Thanks Jakub, I deleted /var/lib/sss/db, and restarted sssd, still could
not fix it, id user returned Not found, and userid in nobody. I added
debug_level = 9, and found following error in sssd_nss.log:

...
(Wed Apr 13 03:09:46 2016) [sssd[nss]] [sss_dp_get_reply] (0x1000): Got
reply from Data Provider - DP error code: 1 errno: 11 error message: Fast
reply - offline
(Wed Apr 13 03:09:46 2016) [sssd[nss]] [nss_cmd_getby_dp_callback]
(0x0040): Unable to get information from Data Provider
Error: 1, 11, Fast reply - offline
...

What did that mean? I checked with "service sssd status" which was running
well, I've just ran the ldapsearch which returned all correct information.

Thank you very much .

Kind regards,

- h


On Wed, Apr 13, 2016 at 5:30 PM, Jakub Hrozek <jhro...@redhat.com> wrote:

> On Wed, Apr 13, 2016 at 10:52:15AM +1000, jupiter wrote:
> > Hi Jakub,
> >
> > Thanks for your response, please see following embedded comments.
> >
> > On Tue, Apr 12, 2016 at 6:24 PM, Jakub Hrozek <jhro...@redhat.com>
> wrote:
> >
> > > On Tue, Apr 12, 2016 at 11:03:47AM +1000, jupiter wrote:
> > > > Hi,
> > > >
> > > > We are running sssd version 1.12.4-47 on CentOS 6. It works fine in
> > > > general, but from time to time, some nodes listed all user ids with
> > > > "nobody",
> > >
> > > Was this problem happening only on an NFS share..?
> > >
> >
> > I don't think it is an NFS issue, it is an SSS issue.
> >
> > >
> > > > calling id username immediatly returned "No such user",
> > >
> > > Hmm, I guess not, this sounds like a generic issue, if neither id
> > > couldn't find the user.
> > >
> >
> > The user is fine, I can run "id username" in another healthy node without
> > any problems.
> >
> > >
> > > > it looks
> > > > the id went to cache and did not contact to the LDAP.
> > >
> > > Please note that if the user was looked up at least once before, then
> > > even if SSSD couldn't contact the server for one reason or another, it
> > > should have returned entries from the cache.
> > >
> >
> > Once again, the user id is fine, we can verify from other health nodes.
> > Beside, when the node is fixed by adding debug_level = 6, everything  is
> > back to normal.
> >
> >
> > > >
> > > > On one occasion, I added debug_level = 6 to the sssd.conf, restarted
> > > sssd,
> > > > the "nobody" was gone and id username was returned correct LDAP user
> id.
> > > It
> > > > did not make any sense to me how adding a debug_level could fix the
> > > > problem.
> > >
> > > I suspect it was actually the restart, because the restart might cause
> > > sssd to reconnect to servers and operate online.
> > >
> >
> > But prior to that change, I restart sssd dozen times, nothing could fix
> it
> > until I changed debug_level = 6 which fixed the issue, but it did not
> make
> > any sens to me.
> >
> > >
> > > What you can do, if for some reason running with debugging enabled all
> > > the time is not practical, is use the sss_debuglevel tool to bump
> > > debugging on the fly.
> > >
> > > But at any rate, we need to see the sssd logs to proceed.
> > >
> >
> > The error in log file was nss_getpwnam: name 'dhpec' not found in domain
> '
> > hpc.org'. It seems to me sssd simply got information from the invalid
> > cache, not from the ldap.
> >
> > >
> > > > I could smell the issue from sssd cache, but I have no idea since
> > > > the all default cache setting only for some seconds, but when the
> node
> > > > caught in that problem, it can sit for many days with uids in
> nobody, id
> > > > returns no such user.
> > > >
> > > > After searching from Internet, someone suggested to run sss_cache -E
> to
> > > > invalidate all cached entries would solve the problem, I tried, it
> did
> > > not
> > > > work.
> > >
> > > Well, if sssd was offline at that time, then invalidating the cache
> > > wouldn't help, because sssd wouldn't have a way to fetch the data
> from..
> > >
> > > I checked sssd process before running sss_cache -E, the sssd was
> always on
> > line. My question is, how do you verify if the cache has been cleaned? Or
> > you simply delete /var/lib/sss/db?
>
> sss_cache just expires user entries. If you want to really remove the
> cache (careful, though..) then yes, at the moment you need to remove the
> .ldb files under /var/lib/sss/db.
>
> The next version of sss_cache should also have an option to really
> delete the cache.
> _______________________________________________
> sssd-users mailing list
> sssd-users@lists.fedorahosted.org
>
> https://lists.fedorahosted.org/admin/lists/sssd-users@lists.fedorahosted.org
>
_______________________________________________
sssd-users mailing list
sssd-users@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-users@lists.fedorahosted.org

Reply via email to