Package: libc6
Severity: important


-- System Information:
Debian Release: 8.5
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.16.0-4-amd64 (SMP w/64 CPU cores)
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)


We are having random failures to map UIDs to usernames

** Note: joebob and bigcomputer are fabricated names

        joebob@bigcomputer:~$ whoami
        joebob
        joebob@bigcomputer:~$ whoami
        whoami: cannot find name for user ID 1234
        joebob@bigcomputer:~$ whoami
        whoami: cannot find name for user ID 1234
        joebob@bigcomputer:~$ whoami
        joebob
        joebob@bigcomputer:~$ whoami
        joebob

but
        ypcat passwd.byname | grep joebob

consistently works properly.

        ypcat passwd.byname | wc -l

always returns the same value.   So, it appears that NIS is correctly 
functioning itself.
(it's a number i expect above 1,000, and definitely not zero ;) )


However,

I have no name!@bigcomputer:~$ while getent passwd | wc -l; do sleep 1; done
2462
2462
2462
2462
2462
1377
85
36
36
36
36
36
36
50

So, getent passwd is randomly failing to fully populate


/etc/nsswitch.conf contains:

        passwd:         compat
        group:          compat nis  *
        netgroup:       nis
* this has been in our systems for years due to bug 584914

I have alternatively tried the following unsuccessfully:

passwd:  compat nis  (to see if 584914 is related)
passwd:  files nis
passwd:  compat [NOTFOUND=continue] compat [NOTFOUND=continue] compat 
[NOTFOUND=continue] compat

The latter is because libnss_nis appears to return notfound, not unavailable, 
so i was hoping to do multiple retries, but i'm not sure what i hoped to 
perform here is even doing what i wanted.


Also:

   getent -s nis passwd joebob
   getent -s compat passwd joebob

both exhibit the random failure/success (so, it's not just libnss_compat here)

getent' is returning status code "2" (One or more supplied key could not be 
found in the database.)


So, it seems to me that the common component here is libnss_nis.so.

The machine this is running on is a rather beefy server:

Dell PowerEdge R820
256GB RAM
4 x Intel(R) Xeon(R) CPU E5-4640 0 @ 2.40GHz


This problem presents itself when the system is getting heavily loaded, so this 
seems like a race-condition somewhere.

I may not be able to do much testing as the system is being emergently 
reconfigured to remove NIS dependency, but let me know if you need any further 
information/testing.

btw, 'nscd' is NOT running, and with bug reports related to these 
libnss_nis/compat issues i see lots of folks saying 'nscd' made no difference, 
so i didn't test it.


thanks,
--stephen

Reply via email to