--- Comment From mauri...@br.ibm.com 2015-01-27 12:55 EDT---
The error actually happened because the chroot was mounted over NFSv4, and the
NFSv4 server had incorrect domain name configuration.
Then, the NFSv4 idmapd didn't match 'localdomain' (server) with
cluster.com (client), resulting in the chfn binary (and others) being
owned by nobody/nogroup, this combined with the suid bit of that binary,
resulted in kernel denying it during PAM/audit check (failure occurs
right after the socket/sendto/recvfrom syscalls from PAM to kernel
audit).
Solution was to configure the domain name correctly on the server.
Possible workarounds were:
- Use NFSv3 (which has no Name-ID Mapping / idmapd)
- Clear the suid bit
More detailed description from e-mail..
The problem happened because the 'chfn' binary had the suid bit set and was not
owned by root
(actually, nobody/nogroup), so the kernel audit refused it during the PAM/auth
step
(the PAM error follows right after the socket/sendto/recvfrom syscalls for
kernel audit)
That ownership mistake only exists on the NFS mount/client (on tulgpu002). It
is correct
(root/root) on the NFS server (bgxcat).
That happened due to a misconfiguration in the NFSv4 rpc.idmapd on the bgxcat
server;
bgxcat had no FQDN/fully-qualified domain name configured, so the NFSv4 idmapd
didn't
allow bgxcat user 'root' to be 'root' on tulgpu002, because of a mismatch
between their
domains ('localdomain' on bgxcat, 'cluster.com' in tulgpu002).
For a solution, either fixing the network/domain configuration in bgxcat, or
using NFSv3,
works. I have already performed the former for you, and validated the latter.
[root@bgxcat mauricfo]# cat /etc/sysconfig/network
NETWORKING=yes
#HOSTNAME=bgxcat
HOSTNAME=bgxcat.cluster.com
For those interested, more technical details / demonstration are
provided below.
I'm happy to extend the discussion if anyone has questions/comments.
Demonstrating problem/workaround w/ NFSv4 ID Mapping (misconfiguration on
bgxcat server)
Trying in tulgpu002 (/install mounted over NFSv4)
root@tulgpu002:~/mauricio# mount | grep /install
bgxcat:/install on /install type nfs
(rw,vers=4,addr=10.0.0.1,clientaddr=10.0.0.7)
root@tulgpu002:~/mauricio# chroot
/install/netboot/ubuntu14.10/ppc64el/tulgpu-0001-netboot-compute/rootimg
/usr/bin/chfn -f 'games user' games
chfn: PAM: System error
The chfn binary has the suid bit set, but uid/gid are NOT root
(nobody/nogroup). This leads to the problem.
root@tulgpu002:~/mauricio# ls -lh
/install/netboot/ubuntu14.10/ppc64el/tulgpu-0001-netboot-compute/rootimg/usr/bin/chfn
-rwsr-sr-x 1 nobody nogroup 53K Jul 18 2014
/install/netboot/ubuntu14.10/ppc64el/tulgpu-0001-netboot-compute/rootimg/usr/bin/chfn
On bgxcat (the source of /install), all is fine: suid is set, and the
uid/gid are root. No problems there.
[root@bgxcat mauricfo]# ls -lh
/install/netboot/ubuntu14.10/ppc64el/tulgpu-0001-netboot-compute/rootimg/usr/bin/chfn
-rwsr-sr-x 1 root root 53K Jul 18 2014
/install/netboot/ubuntu14.10/ppc64el/tulgpu-0001-netboot-compute/rootimg/usr/bin/chfn
The owner/group permissions changes because of NFSv4 ID Mapping
(rpc.idmapd).
root@tulgpu002:~/mauricio# grep ^No /etc/idmapd.conf
Nobody-User = nobody
Nobody-Group = nogroup
# mount | grep /install
bgxcat:/install on /install type nfs
(rw,vers=4,addr=10.0.0.1,clientaddr=10.0.0.7)
If you retry with NFSv3, which has no ID Mapping, it works.
root@tulgpu002:~/mauricio# umount /install
root@tulgpu002:~/mauricio# mount -t nfs -o vers=3 bgxcat:/install /install
root@tulgpu002:~/mauricio# mount | grep /install
bgxcat:/install on /install type nfs (rw,vers=3,addr=10.0.0.1)
The user/group show up as root.
root@tulgpu002:~/mauricio# ls -lh
/install/netboot/ubuntu14.10/ppc64el/tulgpu-0001-netboot-compute/rootimg/usr/bin/chfn
-rwsr-sr-x 1 root root 53K Jul 18 2014
/install/netboot/ubuntu14.10/ppc64el/tulgpu-0001-netboot-compute/rootimg/usr/bin/chfn
And chroot chfn passes.
root@tulgpu002:~/mauricio# chroot
/install/netboot/ubuntu14.10/ppc64el/tulgpu-0001-netboot-compute/rootimg
/usr/bin/chfn -f 'games user' games
root@tulgpu002:~/mauricio#
Go back to NFSv4, and you'll see the problem.
root@tulgpu002:~/mauricio# umount /install
root@tulgpu002:~/mauricio# mount -t nfs -o vers=4 bgxcat:/install /install
root@tulgpu002:~/mauricio# mount | grep /install
bgxcat:/install on /install type nfs
(rw,vers=4,addr=10.0.0.1,clientaddr=10.0.0.7)
User/group are not root anymore.
root@tulgpu002:~/mauricio# ls -lh
/install/netboot/ubuntu14.10/ppc64el/tulgpu-0001-netboot-compute/rootimg/usr/bin/chfn
-rwsr-sr-x 1 nobody nogroup 53K Jul 18 2014
/install/netboot/ubuntu14.10/ppc64el/tulgpu-0001-netboot-compute/rootimg/usr/bin/chfn
root@tulgpu002:~/mauricio# chroot
/install/netboot/ubuntu14.10/ppc64el/tulgpu-0001-netboot-compute/rootimg
/usr/bin/ch