[SSSD-users] Re: [External] Re: 'automount -m' segfaults when using sssd

Prentice Bisbal Wed, 25 Jan 2023 08:34:58 -0800

(difficult to confirm without coredump/backtraces).

Would a stack trace of automount or the sssd daemon be sufficient? Ifso, which sssd daemon should I trace: sssd_be, sssd_autofs, sssd?


Here's what I see when I do an strace of 'automount -m':

newfstatat(AT_FDCWD, "/etc/nsswitch.conf", {st_mode=S_IFREG|0644,st_size=2980, ...}, 0) = 0newfstatat(AT_FDCWD, "/etc/nsswitch.conf", {st_mode=S_IFREG|0644,st_size=2980, ...}, 0) = 0

openat(AT_FDCWD, "/etc/group", O_RDONLY|O_CLOEXEC) = 7

newfstatat(7, "", {st_mode=S_IFREG|0644, st_size=653, ...},AT_EMPTY_PATH) = 0

lseek(7, 0, SEEK_SET)                   = 0
read(7, "root:x:0:\nbin:x:1:\ndaemon:x:2:\ns"..., 4096) = 653
read(7, "", 4096)                       = 0
close(7)                                = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x8} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)

Here's what I see when I attach strace to sssd_autofs:

recvfrom(18, "", 1536, 0, NULL, NULL)   = 0
write(0, "(2023-01-25 10:52:46): [autofs] "..., 85) = 85
getpid()                                = 1348
epoll_ctl(3, EPOLL_CTL_DEL, 18, 0x7fff6d209c8c) = 0
close(18)                               = 0
write(0, "(2023-01-25 10:52:46): [autofs] "..., 107) = 107
getpid()                                = 1348
epoll_wait(3, 0x7fff6d209e0c, 1, 2171)  = -1 EINTR (Interrupted system call)

--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_timerid=0,si_overrun=0, si_int=1165744160, si_ptr=0x7efc457bd820} ---rt_sigreturn({mask=[INT FPE USR1 USR2 PIPE]}) = -1 EINTR (Interruptedsystem call)

getpid()                                = 1348
epoll_wait(3, [], 1, 420)

and from an strace of sss_nsss:

write(0, "(2023-01-25 10:54:12): [nss] [cl"..., 83) = 83
getpid()                                = 1345
epoll_ctl(3, EPOLL_CTL_DEL, 22, 0x7ffdca2b446c) = 0
close(22)                               = 0
write(0, "(2023-01-25 10:54:12): [nss] [cl"..., 105) = 105
getpid()                                = 1345
epoll_wait(3, 0x7ffdca2b45ec, 1, 5949)  = -1 EINTR (Interrupted system call)

--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_timerid=0,si_overrun=0, si_int=959572000, si_ptr=0x7fb93931e820} ---rt_sigreturn({mask=[INT FPE USR1 USR2 PIPE]}) = -1 EINTR (Interruptedsystem call)

getpid()                                = 1345
epoll_wait(3, [], 1, 414)               = 0
getpid()                                = 1345
epoll_wait(3,

And from sss_be:

write(0, "(2023-01-25 11:11:23): [be[defau"..., 100) = 100
getpid()                                = 1251

epoll_wait(3, [{events=EPOLLIN, data={u32=2661562720,u64=94882784098656}}], 1, 5775) = 1recvmsg(17, {msg_name=NULL, msg_namelen=0,msg_iov=[{iov_base="l\2\1\1\0\0\0\0A\0\0\0\0\0\0\6\1s\0\v\0\0\0sssd.aut"..., iov_len=2048}], msg_iovlen=1,msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 48recvmsg(17, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resourcetemporarily unavailable)sendmsg(21, {msg_name=NULL, msg_namelen=0,msg_iov=[{iov_base="l\2\1\1\0\0\0\0A\0\0\0<\0\0\0\6\1s\0\v\0\0\0sssd.aut"...,iov_len=80}, {iov_base="", iov_len=0}], msg_iovlen=2, msg_controllen=0,msg_flags=0}, MSG_NOSIGNAL) = 80

getpid()                                = 1251
epoll_wait(3, 0x7ffd39a53ccc, 1, 5775)  = -1 EINTR (Interrupted system call)

--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_timerid=0,si_overrun=0, si_int=-186955744, si_ptr=0x7fa6f4db4820} ---

rt_sigreturn({mask=[INT FPE PIPE]})     = -1 EINTR (Interrupted system call)
getpid()                                = 1251
epoll_wait(3, [], 1, 872)               = 0
getpid()                                = 1251

I tried to compile the trivial reproducer from the GitHub Issue youlinked to, but I'm getting an error when I try to compile it:


# gcc sssd_tester.c -o sssd_tester -lsss_nss_idmap -lpthread -ldl
sssd_tester.c: In function ‘thread’:

sssd_tester.c:12:5: warning: implicit declaration of function‘sss_getpwnam’; did you mean ‘getpwnam’? [-Wimplicit-function-declaration]

   12 |     sss_getpwnam("test", &res, buff, sizeof(buff), &errnop);
      |     ^~~~~~~~~~~~
      |     getpwnam

It looks like sss_getpwnam isn't defined in any of the header files in/usr/include, or I just can't find what package provides it.


Prentice

On 1/25/23 4:30 AM, Alexey Tikhonov wrote:

Hi,

might be https://github.com/SSSD/sssd/issues/6505 /https://bugzilla.redhat.com/show_bug.cgi?id=2143159

(difficult to confirm without coredump/backtraces).

Should be fixed in C9S/9.2

(https://composes.stream.centos.org/development/latest-CentOS-Stream/compose/BaseOS/x86_64/os/Packages/sssd-2.8.2-2.el9.x86_64.rpm...)




On Tue, Jan 24, 2023 at 10:13 PM Prentice Bisbal <[email protected]> wrote:

    sssd-users,

    I'm having an odd problem with autmount, that seems to be specific to
    using sssd for autofs. The operating system is Springdale Open
    Enterprise Linux 9.1, which is a rebuild of RHEL maintained by
    Princeton
    University, so it should be 100% bug compatible with RHEL (and Rocky).

    My configuration is using Kerberos for auth, and LDAP directory
    services. When my nsswitch.conf entry for automount looks like this:

    automount: sss files

    Testing the configuration of automount/sssdwith 'automount -m'
    leads to
    a segfault:

    # automount -m

    autofs dump map information
    ===========================

    global options: none configured

    Mount point: /p

    source(s):
    Segmentation fault (core dumped)

    If I change nsswitch.conf to use files only or use ldap like this:

    automount: files

    or this:

    automount files ldap

    Everything works as expected. LDAP searches using ldapsearch works
    just
    fine, and using getent to get user and group information (which is
    stored in LDAP) works just fine.  I've increased the debugging levels
    for the relevant  SSSD daemons:

    # egrep '^\[|debug_level' /etc/sssd/sssd.conf
    [domain/PPPL]
    debug_level = 8
    [sssd]
    debug_level = 8
    [nss]
    debug_level = 8
    [pam]
    [autofs]
    debug_level = 8

    Looking in the related log files the logs for my default domain show
    that it is getting information from the LDAP directory, and then it
    fails saying it can't contact the LDAP server:

    (2023-01-24 16:01:31): [be[default]] [sysdb_set_entry_attr] (0x0200):
    [RID#6] Entry
    
[name=/lldap:ou\3Dauto.local\,ou\3Dmounts\,dc\3Dunix\,dc\3Dpppl\,dc\3Dgov,name=auto.master,cn=autofsmaps,cn=custom,cn=default,cn=sysdb]

    has set [cache] attrs.
    (2023-01-24 16:01:31): [be[default]] [sysdb_entry_attrs_diff]
    (0x0400):
    [RID#6] Entry
    
[name=/pfsldap:ou\3Dauto.pfs\,ou\3Dmounts\,dc\3Dunix\,dc\3Dpppl\,dc\3Dgov,name=auto.master,cn=autofsmaps,cn=custom,cn=default,cn=sysdb]

    differs, reason: ts_cache doesn't trace this type of entry.
    (2023-01-24 16:01:31): [be[default]] [sysdb_set_entry_attr] (0x0200):
    [RID#6] Entry
    
[name=/pfsldap:ou\3Dauto.pfs\,ou\3Dmounts\,dc\3Dunix\,dc\3Dpppl\,dc\3Dgov,name=auto.master,cn=autofsmaps,cn=custom,cn=default,cn=sysdb]

    has set [cache] attrs.
    (2023-01-24 16:01:31): [be[default]] [fo_resolve_service_send]
    (0x0100):
    [RID#6] Trying to resolve service 'LDAP'
    (2023-01-24 16:01:31): [be[default]] [get_server_status] (0x1000):
    [RID#6] Status of server 'host-a.pppl.gov
    <http://host-a.pppl.gov>' is 'working'
    (2023-01-24 16:01:31): [be[default]] [get_port_status] (0x1000):
    [RID#6]
    Port status of port 389 for server 'host-a.pppl.gov
    <http://host-a.pppl.gov>' is 'not working'

    Not only do the earlier log file entries show that sssd_bes actually
    getting data from LDAP before it reports an error, but I can run
    queries
    from this machine to our LDAP server with 'ldapsearch', and all the
    other computers in our environment, which are running CentOS 7 or
    Rocky
    8 using the same configuration files.

--Prentice

    _______________________________________________
    sssd-users mailing list -- [email protected]
    To unsubscribe send an email to
    [email protected]
    Fedora Code of Conduct:
    https://docs.fedoraproject.org/en-US/project/code-of-conduct/
    List Guidelines:
    https://fedoraproject.org/wiki/Mailing_list_guidelines
    List Archives:
    
https://lists.fedorahosted.org/archives/list/[email protected]
    Do not reply to spam, report it:
    https://pagure.io/fedora-infrastructure/new_issue


_______________________________________________
sssd-users mailing list [email protected]
To unsubscribe send an email [email protected]
Fedora Code of 
Conduct:https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines:https://fedoraproject.org/wiki/Mailing_list_guidelines
List 
Archives:https://lists.fedorahosted.org/archives/list/[email protected]
Do not reply to spam, report 
it:https://pagure.io/fedora-infrastructure/new_issue

_______________________________________________
sssd-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/[email protected]
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

[SSSD-users] Re: [External] Re: 'automount -m' segfaults when using sssd

Reply via email to