Re: [Freeipa-users] RHEL 5.7 / 5.8 BETA and KDE crashing SSSD
On Sat, February 4, 2012 14:58, Stephen Gallagher wrote: On Fri, 2012-02-03 at 12:53 +0100, Sigbjorn Lie wrote: On Wed, February 1, 2012 15:04, Simo Sorce wrote: On Wed, 2012-02-01 at 07:28 -0500, Stephen Gallagher wrote: On Wed, 2012-02-01 at 11:02 +0100, Sigbjorn Lie wrote: Hi, Is this more like the expected output? :) No, I'm afraid it's not. That's a log of a legitimate shutdown, not a segmentation fault. (Receiving SIGTERM means that the monitor told the process to exit). Possibly this happened if the time between attaching to the process and typing 'cont' was more than about 30 seconds. The monitor will assume the sssd_be process isn't responding and will kill and restart it. You will know you got the correct results if you see Program received signal SIGSEGV, Segmentation fault. and then you can immediately perform the 'bt full' For better results with gdb I suggest to kill SIGSTOP the monitor before attaching gdb to any of the reponders or the providers, this way the monitor will be prevented from sending termination signals to the children. However, don't do this for long, only for short periods and kill SIGCONT back the monitor immediately after. Please see below. Does this help? Yes, thank you it does. (gdb) bt full #0 sysdb_attrs_get_el_int (attrs=0x6c616d726f6e2d72, name=0x43c75d name, alloc=true, el=0x7fffe9e0dab8) at src/db/sysdb.c:254 e = value optimized out i = value optimized out #1 0x004221d7 in sysdb_attrs_primary_name (sysdb=0xf725e00, attrs=0x6c616d726f6e2d72, ldap_attr=0xf741110 cn, The memory address for attrs here is WAY out of range. That suggests that this is an uninitialized value. _primary=0x7fffe9e0db58) at src/db/sysdb.c:2441 ret = value optimized out rdn_attr = 0x0 rdn_val = 0x0 sysdb_name_el = 0x61 orig_dn_el = value optimized out i = value optimized out tmpctx = 0xf768ce0 __FUNCTION__ = sysdb_attrs_primary_name #2 0x0042290d in sysdb_attrs_primary_name_list (sysdb=0xf725e00, mem_ctx=value optimized out, attr_list=0xf772e20, attr_count=2, ldap_attr=0xf741110 cn, name_list=0x7fffe9e0dc88) at src/db/sysdb.c:2606 ret = 259427552 i = 1 i = 1, so it's the second entry in the attr_list being passed in. My spidey-sense is tingling here. Probably the array is one entry too long above. j = 1 list = value optimized out name = 0xf769580 ac_server-normal __FUNCTION__ = sysdb_attrs_primary_name_list #3 0x2b20c9684456 in sdap_initgr_nested_get_membership_diff ( state=0xf7726f0) at src/providers/ldap/sdap_async_accounts.c:3061 __FUNCTION__ = sdap_initgr_nested_get_membership_diff This is the function that is creating that array (well, actually it's sdap_initgr_nested_get_direct_parents()). So the bug must be occurring here. We're somehow creating an array of two entries but not populating the second one. That said, I'm not sure how that's possible. The code there is very short and seems pretty carefully-written to avoid this possibility. I don't have time today to dig into this any further, but I wanted to get my findings down in an email so that if anyone else wanted to jump on this before I get back to it, they don't have to start from scratch. Hi, Any progress on this? Regards, Siggi ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users
Re: [Freeipa-users] RHEL 5.7 / 5.8 BETA and KDE crashing SSSD
On 02/04/2012 02:58 PM, Stephen Gallagher wrote: On Fri, 2012-02-03 at 12:53 +0100, Sigbjorn Lie wrote: On Wed, February 1, 2012 15:04, Simo Sorce wrote: On Wed, 2012-02-01 at 07:28 -0500, Stephen Gallagher wrote: On Wed, 2012-02-01 at 11:02 +0100, Sigbjorn Lie wrote: Hi, Is this more like the expected output? :) No, I'm afraid it's not. That's a log of a legitimate shutdown, not a segmentation fault. (Receiving SIGTERM means that the monitor told the process to exit). Possibly this happened if the time between attaching to the process and typing 'cont' was more than about 30 seconds. The monitor will assume the sssd_be process isn't responding and will kill and restart it. You will know you got the correct results if you see Program received signal SIGSEGV, Segmentation fault. and then you can immediately perform the 'bt full' For better results with gdb I suggest to kill SIGSTOP the monitor before attaching gdb to any of the reponders or the providers, this way the monitor will be prevented from sending termination signals to the children. However, don't do this for long, only for short periods and kill SIGCONT back the monitor immediately after. Please see below. Does this help? Yes, thank you it does. (gdb) bt full #0 sysdb_attrs_get_el_int (attrs=0x6c616d726f6e2d72, name=0x43c75d name, alloc=true, el=0x7fffe9e0dab8) at src/db/sysdb.c:254 e =value optimized out i =value optimized out #1 0x004221d7 in sysdb_attrs_primary_name (sysdb=0xf725e00, attrs=0x6c616d726f6e2d72, ldap_attr=0xf741110 cn, The memory address for attrs here is WAY out of range. That suggests that this is an uninitialized value. _primary=0x7fffe9e0db58) at src/db/sysdb.c:2441 ret =value optimized out rdn_attr = 0x0 rdn_val = 0x0 sysdb_name_el = 0x61 orig_dn_el =value optimized out i =value optimized out tmpctx = 0xf768ce0 __FUNCTION__ = sysdb_attrs_primary_name #2 0x0042290d in sysdb_attrs_primary_name_list (sysdb=0xf725e00, mem_ctx=value optimized out, attr_list=0xf772e20, attr_count=2, ldap_attr=0xf741110 cn, name_list=0x7fffe9e0dc88) at src/db/sysdb.c:2606 ret = 259427552 i = 1 i = 1, so it's the second entry in the attr_list being passed in. My spidey-sense is tingling here. Probably the array is one entry too long above. j = 1 list =value optimized out name = 0xf769580 ac_server-normal __FUNCTION__ = sysdb_attrs_primary_name_list #3 0x2b20c9684456 in sdap_initgr_nested_get_membership_diff ( state=0xf7726f0) at src/providers/ldap/sdap_async_accounts.c:3061 __FUNCTION__ = sdap_initgr_nested_get_membership_diff This is the function that is creating that array (well, actually it's sdap_initgr_nested_get_direct_parents()). So the bug must be occurring here. We're somehow creating an array of two entries but not populating the second one. That said, I'm not sure how that's possible. The code there is very short and seems pretty carefully-written to avoid this possibility. I don't have time today to dig into this any further, but I wanted to get my findings down in an email so that if anyone else wanted to jump on this before I get back to it, they don't have to start from scratch. Is there anything further I can do to help out troubleshooting this issue? I have opened a case (case id 00594772) and referenced this thread, as this issue occurred at a paying customers site. Regards, Siggi ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users
Re: [Freeipa-users] RHEL 5.7 / 5.8 BETA and KDE crashing SSSD
Hi, Is this more like the expected output? :) #0 0x0039a1ad4ce3 in __epoll_wait_nocancel () from /lib64/libc.so.6 No symbol table info available. #1 0x0039a2e0514e in ?? () from /usr/lib64/libtevent.so.0 No symbol table info available. #2 0x0039a2e02690 in _tevent_loop_once () from /usr/lib64/libtevent.so.0 No symbol table info available. #3 0x0039a2e026fb in ?? () from /usr/lib64/libtevent.so.0 No symbol table info available. #4 0x00435771 in server_loop (main_ctx=0x16211b10) at src/util/server.c:526 No locals. #5 0x0040ef2f in main (argc=6, argv=0x7fff66702d88) at src/providers/data_provider_be.c:1333 opt = value optimized out pc = 0x1620f5e0 be_domain = 0x1620f9c0 dns.local srv_name = value optimized out conf_entry = value optimized out main_ctx = 0x16211b10 ret = value optimized out long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x6496a0, val = 0, descrip = 0x43a7d2 Help options:, argDescrip = 0x0}, {longName = 0x43a7e0 debug-level, shortName = 100 'd', argInfo = 2, arg = 0x649778, val = 0, pc = 0x1620f5e0 be_domain = 0x1620f9c0 dns.local srv_name = value optimized out conf_entry = value optimized out main_ctx = 0x16211b10 ret = value optimized out long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x6496a0, val = 0, descrip = 0x43a7d2 Help options:, argDescrip = 0x0}, {longName = 0x43a7e0 debug-level, shortName = 100 'd', argInfo = 2, arg = 0x649778, val = 0, descrip = 0x43a7b1 Debug level, argDescrip = 0x0}, { longName = 0x43a7ec debug-to-files, shortName = 102 'f', argInfo = 0, arg = 0x64977c, val = 0, descrip = 0x43b448 Send the debug output to files instead of stderr, argDescrip = 0x0}, {longName = 0x43a7fb debug-timestamps, shortName = 0 '\000', argInfo = 2, arg = 0x649680, val = 0, descrip = 0x43a7bd Add debug timestamps, argDescrip = 0x0}, { longName = 0x43bd90 domain, shortName = 0 '\000', argInfo = 1, arg = 0x7fff66702c48, val = 0, descrip = 0x43b480 Domain of the information provider (mandatory), argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 0, arg = 0x0, val = 0, descrip = 0x0, argDescrip = 0x0}} __FUNCTION__ = main (gdb) cont Continuing. Program received signal SIGTERM, Terminated. 0x0039a1ad4ce3 in __epoll_wait_nocancel () from /lib64/libc.so.6 Rgds, Siggi On Tue, January 31, 2012 13:40, Stephen Gallagher wrote: On Tue, 2012-01-31 at 13:35 +0100, Sigbjorn Lie wrote: Ok, please see below for the output from gdb. I notice that it's not happening every time. All this morning I could unlock without any issues. Around lunchtime the issue started occouring again, but it's different each time how many times I have to restart sssd before I can successfully unlock my desktop. warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff3fbfd000 0x2b104670cce3 in __epoll_wait_nocancel () from /lib64/libc.so.6 (gdb) cont Continuing. Detaching after fork from child process 22008. Detaching after fork from child process 23608. Detaching after fork from child process 28122. Detaching after fork from child process 32315. Program received signal SIGSEGV, Segmentation fault. sysdb_attrs_get_el_int (attrs=0x6c616d726f6e2d72, name=0x43c75d name, alloc=true, el=0x7fff3fafbb18) at src/db/sysdb.c:254 254 for (i = 0; i attrs-num; i++) { (gdb) Continuing. Don't do continue here. This is where we needed the 'bt full'. Once you continue from here, it just exits and we lose the state. Please rerun this test. Program terminated with signal SIGSEGV, Segmentation fault. The program no longer exists. (gdb) bt full No stack. (gdb) Regards, Siggi ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users
Re: [Freeipa-users] RHEL 5.7 / 5.8 BETA and KDE crashing SSSD
On Wed, 2012-02-01 at 11:02 +0100, Sigbjorn Lie wrote: Hi, Is this more like the expected output? :) No, I'm afraid it's not. That's a log of a legitimate shutdown, not a segmentation fault. (Receiving SIGTERM means that the monitor told the process to exit). Possibly this happened if the time between attaching to the process and typing 'cont' was more than about 30 seconds. The monitor will assume the sssd_be process isn't responding and will kill and restart it. You will know you got the correct results if you see Program received signal SIGSEGV, Segmentation fault. and then you can immediately perform the 'bt full' signature.asc Description: This is a digitally signed message part ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users
Re: [Freeipa-users] RHEL 5.7 / 5.8 BETA and KDE crashing SSSD
On Tue, 2012-01-31 at 13:35 +0100, Sigbjorn Lie wrote: Ok, please see below for the output from gdb. I notice that it's not happening every time. All this morning I could unlock without any issues. Around lunchtime the issue started occouring again, but it's different each time how many times I have to restart sssd before I can successfully unlock my desktop. warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff3fbfd000 0x2b104670cce3 in __epoll_wait_nocancel () from /lib64/libc.so.6 (gdb) cont Continuing. Detaching after fork from child process 22008. Detaching after fork from child process 23608. Detaching after fork from child process 28122. Detaching after fork from child process 32315. Program received signal SIGSEGV, Segmentation fault. sysdb_attrs_get_el_int (attrs=0x6c616d726f6e2d72, name=0x43c75d name, alloc=true, el=0x7fff3fafbb18) at src/db/sysdb.c:254 254 for (i = 0; i attrs-num; i++) { (gdb) Continuing. Don't do continue here. This is where we needed the 'bt full'. Once you continue from here, it just exits and we lose the state. Please rerun this test. Program terminated with signal SIGSEGV, Segmentation fault. The program no longer exists. (gdb) bt full No stack. (gdb) Regards, Siggi signature.asc Description: This is a digitally signed message part ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users
Re: [Freeipa-users] RHEL 5.7 / 5.8 BETA and KDE crashing SSSD
On Mon, 2012-01-30 at 16:01 +0100, Sigbjorn Lie wrote: Hi, I'm doing a pre-implementation project for a customer having RHEL 5.7 workstations with KDE as their windows manager. When using KDE at a RHEL 5.7 (or 5.8 BETA) workstation connected to a IPA 2.1.3 running at RHEL 6.2 server, sssd will crash every time I attempt to unlock the screen. To work around the issue I switch to tty1, log in as root, and restart sssd. After attempting this several times (2-5 times), I can finally unlock the screen. I have attempted to update one workstation to 5.8 beta to see if the issue was resolved there. No such luck. Is this a known issue? The log displays the following: Jan 30 15:49:16 svg118 kdesktop_lock: on 0 Jan 30 15:49:21 svg118 kernel: sssd_be[9873] general protection rip:41dc3d rsp:7fffc57c9f10 error:0 Jan 30 15:49:22 svg118 sssd[be[no.ep.corp.local]]: Starting up Jan 30 15:49:33 svg118 sssd[be[no.ep.corp.local]]: Shutting down Jan 30 15:49:33 svg118 sssd[pam]: Shutting down Jan 30 15:49:33 svg118 kcheckpass[9896]: Authentication failure for username (invoked by uid 12345) Jan 30 15:49:33 svg118 sssd[nss]: Shutting down Jan 30 15:49:33 svg118 sssd: Starting up Jan 30 15:49:34 svg118 sssd[be[no.ep.corp.local]]: Starting up Jan 30 15:49:34 svg118 sssd[nss]: Starting up Jan 30 15:49:34 svg118 sssd[pam]: Starting up Jan 30 15:49:42 svg118 kernel: sssd_be[9928] general protection rip:41dc3d rsp:7fff70baba70 error:0 Jan 30 15:49:43 svg118 sssd[be[no.ep.corp.local]]: Starting up Jan 30 15:49:52 svg118 sssd[be[no.ep.corp.local]]: Shutting down Jan 30 15:49:52 svg118 sssd[pam]: Shutting down Jan 30 15:49:52 svg118 kcheckpass[9933]: Authentication failure for username (invoked by uid 12345) Jan 30 15:49:52 svg118 sssd[nss]: Shutting down Jan 30 15:49:52 svg118 sssd: Starting up Jan 30 15:49:52 svg118 sssd[be[no.ep.corp.local]]: Starting up Jan 30 15:49:52 svg118 sssd[pam]: Starting up Jan 30 15:49:52 svg118 sssd[nss]: Starting up Jan 30 15:49:59 svg118 kernel: sssd_be[9985] general protection rip:41dc3d rsp:7fff40912260 error:0 Definitely not a known issue. Do you think you could attach gdb to the sssd_be process and try to get a backtrace for me to look at, please? signature.asc Description: This is a digitally signed message part ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users
Re: [Freeipa-users] RHEL 5.7 / 5.8 BETA and KDE crashing SSSD
Sure. Ive left the office for today, will do so tomorrow. Im not very familiar with gdb. Any particular syntax / switches to add? Rgds, Siggi. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Stephen Gallagher sgall...@redhat.com wrote: On Mon, 2012-01-30 at 16:01 +0100, Sigbjorn Lie wrote: Hi, I'm doing a pre-implementation project for a customer having RHEL 5.7 workstations with KDE as their windows manager. When using KDE at a RHEL 5.7 (or 5.8 BETA) workstation connected to a IPA 2.1.3 running at RHEL 6.2 server, sssd will crash every time I attempt to unlock the screen. To work around the issue I switch to tty1, log in as root, and restart sssd. After attempting this several times (2-5 times), I can finally unlock the screen. I have attempted to update one workstation to 5.8 beta to see if the issue was resolved there. No such luck. Is this a known issue? The log displays the following: Jan 30 15:49:16 svg118 kdesktop_lock: on 0 Jan 30 15:49:21 svg118 kernel: sssd_be[9873] general protection rip:41dc3d rsp:7fffc57c9f10 error:0 Jan 30 15:49:22 svg118 sssd[be[no.ep.corp.local]]: Starting up Jan 30 15:49:33 svg118 sssd[be[no.ep.corp.local]]: Shutting down Jan 30 15:49:33 svg118 sssd[pam]: Shutting down Jan 30 15:49:33 svg118 kcheckpass[9896]: Authentication failure for username (invoked by uid 12345) Jan 30 15:49:33 svg118 sssd[nss]: Shutting down Jan 30 15:49:33 svg118 sssd: Starting up Jan 30 15:49:34 svg118 sssd[be[no.ep.corp.local]]: Starting up Jan 30 15:49:34 svg118 sssd[nss]: Starting up Jan 30 15:49:34 svg118 sssd[pam]: Starting up Jan 30 15:49:42 svg118 kernel: sssd_be[9928] general protection rip:41dc3d rsp:7fff70baba70 error:0 Jan 30 15:49:43 svg118 sssd[be[no.ep.corp.local]]: Starting up Jan 30 15:49:52 svg118 sssd[be[no.ep.corp.local]]: Shutting down Jan 30 15:49:52 svg118 sssd[pam]: Shutting down Jan 30 15:49:52 svg118 kcheckpass[9933]: Authentication failure for username (invoked by uid 12345) Jan 30 15:49:52 svg118 sssd[nss]: Shutting down Jan 30 15:49:52 svg118 sssd: Starting up Jan 30 15:49:52 svg118 sssd[be[no.ep.corp.local]]: Starting up Jan 30 15:49:52 svg118 sssd[pam]: Starting up Jan 30 15:49:52 svg118 sssd[nss]: Starting up Jan 30 15:49:59 svg118 kernel: sssd_be[9985] general protection rip:41dc3d rsp:7fff40912260 error:0 Definitely not a known issue. Do you think you could attach gdb to the sssd_be process and try to get a backtrace for me to look at, please? _ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users
Re: [Freeipa-users] RHEL 5.7 / 5.8 BETA and KDE crashing SSSD
On Mon, 2012-01-30 at 18:00 +0100, Sigbjorn Lie wrote: Sure. Ive left the office for today, will do so tomorrow. Im not very familiar with gdb. Any particular syntax / switches to add? Rgds, Siggi. You'll want to do this in a non-graphical terminal, so you can switch to it if KDE gets into trouble. First, install the sssd-debuginfo packages (debuginfo-install sssd) and install gdb (yum install gdb) Then run: gdb -p ($pidof sssd_be) Then in the gdb prompt, type 'cont' (this will resume execution of sssd_be). Now, switch back to KDE and unlock the screen. Then switch back to this virtual terminal. You should be back at the prompt, with GDB telling you that you received a SIGSEGV or SIGABRT. Type bt full and reply with all pages of output from that (there may be multiple, requiring you to hit enter to see them). signature.asc Description: This is a digitally signed message part ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users
Re: [Freeipa-users] RHEL 5.7 / 5.8 BETA and KDE crashing SSSD
Excellent, thank you. I will get this done tomorrow. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Stephen Gallagher sgall...@redhat.com wrote: On Mon, 2012-01-30 at 18:00 +0100, Sigbjorn Lie wrote: Sure. Ive left the office for today, will do so tomorrow. Im not very familiar with gdb. Any particular syntax / switches to add? Rgds, Siggi. You'll want to do this in a non-graphical terminal, so you can switch to it if KDE gets into trouble. First, install the sssd-debuginfo packages (debuginfo-install sssd) and install gdb (yum install gdb) Then run: gdb -p ($pidof sssd_be) Then in the gdb prompt, type 'cont' (this will resume execution of sssd_be). Now, switch back to KDE and unlock the screen. Then switch back to this virtual terminal. You should be back at the prompt, with GDB telling you that you received a SIGSEGV or SIGABRT. Type bt full and reply with all pages of output from that (there may be multiple, requiring you to hit enter to see them). ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users