Re: [Freeipa-users] RHEL 5.7 / 5.8 BETA and KDE crashing SSSD

2012-02-08 Thread Sigbjorn Lie
On Sat, February 4, 2012 14:58, Stephen Gallagher wrote:
 On Fri, 2012-02-03 at 12:53 +0100, Sigbjorn Lie wrote:

 On Wed, February 1, 2012 15:04, Simo Sorce wrote:

 On Wed, 2012-02-01 at 07:28 -0500, Stephen Gallagher wrote:


 On Wed, 2012-02-01 at 11:02 +0100, Sigbjorn Lie wrote:


 Hi,



 Is this more like the expected output? :)




 No, I'm afraid it's not. That's a log of a legitimate shutdown, not a
 segmentation fault. (Receiving SIGTERM means that the monitor told the 
 process to exit).

 Possibly this happened if the time between attaching to the process and
 typing 'cont' was more than about 30 seconds. The monitor will assume the 
 sssd_be process
 isn't responding and will kill and restart it.

 You will know you got the correct results if you see



 Program received signal SIGSEGV, Segmentation fault.



 and then you can immediately perform the 'bt full'

 For better results with gdb I suggest to kill SIGSTOP the monitor before
 attaching gdb to any of the reponders or the providers, this way the 
 monitor will be prevented
 from sending termination signals to the children. However, don't do this 
 for long, only for
 short periods and kill SIGCONT back the monitor immediately after.



 Please see below. Does this help?


 Yes, thank you it does.




 (gdb) bt full
 #0  sysdb_attrs_get_el_int (attrs=0x6c616d726f6e2d72, name=0x43c75d name,
 alloc=true, el=0x7fffe9e0dab8) at src/db/sysdb.c:254 e = value optimized 
 out i = value
 optimized out #1  0x004221d7 in sysdb_attrs_primary_name 
 (sysdb=0xf725e00,
 attrs=0x6c616d726f6e2d72, ldap_attr=0xf741110 cn,

 The memory address for attrs here is WAY out of range. That suggests
 that this is an uninitialized value.

 _primary=0x7fffe9e0db58) at src/db/sysdb.c:2441
 ret = value optimized out rdn_attr = 0x0 rdn_val = 0x0 sysdb_name_el = 
 0x61 orig_dn_el = value
 optimized out i = value optimized out tmpctx = 0xf768ce0 __FUNCTION__ =
 sysdb_attrs_primary_name
 #2  0x0042290d in sysdb_attrs_primary_name_list (sysdb=0xf725e00,
 mem_ctx=value optimized out, attr_list=0xf772e20, attr_count=2, 
 ldap_attr=0xf741110 cn,
 name_list=0x7fffe9e0dc88) at src/db/sysdb.c:2606 ret = 259427552 i = 1

 i = 1, so it's the second entry in the attr_list being passed in. My 
 spidey-sense is tingling
 here. Probably the array is one entry too long above.

 j = 1 list = value optimized out name = 0xf769580 ac_server-normal 
 __FUNCTION__ =
 sysdb_attrs_primary_name_list
 #3  0x2b20c9684456 in sdap_initgr_nested_get_membership_diff (
 state=0xf7726f0) at src/providers/ldap/sdap_async_accounts.c:3061 
 __FUNCTION__ =
 sdap_initgr_nested_get_membership_diff



 This is the function that is creating that array (well, actually it's
 sdap_initgr_nested_get_direct_parents()). So the bug must be occurring here. 
 We're somehow creating
 an array of two entries but not populating the second one.

 That said, I'm not sure how that's possible. The code there is very
 short and seems pretty carefully-written to avoid this possibility.

 I don't have time today to dig into this any further, but I wanted to
 get my findings down in an email so that if anyone else wanted to jump on 
 this before I get back to
 it, they don't have to start from scratch.


Hi,

Any progress on this?


Regards,
Siggi



___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users


Re: [Freeipa-users] RHEL 5.7 / 5.8 BETA and KDE crashing SSSD

2012-02-04 Thread Sigbjorn Lie

On 02/04/2012 02:58 PM, Stephen Gallagher wrote:

On Fri, 2012-02-03 at 12:53 +0100, Sigbjorn Lie wrote:

On Wed, February 1, 2012 15:04, Simo Sorce wrote:

On Wed, 2012-02-01 at 07:28 -0500, Stephen Gallagher wrote:


On Wed, 2012-02-01 at 11:02 +0100, Sigbjorn Lie wrote:


Hi,


Is this more like the expected output? :)



No, I'm afraid it's not. That's a log of a legitimate shutdown, not a
segmentation fault. (Receiving SIGTERM means that the monitor told the process 
to exit).

Possibly this happened if the time between attaching to the process and
typing 'cont' was more than about 30 seconds. The monitor will assume the 
sssd_be process isn't
responding and will kill and restart it.

You will know you got the correct results if you see


Program received signal SIGSEGV, Segmentation fault.


and then you can immediately perform the 'bt full'

For better results with gdb I suggest to kill SIGSTOP the monitor before
attaching gdb to any of the reponders or the providers, this way the monitor 
will be prevented from
sending termination signals to the children. However, don't do this for long, 
only for short
periods and kill SIGCONT back the monitor immediately after.



Please see below. Does this help?

Yes, thank you it does.



(gdb) bt full
#0  sysdb_attrs_get_el_int (attrs=0x6c616d726f6e2d72, name=0x43c75d name,
 alloc=true, el=0x7fffe9e0dab8) at src/db/sysdb.c:254
 e =value optimized out
 i =value optimized out
#1  0x004221d7 in sysdb_attrs_primary_name (sysdb=0xf725e00,
 attrs=0x6c616d726f6e2d72, ldap_attr=0xf741110 cn,

The memory address for attrs here is WAY out of range. That suggests
that this is an uninitialized value.


 _primary=0x7fffe9e0db58) at src/db/sysdb.c:2441
 ret =value optimized out
 rdn_attr = 0x0
 rdn_val = 0x0
 sysdb_name_el = 0x61
 orig_dn_el =value optimized out
 i =value optimized out
 tmpctx = 0xf768ce0
 __FUNCTION__ = sysdb_attrs_primary_name
#2  0x0042290d in sysdb_attrs_primary_name_list (sysdb=0xf725e00,
 mem_ctx=value optimized out, attr_list=0xf772e20, attr_count=2,
 ldap_attr=0xf741110 cn, name_list=0x7fffe9e0dc88) at src/db/sysdb.c:2606
 ret = 259427552
 i = 1

i = 1, so it's the second entry in the attr_list being passed in. My
spidey-sense is tingling here. Probably the array is one entry too long
above.


 j = 1
 list =value optimized out
 name = 0xf769580 ac_server-normal
 __FUNCTION__ = sysdb_attrs_primary_name_list
#3  0x2b20c9684456 in sdap_initgr_nested_get_membership_diff (
 state=0xf7726f0) at src/providers/ldap/sdap_async_accounts.c:3061
 __FUNCTION__ = sdap_initgr_nested_get_membership_diff


This is the function that is creating that array (well, actually it's
sdap_initgr_nested_get_direct_parents()). So the bug must be occurring
here. We're somehow creating an array of two entries but not populating
the second one.

That said, I'm not sure how that's possible. The code there is very
short and seems pretty carefully-written to avoid this possibility.

I don't have time today to dig into this any further, but I wanted to
get my findings down in an email so that if anyone else wanted to jump
on this before I get back to it, they don't have to start from scratch.


Is there anything further I can do to help out troubleshooting this issue?

I have opened a case (case id 00594772) and referenced this thread, as 
this issue occurred at a paying customers site.



Regards,
Siggi

___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users


Re: [Freeipa-users] RHEL 5.7 / 5.8 BETA and KDE crashing SSSD

2012-02-01 Thread Sigbjorn Lie
Hi,

Is this more like the expected output? :)



#0  0x0039a1ad4ce3 in __epoll_wait_nocancel () from /lib64/libc.so.6
No symbol table info available.
#1  0x0039a2e0514e in ?? () from /usr/lib64/libtevent.so.0
No symbol table info available.
#2  0x0039a2e02690 in _tevent_loop_once () from /usr/lib64/libtevent.so.0
No symbol table info available.
#3  0x0039a2e026fb in ?? () from /usr/lib64/libtevent.so.0
No symbol table info available.
#4  0x00435771 in server_loop (main_ctx=0x16211b10)
at src/util/server.c:526
No locals.
#5  0x0040ef2f in main (argc=6, argv=0x7fff66702d88)
at src/providers/data_provider_be.c:1333
opt = value optimized out
pc = 0x1620f5e0
be_domain = 0x1620f9c0 dns.local
srv_name = value optimized out
conf_entry = value optimized out
main_ctx = 0x16211b10
ret = value optimized out
long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4,
arg = 0x6496a0, val = 0, descrip = 0x43a7d2 Help options:,
argDescrip = 0x0}, {longName = 0x43a7e0 debug-level,
shortName = 100 'd', argInfo = 2, arg = 0x649778, val = 0,
pc = 0x1620f5e0
be_domain = 0x1620f9c0 dns.local
srv_name = value optimized out
conf_entry = value optimized out
main_ctx = 0x16211b10
ret = value optimized out
long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4,
arg = 0x6496a0, val = 0, descrip = 0x43a7d2 Help options:,
argDescrip = 0x0}, {longName = 0x43a7e0 debug-level,
shortName = 100 'd', argInfo = 2, arg = 0x649778, val = 0,
descrip = 0x43a7b1 Debug level, argDescrip = 0x0}, {
longName = 0x43a7ec debug-to-files, shortName = 102 'f',
argInfo = 0, arg = 0x64977c, val = 0,
descrip = 0x43b448 Send the debug output to files instead of 
stderr, argDescrip =
0x0}, {longName = 0x43a7fb debug-timestamps,
shortName = 0 '\000', argInfo = 2, arg = 0x649680, val = 0,
descrip = 0x43a7bd Add debug timestamps, argDescrip = 0x0}, {
longName = 0x43bd90 domain, shortName = 0 '\000', argInfo = 1,
arg = 0x7fff66702c48, val = 0,
descrip = 0x43b480 Domain of the information provider 
(mandatory), argDescrip =
0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 0,
arg = 0x0, val = 0, descrip = 0x0, argDescrip = 0x0}}
__FUNCTION__ = main

(gdb) cont
Continuing.

Program received signal SIGTERM, Terminated.
0x0039a1ad4ce3 in __epoll_wait_nocancel () from /lib64/libc.so.6


Rgds,
Siggi


On Tue, January 31, 2012 13:40, Stephen Gallagher wrote:
 On Tue, 2012-01-31 at 13:35 +0100, Sigbjorn Lie wrote:



 Ok, please see below for the output from gdb.


 I notice that it's not happening every time. All this morning I could unlock 
 without any
 issues. Around lunchtime the issue started occouring again, but it's 
 different each time how
 many times I have to restart sssd before I can successfully unlock my 
 desktop.



 warning: no loadable sections found in added symbol-file system-supplied DSO 
 at
 0x7fff3fbfd000
 0x2b104670cce3 in __epoll_wait_nocancel () from /lib64/libc.so.6
 (gdb) cont
 Continuing.


 Detaching after fork from child process 22008.
 Detaching after fork from child process 23608.
 Detaching after fork from child process 28122.
 Detaching after fork from child process 32315.


 Program received signal SIGSEGV, Segmentation fault.
 sysdb_attrs_get_el_int (attrs=0x6c616d726f6e2d72, name=0x43c75d name, 
 alloc=true,
 el=0x7fff3fafbb18) at src/db/sysdb.c:254 254 for (i = 0; i  
 attrs-num; i++) {
 (gdb)
 Continuing.


 Don't do continue here. This is where we needed the 'bt full'. Once
 you continue from here, it just exits and we lose the state.

 Please rerun this test.



 Program terminated with signal SIGSEGV, Segmentation fault.
 The program no longer exists.
 (gdb) bt full
 No stack.
 (gdb)



 Regards,
 Siggi








___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users


Re: [Freeipa-users] RHEL 5.7 / 5.8 BETA and KDE crashing SSSD

2012-02-01 Thread Stephen Gallagher
On Wed, 2012-02-01 at 11:02 +0100, Sigbjorn Lie wrote:
 Hi,
 
 Is this more like the expected output? :)
 

No, I'm afraid it's not. That's a log of a legitimate shutdown, not a
segmentation fault. (Receiving SIGTERM means that the monitor told the
process to exit).

Possibly this happened if the time between attaching to the process and
typing 'cont' was more than about 30 seconds. The monitor will assume
the sssd_be process isn't responding and will kill and restart it.

You will know you got the correct results if you see

Program received signal SIGSEGV, Segmentation fault.

and then you can immediately perform the 'bt full'


signature.asc
Description: This is a digitally signed message part
___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users

Re: [Freeipa-users] RHEL 5.7 / 5.8 BETA and KDE crashing SSSD

2012-01-31 Thread Stephen Gallagher
On Tue, 2012-01-31 at 13:35 +0100, Sigbjorn Lie wrote:
 
 
 Ok, please see below for the output from gdb.
 
 I notice that it's not happening every time. All this morning I could unlock 
 without any issues.
 Around lunchtime the issue started occouring again, but it's different each 
 time how many times I
 have to restart sssd before I can successfully unlock my desktop.
 
 
 
 warning: no loadable sections found in added symbol-file system-supplied DSO 
 at
 0x7fff3fbfd000
 0x2b104670cce3 in __epoll_wait_nocancel () from /lib64/libc.so.6
 (gdb) cont
 Continuing.
 
 Detaching after fork from child process 22008.
 Detaching after fork from child process 23608.
 Detaching after fork from child process 28122.
 Detaching after fork from child process 32315.
 
 Program received signal SIGSEGV, Segmentation fault.
 sysdb_attrs_get_el_int (attrs=0x6c616d726f6e2d72, name=0x43c75d name,
 alloc=true, el=0x7fff3fafbb18) at src/db/sysdb.c:254
 254 for (i = 0; i  attrs-num; i++) {
 (gdb)
 Continuing.

Don't do continue here. This is where we needed the 'bt full'. Once
you continue from here, it just exits and we lose the state.

Please rerun this test.

 
 Program terminated with signal SIGSEGV, Segmentation fault.
 The program no longer exists.
 (gdb) bt full
 No stack.
 (gdb)
 
 
 Regards,
 Siggi
 
 




signature.asc
Description: This is a digitally signed message part
___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users

Re: [Freeipa-users] RHEL 5.7 / 5.8 BETA and KDE crashing SSSD

2012-01-30 Thread Stephen Gallagher
On Mon, 2012-01-30 at 16:01 +0100, Sigbjorn Lie wrote:
 Hi,
 
 I'm doing a pre-implementation project for a customer having RHEL 5.7 
 workstations with KDE as
 their windows manager.
 
 When using KDE at a RHEL 5.7 (or 5.8 BETA) workstation connected to a IPA 
 2.1.3 running at RHEL
 6.2 server, sssd will crash every time I attempt to unlock the screen.
 
 To work around the issue I switch to tty1, log in as root, and restart sssd. 
 After attempting this
 several times (2-5 times), I can finally unlock the screen. I have attempted 
 to update one
 workstation to 5.8 beta to see if the issue was resolved there. No such luck.
 
 Is this a known issue?
 
 
 The log displays the following:
 
 Jan 30 15:49:16 svg118 kdesktop_lock: on 0
 Jan 30 15:49:21 svg118 kernel: sssd_be[9873] general protection rip:41dc3d 
 rsp:7fffc57c9f10 error:0
 Jan 30 15:49:22 svg118 sssd[be[no.ep.corp.local]]: Starting up
 Jan 30 15:49:33 svg118 sssd[be[no.ep.corp.local]]: Shutting down
 Jan 30 15:49:33 svg118 sssd[pam]: Shutting down
 Jan 30 15:49:33 svg118 kcheckpass[9896]: Authentication failure for username 
 (invoked by uid 12345)
 Jan 30 15:49:33 svg118 sssd[nss]: Shutting down
 Jan 30 15:49:33 svg118 sssd: Starting up
 Jan 30 15:49:34 svg118 sssd[be[no.ep.corp.local]]: Starting up
 Jan 30 15:49:34 svg118 sssd[nss]: Starting up
 Jan 30 15:49:34 svg118 sssd[pam]: Starting up
 Jan 30 15:49:42 svg118 kernel: sssd_be[9928] general protection rip:41dc3d 
 rsp:7fff70baba70 error:0
 Jan 30 15:49:43 svg118 sssd[be[no.ep.corp.local]]: Starting up
 Jan 30 15:49:52 svg118 sssd[be[no.ep.corp.local]]: Shutting down
 Jan 30 15:49:52 svg118 sssd[pam]: Shutting down
 Jan 30 15:49:52 svg118 kcheckpass[9933]: Authentication failure for username 
 (invoked by uid 12345)
 Jan 30 15:49:52 svg118 sssd[nss]: Shutting down
 Jan 30 15:49:52 svg118 sssd: Starting up
 Jan 30 15:49:52 svg118 sssd[be[no.ep.corp.local]]: Starting up
 Jan 30 15:49:52 svg118 sssd[pam]: Starting up
 Jan 30 15:49:52 svg118 sssd[nss]: Starting up
 Jan 30 15:49:59 svg118 kernel: sssd_be[9985] general protection rip:41dc3d 
 rsp:7fff40912260 error:0
 

Definitely not a known issue. Do you think you could attach gdb to the
sssd_be process and try to get a backtrace for me to look at, please?


signature.asc
Description: This is a digitally signed message part
___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users

Re: [Freeipa-users] RHEL 5.7 / 5.8 BETA and KDE crashing SSSD

2012-01-30 Thread Sigbjorn Lie
Sure. Ive left the office for today, will do so tomorrow.

Im not very familiar with gdb. Any particular syntax / switches to add?

Rgds,
Siggi.
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Stephen Gallagher sgall...@redhat.com wrote:

On Mon, 2012-01-30 at 16:01 +0100, Sigbjorn Lie wrote:
 Hi,
 
 I'm doing a pre-implementation project for a customer having RHEL 5.7 
 workstations with KDE as
 their windows manager.
 
 When using KDE at a RHEL 5.7 (or 5.8 BETA) workstation connected to a IPA 
 2.1.3 running at RHEL
 6.2 server, sssd will crash every time I attempt to unlock the screen.
 
 To work around the issue I switch to tty1, log in as root, and restart sssd. 
 After attempting this
 several times (2-5 times), I can finally unlock the screen. I have attempted 
 to update one
 workstation to 5.8 beta to see if the issue was resolved there. No such luck.
 
 Is this a known issue?
 
 
 The log displays the following:
 
 Jan 30 15:49:16 svg118 kdesktop_lock: on 0
 Jan 30 15:49:21 svg118 kernel: sssd_be[9873] general protection rip:41dc3d 
 rsp:7fffc57c9f10 error:0
 Jan 30 15:49:22 svg118 sssd[be[no.ep.corp.local]]: Starting up
 Jan 30 15:49:33 svg118 sssd[be[no.ep.corp.local]]: Shutting down
 Jan 30 15:49:33 svg118 sssd[pam]: Shutting down
 Jan 30 15:49:33 svg118 kcheckpass[9896]: Authentication failure for username 
 (invoked by uid 12345)
 Jan 30 15:49:33 svg118 sssd[nss]: Shutting down
 Jan 30 15:49:33 svg118 sssd: Starting up
 Jan 30 15:49:34 svg118 sssd[be[no.ep.corp.local]]: Starting up
 Jan 30 15:49:34 svg118 sssd[nss]: Starting up
 Jan 30 15:49:34 svg118 sssd[pam]: Starting up
 Jan 30 15:49:42 svg118 kernel: sssd_be[9928] general protection rip:41dc3d 
 rsp:7fff70baba70 error:0
 Jan 30 15:49:43 svg118 sssd[be[no.ep.corp.local]]: Starting up
 Jan 30 15:49:52 svg118 sssd[be[no.ep.corp.local]]: Shutting down
 Jan 30 15:49:52 svg118 sssd[pam]: Shutting down
 Jan 30 15:49:52 svg118 kcheckpass[9933]: Authentication failure for username 
 (invoked by uid 12345)
 Jan 30 15:49:52 svg118 sssd[nss]: Shutting down
 Jan 30 15:49:52 svg118 sssd: Starting up
 Jan 30 15:49:52 svg118 sssd[be[no.ep.corp.local]]: Starting up
 Jan 30 15:49:52 svg118 sssd[pam]: Starting up
 Jan 30 15:49:52 svg118 sssd[nss]: Starting up
 Jan 30 15:49:59 svg118 kernel: sssd_be[9985] general protection rip:41dc3d 
 rsp:7fff40912260 error:0
 

Definitely not a known issue. Do you think you could attach gdb to the
sssd_be process and try to get a backtrace for me to look at, please?
_

Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users

___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users

Re: [Freeipa-users] RHEL 5.7 / 5.8 BETA and KDE crashing SSSD

2012-01-30 Thread Stephen Gallagher
On Mon, 2012-01-30 at 18:00 +0100, Sigbjorn Lie wrote:
 Sure. Ive left the office for today, will do so tomorrow.
 
 Im not very familiar with gdb. Any particular syntax / switches to
 add?
 
 Rgds,
 Siggi.

You'll want to do this in a non-graphical terminal, so you can switch to
it if KDE gets into trouble.

First, install the sssd-debuginfo packages (debuginfo-install sssd)
and install gdb (yum install gdb)

Then run:
gdb -p ($pidof sssd_be)

Then in the gdb prompt, type 'cont' (this will resume execution of
sssd_be).

Now, switch back to KDE and unlock the screen. Then switch back to this
virtual terminal.

You should be back at the prompt, with GDB telling you that you received
a SIGSEGV or SIGABRT.

Type bt full and reply with all pages of output from that (there may
be multiple, requiring you to hit enter to see them).


signature.asc
Description: This is a digitally signed message part
___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users

Re: [Freeipa-users] RHEL 5.7 / 5.8 BETA and KDE crashing SSSD

2012-01-30 Thread Sigbjorn Lie
Excellent, thank you. I will get this done tomorrow.
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Stephen Gallagher sgall...@redhat.com wrote:

On Mon, 2012-01-30 at 18:00 +0100, Sigbjorn Lie wrote:
 Sure. Ive left the office for today, will do so tomorrow.
 
 Im not very familiar with gdb. Any particular syntax / switches to
 add?
 
 Rgds,
 Siggi.

You'll want to do this in a non-graphical terminal, so you can switch to
it if KDE gets into trouble.

First, install the sssd-debuginfo packages (debuginfo-install sssd)
and install gdb (yum install gdb)

Then run:
gdb -p ($pidof sssd_be)

Then in the gdb prompt, type 'cont' (this will resume execution of
sssd_be).

Now, switch back to KDE and unlock the screen. Then switch back to this
virtual terminal.

You should be back at the prompt, with GDB telling you that you received
a SIGSEGV or SIGABRT.

Type bt full and reply with all pages of output from that (there may
be multiple, requiring you to hit enter to see them).

___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users