Timo Aaltonen wrote: > On Thu, 20 Jan 2011, Howard Chu wrote: > >> [email protected] wrote: >>> Hi >>> >>> Here's some information that Stephen asked would be of use. There is >>> one forest, one domain, but three sites in the layout. The functional >>> level of the forest and the domain is W2008, but the servers have 2008R2. >>> >>> And the full backtrace of the hung process: >> >>> #3 0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock >>> (mutex=0x7f8f6553fc80) >>> at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296 >>> No locals. >>> #4 0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20, >>> dn=0x0, >>> mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0, >>> flags=2, >>> interact=0x7f8f61405120<sdap_sasl_interact>, defaults=0x2124a50) at >>> sasl.c:426 >>> rc = -1921681294 >>> smechs = 0x0 >> >> This particular mutex seems kind of bogus to me; the code is from rev 1.31 in >> June 2001. Perhaps back then it was unsafe to have multiple SASL operations >> outstanding at once; I would expect that was only an issue in the Cyrus 1.5 >> days and it should be safe now with Cyrus 2.x. We should probably just delete >> this mutex. > > Ok, so by doing this: > > --- openldap-2.4.23.orig/libraries/libldap/sasl.c > +++ openldap-2.4.23/libraries/libldap/sasl.c > @@ -421,10 +421,11 @@ > { > int rc; > char *smechs = NULL; > - > +/* > #if defined( LDAP_R_COMPILE )&& defined( HAVE_CYRUS_SASL ) > ldap_pvt_thread_mutex_lock(&ldap_int_sasl_mutex ); > #endif > +*/ > #ifdef LDAP_CONNECTIONLESS > if( LDAP_IS_UDP(ld) ) { > /* Just force it to simple bind, silly to make the user > > -- > > .. the process doesn't hang anymore. But it still doesn't do what it's > supposed to, but that could be a bug in SSSD. I'll investigate further. > > Thanks! > As I noted in a previous followup, it's not clear to me that the Cyrus SASL library is actually safe to use without that mutex. Also, going through your provided backtraces, I see the real issue is that two different requests were active at the same time. I.e., there was an active request that triggered a referral, and an unrelated request. You would also have avoided this issue if you waited for the request that triggered the referrals to complete before issuing any other requests.
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
