Howard Chu wrote:
After looking over slapd.1.log I believe I understand the problem now.
On slapd.1 the connection in question did this:
conn=7 fd=20 ACCEPT from IP=127.0.0.1:36431 (IP=127.0.0.1:9011)
connection_get(20)
connection_get(20): got connid=7
connection_read(20): checking for input on id=7
ber_get_next
ber_get_next: tag 0x30 len 46 contents:
ber_get_next
do_bind
...
connection_get(20)
connection_get(20): got connid=7
connection_read(20): checking for input on id=7
ber_get_next
ber_get_next: tag 0x30 len 46 contents:
connection_input: conn=7 deferring operation: binding
ber_get_next
ber_get_next: tag 0x30 len 46 contents:
connection_input: conn=7 deferring operation: pending operations
ber_get_next
...
ber_get_next on fd 20 failed errno=11 (Resource temporarily unavailable)
do_bind: version=3 dn="cn=Manager,dc=example,dc=com" method=128
conn=7 op=0 BIND dn="cn=Manager,dc=example,dc=com" method=128
==> hdb_bind: dn: cn=Manager,dc=example,dc=com
conn=7 op=0 BIND dn="cn=Manager,dc=example,dc=com" mech=SIMPLE ssf=0
do_bind: v3 bind: "cn=Manager,dc=example,dc=com" to
"cn=Manager,dc=example,dc=co
m"
send_ldap_result: conn=7 op=0 p=3
send_ldap_result: err=0 matched="" text=""
send_ldap_response: msgid=1 tag=97 err=0
ber_flush: 14 bytes to sd 20
conn=7 op=0 RESULT tag=97 err=0 text=
conn=8 fd=21 ACCEPT from IP=127.0.0.1:36432 (IP=127.0.0.1:9011)
do_bind
conn=7 op=2 BIND anonymous mech=implicit ssf=0
What's important to note here is that it skips from op=0 to op=2 and
never responds to op=1. By the time everything hangs, there are 15
pending operations on this connection but nothing is making progress any
more. I believe what has happened here is that op=1 got implicitly
abandoned once slapd recognized that op=2 was a Bind request. (This
occurs in connection_input(), before the operation gets queued.)
I think a correct fix might be to put off the abandon step until the
Bind request actually gets dequeued.
Also I'll note that this whole situation occurs due to back-ldap reusing
an existing connection. When I wrote the connection-sharing code in
back-ldap, I specifically wrote it to only reuse a connection if the
userID was the same, so it should never actually be sending multiple
Bind requests on the same connection. A lot of that has changed with
idassert support etc., so I suspect that there's really a bug in
back-ldap that needs to be fixed. But we should still consider when is
the correct time for an incoming Bind to trigger the abandons.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/