Hi Christopher,

Am 2019-07-02 um 17:49 schrieb [ext] Osipov, Michael:

[...]
During your ~1min stall, Tomcat is still waiting for data, right? When
the connection fails, Tomcat drops its error message at the same time,
right? Can you post a stack trace of what the Tomcat thread is doing
at that time? I assume it's blocked on a read of some kind.

I need to check this with jstack. I'll get back to you as soon as possible.

So I checked this and was able to get the dump right in the moment the request stalled. To my disappointment the offending thread did not lock or did not wait for read() on the native socket.

I have noticed this:
"http-apr-127.0.1.2-8081-exec-3" #33 daemon prio=5 os_prio=15 
tid=0x0000000a68036800 nid=0x188be runnable [0x00007fffdd1cc000]
   java.lang.Thread.State: RUNNABLE
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    - locked <0x0000000965edc140> (a java.net.SocksSocketImpl)
    at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at java.net.Socket.connect(Socket.java:538)
    at java.net.Socket.<init>(Socket.java:434)
    at java.net.Socket.<init>(Socket.java:211)
    at com.sun.jndi.ldap.Connection.createSocket(Connection.java:375)
    at com.sun.jndi.ldap.Connection.<init>(Connection.java:215)
    at com.sun.jndi.ldap.LdapClient.<init>(LdapClient.java:137)
    at com.sun.jndi.ldap.LdapClient.getInstance(LdapClient.java:1609)
    at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2749)
    at com.sun.jndi.ldap.LdapCtx.<init>(LdapCtx.java:319)
    at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:199)
    at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:217)
    at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:195)
    at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:217)
    at 
com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:156)
    at 
com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:86)
    at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:684)
    at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313)
    at javax.naming.InitialContext.init(InitialContext.java:244)
    at javax.naming.InitialContext.<init>(InitialContext.java:216)
    at 
javax.naming.directory.InitialDirContext.<init>(InitialDirContext.java:101)
    at 
net.sf.michaelo.dirctxsrc.DirContextSource$GSSInitialDirContext.<init>(DirContextSource.java:115)
    at 
net.sf.michaelo.dirctxsrc.DirContextSource$1.run(DirContextSource.java:606)
    at 
net.sf.michaelo.dirctxsrc.DirContextSource$1.run(DirContextSource.java:583)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at 
net.sf.michaelo.dirctxsrc.DirContextSource.getGssApiDirContext(DirContextSource.java:583)
    at 
net.sf.michaelo.dirctxsrc.DirContextSource.getDirContext(DirContextSource.java:692)
    at 
net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.open(ActiveDirectoryRealm.java:321)
    at 
net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.getPrincipal(ActiveDirectoryRealm.java:268)
    at 
net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.authenticate(ActiveDirectoryRealm.java:255)
    at 
net.sf.michaelo.tomcat.authenticator.SpnegoAuthenticator.doAuthenticate(SpnegoAuthenticator.java:166)
    at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:575)
    at 
org.apache.catalina.valves.rewrite.RewriteValve.invoke(RewriteValve.java:556)

We query the Active Directory via LDAP with the user's Kerberos principal. As you can see the thread is waiting for a socket to connect. No DCs are hardcoded, they are all retreived via DNS SRV lookups for our AD site. The point here is that we have major trouble with two of four DCs at our site not properly respoding to services like DNS, Kerberos, and LDAP. (Completely out of my department's control) I have made a quick standalone reproducer to try those faulty DCs on port 389/3268 and I had my confirmation. They do block the thread for more than a minute (OS connect timeout).

Our counter measures were to reduce the default connect timeout for InitialDirContext down to 1000 ms and query another local AD site which is not serving our subnet.

So, thank you very much giving me the right pointer to start!

One question arises though: How do I properly size the ProxyTimeout parameter? The longest possible request?

Regards,

Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to