[jira] [Created] (HADOOP-13653) ZKDelegationTokenSecretManager curator client seems to rapidly connect & disconnect from ZK
Alex Ivanov created HADOOP-13653: Summary: ZKDelegationTokenSecretManager curator client seems to rapidly connect & disconnect from ZK Key: HADOOP-13653 URL: https://issues.apache.org/jira/browse/HADOOP-13653 Project: Hadoop Common Issue Type: Bug Components: kms Reporter: Alex Ivanov Priority: Critical During periods of time, KMS gets in a connect/disconnect loop from Zookeeper. It is not clear what causes the connection to be closed. I didn't see any issues on the ZK server side, so the issue must reside on client side. {{Example errors}} NOTE: I had to filter the logs heavily since they were many GB in size. What is left is an illustration of the delegation token creations, and the Zookeeper sessions getting lost and re-established over the course of 2 hours. {code} 2016-09-25 01:43:04,377 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [75027a21ab399aa7789d6907d70fadc4, 46] 2016-09-25 01:43:04,557 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [1106d0754d43dcf29324d7be737f51f0, 46] 2016-09-25 01:43:11,846 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [4426092c861f49c6ba0c60b49b9539e5, 46] 2016-09-25 01:43:48,974 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [a99efff2705d6489deb059098f18818f, 46] 2016-09-25 01:43:49,174 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [398b5962fd647880961ba5e86a77b414, 46] 2016-09-25 01:44:03,359 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [413187e62a21b5459422b5c524315d06, 46] 2016-09-25 01:44:03,625 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [7cc2c0d82edd40e7e6f6f40af20d04d3, 46] 2016-09-25 01:44:06,062 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [bd9394fce20607c12bc00104bea49284, 46] 2016-09-25 01:44:07,134 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [7dad3bd10526517e5e1cfccd2e96074a, 46] 2016-09-25 01:44:07,230 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [a712ed40687580647d070c9c7f525e15, 46] 2016-09-25 01:44:48,481 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [44bfefa31192c68e3cc053eec4e57e14, 46] 2016-09-25 01:44:48,522 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [67efc2aa65eeba701ad7d3d7bab51def, 46] 2016-09-25 01:44:50,259 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [b43e641f58dfbd2c72550ab6804f37d1, 46] 2016-09-25 01:44:54,271 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [ac2fbcf404c633759b75e6d6aae00e05, 46] 2016-09-25 01:44:56,141 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [cdbd224079a4a10400d00d0b8eece008, 46] 2016-09-25 01:45:01,328 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [e03218f4835524f3d05519d27bb04e35, 46] 2016-09-25 01:45:02,728 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [569ae6d666d584b6843fffc47a63d147, 46] 2016-09-25 01:45:02,832 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [c9048271483da234c12f75569b9513c6, 46] 2016-09-25 01:45:05,536 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [f519d621389e41b63e8d92b4cb15f832, 46] 2016-09-25 01:45:07,886 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [45cf6ba58b2bb348ac5e88fa18fe9dad, 46] 2016-09-25 01:47:24,346 WARN ConnectionState - Connection attempt unsuccessful after 66294 (greater than max timeout of 6). Resetting connection and trying again with a new connection. 2016-09-25 01:47:25,120 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [f160a865db69ef33548f146c9b3b84c6, 46] 2016-09-25 01:47:25,276 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [9d60add471464e01ef691c43bd901d96, 46] 2016-09-25 01:47:28,739 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [659ecabf02ff809736202a8484ff2be8, 46] 2016-09-25 01:48:33,233 WARN ConnectionState - Connection attempt unsuccessful after 64494 (greater than max timeout of 6). Resetting connection and trying again with a new connection. 2016-09-25 01:48:33,306 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [15b87dd6c2251177d8db9dda415d0e06, 46] 2016-09-25 01:48:33,459 INFO AbstractDelegationTokenSecretManager - Creating password for identifier: [306aa796017aab2b559bf503f81175e0, 46] 2016-09-25 01:48:34,665 INFO AbstractDelegationTokenSecretManager - Creating password for identifier:
[jira] [Created] (HADOOP-13652) ZKDelegationTokenSecretManager doesn't seem to honor ZK connection/session timeouts
Alex Ivanov created HADOOP-13652: Summary: ZKDelegationTokenSecretManager doesn't seem to honor ZK connection/session timeouts Key: HADOOP-13652 URL: https://issues.apache.org/jira/browse/HADOOP-13652 Project: Hadoop Common Issue Type: Bug Components: kms Reporter: Alex Ivanov Looking at some of the errors I've seen due to Zookeeper connection issues from KMS, it doesn't seem like the following timeouts are picked up. {code} package org.apache.hadoop.security.token.delegation; public abstract class ZKDelegationTokenSecretManager extends AbstractDelegationTokenSecretManager { public static final int ZK_DTSM_ZK_SESSION_TIMEOUT_DEFAULT = 1; public static final int ZK_DTSM_ZK_CONNECTION_TIMEOUT_DEFAULT = 1; ... } {code} Instead, the connection/session timeouts are, correspondingly, 15 & 60 secs: the curator defaults. {code} package org.apache.curator.framework; public class CuratorFrameworkFactory { private static final int DEFAULT_SESSION_TIMEOUT_MS = Integer.getInteger("curator-default-session-timeout", 60 * 1000); private static final int DEFAULT_CONNECTION_TIMEOUT_MS = Integer.getInteger("curator-default-connection-timeout", 15 * 1000); ... } {code} It looks like DelegationTokenAuthenticationFilter is setting curator, and that may cause an issue: {code} package org.apache.hadoop.security.token.delegation.web; public class DelegationTokenAuthenticationFilter extends AuthenticationFilter { protected void initializeAuthHandler(String authHandlerClassName, FilterConfig filterConfig) throws ServletException { ZKDelegationTokenSecretManager.setCurator((CuratorFramework) filterConfig.getServletContext().getAttribute(ZKSignerSecretProvider. ZOOKEEPER_SIGNER_SECRET_PROVIDER_CURATOR_CLIENT_ATTRIBUTE)); super.initializeAuthHandler(authHandlerClassName, filterConfig); ZKDelegationTokenSecretManager.setCurator(null); } {code} Example errors: {code} 2016-09-25 01:46:33,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001) 2016-09-25 01:46:33,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001) 2016-09-25 01:46:34,028 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15976) 2016-09-25 01:46:34,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (16001) 2016-09-25 01:46:37,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (19001) 2016-09-25 01:46:40,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (22001) 2016-09-25 01:46:49,055 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (31003) 2016-09-25 01:46:52,029 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (33977) 2016-09-25 01:47:05,344 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (47292) 2016-09-25 01:47:09,345 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (51292) 2016-09-25 01:47:24,346 WARN ConnectionState - Connection attempt unsuccessful after 66294 (greater than max timeout of 6). Resetting connection and trying again with a new connection. 2016-09-25 01:47:43,740 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001) 2016-09-25 01:47:43,740 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001) {code} There are also some connections issues between KMS and Zookeeper. It is sporadic, that's why I'm still trying to pinpoint them, but essentially KMS can get into this perpetual connect/disconnect cycle from which it eventually recovers or a restart also helps. I'm mentioning this fact in case it is related to this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13487) Hadoop KMS doesn't clean up old delegation tokens stored in Zookeeper
Alex Ivanov created HADOOP-13487: Summary: Hadoop KMS doesn't clean up old delegation tokens stored in Zookeeper Key: HADOOP-13487 URL: https://issues.apache.org/jira/browse/HADOOP-13487 Project: Hadoop Common Issue Type: Bug Components: kms Affects Versions: 2.6.0 Reporter: Alex Ivanov Configuration: CDH 5.5.1 (Hadoop 2.6+) KMS configured to store delegation tokens in Zookeeper DEBUG logging enabled in /etc/hadoop-kms/conf/kms-log4j.properties Findings: It seems to me delegation tokens never get cleaned up from Zookeeper past their renewal date. I can see in the logs that the removal thread is started with the expected interval: {code} 2016-08-11 08:15:24,511 INFO AbstractDelegationTokenSecretManager - Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s) {code} However, I don't see any delegation token removals, indicated by the following log message: org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager --> removeStoredToken(TokenIdent ident), line 769 [CDH] {code} if (LOG.isDebugEnabled()) { LOG.debug("Removing ZKDTSMDelegationToken_" + ident.getSequenceNumber()); } {code} Meanwhile, I see a lot of expired delegation tokens in Zookeeper that don't get cleaned up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org