[jira] [Created] (HADOOP-13653) ZKDelegationTokenSecretManager curator client seems to rapidly connect & disconnect from ZK

2016-09-25 Thread Alex Ivanov (JIRA)
Alex Ivanov created HADOOP-13653:


 Summary: ZKDelegationTokenSecretManager curator client seems to 
rapidly connect & disconnect from ZK
 Key: HADOOP-13653
 URL: https://issues.apache.org/jira/browse/HADOOP-13653
 Project: Hadoop Common
  Issue Type: Bug
  Components: kms
Reporter: Alex Ivanov
Priority: Critical


During periods of time, KMS gets in a connect/disconnect loop from Zookeeper. 
It is not clear what causes the connection to be closed. I didn't see any 
issues on the ZK server side, so the issue must reside on client side.

{{Example errors}}
NOTE: I had to filter the logs heavily since they were many GB in size. What is 
left is an illustration of the delegation token creations, and the Zookeeper 
sessions getting lost and re-established over the course of 2 hours.
{code}
2016-09-25 01:43:04,377 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [75027a21ab399aa7789d6907d70fadc4, 46]
2016-09-25 01:43:04,557 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [1106d0754d43dcf29324d7be737f51f0, 46]
2016-09-25 01:43:11,846 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [4426092c861f49c6ba0c60b49b9539e5, 46]
2016-09-25 01:43:48,974 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [a99efff2705d6489deb059098f18818f, 46]
2016-09-25 01:43:49,174 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [398b5962fd647880961ba5e86a77b414, 46]
2016-09-25 01:44:03,359 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [413187e62a21b5459422b5c524315d06, 46]
2016-09-25 01:44:03,625 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [7cc2c0d82edd40e7e6f6f40af20d04d3, 46]
2016-09-25 01:44:06,062 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [bd9394fce20607c12bc00104bea49284, 46]
2016-09-25 01:44:07,134 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [7dad3bd10526517e5e1cfccd2e96074a, 46]
2016-09-25 01:44:07,230 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [a712ed40687580647d070c9c7f525e15, 46]
2016-09-25 01:44:48,481 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [44bfefa31192c68e3cc053eec4e57e14, 46]
2016-09-25 01:44:48,522 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [67efc2aa65eeba701ad7d3d7bab51def, 46]
2016-09-25 01:44:50,259 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [b43e641f58dfbd2c72550ab6804f37d1, 46]
2016-09-25 01:44:54,271 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [ac2fbcf404c633759b75e6d6aae00e05, 46]
2016-09-25 01:44:56,141 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [cdbd224079a4a10400d00d0b8eece008, 46]
2016-09-25 01:45:01,328 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [e03218f4835524f3d05519d27bb04e35, 46]
2016-09-25 01:45:02,728 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [569ae6d666d584b6843fffc47a63d147, 46]
2016-09-25 01:45:02,832 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [c9048271483da234c12f75569b9513c6, 46]
2016-09-25 01:45:05,536 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [f519d621389e41b63e8d92b4cb15f832, 46]
2016-09-25 01:45:07,886 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [45cf6ba58b2bb348ac5e88fa18fe9dad, 46]
2016-09-25 01:47:24,346 WARN  ConnectionState - Connection attempt unsuccessful 
after 66294 (greater than max timeout of 6). Resetting connection and 
trying again with a new connection.
2016-09-25 01:47:25,120 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [f160a865db69ef33548f146c9b3b84c6, 46]
2016-09-25 01:47:25,276 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [9d60add471464e01ef691c43bd901d96, 46]
2016-09-25 01:47:28,739 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [659ecabf02ff809736202a8484ff2be8, 46]
2016-09-25 01:48:33,233 WARN  ConnectionState - Connection attempt unsuccessful 
after 64494 (greater than max timeout of 6). Resetting connection and 
trying again with a new connection.
2016-09-25 01:48:33,306 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [15b87dd6c2251177d8db9dda415d0e06, 46]
2016-09-25 01:48:33,459 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: [306aa796017aab2b559bf503f81175e0, 46]
2016-09-25 01:48:34,665 INFO  AbstractDelegationTokenSecretManager - Creating 
password for identifier: 

[jira] [Created] (HADOOP-13652) ZKDelegationTokenSecretManager doesn't seem to honor ZK connection/session timeouts

2016-09-25 Thread Alex Ivanov (JIRA)
Alex Ivanov created HADOOP-13652:


 Summary: ZKDelegationTokenSecretManager doesn't seem to honor ZK 
connection/session timeouts
 Key: HADOOP-13652
 URL: https://issues.apache.org/jira/browse/HADOOP-13652
 Project: Hadoop Common
  Issue Type: Bug
  Components: kms
Reporter: Alex Ivanov


Looking at some of the errors I've seen due to Zookeeper connection issues from 
KMS, it doesn't seem like the following timeouts are picked up.
{code}
package org.apache.hadoop.security.token.delegation;

public abstract class ZKDelegationTokenSecretManager
extends AbstractDelegationTokenSecretManager {
  public static final int ZK_DTSM_ZK_SESSION_TIMEOUT_DEFAULT = 1;
  public static final int ZK_DTSM_ZK_CONNECTION_TIMEOUT_DEFAULT = 1;
...
}
{code}

Instead, the connection/session timeouts are, correspondingly, 15 & 60 secs: 
the curator defaults.
{code}
package org.apache.curator.framework;

public class CuratorFrameworkFactory
{
private static final int DEFAULT_SESSION_TIMEOUT_MS = 
Integer.getInteger("curator-default-session-timeout", 60 * 1000);
private static final int DEFAULT_CONNECTION_TIMEOUT_MS = 
Integer.getInteger("curator-default-connection-timeout", 15 * 1000);
...
}
{code}

It looks like DelegationTokenAuthenticationFilter is setting curator, and that 
may cause an issue:
{code}
package org.apache.hadoop.security.token.delegation.web;

public class DelegationTokenAuthenticationFilter
extends AuthenticationFilter {

  protected void initializeAuthHandler(String authHandlerClassName,
  FilterConfig filterConfig) throws ServletException {
ZKDelegationTokenSecretManager.setCurator((CuratorFramework)
filterConfig.getServletContext().getAttribute(ZKSignerSecretProvider.
ZOOKEEPER_SIGNER_SECRET_PROVIDER_CURATOR_CLIENT_ATTRIBUTE));
super.initializeAuthHandler(authHandlerClassName, filterConfig);
ZKDelegationTokenSecretManager.setCurator(null);
}
{code}

Example errors:
{code}
2016-09-25 01:46:33,053 ERROR ConnectionState - Connection timed out for 
connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
2016-09-25 01:46:33,053 ERROR ConnectionState - Connection timed out for 
connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
2016-09-25 01:46:34,028 ERROR ConnectionState - Connection timed out for 
connection string (host1, host2, host3) and timeout (15000) / elapsed (15976)
2016-09-25 01:46:34,053 ERROR ConnectionState - Connection timed out for 
connection string (host1, host2, host3) and timeout (15000) / elapsed (16001)
2016-09-25 01:46:37,053 ERROR ConnectionState - Connection timed out for 
connection string (host1, host2, host3) and timeout (15000) / elapsed (19001)
2016-09-25 01:46:40,053 ERROR ConnectionState - Connection timed out for 
connection string (host1, host2, host3) and timeout (15000) / elapsed (22001)
2016-09-25 01:46:49,055 ERROR ConnectionState - Connection timed out for 
connection string (host1, host2, host3) and timeout (15000) / elapsed (31003)
2016-09-25 01:46:52,029 ERROR ConnectionState - Connection timed out for 
connection string (host1, host2, host3) and timeout (15000) / elapsed (33977)
2016-09-25 01:47:05,344 ERROR ConnectionState - Connection timed out for 
connection string (host1, host2, host3) and timeout (15000) / elapsed (47292)
2016-09-25 01:47:09,345 ERROR ConnectionState - Connection timed out for 
connection string (host1, host2, host3) and timeout (15000) / elapsed (51292)
2016-09-25 01:47:24,346 WARN  ConnectionState - Connection attempt unsuccessful 
after 66294 (greater than max timeout of 6). Resetting connection and 
trying again with a new connection.
2016-09-25 01:47:43,740 ERROR ConnectionState - Connection timed out for 
connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
2016-09-25 01:47:43,740 ERROR ConnectionState - Connection timed out for 
connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
{code}

There are also some connections issues between KMS and Zookeeper. It is 
sporadic, that's why I'm still trying to pinpoint them, but essentially KMS can 
get into this perpetual connect/disconnect cycle from which it eventually 
recovers or a restart also helps. I'm mentioning this fact in case it is 
related to this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13487) Hadoop KMS doesn't clean up old delegation tokens stored in Zookeeper

2016-08-11 Thread Alex Ivanov (JIRA)
Alex Ivanov created HADOOP-13487:


 Summary: Hadoop KMS doesn't clean up old delegation tokens stored 
in Zookeeper
 Key: HADOOP-13487
 URL: https://issues.apache.org/jira/browse/HADOOP-13487
 Project: Hadoop Common
  Issue Type: Bug
  Components: kms
Affects Versions: 2.6.0
Reporter: Alex Ivanov


Configuration:
CDH 5.5.1 (Hadoop 2.6+)
KMS configured to store delegation tokens in Zookeeper
DEBUG logging enabled in /etc/hadoop-kms/conf/kms-log4j.properties

Findings:
It seems to me delegation tokens never get cleaned up from Zookeeper past their 
renewal date. I can see in the logs that the removal thread is started with the 
expected interval:
{code}
2016-08-11 08:15:24,511 INFO  AbstractDelegationTokenSecretManager - Starting 
expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)
{code}
However, I don't see any delegation token removals, indicated by the following 
log message:
org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager --> 
removeStoredToken(TokenIdent ident), line 769 [CDH]
{code}
if (LOG.isDebugEnabled()) {
  LOG.debug("Removing ZKDTSMDelegationToken_"
  + ident.getSequenceNumber());
}
{code}
Meanwhile, I see a lot of expired delegation tokens in Zookeeper that don't get 
cleaned up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org