[jira] [Comment Edited] (HDFS-15756) RBF: Cannot get updated delegation token from zookeeper

2021-04-16 Thread zhangxiping (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322640#comment-17322640
 ] 

zhangxiping edited comment on HDFS-15756 at 4/16/21, 7:05 AM:
--

[~hexiaoqiao]   Thank you for your reply. Yes, I have the same question. I 
think Spark should not renew token so soon.To avoid the current problem, we 
implement that the Router waits for  100ms to retry to get the  token(recently 
generated) from ZK.Since the probability of this problem is relatively small, 
it occurs dozens of times a day and does not affect the performance of the 
Router, it can also be obtained from ZK.But I think it would be better to try 
again on the client side, which is not a perfect solution and can only 
circumvent the problem temporarily. Expect a better solution.

 


was (Author: zhangxiping):
[~hexiaoqiao]   Thank you for your reply. Yes, I have the same question. I 
think Spark should not renew token so soon.To avoid the current problem, we 
implement that the Router waits for  100ms to retry to get the  token(recently 
generated) from ZK.Since the probability of this problem is relatively small, 
it occurs dozens of times a day and does not affect the performance of the 
Router, it can also be obtained from ZK.But I think it would be better to try 
again on the client side, which is not a perfect solution and can only 
circumvent the problem temporarily. Expect a better solution.

 
{code:java}
//代码占位符  AbstractDelegationTokenSecretManager

protected synchronized byte[] createPassword(TokenIdent identifier) {
  int sequenceNum;
  long now = Time.now();
  sequenceNum = incrementDelegationTokenSeqNum();
  identifier.setIssueDate(now);
  identifier.setMaxDate(now + tokenMaxLifetime);
  identifier.setMasterKeyId(currentKey.getKeyId());
  identifier.setSequenceNumber(sequenceNum);
  LOG.info("Creating password for identifier: " + formatTokenId(identifier)
  + ", currentKey: " + currentKey.getKeyId());
  byte[] password = createPassword(identifier.getBytes(), currentKey.getKey());
  DelegationTokenInformation tokenInfo = new DelegationTokenInformation(now
  + tokenRenewInterval, password, getTrackingIdIfEnabled(identifier));
  try {
storeToken(identifier, tokenInfo);
  } catch (IOException ioe) {
LOG.error("Could not store token " + formatTokenId(identifier) + "!!",
ioe);
  }
  return password;
}
{code}
 

> RBF: Cannot get updated delegation token from zookeeper
> ---
>
> Key: HDFS-15756
> URL: https://issues.apache.org/jira/browse/HDFS-15756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.0.0
>Reporter: hbprotoss
>Priority: Major
>
> Affected version: all version with rbf
> When RBF work with spark 2.4 client mode, there will be a chance that token 
> is missing across different nodes in RBF cluster. The root cause is that 
> spark renew the  token(via resource manager) immediately after got one, as 
> zookeeper don't have a strong consistency guarantee after an update in 
> cluster, zookeeper client may read a stale value in some followers not synced 
> with other nodes.
>  
> We apply a patch in spark, but it is still the problem of RBF. Is it possible 
> for RBF to replace the delegation token store using some other 
> datasource(redis for example)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15756) RBF: Cannot get updated delegation token from zookeeper

2021-04-15 Thread zhangxiping (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321866#comment-17321866
 ] 

zhangxiping edited comment on HDFS-15756 at 4/15/21, 6:10 AM:
--

-_- like :
{code:java}
//代码占位符 AbstractDelegationTokenSecretManager

public synchronized long renewToken(Token token,
   String renewer) throws InvalidToken, IOException {
  ByteArrayInputStream buf = new ByteArrayInputStream(token.getIdentifier());
  DataInputStream in = new DataInputStream(buf);
  TokenIdent id = createIdentifier();
  id.readFields(in);
  LOG.info("Token renewal for identifier: " + formatTokenId(id)
  + "; total currentTokens " +  currentTokens.size());

  ...
  ...

  DelegationTokenInformation tokenInfo = getTokenInfo(id);

  if (tokenInfo == null && id.getIssueDate() + 6 > now) {
throw new StandbyException("Renewal request for unknown token "
+ formatTokenId(id)+",will Failover to other server and retry");
  }

  if (tokenInfo == null) {
throw new InvalidToken("Renewal request for unknown token "
+ formatTokenId(id));
  }
  updateToken(id, info);
  return renewTime;
}

{code}


was (Author: zhangxiping):
-_- like :
{code:java}
//代码占位符 AbstractDelegationTokenSecretManager

public synchronized long renewToken(Token token,
   String renewer) throws InvalidToken, IOException {
  ByteArrayInputStream buf = new ByteArrayInputStream(token.getIdentifier());
  DataInputStream in = new DataInputStream(buf);
  TokenIdent id = createIdentifier();
  id.readFields(in);
  LOG.info("Token renewal for identifier: " + formatTokenId(id)
  + "; total currentTokens " +  currentTokens.size());

  long now = Time.now();
  ...
  ...

  if (getTokenInfo(id) == null && id.getIssueDate() + 6 > now) {
throw new StandbyException("Renewal request for unknown token "
+ formatTokenId(id)+",will Failover to other server and retry");
  }

 // if (getTokenInfo(id) == null) {
 //   throw new InvalidToken("Renewal request for unknown token "
 //   + formatTokenId(id));
 // }
  updateToken(id, info);
  return renewTime;
}

{code}

> RBF: Cannot get updated delegation token from zookeeper
> ---
>
> Key: HDFS-15756
> URL: https://issues.apache.org/jira/browse/HDFS-15756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.0.0
>Reporter: hbprotoss
>Priority: Major
>
> Affected version: all version with rbf
> When RBF work with spark 2.4 client mode, there will be a chance that token 
> is missing across different nodes in RBF cluster. The root cause is that 
> spark renew the  token(via resource manager) immediately after got one, as 
> zookeeper don't have a strong consistency guarantee after an update in 
> cluster, zookeeper client may read a stale value in some followers not synced 
> with other nodes.
>  
> We apply a patch in spark, but it is still the problem of RBF. Is it possible 
> for RBF to replace the delegation token store using some other 
> datasource(redis for example)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15756) RBF: Cannot get updated delegation token from zookeeper

2021-04-15 Thread zhangxiping (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321866#comment-17321866
 ] 

zhangxiping edited comment on HDFS-15756 at 4/15/21, 6:07 AM:
--

-_- like :
{code:java}
//代码占位符 AbstractDelegationTokenSecretManager

public synchronized long renewToken(Token token,
   String renewer) throws InvalidToken, IOException {
  ByteArrayInputStream buf = new ByteArrayInputStream(token.getIdentifier());
  DataInputStream in = new DataInputStream(buf);
  TokenIdent id = createIdentifier();
  id.readFields(in);
  LOG.info("Token renewal for identifier: " + formatTokenId(id)
  + "; total currentTokens " +  currentTokens.size());

  long now = Time.now();
  ...
  ...

  if (getTokenInfo(id) == null && id.getIssueDate() + 6 > now) {
throw new StandbyException("Renewal request for unknown token "
+ formatTokenId(id)+",will Failover to other server and retry");
  }

 // if (getTokenInfo(id) == null) {
 //   throw new InvalidToken("Renewal request for unknown token "
 //   + formatTokenId(id));
 // }
  updateToken(id, info);
  return renewTime;
}

{code}


was (Author: zhangxiping):
-_- like :
{code:java}
//代码占位符 AbstractDelegationTokenSecretManager

public synchronized long renewToken(Token token,
   String renewer) throws InvalidToken, IOException {
  ByteArrayInputStream buf = new ByteArrayInputStream(token.getIdentifier());
  DataInputStream in = new DataInputStream(buf);
  TokenIdent id = createIdentifier();
  id.readFields(in);
  LOG.info("Token renewal for identifier: " + formatTokenId(id)
  + "; total currentTokens " +  currentTokens.size());

  long now = Time.now();
  ...
  ...

  if (getTokenInfo(id) == null && id.getIssueDate() + 6 > now) {
throw new StandbyException("Renewal request for unknown token "
+ formatTokenId(id)+",will Failover to other server and retry");
  }

  if (getTokenInfo(id) == null) {
throw new InvalidToken("Renewal request for unknown token "
+ formatTokenId(id));
  }
  updateToken(id, info);
  return renewTime;
}

{code}

> RBF: Cannot get updated delegation token from zookeeper
> ---
>
> Key: HDFS-15756
> URL: https://issues.apache.org/jira/browse/HDFS-15756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.0.0
>Reporter: hbprotoss
>Priority: Major
>
> Affected version: all version with rbf
> When RBF work with spark 2.4 client mode, there will be a chance that token 
> is missing across different nodes in RBF cluster. The root cause is that 
> spark renew the  token(via resource manager) immediately after got one, as 
> zookeeper don't have a strong consistency guarantee after an update in 
> cluster, zookeeper client may read a stale value in some followers not synced 
> with other nodes.
>  
> We apply a patch in spark, but it is still the problem of RBF. Is it possible 
> for RBF to replace the delegation token store using some other 
> datasource(redis for example)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15756) RBF: Cannot get updated delegation token from zookeeper

2021-04-14 Thread zhangxiping (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321866#comment-17321866
 ] 

zhangxiping edited comment on HDFS-15756 at 4/15/21, 3:04 AM:
--

-_- like :
{code:java}
//代码占位符 AbstractDelegationTokenSecretManager

public synchronized long renewToken(Token token,
   String renewer) throws InvalidToken, IOException {
  ByteArrayInputStream buf = new ByteArrayInputStream(token.getIdentifier());
  DataInputStream in = new DataInputStream(buf);
  TokenIdent id = createIdentifier();
  id.readFields(in);
  LOG.info("Token renewal for identifier: " + formatTokenId(id)
  + "; total currentTokens " +  currentTokens.size());

  long now = Time.now();
  ...
  ...

  if (getTokenInfo(id) == null && id.getIssueDate() + 6 > now) {
throw new StandbyException("Renewal request for unknown token "
+ formatTokenId(id)+",will Failover to other server and retry");
  }

  if (getTokenInfo(id) == null) {
throw new InvalidToken("Renewal request for unknown token "
+ formatTokenId(id));
  }
  updateToken(id, info);
  return renewTime;
}

{code}


was (Author: zhangxiping):
like :
{code:java}
//代码占位符 AbstractDelegationTokenSecretManager

public synchronized long renewToken(Token token,
   String renewer) throws InvalidToken, IOException {
  ByteArrayInputStream buf = new ByteArrayInputStream(token.getIdentifier());
  DataInputStream in = new DataInputStream(buf);
  TokenIdent id = createIdentifier();
  id.readFields(in);
  LOG.info("Token renewal for identifier: " + formatTokenId(id)
  + "; total currentTokens " +  currentTokens.size());

  long now = Time.now();
  ...
  ...

  if (getTokenInfo(id) == null && id.getIssueDate() + 6 > now) {
throw new StandbyException("Renewal request for unknown token "
+ formatTokenId(id)+",will Failover to other server and retry");
  }

  if (getTokenInfo(id) == null) {
throw new InvalidToken("Renewal request for unknown token "
+ formatTokenId(id));
  }
  updateToken(id, info);
  return renewTime;
}

{code}

> RBF: Cannot get updated delegation token from zookeeper
> ---
>
> Key: HDFS-15756
> URL: https://issues.apache.org/jira/browse/HDFS-15756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.0.0
>Reporter: hbprotoss
>Priority: Major
>
> Affected version: all version with rbf
> When RBF work with spark 2.4 client mode, there will be a chance that token 
> is missing across different nodes in RBF cluster. The root cause is that 
> spark renew the  token(via resource manager) immediately after got one, as 
> zookeeper don't have a strong consistency guarantee after an update in 
> cluster, zookeeper client may read a stale value in some followers not synced 
> with other nodes.
>  
> We apply a patch in spark, but it is still the problem of RBF. Is it possible 
> for RBF to replace the delegation token store using some other 
> datasource(redis for example)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15756) RBF: Cannot get updated delegation token from zookeeper

2021-04-14 Thread zhangxiping (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321862#comment-17321862
 ] 

zhangxiping edited comment on HDFS-15756 at 4/15/21, 2:35 AM:
--

[~hexiaoqiao]  

I am very happy to see your comment. We are also using Router now and have 
encountered this problem.We encountered this problem when we submitted the 
Spark application. During the task submission, the following log appeared:

 
{code:java}
//代码占位符
2021-04-13 01:01:13 CST DFSClient INFO - Created HDFS_DELEGATION_TOKEN token 
205440696 for da_music on ha-hdfs:hz-cluster11 
2021-04-13 01:01:13 CST SparkContext ERROR - Error initializing SparkContext. 
org.apache.hadoop.security.token.SecretManager$InvalidToken: Renewal request 
for unknown token (token for da_music: HDFS_DELEGATION_TOKEN 
owner=da_music/d...@hadoop.hz.netease.com, renewer=da_music, realUser=, 
issueDate=1618246873345, maxDate=1618851673345, sequenceNumber=205440696, 
masterKeyId=161) at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:543)

{code}
{code:java}
//代码占位符
  private def getTokenRenewalInterval(
  hadoopConf: Configuration,
  sparkConf: SparkConf,
  filesystems: Set[FileSystem]): Option[Long] = {
// We cannot use the tokens generated with renewer yarn. Trying to renew
// those will fail with an access control issue. So create new tokens with 
the logged in
// user as renewer.
sparkConf.get(PRINCIPAL).flatMap { renewer =>
  val creds = new Credentials()
  fetchDelegationTokens(renewer, filesystems, creds)

  val renewIntervals = creds.getAllTokens.asScala.filter {
_.decodeIdentifier().isInstanceOf[AbstractDelegationTokenIdentifier]
  }.flatMap { token =>
Try {
  val newExpiration = token.renew(hadoopConf)
  val identifier = 
token.decodeIdentifier().asInstanceOf[AbstractDelegationTokenIdentifier]
  val interval = newExpiration - identifier.getIssueDate
  logInfo(s"Renewal interval is $interval for token 
${token.getKind.toString}")
  interval
}.toOption
  }
  if (renewIntervals.isEmpty) None else Some(renewIntervals.min)
}
  }
}
{code}
Looking at the Spark2.4.7 code, we found that the time between creating a token 
and renew it is very short。

So can we only make retry requests for tokens that were created shortly after, 
such as those created within 1 minute?

 


was (Author: zhangxiping):
[~hexiaoqiao]  

I am very happy to see your comment. We are also using Router now and have 
encountered this problem.We encountered this problem when we submitted the 
Spark application. During the task submission, the following log appeared:

 
{code:java}
//代码占位符
2021-04-13 01:01:13 CST DFSClient INFO - Created HDFS_DELEGATION_TOKEN token 
205440696 for da_music on ha-hdfs:hz-cluster11 
2021-04-13 01:01:13 CST SparkContext ERROR - Error initializing SparkContext. 
org.apache.hadoop.security.token.SecretManager$InvalidToken: Renewal request 
for unknown token (token for da_music: HDFS_DELEGATION_TOKEN 
owner=da_music/d...@hadoop.hz.netease.com, renewer=da_music, realUser=, 
issueDate=1618246873345, maxDate=1618851673345, sequenceNumber=205440696, 
masterKeyId=161) at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:543)

{code}
!image-2021-04-15-10-27-29-927.png!

Looking at the Spark2.4.7 code, we found that the time between creating a token 
and renew it is very short。

So can we only make retry requests for tokens that were created shortly after, 
such as those created within 1 minute?

 

> RBF: Cannot get updated delegation token from zookeeper
> ---
>
> Key: HDFS-15756
> URL: https://issues.apache.org/jira/browse/HDFS-15756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.0.0
>Reporter: hbprotoss
>Priority: Major
>
> Affected version: all version with rbf
> When RBF work with spark 2.4 client mode, there will be a chance that token 
> is missing across different nodes in RBF cluster. The root cause is that 
> spark renew the  token(via resource manager) immediately after got one, as 
> zookeeper don't have a strong consistency guarantee after an update in 
> cluster, zookeeper client may read a stale value in some followers not synced 
> with other nodes.
>  
> We apply a patch in spark, but it is still the problem of RBF. Is it possible 
> for RBF to replace the delegation token store using some other 
> datasource(redis for example)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe,