Github user marsishandsome commented on the pull request:
https://github.com/apache/spark/pull/9168#issuecomment-149398132
The reason to my opinion is:
1 Spark AM will get a HDFS Delegation Token and add it to the Current
User's Credential.
This Token looks like:
token1: "ha-hdfs:hadoop-namenode" -> "Kind: HDFS_DELEGATION_TOKEN, Service:
ha-hdfs:hadoop-namenode, Ident: (HDFS_DELEGATION_TOKEN token 328709 for test)".
2 DFSClient will generate another 2 Tokens for each NameNode.
token2: "ha-hdfs://xxx.xxx.xxx.xxx:8020" -> "Kind: HDFS_DELEGATION_TOKEN,
Service: xxx.xxx.xxx.xxx:8020, Ident: (HDFS_DELEGATION_TOKEN token 328708 for
test)"
token3: "ha-hdfs://yyy:yyy:yyy:yyy:8020" -> "Kind: HDFS_DELEGATION_TOKEN,
Service: yyy:yyy:yyy:yyy:8020, Ident: (HDFS_DELEGATION_TOKEN token 328708 for
test)"
3 DFSClient will not generate token2 and token3 automatically, when Spark
update token1.
DFSClient will only use token2 and token3 to communicate with the 2 Name
Nodes.
4 FileSystem has cache, calling FileSystem.get will get a cached DFSClient,
which has old tokens.
Spark only update token1, but DFSClient will use token2 and token3.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]