[GitHub] [spark] yqwang-ms commented on pull request #28323: [SPARK-31551][CORE] Fix createSparkUser lost user's non-Hadoop credentials

GitBox Mon, 27 Apr 2020 19:25:10 -0700


yqwang-ms commented on pull request #28323:
URL: https://github.com/apache/spark/pull/28323#issuecomment-620338444

For testing, until now, I have did:
1. End to end simple spark job test (like spark word count) on secure Hadoop
cluster.
2. All existing unit tests are passed

And we can also do a little reasoning, to prove that this change does not
introduce more issue than current spark (given the code change is small).
For current spark,

https://github.com/apache/spark/blob/410fa913215665db72f871b556569cba3dc9ee0a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L66-L76
The *transferCredentials* func can only transfer Hadoop creds such as
Delegation Tokens.
However, other creds stored in UGI.subject.getPrivateCredentials, will be
lost here, such as:
- Non-Hadoop creds:
Such as, [Kafka
creds](https://github.com/apache/kafka/blob/f3c8bff311b0e4c4d0e316ac949fe4491f9b107f/clients/src/main/java/org/apache/kafka/common/security/oauthbearer/OAuthBearerLoginModule.java#L395)
- Newly supported or 3rd party supported Hadoop creds:
Such as to support OAuth/JWT token authn on Hadoop, we need to store the
OAuth/JWT token into UGI.subject.getPrivateCredentials. However, these tokens
are not supposed to be managed by Hadoop Credentials (currently it is only for
Hadoop secret keys and delegation tokens).

BTW, if we use SPARK_USER to be the effective user, and
UserGroupInformation.getCurrentUser as real user (to impersonate the effective
user), we should use createProxyUser, instead of, directly transferCredentials
from UserGroupInformation.getCurrentUser to SPARK_USER, the transfered creds
may not may with the SPARK_USER (such as user name is not matched, so server
may choose to reject the creds). You can also check other places in Spark and
Hadoop, they all use createProxyUser instead of hacking like
transferCredentials.

What this change do is, just replace the transferCredentials with
createProxyUser, so the creds are matched with the real user, beside, it will
not lost any creds stored in UGI.subject.getPrivateCredentials. So, it can
**only increase** the possibility that a target RPC server accept our UGI, i.e.
successfully authentication. So, it will have no impacts for current working
well spark jobs.

For more, see createProxyUser:

https://github.com/apache/hadoop/blob/5e0eda5d5f696aba7fc209874d232baf2a50d547/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1442-L1451

For the

https://github.com/apache/spark/blob/410fa913215665db72f871b556569cba3dc9ee0a/core/src/main/scala/org/apache/spark/util/Utils.scala#L2410-L2413
, I agree it is not safe to change to use getUserName.
So I keep it to use getShortUserName, pls check the change.

For this,

https://github.com/apache/spark/blob/410fa913215665db72f871b556569cba3dc9ee0a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L844-L849
It has nothing related to current creds lost issue, but it should be a bug
in Spark. Because RMPrincipal should be set to the full qulified name, i.e.
getUserName instead of getShortUserName . Pls check hadoop related codes and
usage. However, if you think it is not suitable or safe in current PR, I can
remove it :)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] yqwang-ms commented on pull request #28323: [SPARK-31551][CORE] Fix createSparkUser lost user's non-Hadoop credentials

Reply via email to