[GitHub] [spark] yqwang-ms edited a comment on pull request #28323: [SPARK-31551][CORE] Fix createSparkUser lost user's non-Hadoop credentials

GitBox Mon, 27 Apr 2020 19:39:55 -0700


yqwang-ms edited a comment on pull request #28323:
URL: https://github.com/apache/spark/pull/28323#issuecomment-620338444

For testing, until now, I have did:
1. End to end simple spark job test (like spark word count) on secure Hadoop
cluster.
2. All existing unit tests are passed

And we can also do a little reasoning, to **prove that this change does not
introduce more issue than current spark** (given the code change is small).
For current spark,

https://github.com/apache/spark/blob/410fa913215665db72f871b556569cba3dc9ee0a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L66-L76
The **transferCredentials** func can only transfer Hadoop creds such as
Delegation Tokens.
However, other creds stored in UGI.subject.getPrivateCredentials, will be
lost here, such as:
- Non-Hadoop creds:
Such as, [Kafka
creds](https://github.com/apache/kafka/blob/f3c8bff311b0e4c4d0e316ac949fe4491f9b107f/clients/src/main/java/org/apache/kafka/common/security/oauthbearer/OAuthBearerLoginModule.java#L395)
- Newly supported or 3rd party supported Hadoop creds:
Such as to support OAuth/JWT token authn on Hadoop, we need to store the
OAuth/JWT token into UGI.subject.getPrivateCredentials. However, these tokens
are not supposed to be managed by Hadoop Credentials (currently it is only for
Hadoop secret keys and delegation tokens).

BTW, if we use SPARK_USER to be the effective user, and
UserGroupInformation.getCurrentUser as real user (to impersonate the effective
user), we should use **createProxyUser**, instead of, directly
transferCredentials from UserGroupInformation.getCurrentUser to SPARK_USER.
This is because the transfered creds may not match with the SPARK_USER (such as
user name is not matched, so server may choose to reject the creds). You can
also check other places in Spark and Hadoop, they all use createProxyUser
instead of hacking like transferCredentials.

What this change do is, just replace the transferCredentials with
createProxyUser, so the creds are matched with the real user, beside, it will
not lost any creds stored in UGI.subject.getPrivateCredentials (including the
creds transfered by transferCredentials). So, it can **only increase** the
possibility that a target RPC server accept our UGI, i.e. successful
authentication. **So, it will have no impacts for current working well spark
jobs.**

For more, see createProxyUser:

https://github.com/apache/hadoop/blob/5e0eda5d5f696aba7fc209874d232baf2a50d547/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1442-L1451

For the

https://github.com/apache/spark/blob/410fa913215665db72f871b556569cba3dc9ee0a/core/src/main/scala/org/apache/spark/util/Utils.scala#L2410-L2413
, I agree it is not safe to change to use getUserName.
So I keep it to use getShortUserName, pls check the change.

For this,

https://github.com/apache/spark/blob/410fa913215665db72f871b556569cba3dc9ee0a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L844-L849
It has nothing related to current creds lost issue, but it should be a bug
in Spark. Because RMPrincipal should be set to the fully qualified name, i.e.
getUserName instead of getShortUserName . Pls check hadoop related codes and
usage. Again, a fully qualified name, will **only increase** successful
authentication. However, if you think it is not suitable or safe in current PR,
I can remove it :)

The only change, that may impact existing spark jobs, is:
Someone previous set SPARK_USER as a fully qualified name (such as
yqwang-ms/[email protected]), now will get the short UserName from
getCurrentUserName. However, for previous spark, if no SPARK_USER is provided,
we also return UserGroupInformation.getCurrentUser().getShortUserName(). So,
let's align the getCurrentUserName to return short UserName, and I think this
is an inevitable change and impact.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] yqwang-ms edited a comment on pull request #28323: [SPARK-31551][CORE] Fix createSparkUser lost user's non-Hadoop credentials

Reply via email to