yqwang-ms commented on pull request #28323:
URL: https://github.com/apache/spark/pull/28323#issuecomment-620338444


   For testing, until now, I have did:
   1. End to end simple spark job test (like spark word count) on secure Hadoop 
cluster.
   2. All existing unit tests are passed
   
   And we can also do a little reasoning, to prove that this change does not 
introduce more issue than current spark (given the code change is small).
   For current spark,
   
https://github.com/apache/spark/blob/410fa913215665db72f871b556569cba3dc9ee0a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L66-L76
   The *transferCredentials* func can only transfer Hadoop creds such as 
Delegation Tokens.
    However, other creds stored in UGI.subject.getPrivateCredentials, will be 
lost here, such as:
   - Non-Hadoop creds:
    Such as, [Kafka 
creds](https://github.com/apache/kafka/blob/f3c8bff311b0e4c4d0e316ac949fe4491f9b107f/clients/src/main/java/org/apache/kafka/common/security/oauthbearer/OAuthBearerLoginModule.java#L395)
   - Newly supported or 3rd party supported Hadoop creds:
    Such as to support OAuth/JWT token authn on Hadoop, we need to store the 
OAuth/JWT token into UGI.subject.getPrivateCredentials. However, these tokens 
are not supposed to be managed by Hadoop Credentials (currently it is only for 
Hadoop secret keys and delegation tokens).
   
   BTW, if we use SPARK_USER to be the effective user, and 
UserGroupInformation.getCurrentUser as real user (to impersonate the effective 
user), we should use createProxyUser, instead of, directly transferCredentials 
from UserGroupInformation.getCurrentUser to SPARK_USER, the transfered creds 
may not may with the SPARK_USER (such as user name is not matched, so server 
may choose to reject the creds). You can also check other places in Spark and 
Hadoop, they all use createProxyUser instead of hacking like 
transferCredentials.
   
   What this change do is, just replace the transferCredentials with 
createProxyUser, so the creds are matched with the real user, beside, it will 
not lost any creds stored in UGI.subject.getPrivateCredentials. So, it can 
**only increase** the possibility that a target RPC server accept our UGI, i.e. 
successfully authentication. So, it will have no impacts for current working 
well spark jobs.
   
   For more, see createProxyUser:
   
https://github.com/apache/hadoop/blob/5e0eda5d5f696aba7fc209874d232baf2a50d547/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1442-L1451
   
   
   For the 
   
https://github.com/apache/spark/blob/410fa913215665db72f871b556569cba3dc9ee0a/core/src/main/scala/org/apache/spark/util/Utils.scala#L2410-L2413
   , I agree it is not safe to change to use getUserName.
   So I keep it to use getShortUserName, pls check the change.
   
   
   For this,
   
https://github.com/apache/spark/blob/410fa913215665db72f871b556569cba3dc9ee0a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L844-L849
   It has nothing related to current creds lost issue, but it should be a bug 
in Spark. Because RMPrincipal should be set to the full qulified name, i.e. 
getUserName instead of getShortUserName . Pls check hadoop related codes and 
usage. However, if you think it is not suitable or safe in current PR, I can 
remove it :)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to