Yuqi Wang created SPARK-31551:
---------------------------------

             Summary: createSparkUser lost user's non-Hadoop credentials and 
fully qualified user name
                 Key: SPARK-31551
                 URL: https://issues.apache.org/jira/browse/SPARK-31551
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.4.5, 2.4.4
            Reporter: Yuqi Wang


Current createRemoteUser:

[https://github.com/apache/spark/blob/263f04db865920d9c10251517b00a1b477b58ff1/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L66-L76]

{code:java}
   def createSparkUser(): UserGroupInformation = {
    val user = Utils.getCurrentUserName()
    logDebug("creating UGI for user: " + user)
    val ugi = UserGroupInformation.createRemoteUser(user)
    transferCredentials(UserGroupInformation.getCurrentUser(), ugi)
    ugi
  }

  def transferCredentials(source: UserGroupInformation, dest: 
UserGroupInformation): Unit = {
    dest.addCredentials(source.getCredentials())
  }

  def getCurrentUserName(): String = {
    Option(System.getenv("SPARK_USER"))
      .getOrElse(UserGroupInformation.getCurrentUser().getShortUserName())
  }
{code}

The transferCredentials func can only transfer Hadoop creds such as Delegation 
Tokens.
However, other creds stored in UGI.subject.getPrivateCredentials, will be lost 
here, such as:
1. Non-Hadoop creds:
    Such as, Kafka creds, 
https://github.com/apache/kafka/blob/f3c8bff311b0e4c4d0e316ac949fe4491f9b107f/clients/src/main/java/org/apache/kafka/common/security/oauthbearer/OAuthBearerLoginModule.java#L395

2. Customized Hadoop creds:
    Such as support OAuth/JWT token authn on Hadoop, we need to store the 
OAuth/JWT token into UGI.subject.getPrivateCredentials. However, these tokens 
are not supposed to be managed by Hadoop Credentials (currently only for Hadoop 
secret keys and delegation Tokens)

Another issue is that the getCurrentUserName only returns the getShortUserName 
of the user, which may lost the user's fully qualified user name that need to 
be passed to PRC server (such as YARN, HDFS, Kafka). We should use getUserName 
to get fully qualified user name in our client side.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to