Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/5031#discussion_r26778672
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -903,6 +908,30 @@ object Client extends Logging {
}
/**
+ * Obtains token for the Hive metastore and adds them to the credentials.
+ */
+ private def obtainTokenForHiveMetastore(conf: Configuration,
credentials: Credentials) {
+ if (UserGroupInformation.isSecurityEnabled /* And Hive is enabled */) {
+ val hc = org.apache.hadoop.hive.ql.metadata.Hive.get
+ val principal =
hc.getConf().get(HiveConf.ConfVars.METASTORE_KERBEROS_PRINCIPAL.varname)
+ val username = UserGroupInformation.getCurrentUser().getUserName
+
+ if (principal == null) {
+ val errorMessage = "Required hive metastore principal is not
configured!"
+ logError(errorMessage)
+ throw new IllegalArgumentException(errorMessage)
+ }
+
+ val tokenStr = hc.getDelegationToken(username,principal)
+ val hive2Token = new Token[DelegationTokenIdentifier]()
+ hive2Token.decodeFromUrlString(tokenStr)
+ credentials.addToken(new Text("hive.server2.delegation.token"),
hive2Token)
+ logDebug("Added the Hive Server 2 token to conf.")
+ org.apache.hadoop.hive.ql.metadata.Hive.closeCurrent
--- End diff --
It looks like most of the existing hive condition compile stuff is just
building separate jars and including them. Not ideal here unless we made
separate jar for this. Seems like reflection might be the more straight forward
way here.
Also on whether the hive token is required it might make sense to make a
config for it (like spark.yarn.hive.token.required). Similar to how MR has a
config to get the job history token if your job is going to talk to the history
server. Its not a nice as far as automatically getting it and we could try
other things like checking to see if the hive.metastore.uris config is set but
it makes it obvious to the user what is going on and it won't try if not told
to. Thus avoiding issues with people have hive compiled in but not configured
or when running app that doesn't require hive access.
One other thing I'm not familiar with and need to look into is the renewal
of these tokens. I know the RM doesn't renew them so is there something in
hive that does, if not then the spark app could only run for 24 hours. Or
perhaps they don't expire that often.
The other option is to provide some way for user to fill in the tokens
themselves. Get any tokens they need and have spark add them. As far as I know
hive and hbase don't provide command line way to get the tokens though like
hdfs does. That would mean you have to write java program to fetch them first.
@vanzin does secure hbase access not work either? I haven't tried it
myself but I'm guessing it doesn't.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]