[GitHub] spark pull request: SPARK-1676: Cache Hadoop UGIs by default to pr...

vanzin Thu, 01 May 2014 15:55:33 -0700

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/607#discussion_r12212185
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
    @@ -35,15 +35,28 @@ class SparkHadoopUtil {
       val conf: Configuration = newConfiguration()
       UserGroupInformation.setConfiguration(conf)
     
    -  def runAsUser(user: String)(func: () => Unit) {
    +  /** Creates a UserGroupInformation for Spark based on SPARK_USER 
environment variable. */
    +  def createSparkUser(): Option[UserGroupInformation] = {
    +    val user = 
Option(System.getenv("SPARK_USER")).getOrElse(SparkContext.SPARK_UNKNOWN_USER)
         if (user != SparkContext.SPARK_UNKNOWN_USER) {
    -      val ugi = UserGroupInformation.createRemoteUser(user)
    -      transferCredentials(UserGroupInformation.getCurrentUser(), ugi)
    -      ugi.doAs(new PrivilegedExceptionAction[Unit] {
    -        def run: Unit = func()
    -      })
    +      Some(UserGroupInformation.createRemoteUser(user))
    --- End diff --
    
    This is my first time looking at this code, so bear with me a little. :-)
    
    I'm not sure what's the objective of calling createRemoteUser() here. What 
purpose is it serving? Isn't it better to just rely on getCurrentUser() to 
define the user? Then you wouldn't need SPARK_USER nor SPARK_UNKNOWN_USER. 
    
    Unless you want to create a dummy user for the non-kerberos case that is 
different from the logged in user? I'd say that, in that case, it's better to 
let users do this in their own code (by wrapping their app in a UGI.doAs() 
call) instead of building it into Spark.
    
    As for the approach, I think this should work. But to address @pwendell's 
comments about tokens, there should be code somewhere that's renewing the 
kerberos ticket (by calling UserGroupInformation.reloginFromKeytab() at 
appropriate periods). Unfortunately I don't know what the best practices are 
around this - in our internal code, we just call reloginFromKeytab() 
periodically as part of our framework for talking to Hadoop services (so 
individual clients don't need to worry about it), and that seems to work fine.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1676: Cache Hadoop UGIs by default to pr...

Reply via email to