Github user squito commented on a diff in the pull request:
    --- Diff: 
    @@ -18,221 +18,156 @@ package
     import java.util.concurrent.{ScheduledExecutorService, TimeUnit}
    +import java.util.concurrent.atomic.AtomicReference
     import org.apache.hadoop.conf.Configuration
    -import org.apache.hadoop.fs.{FileSystem, Path}
    +import{Credentials, UserGroupInformation}
     import org.apache.spark.SparkConf
     import org.apache.spark.deploy.SparkHadoopUtil
    -import org.apache.spark.deploy.yarn.YarnSparkHadoopUtil
     import org.apache.spark.deploy.yarn.config._
     import org.apache.spark.internal.Logging
     import org.apache.spark.internal.config._
    +import org.apache.spark.rpc.RpcEndpointRef
    +import org.apache.spark.ui.UIUtils
     import org.apache.spark.util.ThreadUtils
    - * The following methods are primarily meant to make sure long-running 
apps like Spark
    - * Streaming apps can run without interruption while accessing secured 
services. The
    - * scheduleLoginFromKeytab method is called on the AM to get the new 
    - * This method wakes up a thread that logs into the KDC
    - * once 75% of the renewal interval of the original credentials used for 
the container
    - * has elapsed. It then obtains new credentials and writes them to HDFS in 
    - * pre-specified location - the prefix of which is specified in the 
sparkConf by
    - * spark.yarn.credentials.file (so the file(s) would be named 
c-timestamp1-1, c-timestamp2-2 etc.
    - * - each update goes to a new file, with a monotonically increasing 
suffix), also the
    - * timestamp1, timestamp2 here indicates the time of next update for 
    - * After this, the credentials are renewed once 75% of the new tokens 
renewal interval has elapsed.
    + * A manager tasked with periodically updating delegation tokens needed by 
the application.
    - * On the executor and driver (yarn client mode) side, the 
updateCredentialsIfRequired method is
    - * called once 80% of the validity of the original credentials has 
elapsed. At that time the
    - * executor finds the credentials file with the latest timestamp and 
checks if it has read those
    - * credentials before (by keeping track of the suffix of the last file it 
read). If a new file has
    - * appeared, it will read the credentials and update the currently running 
UGI with it. This
    - * process happens again once 80% of the validity of this has expired.
    + * This manager is meant to make sure long-running apps (such as Spark 
Streaming apps) can run
    + * without interruption while accessing secured services. It periodically 
logs in to the KDC with
    + * user-provided credentials, and contacts all the configured secure 
services to obtain delegation
    + * tokens to be distributed to the rest of the application.
    --- End diff --
    for folks like me less familiar with this, this seems like a good spot to 
explain the overall flow a little bit more.  Eg.
    The KDC provides a ticket granting ticket (tgt), which is then used to 
obtain delegation tokens for each service.  The KDC does not expose the tgt's 
expiry time, so renewal is controlled by a conf (by default 1m, much more 
frequent than usual expiry times).  Each providers delegation token provider 
should determine the expiry time of the delegation token, so they can be 
renewed appropriately.
    (in particular I needed an extra read to figure out why the tgt had its own 
renewal mechanism)


To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to