[GitHub] LucaCanali commented on a change in pull request #23525: [SPARK-26595][core] Allow credential renewal based on kerberos ticket cache.

2019-01-18 Thread GitBox
LucaCanali commented on a change in pull request #23525: [SPARK-26595][core] 
Allow credential renewal based on kerberos ticket cache.
URL: https://github.com/apache/spark/pull/23525#discussion_r248966327
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
 ##
 @@ -97,28 +106,37 @@ private[spark] class HadoopDelegationTokenManager(
   ThreadUtils.newDaemonSingleThreadScheduledExecutor("Credential Renewal 
Thread")
 
 val ugi = UserGroupInformation.getCurrentUser()
-if (ugi.isFromKeytab()) {
+val tgtRenewalTask = if (ugi.isFromKeytab()) {
   // In Hadoop 2.x, renewal of the keytab-based login seems to be 
automatic, but in Hadoop 3.x,
   // it is configurable (see 
hadoop.kerberos.keytab.login.autorenewal.enabled, added in
   // HADOOP-9567). This task will make sure that the user stays logged in 
regardless of that
   // configuration's value. Note that checkTGTAndReloginFromKeytab() is a 
no-op if the TGT does
   // not need to be renewed yet.
-  val tgtRenewalTask = new Runnable() {
+  new Runnable() {
 override def run(): Unit = {
   ugi.checkTGTAndReloginFromKeytab()
 
 Review comment:
   I should clarify that the warning messages I reported are for the case where 
I use the TGT and --conf spark.kerberos.renewal.credentials=ccache rather than 
using keytab, apologies for the possible confusion this may have generated. 
   Looking now at the code for UserGroupInformatio.reloginFromTicketCache I can 
see that it calls hasSufficientTimeElapsed which is responsible for generating 
the warning message in question, when users are trying to renew at a rate 
higher than a certain frequency. As you pointed out, with 
hadoop.kerberos.min.seconds.before.relogin set to default value of 60 we are OK 
as it matches the default for spark.kerberos.relogin.period, (but this requires 
HADOOP-7930, e.g. Hadoop version >= 2.8)).  
   On a related topic, I can see that checkTGTAndReloginFromKeytab has a 
"silent" way of checking if the rate of request for renewal is higher than the 
threshold, so no warning are generated in this case.
   Does this make sense and is it reproducible in you Hadoop 2.7 environment 
too?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] LucaCanali commented on a change in pull request #23525: [SPARK-26595][core] Allow credential renewal based on kerberos ticket cache.

2019-01-17 Thread GitBox
LucaCanali commented on a change in pull request #23525: [SPARK-26595][core] 
Allow credential renewal based on kerberos ticket cache.
URL: https://github.com/apache/spark/pull/23525#discussion_r248634712
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
 ##
 @@ -97,28 +106,37 @@ private[spark] class HadoopDelegationTokenManager(
   ThreadUtils.newDaemonSingleThreadScheduledExecutor("Credential Renewal 
Thread")
 
 val ugi = UserGroupInformation.getCurrentUser()
-if (ugi.isFromKeytab()) {
+val tgtRenewalTask = if (ugi.isFromKeytab()) {
   // In Hadoop 2.x, renewal of the keytab-based login seems to be 
automatic, but in Hadoop 3.x,
   // it is configurable (see 
hadoop.kerberos.keytab.login.autorenewal.enabled, added in
   // HADOOP-9567). This task will make sure that the user stays logged in 
regardless of that
   // configuration's value. Note that checkTGTAndReloginFromKeytab() is a 
no-op if the TGT does
   // not need to be renewed yet.
-  val tgtRenewalTask = new Runnable() {
+  new Runnable() {
 override def run(): Unit = {
   ugi.checkTGTAndReloginFromKeytab()
 
 Review comment:
   Thanks @vanzin for the detailed explanations.
   After some additional investigation I found that if I compile Spark with 
Hadoop 3.1 the behavior is OK. I can still reproduce the issue I mentioned with 
the standard 2.7 version in my environment. It appears that  
`hadoop.kerberos.min.seconds.before.relogin` is not available in Hadoop 2.7 and 
was introduced in 2.8?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] LucaCanali commented on a change in pull request #23525: [SPARK-26595][core] Allow credential renewal based on kerberos ticket cache.

2019-01-15 Thread GitBox
LucaCanali commented on a change in pull request #23525: [SPARK-26595][core] 
Allow credential renewal based on kerberos ticket cache.
URL: https://github.com/apache/spark/pull/23525#discussion_r247846894
 
 

 ##
 File path: docs/security.md
 ##
 @@ -776,16 +776,32 @@ The following options provides finer-grained control for 
this feature:
 Long-running applications may run into issues if their run time exceeds the 
maximum delegation
 token lifetime configured in services it needs to access.
 
-Spark supports automatically creating new tokens for these applications when 
running in YARN mode.
-Kerberos credentials need to be provided to the Spark application via the 
`spark-submit` command,
-using the `--principal` and `--keytab` parameters.
+This feature is not available everywhere. In particular, it's only implemented
+on YARN and Kubernetes (both client and cluster modes), and on Mesos when 
using client mode.
 
-The provided keytab will be copied over to the machine running the Application 
Master via the Hadoop
-Distributed Cache. For this reason, it's strongly recommended that both YARN 
and HDFS be secured
-with encryption, at least.
+Spark supports automatically creating new tokens for these applications. There 
are two ways to
+enable this functionality.
 
-The Kerberos login will be periodically renewed using the provided 
credentials, and new delegation
-tokens for supported will be created.
+### Using a Keytab
+
+By providing Spark with a principal and keytab (e.g. using `spark-submit` with 
`--principal`
+and `--keytab` parameters), the application will maintain a valid Kerberos 
login that can be
+used to retrieve delegation tokens indefinitely.
+
+Note that when using a keytab in cluster mode, it will be copied over to the 
machine running the
+Spark driver. In the case of YARN, this means using HDFS as a staging area for 
the keytab, so it's
+strongly recommended that both YARN and HDFS be secured with encryption, at 
least.
+
+### Using a ticket cache
 
 Review comment:
   Very nice improvement in this PR. I guess it is worth documenting it also on 
docs/running-on-yarn.md


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] LucaCanali commented on a change in pull request #23525: [SPARK-26595][core] Allow credential renewal based on kerberos ticket cache.

2019-01-15 Thread GitBox
LucaCanali commented on a change in pull request #23525: [SPARK-26595][core] 
Allow credential renewal based on kerberos ticket cache.
URL: https://github.com/apache/spark/pull/23525#discussion_r247840564
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
 ##
 @@ -236,11 +257,19 @@ private[spark] class HadoopDelegationTokenManager(
   }
 
   private def doLogin(): UserGroupInformation = {
-logInfo(s"Attempting to login to KDC using principal: $principal")
-require(new File(keytab).isFile(), s"Cannot find keytab at $keytab.")
-val ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, 
keytab)
-logInfo("Successfully logged into KDC.")
-ugi
+if (principal != null) {
+  logInfo(s"Attempting to login to KDC using principal: $principal")
+  require(new File(keytab).isFile(), s"Cannot find keytab at $keytab.")
+  val ugi = 
UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
+  logInfo("Successfully logged into KDC.")
+  ugi
+} else {
+  logInfo(s"Attempting to load user's ticket cache.")
+  val ccache = sparkConf.getenv("KRB5CCNAME")
+  val user = Option(sparkConf.getenv("KRB5PRINCIPAL")).getOrElse(
 
 Review comment:
   Would it make sense to also check/use the value of spark.yarn.principal (or 
an ad-hoc config parameter if "reusing" this one is not OK) if provided by the 
user?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] LucaCanali commented on a change in pull request #23525: [SPARK-26595][core] Allow credential renewal based on kerberos ticket cache.

2019-01-15 Thread GitBox
LucaCanali commented on a change in pull request #23525: [SPARK-26595][core] 
Allow credential renewal based on kerberos ticket cache.
URL: https://github.com/apache/spark/pull/23525#discussion_r247844744
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
 ##
 @@ -97,28 +106,37 @@ private[spark] class HadoopDelegationTokenManager(
   ThreadUtils.newDaemonSingleThreadScheduledExecutor("Credential Renewal 
Thread")
 
 val ugi = UserGroupInformation.getCurrentUser()
-if (ugi.isFromKeytab()) {
+val tgtRenewalTask = if (ugi.isFromKeytab()) {
   // In Hadoop 2.x, renewal of the keytab-based login seems to be 
automatic, but in Hadoop 3.x,
   // it is configurable (see 
hadoop.kerberos.keytab.login.autorenewal.enabled, added in
   // HADOOP-9567). This task will make sure that the user stays logged in 
regardless of that
   // configuration's value. Note that checkTGTAndReloginFromKeytab() is a 
no-op if the TGT does
   // not need to be renewed yet.
-  val tgtRenewalTask = new Runnable() {
+  new Runnable() {
 override def run(): Unit = {
   ugi.checkTGTAndReloginFromKeytab()
 
 Review comment:
   When testing this I get a warning message "WARN UserGroupInformation: Not 
attempting to re-login since the last re-login was attempted less than 600 
seconds before.." every minute (I am using the default value of 
spark.yarn.kerberos.relogin.period).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] LucaCanali commented on a change in pull request #23525: [SPARK-26595][core] Allow credential renewal based on kerberos ticket cache.

2019-01-15 Thread GitBox
LucaCanali commented on a change in pull request #23525: [SPARK-26595][core] 
Allow credential renewal based on kerberos ticket cache.
URL: https://github.com/apache/spark/pull/23525#discussion_r247840564
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
 ##
 @@ -236,11 +257,19 @@ private[spark] class HadoopDelegationTokenManager(
   }
 
   private def doLogin(): UserGroupInformation = {
-logInfo(s"Attempting to login to KDC using principal: $principal")
-require(new File(keytab).isFile(), s"Cannot find keytab at $keytab.")
-val ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, 
keytab)
-logInfo("Successfully logged into KDC.")
-ugi
+if (principal != null) {
+  logInfo(s"Attempting to login to KDC using principal: $principal")
+  require(new File(keytab).isFile(), s"Cannot find keytab at $keytab.")
+  val ugi = 
UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
+  logInfo("Successfully logged into KDC.")
+  ugi
+} else {
+  logInfo(s"Attempting to load user's ticket cache.")
+  val ccache = sparkConf.getenv("KRB5CCNAME")
+  val user = Option(sparkConf.getenv("KRB5PRINCIPAL")).getOrElse(
 
 Review comment:
   Would it make sense to check also the value of spark.yarn.principal is 
provided by the user?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] LucaCanali commented on a change in pull request #23525: [SPARK-26595][core] Allow credential renewal based on kerberos ticket cache.

2019-01-15 Thread GitBox
LucaCanali commented on a change in pull request #23525: [SPARK-26595][core] 
Allow credential renewal based on kerberos ticket cache.
URL: https://github.com/apache/spark/pull/23525#discussion_r247839666
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
 ##
 @@ -236,11 +257,19 @@ private[spark] class HadoopDelegationTokenManager(
   }
 
   private def doLogin(): UserGroupInformation = {
-logInfo(s"Attempting to login to KDC using principal: $principal")
-require(new File(keytab).isFile(), s"Cannot find keytab at $keytab.")
-val ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, 
keytab)
-logInfo("Successfully logged into KDC.")
-ugi
+if (principal != null) {
+  logInfo(s"Attempting to login to KDC using principal: $principal")
+  require(new File(keytab).isFile(), s"Cannot find keytab at $keytab.")
+  val ugi = 
UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
+  logInfo("Successfully logged into KDC.")
+  ugi
+} else {
+  logInfo(s"Attempting to load user's ticket cache.")
+  val ccache = sparkConf.getenv("KRB5CCNAME")
 
 Review comment:
   I was wondering if adding an additional optional configuration parameter 
with the path of the KRB5CC file could also be useful? Possibly more useful 
when using this in cluster mode?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org