[jira] [Commented] (SPARK-26254) Move delegation token providers into a separate project
[ https://issues.apache.org/jira/browse/SPARK-26254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609802#comment-17609802 ] forrest lv commented on SPARK-26254: nice job > Move delegation token providers into a separate project > --- > > Key: SPARK-26254 > URL: https://issues.apache.org/jira/browse/SPARK-26254 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Assignee: Gabor Somogyi >Priority: Major > Fix For: 3.0.0 > > > There was a discussion in > [PR#22598|https://github.com/apache/spark/pull/22598] that there are several > provided dependencies inside core project which shouldn't be there (for ex. > hive and kafka). This jira is to solve this problem. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26254) Move delegation token providers into a separate project
[ https://issues.apache.org/jira/browse/SPARK-26254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734393#comment-16734393 ] Gabor Somogyi commented on SPARK-26254: --- [~hyukjin.kwon] In my last comment right before the ping almost everything is clear but don't know what is the suggestion related kafka. > Move delegation token providers into a separate project > --- > > Key: SPARK-26254 > URL: https://issues.apache.org/jira/browse/SPARK-26254 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Priority: Major > > There was a discussion in > [PR#22598|https://github.com/apache/spark/pull/22598] that there are several > provided dependencies inside core project which shouldn't be there (for ex. > hive and kafka). This jira is to solve this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26254) Move delegation token providers into a separate project
[ https://issues.apache.org/jira/browse/SPARK-26254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734004#comment-16734004 ] Hyukjin Kwon commented on SPARK-26254: -- [~gsomogyi], can you point out which comment is about the discussion exactly? > Move delegation token providers into a separate project > --- > > Key: SPARK-26254 > URL: https://issues.apache.org/jira/browse/SPARK-26254 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Priority: Major > > There was a discussion in > [PR#22598|https://github.com/apache/spark/pull/22598] that there are several > provided dependencies inside core project which shouldn't be there (for ex. > hive and kafka). This jira is to solve this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26254) Move delegation token providers into a separate project
[ https://issues.apache.org/jira/browse/SPARK-26254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733956#comment-16733956 ] Gabor Somogyi commented on SPARK-26254: --- ping [~vanzin] > Move delegation token providers into a separate project > --- > > Key: SPARK-26254 > URL: https://issues.apache.org/jira/browse/SPARK-26254 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Priority: Major > > There was a discussion in > [PR#22598|https://github.com/apache/spark/pull/22598] that there are several > provided dependencies inside core project which shouldn't be there (for ex. > hive and kafka). This jira is to solve this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26254) Move delegation token providers into a separate project
[ https://issues.apache.org/jira/browse/SPARK-26254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720633#comment-16720633 ] Steve Loughran commented on SPARK-26254: bq. There was concern about using ServiceLoader before, but if the interface being loaded is private to Spark, it's fine with me. HADOOP-15808 there. If you have any class which declares a delegation token, but that class doesn't actually load (missing, transitive CNFE, etc), and the jar containing that META-INF manifest gets into the classpath of your resource manager, there goes your cluster as soon as the first job is submitted.Traumatic. > Move delegation token providers into a separate project > --- > > Key: SPARK-26254 > URL: https://issues.apache.org/jira/browse/SPARK-26254 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Priority: Major > > There was a discussion in > [PR#22598|https://github.com/apache/spark/pull/22598] that there are several > provided dependencies inside core project which shouldn't be there (for ex. > hive and kafka). This jira is to solve this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26254) Move delegation token providers into a separate project
[ https://issues.apache.org/jira/browse/SPARK-26254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720241#comment-16720241 ] Gabor Somogyi commented on SPARK-26254: --- {quote}There was concern about using ServiceLoader before, but if the interface being loaded is private to Spark, it's fine with me. {quote} org.apache.spark.deploy.security.HadoopDelegationTokenProvider fulfils it. {quote}Keep HDFS and HBase in core {quote} clear and same idea. {quote}move the Kafka one to some Kafka package {quote} you mean module isn't it? * If we move inside core then the kafka deps remain * If we move to kafka-sql then DStreams will not reach it My suggestion is to create a module something like kafka-token-provider and kafka-sql (+ later DStreams) can depend on that. {quote}the Hive one to the Hive module {quote} clear and same idea. For example hive-token-provider which extracts the ugly dependencies from core. > Move delegation token providers into a separate project > --- > > Key: SPARK-26254 > URL: https://issues.apache.org/jira/browse/SPARK-26254 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Priority: Major > > There was a discussion in > [PR#22598|https://github.com/apache/spark/pull/22598] that there are several > provided dependencies inside core project which shouldn't be there (for ex. > hive and kafka). This jira is to solve this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26254) Move delegation token providers into a separate project
[ https://issues.apache.org/jira/browse/SPARK-26254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719574#comment-16719574 ] Marcelo Vanzin commented on SPARK-26254: bq. loaded the providers with ServiceLoader If you're going to use that, then you probably don't need a new module. Keep HDFS and HBase in core, move the Kafka one to some Kafka package (which one TBD, especially if you want to support both dstreams and structured streaming), and the Hive one to the Hive module. There was concern about using ServiceLoader before, but if the interface being loaded is private to Spark, it's fine with me. My original idea was to move everything (renewer code et al) to a new module, and make core not have this feature at all; yarn, mesos and others would depend on this new module. But the above change might be simpler / better. > Move delegation token providers into a separate project > --- > > Key: SPARK-26254 > URL: https://issues.apache.org/jira/browse/SPARK-26254 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Priority: Major > > There was a discussion in > [PR#22598|https://github.com/apache/spark/pull/22598] that there are several > provided dependencies inside core project which shouldn't be there (for ex. > hive and kafka). This jira is to solve this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26254) Move delegation token providers into a separate project
[ https://issues.apache.org/jira/browse/SPARK-26254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712843#comment-16712843 ] Steve Loughran commented on SPARK-26254: maybe ask the Kafka people for opinions [~jkreps] can probably nominate someone bq. token-providers provided dependency to kafka-sql project => It's kinda' weird but at the moment looks the least problematic probably makes sense then > Move delegation token providers into a separate project > --- > > Key: SPARK-26254 > URL: https://issues.apache.org/jira/browse/SPARK-26254 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Priority: Major > > There was a discussion in > [PR#22598|https://github.com/apache/spark/pull/22598] that there are several > provided dependencies inside core project which shouldn't be there (for ex. > hive and kafka). This jira is to solve this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26254) Move delegation token providers into a separate project
[ https://issues.apache.org/jira/browse/SPARK-26254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711516#comment-16711516 ] Gabor Somogyi commented on SPARK-26254: --- I've reached a state where tradeoff has to be made, so interested in opinions [~vanzin] [~ste...@apache.org] I've created a project with token-providers name which is depending on core. With this successfully extracted all the nasty hive + kafka dependencies + all token providers are there. Then loaded the providers with ServiceLoader which also works fine. Finally reached a point where kafka-sql project expects couple of things from KafkaUtil which is in token-providers now. Here is the list of problems: {noformat} [error] /Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala:31: object KafkaTokenUtil is not a member of package org.apache.spark.deploy.security [error] import org.apache.spark.deploy.security.KafkaTokenUtil [error]^ [error] /Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSecurityHelper.scala:25: object KafkaTokenUtil is not a member of package org.apache.spark.deploy.security [error] import org.apache.spark.deploy.security.KafkaTokenUtil [error]^ [error] /Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSecurityHelper.scala:32: not found: value KafkaTokenUtil [error] KafkaTokenUtil.TOKEN_SERVICE) != null [error] ^ [error] /Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSecurityHelper.scala:37: not found: value KafkaTokenUtil [error] KafkaTokenUtil.TOKEN_SERVICE) [error] ^ [error] /Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala:566: not found: value KafkaTokenUtil [error] if (KafkaTokenUtil.isGlobalJaasConfigurationProvided) { [error] ^ + all isTokenAvailable tests expects TOKEN_KIND, TOKEN_SERVICE + KafkaDelegationTokenIdentifier in KafkaSecurityHelperSuite {noformat} Here I see these possibilities: * Hardcode TOKEN_KIND + TOKEN_SERVICE and duplicate isGlobalJaasConfigurationProvided => The drawback here is we can't really test whether the provider created token can be read in kafka-sql (we can actually but with hardcoded strings in both sides which makes it brittle) * As we're loading providers with ServiceLoader the kafka related one can be moved to kafka-sql => The drawback is that providers spread around and this code can't really be reused in DStreams. * Add token-providers provided dependency to kafka-sql project => It's kinda' weird but at the moment looks the least problematic Waiting on opinions... > Move delegation token providers into a separate project > --- > > Key: SPARK-26254 > URL: https://issues.apache.org/jira/browse/SPARK-26254 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Priority: Major > > There was a discussion in > [PR#22598|https://github.com/apache/spark/pull/22598] that there are several > provided dependencies inside core project which shouldn't be there (for ex. > hive and kafka). This jira is to solve this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26254) Move delegation token providers into a separate project
[ https://issues.apache.org/jira/browse/SPARK-26254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708542#comment-16708542 ] Gabor Somogyi commented on SPARK-26254: --- HBase libs are not on provided scope in core's pom.xml but the provider should move. > Move delegation token providers into a separate project > --- > > Key: SPARK-26254 > URL: https://issues.apache.org/jira/browse/SPARK-26254 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Priority: Major > > There was a discussion in > [PR#22598|https://github.com/apache/spark/pull/22598] that there are several > provided dependencies inside core project which shouldn't be there (for ex. > hive and kafka). This jira is to solve this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26254) Move delegation token providers into a separate project
[ https://issues.apache.org/jira/browse/SPARK-26254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707679#comment-16707679 ] Steve Loughran commented on SPARK-26254: +HBase I don't have any opinions on the best place; people who know the spark packaging are the ones there. And people deploying to other infras than YARN will have their opinions too. Token loading can be fairly brittle to classpath problems (HADOOP-15808); its good not to trust everything to be well-configured. > Move delegation token providers into a separate project > --- > > Key: SPARK-26254 > URL: https://issues.apache.org/jira/browse/SPARK-26254 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Priority: Major > > There was a discussion in > [PR#22598|https://github.com/apache/spark/pull/22598] that there are several > provided dependencies inside core project which shouldn't be there (for ex. > hive and kafka). This jira is to solve this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26254) Move delegation token providers into a separate project
[ https://issues.apache.org/jira/browse/SPARK-26254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16706970#comment-16706970 ] Gabor Somogyi commented on SPARK-26254: --- I've created this jira to discuss the details. cc [~vanzin] [~steveloughran] I've taken a look at the code and I see mainly 2 problematic library set: hive + kafka. So these should be moved + all the delegation token providers. What do you think guys? Any thoughts welcome. > Move delegation token providers into a separate project > --- > > Key: SPARK-26254 > URL: https://issues.apache.org/jira/browse/SPARK-26254 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Priority: Major > > There was a discussion in > [PR#22598|https://github.com/apache/spark/pull/22598] that there are several > provided dependencies inside core project which shouldn't be there (for ex. > hive and kafka). This jira is to solve this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org