[GitHub] spark issue #22179: [SPARK-25258][SPARK-23131][SPARK-25176][BUILD] Upgrade K...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22179 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22333: [SPARK-25335][BUILD] Skip Zinc downloading if it'...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22333 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22333: [SPARK-25335][BUILD] Skip Zinc downloading if it's insta...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22333 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22112: [SPARK-23243][Core] Fix RDD.repartition() data co...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22112 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22112 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215446604 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilder.scala --- @@ -51,7 +51,11 @@ private[spark] class KubernetesDriverBuilder( provideJavaStep: ( KubernetesConf[KubernetesDriverSpecificConf] => JavaDriverFeatureStep) = -new JavaDriverFeatureStep(_)) { +new JavaDriverFeatureStep(_), --- End diff -- Argument type is a function with the default value being assigned to the default constructor. It might be cleaner to write this as follows: - Constructor in the class declaration has _no_ default arguments - Provide an empty arg constructor (`def this()`) which passes along all default implementations to the full-args constructor. In this case default arguments are hurting more than they are helping, and in the unit tests, we always supply mocks for _every_ arg, meaning we always use all defaults (real implementation) or no defaults (test implementation). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22270 @dilipbiswal Any update on this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r21566 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadoopsteps/HadoopStepsOrchestrator.scala --- @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadoopsteps + +import java.io.File + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.features.OptionRequirements +import org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager +import org.apache.spark.internal.Logging + +private[spark] class HadoopStepsOrchestrator( + conf: SparkConf, + kubernetesResourceNamePrefix: String, + hadoopConfDir: String, + hadoopConfigMapName: String, + isKerberosEnabled: Boolean) extends Logging { + + private val maybePrincipal = conf.get(KUBERNETES_KERBEROS_PRINCIPAL) + private val maybeKeytab = conf.get(KUBERNETES_KERBEROS_KEYTAB) +.map(k => new File(k)) + private val maybeExistingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) + private val maybeExistingSecretItemKey = +conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) + private val maybeRenewerPrincipal = +conf.get(KUBERNETES_KERBEROS_RENEWER_PRINCIPAL) + + require(maybeKeytab.forall( _ => isKerberosEnabled ), +"You must enable Kerberos support if you are specifying a Kerberos Keytab") + + require(maybeExistingSecretName.forall( _ => isKerberosEnabled ), +"You must enable Kerberos support if you are specifying a Kerberos Secret") + + OptionRequirements.requireBothOrNeitherDefined( +maybeKeytab, +maybePrincipal, +"If a Kerberos keytab is specified you must also specify a Kerberos principal", +"If a Kerberos principal is specified you must also specify a Kerberos keytab") + + OptionRequirements.requireBothOrNeitherDefined( +maybeExistingSecretName, +maybeExistingSecretItemKey, +"If a secret storing a Kerberos Delegation Token is specified you must also" + + " specify the label where the data is stored", +"If a secret data item-key where the data of the Kerberos Delegation Token is specified" + + " you must also specify the name of the secret") + + def getHadoopSteps(kubeTokenManager: KubernetesHadoopDelegationTokenManager): --- End diff -- We did this before, in the prototype, but since we made the new feature steps API I'm not sure about having two tiers of steps. I think it'll be clearer to just use the top level feature steps and just do all Hadoop things in a single step class. Specific submodules can be factored out into separate classes, without the strict requirement of sharing the same interface. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22288: [SPARK-22148][Scheduler] Acquire new executors to avoid ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22288 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22288: [SPARK-22148][Scheduler] Acquire new executors to avoid ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22288 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95729/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22333: [SPARK-25335][BUILD] Skip Zinc downloading if it's insta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22333 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95728/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22333: [SPARK-25335][BUILD] Skip Zinc downloading if it's insta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22333 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22288: [SPARK-22148][Scheduler] Acquire new executors to avoid ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22288 **[Test build #95729 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95729/testReport)** for PR 22288 at commit [`87c4e57`](https://github.com/apache/spark/commit/87c4e57bb966078c8a78eabc5a5e4b6f60c78f28). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22192 > I'm looking up other spark-submit options that can be used to provide the jar to other nodes I don't think there is anything like that currently; so using `--jars` for this will only work in YARN mode. But that can be just a documentation thingy, when/if this feature is documented. (It may work with kubernetes too but haven't really tried that out since 2.3.) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22333: [SPARK-25335][BUILD] Skip Zinc downloading if it's insta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22333 **[Test build #95728 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95728/testReport)** for PR 22333 at commit [`14d1017`](https://github.com/apache/spark/commit/14d10172f9d62ba9a8eabec5aaa11759757278bf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...
Github user pgandhi999 commented on the issue: https://github.com/apache/spark/pull/22144 @cloud-fan I have updated the PR and added the configurable flags for Window UDAF and Hive UDAF for now. Would appreciate to hear your thoughts. Thank you. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21525: [SPARK-24513][ML] Attribute support in UnaryTransformer
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21525 **[Test build #95732 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95732/testReport)** for PR 21525 at commit [`cfeae4d`](https://github.com/apache/spark/commit/cfeae4d5282be98d619da97f350c11339057e92f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22271: [SPARK-25268][GraphX]run Parallel Personalized PageRank ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22271 **[Test build #4332 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4332/testReport)** for PR 22271 at commit [`7651652`](https://github.com/apache/spark/commit/76516529e31a5507afc0a1de31b30104a4d7c6ad). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Struc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22282 **[Test build #95731 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95731/testReport)** for PR 22282 at commit [`b0f64e9`](https://github.com/apache/spark/commit/b0f64e91cb4f6306a7c0c60d4a17f1a0aacb3a51). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215429800 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopGlobalFeatureDriverStep.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, ContainerBuilder, HasMetadata, PodBuilder} + +import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod} +import org.apache.spark.deploy.k8s.Config.{KUBERNETES_KERBEROS_KRB5_FILE, KUBERNETES_KERBEROS_PROXY_USER} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadoopsteps.{HadoopBootstrapUtil, HadoopConfigSpec, HadoopConfigurationStep} +import org.apache.spark.internal.Logging + + /** + * This is the main method that runs the hadoopConfigurationSteps defined + * by the HadoopStepsOrchestrator. These steps are run to modify the + * SparkPod and Kubernetes Resources using the additive method of the feature steps + */ +private[spark] class HadoopGlobalFeatureDriverStep( + kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) + extends KubernetesFeatureConfigStep with Logging { + private val hadoopTestOrchestrator = + kubernetesConf.getHadoopStepsOrchestrator --- End diff -- fits in previous line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21710 @felixcheung Are there any other things I need to change? If not, could this PR be merged in 2.4? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21649: [SPARK-23648][R][SQL]Adds more types for hint in SparkR
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21649 @felixcheung Are there any other things I need to change? If not, could this PR be merged in 2.4? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215433952 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadoopsteps/HadoopKerberosSecretResolverStep.scala --- @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadoopsteps + +import org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager +import org.apache.spark.internal.Logging + + /** + * This step assumes that you have already done all the heavy lifting in retrieving a --- End diff -- This whole "steps" thing used in the k8s backend always throws me off... but given the comment, why isn't the code in this step just part of the step that actually does the required initialization of the delegation token stuff? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215431441 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadoopsteps/HadoopBootstrapUtil.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadoopsteps + +import java.io.File + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMap, ConfigMapBuilder, ContainerBuilder, KeyToPathBuilder, PodBuilder} + +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod +import org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager + +private[spark] object HadoopBootstrapUtil { + + /** +* Mounting the DT secret for both the Driver and the executors +* +* @param dtSecretName Name of the secret that stores the Delegation Token +* @param dtSecretItemKey Name of the Item Key storing the Delegation Token +* @param userName Name of the SparkUser to set SPARK_USER +* @param fileLocation Location of the krb5 file +* @param krb5ConfName Name of the ConfigMap for Krb5 +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapKerberosPod( + dtSecretName: String, + dtSecretItemKey: String, + userName: String, + fileLocation: String, + krb5ConfName: String, + pod: SparkPod) : SparkPod = { + val krb5File = new File(fileLocation) --- End diff -- Whole body is indented too far. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215429715 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopGlobalFeatureDriverStep.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, ContainerBuilder, HasMetadata, PodBuilder} + +import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod} +import org.apache.spark.deploy.k8s.Config.{KUBERNETES_KERBEROS_KRB5_FILE, KUBERNETES_KERBEROS_PROXY_USER} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadoopsteps.{HadoopBootstrapUtil, HadoopConfigSpec, HadoopConfigurationStep} +import org.apache.spark.internal.Logging + + /** + * This is the main method that runs the hadoopConfigurationSteps defined + * by the HadoopStepsOrchestrator. These steps are run to modify the + * SparkPod and Kubernetes Resources using the additive method of the feature steps + */ +private[spark] class HadoopGlobalFeatureDriverStep( + kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) --- End diff -- nit: params in their own line are double-indented. Also happens in other places. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215433543 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadoopsteps/HadoopKerberosKeytabResolverStep.scala --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadoopsteps + +import java.io._ +import java.security.PrivilegedExceptionAction + +import scala.collection.JavaConverters._ + +import io.fabric8.kubernetes.api.model.SecretBuilder +import org.apache.commons.codec.binary.Base64 + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager +import org.apache.spark.deploy.security.HadoopDelegationTokenManager +import org.apache.spark.internal.Logging + + /** + * This step does all the heavy lifting for Delegation Token logic. This step + * assumes that the job user has either specified a principal and keytab or ran + * $kinit before running spark-submit. With a TGT stored locally, by running + * UGI.getCurrentUser you are able to obtain the current user, alternatively + * you can run UGI.loginUserFromKeytabAndReturnUGI and by running .doAs run + * as the logged into user instead of the current user. With the Job User principal + * you then retrieve the delegation token from the NameNode and store values in + * DelegationToken. Lastly, the class puts the data into a secret. All this is + * appended to the current HadoopSpec which in turn will append to the current + * DriverSpec. + */ +private[spark] class HadoopKerberosKeytabResolverStep( +submissionSparkConf: SparkConf, +kubernetesResourceNamePrefix : String, +maybePrincipal: Option[String], +maybeKeytab: Option[File], +maybeRenewerPrincipal: Option[String], +tokenManager: KubernetesHadoopDelegationTokenManager) + extends HadoopConfigurationStep with Logging { + +override def configureHadoopSpec(hSpec: HadoopConfigSpec): HadoopConfigSpec = { + val hadoopConf = SparkHadoopUtil.get.newConfiguration(submissionSparkConf) + if (!tokenManager.isSecurityEnabled) { +throw new SparkException("Hadoop not configured with Kerberos") + } + val maybeJobUserUGI = +for { + principal <- maybePrincipal + keytab <- maybeKeytab +} yield { + logDebug("Logged into KDC with keytab using Job User UGI") + tokenManager.loginUserFromKeytabAndReturnUGI( +principal, +keytab.toURI.toString) +} + // In the case that keytab is not specified we will read from Local Ticket Cache + val jobUserUGI = maybeJobUserUGI.getOrElse(tokenManager.getCurrentUser) + val originalCredentials = jobUserUGI.getCredentials + // It is necessary to run as jobUserUGI because logged in user != Current User + val (tokenData, renewalInterval) = jobUserUGI.doAs( +new PrivilegedExceptionAction[(Array[Byte], Long)] { +override def run(): (Array[Byte], Long) = { + val hadoopTokenManager: HadoopDelegationTokenManager = +new HadoopDelegationTokenManager(submissionSparkConf, hadoopConf) + tokenManager.getDelegationTokens( +originalCredentials, +submissionSparkConf, +hadoopConf, +hadoopTokenManager) +}}) + if (tokenData.isEmpty) throw new SparkException(s"Did not obtain any delegation tokens") --- End diff -- Just use `require`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215430312 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopGlobalFeatureDriverStep.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, ContainerBuilder, HasMetadata, PodBuilder} + +import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod} +import org.apache.spark.deploy.k8s.Config.{KUBERNETES_KERBEROS_KRB5_FILE, KUBERNETES_KERBEROS_PROXY_USER} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadoopsteps.{HadoopBootstrapUtil, HadoopConfigSpec, HadoopConfigurationStep} +import org.apache.spark.internal.Logging + + /** + * This is the main method that runs the hadoopConfigurationSteps defined --- End diff -- All this seems to just say "Runs the configuration steps defined by HadoopStepsOrchestrator" which is a lot shorter. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215428372 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -225,6 +225,60 @@ private[spark] object Config extends Logging { "Ensure that major Python version is either Python2 or Python3") .createWithDefault("2") + val KUBERNETES_KERBEROS_PROXY_USER = +ConfigBuilder("spark.kubernetes.kerberos.proxyUser") + .doc("Specify the proxy user " + +"for HadoopUGI login for the Driver + Executors") + .internal() + .stringConf + .createWithDefault("false") + + val KUBERNETES_KERBEROS_KRB5_FILE = +ConfigBuilder("spark.kubernetes.kerberos.krb5location") + .doc("Specify the location of the krb5 file " + +"to be mounted on the driver and executors for Secure HDFS") + .stringConf + .createOptional + + val KUBERNETES_KERBEROS_KEYTAB = +ConfigBuilder("spark.kubernetes.kerberos.keytab") + .doc("Specify the location of keytab " + --- End diff -- Fits in one line. Probably similar for other confs you're adding. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215435777 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBuilder.scala --- @@ -17,8 +17,8 @@ package org.apache.spark.scheduler.cluster.k8s import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesExecutorSpecificConf, KubernetesRoleSpecificConf, SparkPod} -import org.apache.spark.deploy.k8s.features._ -import org.apache.spark.deploy.k8s.features.{BasicExecutorFeatureStep, EnvSecretsFeatureStep, LocalDirsFeatureStep, MountSecretsFeatureStep} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.features.{BasicExecutorFeatureStep, EnvSecretsFeatureStep, HadoopConfExecutorFeatureStep, HadoopSparkUserExecutorFeatureStep, KerberosConfExecutorFeatureStep, KubernetesFeatureConfigStep, LocalDirsFeatureStep, MountSecretsFeatureStep, MountVolumesFeatureStep} --- End diff -- Same: if the import list is getting this noisy, just import everything. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215429895 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopGlobalFeatureDriverStep.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, ContainerBuilder, HasMetadata, PodBuilder} + +import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod} +import org.apache.spark.deploy.k8s.Config.{KUBERNETES_KERBEROS_KRB5_FILE, KUBERNETES_KERBEROS_PROXY_USER} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadoopsteps.{HadoopBootstrapUtil, HadoopConfigSpec, HadoopConfigurationStep} +import org.apache.spark.internal.Logging + + /** + * This is the main method that runs the hadoopConfigurationSteps defined + * by the HadoopStepsOrchestrator. These steps are run to modify the + * SparkPod and Kubernetes Resources using the additive method of the feature steps + */ +private[spark] class HadoopGlobalFeatureDriverStep( + kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) + extends KubernetesFeatureConfigStep with Logging { + private val hadoopTestOrchestrator = + kubernetesConf.getHadoopStepsOrchestrator + require(kubernetesConf.hadoopConfDir.isDefined && + hadoopTestOrchestrator.isDefined, "Ensure that HADOOP_CONF_DIR is defined") --- End diff -- keep condition in previous line, break only the message if it doesn't fit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215428441 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -225,6 +225,60 @@ private[spark] object Config extends Logging { "Ensure that major Python version is either Python2 or Python3") .createWithDefault("2") + val KUBERNETES_KERBEROS_PROXY_USER = +ConfigBuilder("spark.kubernetes.kerberos.proxyUser") + .doc("Specify the proxy user " + +"for HadoopUGI login for the Driver + Executors") + .internal() + .stringConf + .createWithDefault("false") + + val KUBERNETES_KERBEROS_KRB5_FILE = +ConfigBuilder("spark.kubernetes.kerberos.krb5location") + .doc("Specify the location of the krb5 file " + +"to be mounted on the driver and executors for Secure HDFS") + .stringConf + .createOptional + + val KUBERNETES_KERBEROS_KEYTAB = +ConfigBuilder("spark.kubernetes.kerberos.keytab") + .doc("Specify the location of keytab " + +"for Kerberos in order to access Secure HDFS") + .stringConf + .createOptional + + val KUBERNETES_KERBEROS_PRINCIPAL = +ConfigBuilder("spark.kubernetes.kerberos.principal") + .doc("Specify the principal " + +"for Kerberos in order to access Secure HDFS") + .stringConf + .createOptional + + val KUBERNETES_KERBEROS_RENEWER_PRINCIPAL = +ConfigBuilder("spark.kubernetes.kerberos.renewer.principal") + .doc("Specify the principal " + +"you wish to renew and retrieve your Kerberos values with") + .stringConf + .createOptional + + val KUBERNETES_KERBEROS_DT_SECRET_NAME = +ConfigBuilder("spark.kubernetes.kerberos.tokensecret.name") + .doc("Specify the name of the secret where " + +"your existing delegation token is stored. This removes the need " + +"for the job user to provide any keytab for launching a job") + .stringConf + .createOptional + + val KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY = +ConfigBuilder("spark.kubernetes.kerberos.tokensecret.itemkey") + .doc("Specify the item key of the data where " + +"your existing delegation token is stored. This removes the need " + +"for the job user to provide any keytab for launching a job") + .stringConf + .createOptional + --- End diff -- Too many empty lines. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215434425 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/security/KubernetesHadoopDelegationTokenManager.scala --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.k8s.security + +import java.io.File + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.FileSystem +import org.apache.hadoop.security.{Credentials, UserGroupInformation} + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.security.HadoopDelegationTokenManager +import org.apache.spark.internal.Logging +import org.apache.spark.util.{Clock, SystemClock} + + /** + * The KubernetesHadoopDelegationTokenManager fetches and updates Hadoop delegation tokens + * on the behalf of the Kubernetes submission client. It is modeled after the YARN + * AMCredentialRenewer, renewals in Kubernetes happen in a seperate microservice that will + * automatically update the Tokens via Kubernetes Secrets. The principal difference is that + * instead of writing the new credentials to HDFS and incrementing the timestamp of the file, --- End diff -- This is not what the YARN backend does anymore. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215435116 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilder.scala --- @@ -17,7 +17,7 @@ package org.apache.spark.deploy.k8s.submit import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesDriverSpec, KubernetesDriverSpecificConf, KubernetesRoleSpecificConf} -import org.apache.spark.deploy.k8s.features.{BasicDriverFeatureStep, DriverKubernetesCredentialsFeatureStep, DriverServiceFeatureStep, EnvSecretsFeatureStep, LocalDirsFeatureStep, MountSecretsFeatureStep, MountVolumesFeatureStep} +import org.apache.spark.deploy.k8s.features.{BasicDriverFeatureStep, DriverKubernetesCredentialsFeatureStep, DriverServiceFeatureStep, EnvSecretsFeatureStep, HadoopGlobalFeatureDriverStep, KubernetesFeatureConfigStep, LocalDirsFeatureStep, MountSecretsFeatureStep, MountVolumesFeatureStep} --- End diff -- At this point, just import `.features._`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215435834 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBuilder.scala --- @@ -35,10 +35,26 @@ private[spark] class KubernetesExecutorBuilder( new LocalDirsFeatureStep(_), provideVolumesStep: (KubernetesConf[_ <: KubernetesRoleSpecificConf] => MountVolumesFeatureStep) = - new MountVolumesFeatureStep(_)) { +new MountVolumesFeatureStep(_), --- End diff -- Same thing about this code being hard to parse visually. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215426740 --- Diff: docs/security.md --- @@ -722,6 +722,62 @@ with encryption, at least. The Kerberos login will be periodically renewed using the provided credentials, and new delegation tokens for supported will be created. +## Secure Interaction with Kubernetes + +When talking to Hadoop-based services behind Kerberos, it was noted that Spark needs to obtain delegation tokens +so that non-local processes can authenticate. These delegation tokens in Kubernetes are stored in Secrets that are +shared by the Driver and its Executors. As such, there are three ways of submitting a kerberos job: + +1. Submitting with a $kinit that stores a TGT in the Local Ticket Cache: +```bash +/usr/bin/kinit -kt / +/opt/spark/bin/spark-submit \ +--deploy-mode cluster \ +--class org.apache.spark.examples.HdfsTest \ +--master k8s:// \ +--conf spark.executor.instances=1 \ +--conf spark.app.name=spark-hdfs \ +--conf spark.kubernetes.container.image=spark:latest \ +--conf spark.kubernetes.kerberos.krb5location=/etc/krb5.conf \ + local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar \ + +``` +3. Submitting with a local keytab and principle --- End diff -- principal --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215426993 --- Diff: docs/security.md --- @@ -722,6 +722,62 @@ with encryption, at least. The Kerberos login will be periodically renewed using the provided credentials, and new delegation tokens for supported will be created. +## Secure Interaction with Kubernetes + +When talking to Hadoop-based services behind Kerberos, it was noted that Spark needs to obtain delegation tokens +so that non-local processes can authenticate. These delegation tokens in Kubernetes are stored in Secrets that are +shared by the Driver and its Executors. As such, there are three ways of submitting a kerberos job: + +1. Submitting with a $kinit that stores a TGT in the Local Ticket Cache: +```bash +/usr/bin/kinit -kt / +/opt/spark/bin/spark-submit \ +--deploy-mode cluster \ +--class org.apache.spark.examples.HdfsTest \ +--master k8s:// \ +--conf spark.executor.instances=1 \ +--conf spark.app.name=spark-hdfs \ +--conf spark.kubernetes.container.image=spark:latest \ +--conf spark.kubernetes.kerberos.krb5location=/etc/krb5.conf \ + local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar \ + +``` +3. Submitting with a local keytab and principle +```bash +/opt/spark/bin/spark-submit \ +--deploy-mode cluster \ +--class org.apache.spark.examples.HdfsTest \ +--master k8s:// \ +--conf spark.executor.instances=1 \ +--conf spark.app.name=spark-hdfs \ +--conf spark.kubernetes.container.image=spark:latest \ +--conf spark.kubernetes.kerberos.keytab= \ +--conf spark.kubernetes.kerberos.principal= \ +--conf spark.kubernetes.kerberos.krb5location=/etc/krb5.conf \ + local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar \ + +``` + +3. Submitting with pre-populated secrets already existing within the namespace --- End diff -- What are these secrets? Are they the ticket cache, a delegation token cache, something else? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215427665 --- Diff: examples/src/main/scala/org/apache/spark/examples/HdfsTest.scala --- @@ -41,6 +41,8 @@ object HdfsTest { val end = System.currentTimeMillis() println(s"Iteration $iter took ${end-start} ms") } +println(s"File contents: ${file.map(s => s.toString).collect().mkString(",")}") --- End diff -- Spark's convention is either `.map { s=> s.toString }` or `.map(_.toString)`. I noticed this in a few other places too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215429280 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala --- @@ -60,7 +62,23 @@ private[spark] case class KubernetesConf[T <: KubernetesRoleSpecificConf]( roleSecretEnvNamesToKeyRefs: Map[String, String], roleEnvs: Map[String, String], roleVolumes: Iterable[KubernetesVolumeSpec[_ <: KubernetesVolumeSpecificConf]], -sparkFiles: Seq[String]) { +sparkFiles: Seq[String], +hadoopConfDir: Option[String]) { + + def getHadoopConfigMapName: String = s"$appResourceNamePrefix-hadoop-config" + + def getKRBConfigMapName: String = s"$appResourceNamePrefix-krb5-file" + + def getHadoopStepsOrchestrator : Option[HadoopStepsOrchestrator] = hadoopConfDir.map { +hConf => new HadoopStepsOrchestrator( --- End diff -- nit: `hConf =>` stays together with the `{`; if it doesn't fit, use a full method declaration with enclosing braces. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215434745 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/security/KubernetesHadoopDelegationTokenManager.scala --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.k8s.security + +import java.io.File + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.FileSystem +import org.apache.hadoop.security.{Credentials, UserGroupInformation} + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.security.HadoopDelegationTokenManager +import org.apache.spark.internal.Logging +import org.apache.spark.util.{Clock, SystemClock} + + /** + * The KubernetesHadoopDelegationTokenManager fetches and updates Hadoop delegation tokens + * on the behalf of the Kubernetes submission client. It is modeled after the YARN + * AMCredentialRenewer, renewals in Kubernetes happen in a seperate microservice that will + * automatically update the Tokens via Kubernetes Secrets. The principal difference is that + * instead of writing the new credentials to HDFS and incrementing the timestamp of the file, + * the new credentials (called Tokens when they are serialized) are stored in Secrets accessible + * to the driver and executors, when new Tokens are received they overwrite the current Secrets. + */ +private[spark] class KubernetesHadoopDelegationTokenManager extends Logging { + + // HadoopUGI Util methods + private val clock: Clock = new SystemClock() + def getCurrentUser: UserGroupInformation = UserGroupInformation.getCurrentUser + def getShortUserName : String = getCurrentUser.getShortUserName + def getFileSystem(hadoopConf: Configuration) : FileSystem = FileSystem.get(hadoopConf) + def isSecurityEnabled: Boolean = UserGroupInformation.isSecurityEnabled + def loginUserFromKeytabAndReturnUGI(principal: String, keytab: String): UserGroupInformation = + UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab) + def getCurrentTime: Long = clock.getTimeMillis() + def serializeCreds(creds: Credentials): Array[Byte] = SparkHadoopUtil.get.serialize(creds) + def nextRT(rt: Long, conf: SparkConf): Long = SparkHadoopUtil.nextCredentialRenewalTime(rt, conf) + + // Grab files in the HADOOP_CONF directory + def getHadoopConfFiles(path: String) : Seq[File] = { + val dir = new File(path) + if (dir.isDirectory) { + dir.listFiles.flatMap { file => Some(file).filter(_.isFile) }.toSeq + } else { + Seq.empty[File] + } + } + + // Principle method in charge of retrieving new Delegation Tokens --- End diff -- Main method. Or just drop the comment altogether, it's not really adding anything. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215428117 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -225,6 +225,60 @@ private[spark] object Config extends Logging { "Ensure that major Python version is either Python2 or Python3") .createWithDefault("2") + val KUBERNETES_KERBEROS_PROXY_USER = --- End diff -- I see this config being set in the code, but never read? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22305 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95730/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215427909 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -225,6 +225,60 @@ private[spark] object Config extends Logging { "Ensure that major Python version is either Python2 or Python3") .createWithDefault("2") + val KUBERNETES_KERBEROS_PROXY_USER = +ConfigBuilder("spark.kubernetes.kerberos.proxyUser") + .doc("Specify the proxy user " + --- End diff -- Fits in one line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215431579 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadoopsteps/HadoopBootstrapUtil.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadoopsteps + +import java.io.File + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMap, ConfigMapBuilder, ContainerBuilder, KeyToPathBuilder, PodBuilder} + +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod +import org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager + +private[spark] object HadoopBootstrapUtil { + + /** +* Mounting the DT secret for both the Driver and the executors +* +* @param dtSecretName Name of the secret that stores the Delegation Token +* @param dtSecretItemKey Name of the Item Key storing the Delegation Token +* @param userName Name of the SparkUser to set SPARK_USER +* @param fileLocation Location of the krb5 file +* @param krb5ConfName Name of the ConfigMap for Krb5 +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapKerberosPod( + dtSecretName: String, + dtSecretItemKey: String, + userName: String, + fileLocation: String, + krb5ConfName: String, + pod: SparkPod) : SparkPod = { + val krb5File = new File(fileLocation) + val fileStringPath = krb5File.toPath.getFileName.toString + val kerberizedPod = new PodBuilder(pod.pod) +.editOrNewSpec() + .addNewVolume() +.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) +.withNewSecret() + .withSecretName(dtSecretName) + .endSecret() +.endVolume() + .addNewVolume() +.withName(KRB_FILE_VOLUME) + .withNewConfigMap() +.withName(krb5ConfName) +.withItems(new KeyToPathBuilder() + .withKey(fileStringPath) + .withPath(fileStringPath) + .build()) +.endConfigMap() + .endVolume() + .endSpec() +.build() + val kerberizedContainer = new ContainerBuilder(pod.container) +.addNewVolumeMount() + .withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) + .withMountPath(SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR) + .endVolumeMount() +.addNewVolumeMount() + .withName(KRB_FILE_VOLUME) + .withMountPath(KRB_FILE_DIR_PATH + "/krb5.conf") + .withSubPath("krb5.conf") + .endVolumeMount() +.addNewEnv() + .withName(ENV_HADOOP_TOKEN_FILE_LOCATION) + .withValue(s"$SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR/$dtSecretItemKey") + .endEnv() +.addNewEnv() + .withName(ENV_SPARK_USER) + .withValue(userName) + .endEnv() +.build() +SparkPod(kerberizedPod, kerberizedContainer) + } + + /** +* setting ENV_SPARK_USER when HADOOP_FILES are detected +* +* @param sparkUserName Name of the SPARK_USER +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapSparkUserPod( + sparkUserName: String, + pod: SparkPod) : SparkPod = { + val envModifiedContainer = new ContainerBuilder(pod.container) + .addNewEnv() --- End diff -- indent --- - To unsubscribe, e-mail:
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215431371 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/OptionRequirements.scala --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +private[spark] object OptionRequirements { --- End diff -- These methods seem to fit just fine in `KubernetesUtils`, instead of adding a new object for them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215435605 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilder.scala --- @@ -51,7 +51,11 @@ private[spark] class KubernetesDriverBuilder( provideJavaStep: ( KubernetesConf[KubernetesDriverSpecificConf] => JavaDriverFeatureStep) = -new JavaDriverFeatureStep(_)) { +new JavaDriverFeatureStep(_), --- End diff -- Indentation here is super odd, but it is also odd for all the existing code. But it's kinda hard to tell what's going on. Seems like ` new JavaDriverFeatureStep(_)` is the body of a closure, in which case it might be clearer to either keep in the same line as the `=`, or just use braces and indent the block properly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215427411 --- Diff: docs/security.md --- @@ -722,6 +722,62 @@ with encryption, at least. The Kerberos login will be periodically renewed using the provided credentials, and new delegation tokens for supported will be created. +## Secure Interaction with Kubernetes + +When talking to Hadoop-based services behind Kerberos, it was noted that Spark needs to obtain delegation tokens +so that non-local processes can authenticate. These delegation tokens in Kubernetes are stored in Secrets that are +shared by the Driver and its Executors. As such, there are three ways of submitting a kerberos job: + +1. Submitting with a $kinit that stores a TGT in the Local Ticket Cache: +```bash +/usr/bin/kinit -kt / +/opt/spark/bin/spark-submit \ --- End diff -- Instead of using a command line example with some configs that are really unrelated to the feature being explained, how about only explaining the configs that need to be set to enable the feature? Preferably using a table like other config-related documents use. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22305 **[Test build #95730 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95730/testReport)** for PR 22305 at commit [`09a79c2`](https://github.com/apache/spark/commit/09a79c216a42b3f7a20e4aaef6f56bcf79e405cb). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215429047 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala --- @@ -60,7 +62,23 @@ private[spark] case class KubernetesConf[T <: KubernetesRoleSpecificConf]( roleSecretEnvNamesToKeyRefs: Map[String, String], roleEnvs: Map[String, String], roleVolumes: Iterable[KubernetesVolumeSpec[_ <: KubernetesVolumeSpecificConf]], -sparkFiles: Seq[String]) { +sparkFiles: Seq[String], +hadoopConfDir: Option[String]) { + + def getHadoopConfigMapName: String = s"$appResourceNamePrefix-hadoop-config" --- End diff -- Please avoid `get` in the property names (see existing methods in this class). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215432887 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadoopsteps/HadoopKerberosKeytabResolverStep.scala --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadoopsteps + +import java.io._ +import java.security.PrivilegedExceptionAction + +import scala.collection.JavaConverters._ + +import io.fabric8.kubernetes.api.model.SecretBuilder +import org.apache.commons.codec.binary.Base64 + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager +import org.apache.spark.deploy.security.HadoopDelegationTokenManager +import org.apache.spark.internal.Logging + + /** + * This step does all the heavy lifting for Delegation Token logic. This step + * assumes that the job user has either specified a principal and keytab or ran + * $kinit before running spark-submit. With a TGT stored locally, by running + * UGI.getCurrentUser you are able to obtain the current user, alternatively + * you can run UGI.loginUserFromKeytabAndReturnUGI and by running .doAs run + * as the logged into user instead of the current user. With the Job User principal + * you then retrieve the delegation token from the NameNode and store values in + * DelegationToken. Lastly, the class puts the data into a secret. All this is + * appended to the current HadoopSpec which in turn will append to the current + * DriverSpec. + */ +private[spark] class HadoopKerberosKeytabResolverStep( +submissionSparkConf: SparkConf, +kubernetesResourceNamePrefix : String, +maybePrincipal: Option[String], +maybeKeytab: Option[File], +maybeRenewerPrincipal: Option[String], +tokenManager: KubernetesHadoopDelegationTokenManager) + extends HadoopConfigurationStep with Logging { + +override def configureHadoopSpec(hSpec: HadoopConfigSpec): HadoopConfigSpec = { + val hadoopConf = SparkHadoopUtil.get.newConfiguration(submissionSparkConf) + if (!tokenManager.isSecurityEnabled) { +throw new SparkException("Hadoop not configured with Kerberos") + } + val maybeJobUserUGI = --- End diff -- `SparkSubmit` already logs in the user if a keytab is provided, so you could reuse that. Only issue is that it uses the existing configs which have "yarn" in their name. I think it would be better to create common names for the principal and keytab configs, and deprecate the YARN-specific ones. Then you can just use `UGI.getCurrentUser` here. You could even simplify the logic below (no need for `doAs` in that case). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215430810 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopGlobalFeatureDriverStep.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, ContainerBuilder, HasMetadata, PodBuilder} + +import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod} +import org.apache.spark.deploy.k8s.Config.{KUBERNETES_KERBEROS_KRB5_FILE, KUBERNETES_KERBEROS_PROXY_USER} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadoopsteps.{HadoopBootstrapUtil, HadoopConfigSpec, HadoopConfigurationStep} +import org.apache.spark.internal.Logging + + /** + * This is the main method that runs the hadoopConfigurationSteps defined + * by the HadoopStepsOrchestrator. These steps are run to modify the + * SparkPod and Kubernetes Resources using the additive method of the feature steps + */ +private[spark] class HadoopGlobalFeatureDriverStep( + kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) + extends KubernetesFeatureConfigStep with Logging { + private val hadoopTestOrchestrator = + kubernetesConf.getHadoopStepsOrchestrator + require(kubernetesConf.hadoopConfDir.isDefined && + hadoopTestOrchestrator.isDefined, "Ensure that HADOOP_CONF_DIR is defined") + private val hadoopSteps = + hadoopTestOrchestrator + .map(hto => hto.getHadoopSteps(kubernetesConf.getTokenManager)) + .getOrElse(Seq.empty[HadoopConfigurationStep]) + + var currentHadoopSpec = HadoopConfigSpec( + configMapProperties = Map.empty[String, String], + dtSecret = None, + dtSecretName = KERBEROS_DELEGEGATION_TOKEN_SECRET_NAME, + dtSecretItemKey = None, + jobUserName = None) + + for (nextStep <- hadoopSteps) { + currentHadoopSpec = nextStep.configureHadoopSpec(currentHadoopSpec) + } + + override def configurePod(pod: SparkPod): SparkPod = { +val hadoopBasedSparkPod = HadoopBootstrapUtil.bootstrapHadoopConfDir( + kubernetesConf.hadoopConfDir.get, + kubernetesConf.getHadoopConfigMapName, + kubernetesConf.getTokenManager, + pod) + +val maybeKerberosModification = + for { +secretItemKey <- currentHadoopSpec.dtSecretItemKey +userName <- currentHadoopSpec.jobUserName +krb5fileLocation <- kubernetesConf.get(KUBERNETES_KERBEROS_KRB5_FILE) + } yield { +HadoopBootstrapUtil.bootstrapKerberosPod( + currentHadoopSpec.dtSecretName, + secretItemKey, + userName, + krb5fileLocation, + kubernetesConf.getKRBConfigMapName, + hadoopBasedSparkPod) + } +maybeKerberosModification.getOrElse( + HadoopBootstrapUtil.bootstrapSparkUserPod( +kubernetesConf.getTokenManager.getCurrentUser.getShortUserName, +hadoopBasedSparkPod)) + } + + override def getAdditionalPodSystemProperties(): Map[String, String] = { +val maybeKerberosConfValues = + for { +secretItemKey <- currentHadoopSpec.dtSecretItemKey +userName <- currentHadoopSpec.jobUserName + } yield { +Map(KERBEROS_KEYTAB_SECRET_NAME -> currentHadoopSpec.dtSecretName, + KERBEROS_KEYTAB_SECRET_KEY -> secretItemKey, + KERBEROS_SPARK_USER_NAME -> userName) +} --- End diff -- nit: more indent --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215435018 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala --- @@ -109,7 +112,8 @@ private[spark] class Client( def run(): Unit = { val resolvedDriverSpec = builder.buildFromFeatures(kubernetesConf) val configMapName = s"$kubernetesResourceNamePrefix-driver-conf-map" -val configMap = buildConfigMap(configMapName, resolvedDriverSpec.systemProperties) +val configMap = --- End diff -- Fits in one line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215431778 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadoopsteps/HadoopBootstrapUtil.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadoopsteps + +import java.io.File + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMap, ConfigMapBuilder, ContainerBuilder, KeyToPathBuilder, PodBuilder} + +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod +import org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager + +private[spark] object HadoopBootstrapUtil { + + /** +* Mounting the DT secret for both the Driver and the executors +* +* @param dtSecretName Name of the secret that stores the Delegation Token +* @param dtSecretItemKey Name of the Item Key storing the Delegation Token +* @param userName Name of the SparkUser to set SPARK_USER +* @param fileLocation Location of the krb5 file +* @param krb5ConfName Name of the ConfigMap for Krb5 +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapKerberosPod( + dtSecretName: String, + dtSecretItemKey: String, + userName: String, + fileLocation: String, + krb5ConfName: String, + pod: SparkPod) : SparkPod = { + val krb5File = new File(fileLocation) + val fileStringPath = krb5File.toPath.getFileName.toString + val kerberizedPod = new PodBuilder(pod.pod) +.editOrNewSpec() + .addNewVolume() +.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) +.withNewSecret() + .withSecretName(dtSecretName) + .endSecret() +.endVolume() + .addNewVolume() +.withName(KRB_FILE_VOLUME) + .withNewConfigMap() +.withName(krb5ConfName) +.withItems(new KeyToPathBuilder() + .withKey(fileStringPath) + .withPath(fileStringPath) + .build()) +.endConfigMap() + .endVolume() + .endSpec() +.build() + val kerberizedContainer = new ContainerBuilder(pod.container) +.addNewVolumeMount() + .withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) + .withMountPath(SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR) + .endVolumeMount() +.addNewVolumeMount() + .withName(KRB_FILE_VOLUME) + .withMountPath(KRB_FILE_DIR_PATH + "/krb5.conf") + .withSubPath("krb5.conf") + .endVolumeMount() +.addNewEnv() + .withName(ENV_HADOOP_TOKEN_FILE_LOCATION) + .withValue(s"$SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR/$dtSecretItemKey") + .endEnv() +.addNewEnv() + .withName(ENV_SPARK_USER) + .withValue(userName) + .endEnv() +.build() +SparkPod(kerberizedPod, kerberizedContainer) + } + + /** +* setting ENV_SPARK_USER when HADOOP_FILES are detected +* +* @param sparkUserName Name of the SPARK_USER +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapSparkUserPod( + sparkUserName: String, --- End diff -- this is actually indented 3 spaces from the `def`... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215434370 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/security/KubernetesHadoopDelegationTokenManager.scala --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.k8s.security + +import java.io.File + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.FileSystem +import org.apache.hadoop.security.{Credentials, UserGroupInformation} + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.security.HadoopDelegationTokenManager +import org.apache.spark.internal.Logging +import org.apache.spark.util.{Clock, SystemClock} + + /** + * The KubernetesHadoopDelegationTokenManager fetches and updates Hadoop delegation tokens + * on the behalf of the Kubernetes submission client. It is modeled after the YARN + * AMCredentialRenewer, renewals in Kubernetes happen in a seperate microservice that will --- End diff -- I commented on the spec, and I have issues with this microservice thingy, but since it's not being added in this PR the comment should not refer to it; instead just say renewals are not yet supported. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22305 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22305 **[Test build #95730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95730/testReport)** for PR 22305 at commit [`09a79c2`](https://github.com/apache/spark/commit/09a79c216a42b3f7a20e4aaef6f56bcf79e405cb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2884/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22305 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22325 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95726/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22325 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22325 **[Test build #95726 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95726/testReport)** for PR 22325 at commit [`ec069b3`](https://github.com/apache/spark/commit/ec069b3d269b6ec2e6ca9edcc0f6981ccecdad3d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22221: [SPARK-25231] : Fix synchronization of executor h...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/22112 testing so far looks good. I'm +1 for this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22221: [SPARK-25231] : Fix synchronization of executor heartbea...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/1 merged to master and 2.3, thanks @pgandhi999 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20999: [WIP][SPARK-14922][SPARK-23866][SQL] Support partition f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20999 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95725/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20999: [WIP][SPARK-14922][SPARK-23866][SQL] Support partition f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20999 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20999: [WIP][SPARK-14922][SPARK-23866][SQL] Support partition f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20999 **[Test build #95725 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95725/testReport)** for PR 20999 at commit [`148f477`](https://github.com/apache/spark/commit/148f47742cae892260c46f9ffa97bb2d0422701d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/22209 would be nice to put into 2.3 as well, I realize we are close to rc but I don't think we should stop backporting fixes since I don't expect 2.4 to be stable for a while. If we stop for a bit for this rc we should have some way to track to pull back after rc. thoughts? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22299: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics...
Github user arunmahadevan closed the pull request at: https://github.com/apache/spark/pull/22299 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22318: [SPARK-25150][SQL] Rewrite condition when dedupli...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22318#discussion_r215402140 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -921,12 +924,18 @@ class Analyzer( failAnalysis("Invalid usage of '*' in explode/json_tuple/UDTF") // To resolve duplicate expression IDs for Join and Intersect - case j @ Join(left, right, _, _) if !j.duplicateResolved => -j.copy(right = dedupRight(left, right)) + case j @ Join(left, right, _, condition) if !j.duplicateResolved => +val (dedupedRight, attributeRewrites) = dedupRight(left, right) +val changedCondition = condition.map(_.transform { + case attr: Attribute if attr.resolved => attributeRewrites.getOrElse(attr, attr) --- End diff -- This looks fishy. We will not rewrite attributes in the join condition when they are not resolved? What will happen then? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user NiharS commented on the issue: https://github.com/apache/spark/pull/22192 You're right, I ran in local-cluster and it exited very quickly citing executors shutting down after not being able to find my test plugin. Although, the logs say that it does use a CoarseGrainedExecutorBackend: `18/09/05 12:03:20 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@nihar-xxx:45767` unless you mean Yarn only uses that one command line option. I'm looking into how it reacts in regular standalone mode with different thread counts. Thanks for pointing this out! I'm looking up other spark-submit options that can be used to provide the jar to other nodes, but I'm not super hopeful it's going to work out. If it indeed doesn't, I'll start exploring other options once I figure out why the pyspark tests are failing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22325: [SPARK-25318]. Add exception handling when wrappi...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/22325#discussion_r215396772 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -444,10 +444,23 @@ final class ShuffleBlockFetcherIterator( throwFetchFailedException(blockId, address, e) } - input = streamWrapper(blockId, in) + var wrapCorruption: Boolean = false + try { +input = streamWrapper(blockId, in) + } catch { +case e: IOException => + buf.release() + logWarning(s"got a corrupted block $blockId from $address while wrapping it" + +s" locally, fetch again", e) + corruptedBlocks += blockId + fetchRequests += FetchRequest(address, Array((blockId, size))) + wrapCorruption = true + result = null + in.close + } // Only copy the stream if it's wrapped by compression or encryption, also the size of // block is small (the decompressed block is smaller than maxBytesInFlight) - if (detectCorrupt && !input.eq(in) && size < maxBytesInFlight / 3) { + if (detectCorrupt && !wrapCorruption && !input.eq(in) && size < maxBytesInFlight / 3) { val originalInput = input --- End diff -- @davies would appreciate your comments about this change as well. Also I have a qq. What is the use of originalInput? it seems that this var isn'e used any where except in the finally clause. I think it can be removed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22328 Merged into master. Thanks @WeichenXu123 for the implementation and everyone for the review! I created the following JIRAs as follow-ups: * deprecate ImageSchema: https://issues.apache.org/jira/browse/SPARK-25345 * list built-in data sources in doc site: https://issues.apache.org/jira/browse/SPARK-25346 * doc for image data source: https://issues.apache.org/jira/browse/SPARK-25347 * data source for binary files: https://issues.apache.org/jira/browse/SPARK-25348 * support sample pushdown: https://issues.apache.org/jira/browse/SPARK-25349 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22325: [SPARK-25318]. Add exception handling when wrappi...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/22325#discussion_r215391696 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -444,10 +444,23 @@ final class ShuffleBlockFetcherIterator( throwFetchFailedException(blockId, address, e) } - input = streamWrapper(blockId, in) + var wrapCorruption: Boolean = false --- End diff -- yeah that is right. I will change that --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22318: [SPARK-25150][SQL] Rewrite condition when dedupli...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22318#discussion_r215391569 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -805,10 +807,10 @@ class Analyzer( * that this rule cannot handle. When that is the case, there must be another rule * that resolves these conflicts. Otherwise, the analysis will fail. */ - right + (right, AttributeMap.empty) --- End diff -- `AttributeMap.empty[Attribute]` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22318: [SPARK-25150][SQL] Rewrite condition when dedupli...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22318#discussion_r215391402 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeMap.scala --- @@ -23,12 +23,14 @@ package org.apache.spark.sql.catalyst.expressions * of the name, or the expected nullability). */ object AttributeMap { + val empty = new AttributeMap(Map.empty) --- End diff -- ```Scala def empty[A]: AttributeMap[A] = new AttributeMap(Map.empty) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22318: [SPARK-25150][SQL] Rewrite condition when dedupli...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22318#discussion_r215391459 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeMap.scala --- @@ -23,12 +23,14 @@ package org.apache.spark.sql.catalyst.expressions * of the name, or the expected nullability). */ object AttributeMap { + val empty = new AttributeMap(Map.empty) + def apply[A](kvs: Seq[(Attribute, A)]): AttributeMap[A] = { new AttributeMap(kvs.map(kv => (kv._1.exprId, kv)).toMap) } } -class AttributeMap[A](val baseMap: Map[ExprId, (Attribute, A)]) +class AttributeMap[+A](val baseMap: Map[ExprId, (Attribute, A)]) --- End diff -- revert the change here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22328 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22325: [SPARK-25318]. Add exception handling when wrappi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22325#discussion_r215390096 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -444,10 +444,23 @@ final class ShuffleBlockFetcherIterator( throwFetchFailedException(blockId, address, e) } - input = streamWrapper(blockId, in) + var wrapCorruption: Boolean = false --- End diff -- The reason for the error is the same (corruption in the source data), so a different error message doesn't really add any value. The exception will contain the line where it happened so that's enough for debugging. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22325: [SPARK-25318]. Add exception handling when wrappi...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/22325#discussion_r215388601 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -444,10 +444,23 @@ final class ShuffleBlockFetcherIterator( throwFetchFailedException(blockId, address, e) } - input = streamWrapper(blockId, in) + var wrapCorruption: Boolean = false --- End diff -- Thanks for the review. I didn't do that to give more specific warning message and being able to point whether the problem happened during wrapping or after, but if there isn'r any value in separating that I can change it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22288: [SPARK-22148][Scheduler] Acquire new executors to avoid ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22288 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2883/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22288: [SPARK-22148][Scheduler] Acquire new executors to avoid ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22288 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22318 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22318 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95723/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22318 **[Test build #95723 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95723/testReport)** for PR 22318 at commit [`809b8a8`](https://github.com/apache/spark/commit/809b8a83b7ec3d62ba6d65f6aff6a7d3175bd3e3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22288: [SPARK-22148][Scheduler] Acquire new executors to avoid ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22288 **[Test build #95729 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95729/testReport)** for PR 22288 at commit [`87c4e57`](https://github.com/apache/spark/commit/87c4e57bb966078c8a78eabc5a5e4b6f60c78f28). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22328 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22328 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95727/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22338 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95721/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22338 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22328 **[Test build #95727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95727/testReport)** for PR 22328 at commit [`218ce4c`](https://github.com/apache/spark/commit/218ce4cf796308c8705a27889b25100e2b779365). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22338 **[Test build #95721 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95721/testReport)** for PR 22338 at commit [`6cb94ed`](https://github.com/apache/spark/commit/6cb94ed5ed76e23e4fb775a2fd6f0e66e9c15abd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22340: [SPARK-25337][SQL][TEST] `runSparkSubmit` should provide...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22340 Thank you for approval, @srowen . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ig...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22336 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22336 Wow, Great. Thanks, @gatorsmile ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22335: [SPARK-25091][SQL] reduce the storage memory in Executor...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22335 Also, this should be `[core]` not `[sql]`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22336 Sorry,,, I just merged it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22336 Finally, it's passed. Can I merge this, @cloud-fan and @HyukjinKwon ? :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22336 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95722/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22336 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22336 **[Test build #95722 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95722/testReport)** for PR 22336 at commit [`69f207f`](https://github.com/apache/spark/commit/69f207f8a4531435c4a8df790780557968a33bb1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org