[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22087 PredictionModel as a super-class of unsupervised and supervised seems sane to me. Returning a double to unify the signature also seems sane, although the thought of casting it might irk people. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22087 This LGTM, but it raises a more general question about the lack of single-sample prediction over the entire hierarchy. For example (IMO) there should be some kind of single-sample method associated with `org.apache.spark.ml.Model`, or `org.apache.spark.ml.Transformer`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22146: [SPARK-24434][K8S] pod template files
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22146 @mccheah integration testing is passing with the latest container selection policy, good to merge? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22146: [SPARK-24434][K8S] pod template files
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22146 thanks @shaneknapp ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22146: [SPARK-24434][K8S] pod template files
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22146 k8s integration test failure appears repeatable: ``` Downloading from central: https://repo.maven.apache.org/maven2/com/squareup/okhttp3/okhttp/3.9.1/okhttp-3.9.1.jar [WARNING] Failed to write tracking file /home/jenkins/.m2/repository/com/squareup/okhttp3/okhttp/3.9.1/okhttp-3.9.1.jar.lastUpdated java.io.FileNotFoundException: /home/jenkins/.m2/repository/com/squareup/okhttp3/okhttp/3.9.1/okhttp-3.9.1.jar.lastUpdated (Permission denied) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22146: [SPARK-24434][K8S] pod template files
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22146 Latest container selection w/ default to first LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22820: [SPARK-25828][K8S] Bumping Kubernetes-Client version to ...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22820 merging --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22820: [SPARK-25828][K8S][BUILD] Bumping Kubernetes-Clie...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/22820#discussion_r228321586 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala --- @@ -157,7 +157,10 @@ private[spark] object KubernetesUtils { }.getOrElse(Seq(("container state", "N/A"))) } - def formatTime(time: Time): String = { -if (time != null) time.getTime else "N/A" + def formatTime(time: String): String = { --- End diff -- I would keep it, but possibly add a comment about data type change --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22820: [SPARK-25828][K8S][BUILD] Bumping Kubernetes-Clie...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/22820#discussion_r228321407 --- Diff: docs/running-on-kubernetes.md --- @@ -45,7 +45,8 @@ logs and remains in "completed" state in the Kubernetes API until it's eventuall Note that in the completed state, the driver pod does *not* use any computational or memory resources. -The driver and executor pod scheduling is handled by Kubernetes. It is possible to schedule the +The driver and executor pod scheduling is handled by Kubernetes. Communication to the Kubernetes API is done via fabric8, and we are +currently running kubernetes-client version 4.1.0. Make sure that any infrastructure additions are weary of said version. It is possible to schedule the --- End diff -- nit: "wary" ? or "aware of " ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22838: [SPARK-25835][K8s] fix issues with k8s tests
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22838 LGTM - the k8s integration tests are off by default, can be run via `-P...`, and are passwing via Jenkins --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22433: [SPARK-25442][SQL][K8S] Support STS to run in k8s deploy...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22433 @suryag10, all things being equal, it is considered preferable to provide testing for new functionality on the same PR. Are there are logistical problems adding testing here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22433: [SPARK-25442][SQL][K8S] Support STS to run in k8s deploy...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22433 @suryag10 you were probably encountering github server problems from yesterday: https://status.github.com/messages --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/13440 update - I'm consulting with some teammates about what it might mean to also support Bayesian variations on split quality, since there has been a lot of interest in the last few years regarding Bayesian alternatives to more traditional p-value based statistics. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22608: [SPARK-23257][K8S][TESTS] Kerberos Support Integr...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/22608#discussion_r225625939 --- Diff: bin/docker-image-tool.sh --- @@ -71,18 +71,29 @@ function build { --build-arg base_img=$(image_ref spark) ) - local BASEDOCKERFILE=${BASEDOCKERFILE:-"$IMG_PATH/spark/Dockerfile"} - local PYDOCKERFILE=${PYDOCKERFILE:-"$IMG_PATH/spark/bindings/python/Dockerfile"} - local RDOCKERFILE=${RDOCKERFILE:-"$IMG_PATH/spark/bindings/R/Dockerfile"} + local BASEDOCKERFILE=${BASEDOCKERFILE:-"$IMG_PATH/main/dockerfiles/spark/Dockerfile"} + local PYDOCKERFILE=${PYDOCKERFILE:-"$IMG_PATH/main/dockerfiles/spark/bindings/python/Dockerfile"} + local KDOCKERFILE=${KDOCKERFILE:-"$IMG_PATH/test/dockerfiles/spark/kerberos/Dockerfile"} + local RDOCKERFILE=${RDOCKERFILE:-"$IMG_PATH/main/dockerfiles/spark/bindings/R/Dockerfile"} + # Spark Base docker build $NOCACHEARG "${BUILD_ARGS[@]}" \ -t $(image_ref spark) \ -f "$BASEDOCKERFILE" . + # PySpark docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \ -t $(image_ref spark-py) \ -f "$PYDOCKERFILE" . + # The following are optional docker builds for Kerberos Testing + docker pull ifilonenko/hadoop-base:latest --- End diff -- I had also discussed concerns re: pulling a "personal" image w/ @ifilonenko - I'd prefer some other solution here. Docker build seems plausible, but it would be best if it can be done once per test run for efficiency reasons --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-23257][K8S][TESTS] Kerberos Support Integration T...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22608 Although this is a large patch, its impact on existing code is small, and it is nearly all testing code. Unless the tests themselves are unstable, I'd consider this plausible to include with the 2.4 release. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-23257][K8S][TESTS] Kerberos Support Integration T...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22608 re: hadoop-2.7.3.tgz is that something Shane needs to install on the testing infra, to build the images you want? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-23257][K8S][TESTS] Kerberos Support Integration T...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22608 @ifilonenko can we work with the existing service-account-name config parameters for obtaining the resource permissions? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22433: [SPARK-25442][SQL][K8S] Support STS to run in k8s deploy...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22433 If possible, there should be some basic integration testing. Run a thrift server command against the minishift cluster used by the other testing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22433: [SPARK-25442][SQL][K8S] Support STS to run in k8s deploy...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22433 In the scenario of a cluster-mode submission, what is the command-line behavior? Does the thrift-server script "block" until the thrift server pod is shut down? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/13440 I think targeting 3.0 with a refactor makes the most sense. There's no way to do this without making small breaking changes, but slightly larger changes could clean up the design. `ImpurityCalculator` can subsume `Impurity`, and a more general rethinking of gain and impurity can be accommodated too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13440: [SPARK-15699] [ML] Implement a Chi-Squared test s...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/13440#discussion_r218997600 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Gini.scala --- @@ -71,6 +71,23 @@ object Gini extends Impurity { @Since("1.1.0") def instance: this.type = this + /** + * :: DeveloperApi :: + * p-values for test-statistic measures, unsupported for [[Gini]] + */ + @Since("2.2.0") + @DeveloperApi + def calculate(calcL: ImpurityCalculator, calcR: ImpurityCalculator): Double = --- End diff -- @srowen @willb I cached the design of the metrics back in. In general, `Impurity` already uses methods that are only defined on certain impurity sub-classes, and so this new method does not change that situation. My take on the "problem" is that the existing measures are all based on a localized concept of "purity" (or impurity) that can be calculated using _only_ the data at a single node. Splitting based on statistical tests (p-values) breaks that model, since it is making use of a more generalized concept of split quality that requires the sample populations of both children from a candidate split. A maximally general signature would probably involve the parent and both children. Another kink in the current design is that `ImpurityCalculator` is essentially parallel to `Impurity`, and in fact `ImpurityCalculator#calculate()` is how impurity measures are currently requested. `Impurity` seems somewhat redundant, and might be factored out in favor of `ImpurityCalculator`. The current signature `calculate()` might be generalized into a more inclusive concept of split quality that expects to make use of {parent,left,right}. Calls to `calculate()` are not very wide-spread but threading that change through is outside the scope of this particular PR. If people are interested in that kind of refactoring I could look into it in the near future but probably not in the next couple weeks. That kind of change would also be API breaking and so a good target for 3.0 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13440: [SPARK-15699] [ML] Implement a Chi-Squared test s...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/13440#discussion_r218252461 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala --- @@ -52,6 +52,49 @@ trait Impurity extends Serializable { @Since("1.0.0") @DeveloperApi def calculate(count: Double, sum: Double, sumSquares: Double): Double + + /** + * :: DeveloperApi :: + * Compute a test-statistic p-value quality measure from left and right split populations + * @param calcL impurity calculator for the left split population + * @param calcR impurity calculator for the right split population + * @return The p-value for the null hypothesis; that left and right split populations + * represent the same distribution + * @note Unless overridden this method will fail with an exception, for backward compatability + */ + @Since("2.2.0") + @DeveloperApi + def calculate(calcL: ImpurityCalculator, calcR: ImpurityCalculator): Double + + /** + * :: DeveloperApi :: + * Determine if this impurity measure is a test-statistic measure + * @return True if this is a split quality measure based on a test statistic (i.e. returns a + * p-value) or false otherwise. + * @note Unless overridden this method returns false by default, for backward compatability + */ + @Since("2.2.0") + @DeveloperApi + def isTestStatistic: Boolean --- End diff -- Can this be customized or extended externally to spark? I'm wondering why it is public. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13440: [SPARK-15699] [ML] Implement a Chi-Squared test s...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/13440#discussion_r218252156 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Gini.scala --- @@ -71,6 +71,23 @@ object Gini extends Impurity { @Since("1.1.0") def instance: this.type = this + /** + * :: DeveloperApi :: + * p-values for test-statistic measures, unsupported for [[Gini]] + */ + @Since("2.2.0") + @DeveloperApi + def calculate(calcL: ImpurityCalculator, calcR: ImpurityCalculator): Double = --- End diff -- I suspect that the generalization is closer to my newer signature `val pval = imp.calculate(leftImpurityCalculator, rightImpurityCalculator)` where you have all the context from the left and right nodes. The existing gain-based calculation should fit into this framework, just doing its current weighted average of purity gain. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13440: [SPARK-15699] [ML] Implement a Chi-Squared test s...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/13440#discussion_r218246415 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Gini.scala --- @@ -71,6 +71,23 @@ object Gini extends Impurity { @Since("1.1.0") def instance: this.type = this + /** + * :: DeveloperApi :: + * p-values for test-statistic measures, unsupported for [[Gini]] + */ + @Since("2.2.0") + @DeveloperApi + def calculate(calcL: ImpurityCalculator, calcR: ImpurityCalculator): Double = --- End diff -- I'll consider if there's a unifying idea here. pval-based metrics require integrating information across the new split children, which I believe was not the case for existing methods. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13440: [SPARK-15699] [ML] Implement a Chi-Squared test s...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/13440#discussion_r218245862 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/ChiSquared.scala --- @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.tree.impurity + +import org.apache.spark.annotation.{DeveloperApi, Experimental, Since} + +/** + * :: Experimental :: + * Class for calculating Chi Squared as a split quality metric during binary classification. + */ +@Since("2.2.0") --- End diff -- I'll update those. 3.0 might be a good target, especially if I can't do this without new `isTestStatistic` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13440: [SPARK-15699] [ML] Implement a Chi-Squared test s...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/13440#discussion_r218245670 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -670,14 +670,32 @@ private[spark] object RandomForest extends Logging { val leftImpurity = leftImpurityCalculator.calculate() // Note: This equals 0 if count = 0 val rightImpurity = rightImpurityCalculator.calculate() -val leftWeight = leftCount / totalCount.toDouble -val rightWeight = rightCount / totalCount.toDouble +val gain = metadata.impurity match { + case imp if (imp.isTestStatistic) => +// For split quality measures based on a test-statistic, run the test on the +// left and right sub-populations to get a p-value for the null hypothesis +val pval = imp.calculate(leftImpurityCalculator, rightImpurityCalculator) +// Transform the test statistic p-val into a larger-is-better gain value +Impurity.pValToGain(pval) + + case _ => +// Default purity-gain logic: +// measure the weighted decrease in impurity from parent to the left and right +val leftWeight = leftCount / totalCount.toDouble +val rightWeight = rightCount / totalCount.toDouble + +impurity - leftWeight * leftImpurity - rightWeight * rightImpurity +} -val gain = impurity - leftWeight * leftImpurity - rightWeight * rightImpurity +// If the impurity being used is a test statistic p-val, apply a standard transform into +// a larger-is-better gain value for the minimum-gain threshold +val minGain = + if (metadata.impurity.isTestStatistic) Impurity.pValToGain(metadata.minInfoGain) + else metadata.minInfoGain --- End diff -- The main issue I recall was that all of the existing metrics assume some kind of "larger is better" gain, and p-values are "smaller is better." I'll take another pass over it and see if I can push that distinction down so it doesn't require exposing new methods. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22433: [SPARK-25442][SQL][K8S] Support STS to run in k8s deploy...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22433 I'm wondering, is there some reason this isn't supported in cluster mode for yarn & mesos? Or put another way, what is the rationale for k8s being added as an exception to this rule? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22381: [SPARK-25394][CORE] Add an application status metrics so...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22381 These new metrics seem useful. Is there a way to provide unit or integration testing for it? Do these have enable/disable via metrics.properties files? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22146: [SPARK-24434][K8S] pod template files
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22146 +1 for @skonto recommendation about CI testing templates populated with larger numbers of fields, and template errors --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22294: [SPARK-25287][INFRA] Add up-front check for JIRA_USERNAM...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22294 cc @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22294: [SPARK-25287][INFRA] Add up-front check for JIRA_...
GitHub user erikerlandson opened a pull request: https://github.com/apache/spark/pull/22294 [SPARK-25287][INFRA] Add up-front check for JIRA_USERNAME and JIRA_PASSWORD ## What changes were proposed in this pull request? Add an up-front check that `JIRA_USERNAME` and `JIRA_PASSWORD` have been set. If they haven't, ask user if they want to continue. This prevents the JIRA state update from failing at the very end of the process because user forgot to set these environment variables. ## How was this patch tested? I ran the script with environment vars set, and unset, to verify it works as specified. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/erikerlandson/spark spark-25287 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22294.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22294 commit 1ed41ddc922cd07f6d6c2384c5aa248699f9ef87 Author: Erik Erlandson Date: 2018-08-30T21:52:29Z [SPARK-25287][INFRA] Add up-front check for JIRA_USERNAME and JIRA_PASSWORD --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22179 For maximum breaking-change forgiveness, the "safe" option would be to punt it to 3.0; which seems likely to be the next release after this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22179 If it doesn't expose any breaking changes to users, that seems reasonable. I assume there would be no integration problems with scala 2.12? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22285: [SPARK-25275][K8S] require memberhip in wheel to run 'su...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22285 looks like alpine doesn't have PAM installed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22285: [SPARK-25275][K8S] require memberhip in wheel to run 'su...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22285 jenkins test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22285: [SPARK-25275][K8S] require memberhip in wheel to ...
GitHub user erikerlandson opened a pull request: https://github.com/apache/spark/pull/22285 [SPARK-25275][K8S] require memberhip in wheel to run 'su' in dockerfiles ## What changes were proposed in this pull request? Add a PAM configuration in k8s dockerfile to require authentication into wheel to run as `su` ## How was this patch tested? Verify against CI that PAM config succeeds & causes no regressions You can merge this pull request into a Git repository by running: $ git pull https://github.com/erikerlandson/spark spark-25275 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22285.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22285 commit f29051edb52c038b18592915187dab2955a87d7f Author: Erik Erlandson Date: 2018-08-30T15:02:04Z [SPARK-25275][K8S] require memberhip in wheel to run 'su' in dockerfiles --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21279: [SPARK-24219][k8s] Improve the docker building sc...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21279#discussion_r213317184 --- Diff: bin/docker-image-tool.sh --- @@ -44,15 +44,37 @@ function image_ref { function build { local BUILD_ARGS local IMG_PATH + local TMPFOLDER if [ ! -f "$SPARK_HOME/RELEASE" ]; then # Set image build arguments accordingly if this is a source repo and not a distribution archive. +local JARS="${SPARK_HOME}/assembly/target/scala-${SPARK_SCALA_VERSION}/jars" +TMPFOLDER=`mktemp -q -d examples.XX` +if [ $? -ne 0 ]; then + ehco "Cannot create temp folder, exiting..." + exit 1 +fi + +mkdir -p "${TMPFOLDER}/jars" --- End diff -- Also, is using `${SPARK_SCALA_VERSION}` preferable to `scala*` ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21279: [SPARK-24219][k8s] Improve the docker building sc...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21279#discussion_r213316408 --- Diff: bin/docker-image-tool.sh --- @@ -44,15 +44,37 @@ function image_ref { function build { local BUILD_ARGS local IMG_PATH + local TMPFOLDER if [ ! -f "$SPARK_HOME/RELEASE" ]; then # Set image build arguments accordingly if this is a source repo and not a distribution archive. +local JARS="${SPARK_HOME}/assembly/target/scala-${SPARK_SCALA_VERSION}/jars" +TMPFOLDER=`mktemp -q -d examples.XX` +if [ $? -ne 0 ]; then + ehco "Cannot create temp folder, exiting..." + exit 1 +fi + +mkdir -p "${TMPFOLDER}/jars" --- End diff -- (nit) `cp -R ${SPARK_HOME}"/examples/target/scala*/jars ${TMPFOLDER}` ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21279: [SPARK-24219][k8s] Improve the docker building sc...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21279#discussion_r213314870 --- Diff: bin/docker-image-tool.sh --- @@ -44,15 +44,37 @@ function image_ref { function build { local BUILD_ARGS local IMG_PATH + local TMPFOLDER if [ ! -f "$SPARK_HOME/RELEASE" ]; then # Set image build arguments accordingly if this is a source repo and not a distribution archive. +local JARS="${SPARK_HOME}/assembly/target/scala-${SPARK_SCALA_VERSION}/jars" +TMPFOLDER=`mktemp -q -d examples.XX` --- End diff -- This will probably work, just wondering if we can use `-p /some/path` where `/some/path` is guaranteed to exist. This could fail in theory if for some reason `/tmp` doesn't exist (or doesn't have space?) or if the user's TMPDIR is set to something non-writable --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22212: [SPARK-25220] Seperate kubernetes node selector config b...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22212 I agree there's an argument for keeping this, but an alternative would be to leave the original for backward compatability, deprecate it, and recommend people make use of custom pod templates (#22146) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22212: [SPARK-25220] Seperate kubernetes node selector c...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/22212#discussion_r213127037 --- Diff: docs/running-on-kubernetes.md --- @@ -663,11 +663,21 @@ specific to Spark on Kubernetes. - spark.kubernetes.node.selector.[labelKey] + spark.kubernetes.driver.selector.[labelKey] --- End diff -- agreed we should keep it, but recommend annotating it as deprecated --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/22146#discussion_r212668742 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -225,6 +225,18 @@ private[spark] object Config extends Logging { "Ensure that major Python version is either Python2 or Python3") .createWithDefault("2") + val KUBERNETES_DRIVER_CONTAINER_NAME = +ConfigBuilder("spark.kubernetes.driver.containerName") --- End diff -- I'm sensing some confusion over the purpose of `spark.kubernetes.driver.containerName` - is it to select the container out of the `containers` array in the template, or is it to *set* that name? Since this PR is creating `containerName` spark conf, I'm assuming it is no longer hard-coded, and we're exposing it to user control. IIUC, the current concept is that the user *must* declare a container name, *and* also must specify it via `spark.kubernetes.driver.containerName`; If not, I'm not sure what the default behavior is. I agree that leaning on convention (spark container is the first one) can cause problems, although the same problems would happen if they specify the wrong container name in the config. Clear documentation would be important regardless. In general, user supplied templates are a "power-user" feature, and some increased opportunities for error are inherent here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/22146#discussion_r212469389 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -225,6 +225,18 @@ private[spark] object Config extends Logging { "Ensure that major Python version is either Python2 or Python3") .createWithDefault("2") + val KUBERNETES_DRIVER_CONTAINER_NAME = +ConfigBuilder("spark.kubernetes.driver.containerName") --- End diff -- I'm having mixed feelings about utility of convention vs configuration on this. Mostly just because the purpose of the PR is to prevent further spark configurations. If the feature leans on convention that first container in the list is to be the spark driver/executor, how confusing is that likely to be? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/22146#discussion_r212460665 --- Diff: docs/running-on-kubernetes.md --- @@ -775,4 +787,183 @@ specific to Spark on Kubernetes. This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3. + + spark.kubernetes.driver.containerName + "spark-kubernetes-driver" + + This sets the driver container name. If you are specifying a driver [pod template](#pod-template), you can match this name to the --- End diff -- Does it mean "this is the name to look for in a list of containers coming in from the template?" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/22146#discussion_r212438850 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -225,6 +225,18 @@ private[spark] object Config extends Logging { "Ensure that major Python version is either Python2 or Python3") .createWithDefault("2") + val KUBERNETES_DRIVER_CONTAINER_NAME = +ConfigBuilder("spark.kubernetes.driver.containerName") --- End diff -- @mccheah it raises interesting question: a user might use this feature to add containers, e.g. side-car proxies, service mesh, etc. In fact, I'd want users to be able to do this. If there *are* multiple containers defined, we may need a convention for identifying which one is driver/executor. Alternatives might include additive-only, like labels, or disallowing, but supporting this seems very desirable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/22146#discussion_r212423082 --- Diff: docs/running-on-kubernetes.md --- @@ -775,4 +787,183 @@ specific to Spark on Kubernetes. This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3. + + spark.kubernetes.driver.containerName + "spark-kubernetes-driver" + + This sets the driver container name. If you are specifying a driver [pod template](#pod-template), you can match this name to the --- End diff -- What does "match this name to the name set in the template" mean? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/22146#discussion_r212424515 --- Diff: docs/running-on-kubernetes.md --- @@ -775,4 +787,183 @@ specific to Spark on Kubernetes. This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3. + + spark.kubernetes.driver.containerName + "spark-kubernetes-driver" + + This sets the driver container name. If you are specifying a driver [pod template](#pod-template), you can match this name to the + driver container name set in the template. + + + + spark.kubernetes.executor.containerName + "spark-kubernetes-executor" + + This sets the executor container name. If you are specifying a an executor [pod template](#pod-template), you can match this name to the + driver container name set in the template. + + + + spark.kubernetes.driver.podTemplateFile + (none) + + Specify the local file that contains the driver [pod template](#pod-template). For example + spark.kubernetes.driver.podTemplateFile=/path/to/driver-pod-template.yaml` + + + + spark.kubernetes.executor.podTemplateFile + (none) + + Specify the local file that contains the executor [pod template](#pod-template). For example + spark.kubernetes.executor.podTemplateFile=/path/to/executor-pod-template.yaml` + + + + + Pod template properties + +See the below table for the full list of pod specifications that will be overwritten by spark. + +### Pod Metadata + + +Pod metadata keyModified valueDescription + + name + Value of spark.kubernetes.driver.pod.name + +The driver pod name will be overwritten with either the configured or default value of +spark.kubernetes.driver.pod.name. The executor pod names will be unaffected. + + + + namespace + Value of spark.kubernetes.namespace + +Spark makes strong assumptions about the driver and executor namespaces. Both driver and executor namespaces will +be replaced by this spark conf value. + + + + labels + Adds the labels from spark.kubernetes.{driver,executor}.label.* + +Spark will add additional labels specified by the spark configuration. + + + + annotations + Adds the annotations from spark.kubernetes.{driver,executor}.annotation.* + +Spark will add additional labels specified by the spark configuration. + + + + +### Pod Spec + + +Pod spec keyModified valueDescription + + imagePullSecrets + Adds image pull secrets from spark.kubernetes.container.image.pullSecrets + +Additional pull secrets will be added from the spark configuration to both executor pods. + + + + nodeSelector + Adds node selectors from spark.kubernetes.node.selector.* + +Additional node selectors will be added from the spark configuration to both executor pods. + + + + restartPolicy + "never" + +Spark assumes that both drivers and executors never restart. + + + + serviceAccount + Value of spark.kubernetes.authenticate.driver.serviceAccountName + +Spark will override serviceAccount with the value of the spark configuration for only --- End diff -- similar Q to namespace: if no spark-conf is set, will spark's conf default override here as well? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/22146#discussion_r212423941 --- Diff: docs/running-on-kubernetes.md --- @@ -775,4 +787,183 @@ specific to Spark on Kubernetes. This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3. + + spark.kubernetes.driver.containerName + "spark-kubernetes-driver" + + This sets the driver container name. If you are specifying a driver [pod template](#pod-template), you can match this name to the + driver container name set in the template. + + + + spark.kubernetes.executor.containerName + "spark-kubernetes-executor" + + This sets the executor container name. If you are specifying a an executor [pod template](#pod-template), you can match this name to the + driver container name set in the template. + + + + spark.kubernetes.driver.podTemplateFile + (none) + + Specify the local file that contains the driver [pod template](#pod-template). For example + spark.kubernetes.driver.podTemplateFile=/path/to/driver-pod-template.yaml` + + + + spark.kubernetes.executor.podTemplateFile + (none) + + Specify the local file that contains the executor [pod template](#pod-template). For example + spark.kubernetes.executor.podTemplateFile=/path/to/executor-pod-template.yaml` + + + + + Pod template properties + +See the below table for the full list of pod specifications that will be overwritten by spark. + +### Pod Metadata + + +Pod metadata keyModified valueDescription + + name + Value of spark.kubernetes.driver.pod.name + +The driver pod name will be overwritten with either the configured or default value of +spark.kubernetes.driver.pod.name. The executor pod names will be unaffected. + + + + namespace + Value of spark.kubernetes.namespace + +Spark makes strong assumptions about the driver and executor namespaces. Both driver and executor namespaces will --- End diff -- if the spark conf value for namespace isn't set, can spark use the template setting, or will spark's conf default also override the template? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/22146#discussion_r212399808 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -225,6 +225,18 @@ private[spark] object Config extends Logging { "Ensure that major Python version is either Python2 or Python3") .createWithDefault("2") + val KUBERNETES_DRIVER_CONTAINER_NAME = +ConfigBuilder("spark.kubernetes.driver.containerName") --- End diff -- If you're saying that manipulating `containers` doesn't give specific control over which name is assigned to which container, then I agree --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/22146#discussion_r212364915 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -225,6 +225,18 @@ private[spark] object Config extends Logging { "Ensure that major Python version is either Python2 or Python3") .createWithDefault("2") + val KUBERNETES_DRIVER_CONTAINER_NAME = +ConfigBuilder("spark.kubernetes.driver.containerName") --- End diff -- Is there any way to do this via the `containers` container array from the pod template? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22146: [WIP][SPARK-24434][K8S] pod template files
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22146 Thanks @onursatici! Can you please resolve merge conflicts (rebase?), so CI can build it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22146: [WIP][SPARK-24434][K8S] pod template files
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/22146 Jenkins, OK to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21555: [SPARK-24547][K8S] Allow for building spark on k8s docke...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/21555 LGMT, I am OK to merge. Most of the automated image build tooling I've seen is custom, but I agree w/ Matt that being able to selectively build is worth supporting, via a followup PR. It might make it easier to write custom logic using this as an abstraction. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21572: [SPARK-24534][K8S] Bypass non spark-on-k8s commands
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/21572 testing environment is synced again and its passing, so I'm going to merge --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21551: [K8S] Fix issue in 'docker-image-tool.sh'
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/21551 @fabriziocucci can you close this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21572: [SPARK-24534][K8S] Bypass non spark-on-k8s commands
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/21572 Jenkins, test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21551: [K8S] Fix issue in 'docker-image-tool.sh'
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/21551 OK, I see it referenced above as commit b8dbfcc. I'd close this, but that option is disabled for me. Probably assuming it happens from the merge-pr script. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21551: [K8S] Fix issue in 'docker-image-tool.sh'
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/21551 @mccheah this isn't showing as merged, you said you ran the merge script? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21572: [SPARK-24534][K8S] Bypass non spark-on-k8s commands
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/21572 please test this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21572: [SPARK-24534][K8S] Bypass non spark-on-k8s commands
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/21572 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21572: [SPARK-24534][K8S] Bypass non spark-on-k8s comman...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21572#discussion_r195866275 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh --- @@ -37,11 +37,17 @@ if [ -z "$uidentry" ] ; then fi SPARK_K8S_CMD="$1" -if [ -z "$SPARK_K8S_CMD" ]; then - echo "No command to execute has been provided." 1>&2 - exit 1 -fi -shift 1 +case "$SPARK_K8S_CMD" in +driver | driver-py | executor) + shift 1 + ;; +"") + ;; +*) + echo "No SPARK_K8S_CMD provided: proceeding in pass-through mode..." --- End diff -- Overall this looks good to me. Looking at this simplified logic, the logging message should probably be more like: "Non-spark-k8s command provided, proceeding in pass-through mode..." --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21572: [SPARK-24534][K8S] Bypass non spark-on-k8s comman...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21572#discussion_r195811228 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh --- @@ -38,10 +38,10 @@ fi SPARK_K8S_CMD="$1" --- End diff -- @tmckayus that seems wrong, since the `CMD` that gets executed after the `case` is now missing the first element, am I missing something? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21572: [SPARK-24534][K8S] Bypass non spark-on-k8s comman...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21572#discussion_r195804610 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh --- @@ -38,10 +38,10 @@ fi SPARK_K8S_CMD="$1" --- End diff -- And it does the right thing w.r.t. `shift` in both cases? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21572: [SPARK-24534][K8S] Bypass non spark-on-k8s comman...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21572#discussion_r195794051 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh --- @@ -38,10 +38,10 @@ fi SPARK_K8S_CMD="$1" --- End diff -- I see two possible pass-through conditions here: one is "empty SPARK_K8S_CMD" and the other is "SPARK_K8S_CMD is non empty but has non-spark command in it" Is that the convention, or is the pass-through case always expected to be "empty SPARK_K8S_CMD" ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21572: [SPARK-24534][K8S] Bypass non spark-on-k8s comman...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21572#discussion_r195793358 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh --- @@ -110,8 +110,7 @@ case "$SPARK_K8S_CMD" in ;; *) -echo "Unknown command: $SPARK_K8S_CMD" 1>&2 -exit 1 +CMD=("$@") --- End diff -- Should log a message here too, about "executing in pass-through mode" since this is the guaranteed code path for pass through --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21572: [SPARK-24534][K8S] Bypass non spark-on-k8s comman...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21572#discussion_r195787809 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh --- @@ -38,10 +38,10 @@ fi SPARK_K8S_CMD="$1" if [ -z "$SPARK_K8S_CMD" ]; then - echo "No command to execute has been provided." 1>&2 - exit 1 + echo "No command to execute has been provided. Ignoring spark-on-k8s workflow..." 1>&2 --- End diff -- I'd propose: "No SPARK_K8S_CMD provided: proceeding in pass-through mode..." --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21572: [SPARK-24534][K8S] Bypass non spark-on-k8s commands
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/21572 @rimolive there are some significant differences between the head of master and your file. I'm wondering if you should get the head of master and re-edit from there, because rebasing didn't seem to sync it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21572: Bypass non spark-on-k8s commands
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/21572 @rimolive I think you may need to rebase this to latest head of master, to pick up the pyspark updates --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21572: Bypass non spark-on-k8s commands
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/21572 Thanks @rimolive ! Can you please prepend `[SPARK-24534][K8S]` to the title of this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r190755750 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/DeterministicExecutorPodsEventQueue.scala --- @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import io.fabric8.kubernetes.api.model.Pod +import scala.collection.mutable + +class DeterministicExecutorPodsEventQueue extends ExecutorPodsEventQueue { + + private val eventBuffer = mutable.Buffer.empty[Pod] + private val subscribers = mutable.Buffer.empty[(Seq[Pod]) => Unit] + + override def addSubscriber + (processBatchIntervalMillis: Long) + (onNextBatch: (Seq[Pod]) => Unit): Unit = { +subscribers += onNextBatch + } + + override def stopProcessingEvents(): Unit = {} + + override def pushPodUpdate(updatedPod: Pod): Unit = eventBuffer += updatedPod --- End diff -- So events are pods themselves, as opposed to some event structure on pods? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r189735073 --- Diff: pom.xml --- @@ -150,6 +150,7 @@ 4.5.4 4.4.8 +3.0.1 --- End diff -- My take is that the performance is probably not worth the additional dependency. I also noticed that the trove dep is LGPL, which is considered incompatible with Apache license. Although I believe this is not a show-stopper with respect to "containerized" dependencies, it probably is for a direct dependency in the code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r189435162 --- Diff: pom.xml --- @@ -150,6 +150,7 @@ 4.5.4 4.4.8 +3.0.1 --- End diff -- it looks like this dep is being taken on for a couple data structures: are these important or could they be replaced with scala Array and HashMap? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/21238 I agree with @mcheah that the potential code reuse is small. Keeping this as a separate pod construction step, decoupled from the user-exposed step, is cleaner. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21241: [SPARK-24135][K8s] Resilience to init-container e...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21241#discussion_r186859389 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala --- @@ -320,50 +322,83 @@ private[spark] class KubernetesClusterSchedulerBackend( override def eventReceived(action: Action, pod: Pod): Unit = { val podName = pod.getMetadata.getName val podIP = pod.getStatus.getPodIP - + val podPhase = pod.getStatus.getPhase action match { -case Action.MODIFIED if (pod.getStatus.getPhase == "Running" +case Action.MODIFIED if (podPhase == "Running" && pod.getMetadata.getDeletionTimestamp == null) => val clusterNodeName = pod.getSpec.getNodeName logInfo(s"Executor pod $podName ready, launched at $clusterNodeName as IP $podIP.") executorPodsByIPs.put(podIP, pod) -case Action.DELETED | Action.ERROR => +case Action.MODIFIED if (podPhase == "Init:Error" || podPhase == "Init:CrashLoopBackoff") + && pod.getMetadata.getDeletionTimestamp == null => val executorId = getExecutorId(pod) - logDebug(s"Executor pod $podName at IP $podIP was at $action.") - if (podIP != null) { -executorPodsByIPs.remove(podIP) + failedInitExecutors.add(executorId) + if (failedInitExecutors.size >= executorMaxInitErrors) { +val errorMessage = s"Aborting Spark application because $executorMaxInitErrors" + + s" executors failed to start. The maximum number of allowed startup failures is" + + s" $executorMaxInitErrors. Please contact your cluster administrator or increase" + + s" your setting of ${KUBERNETES_EXECUTOR_MAX_INIT_ERRORS.key}." +logError(errorMessage) + KubernetesClusterSchedulerBackend.this.scheduler.sc.stopInNewThread() --- End diff -- Leaning on configured minimum executors could be construed as a way to let the user express application-dependent throughput minimums. If the use case is terabyte-scale, then setting minimum executors appropriately would address that. A %-threshold is also reasonable, it just adds a new knob. Requesting new executors indefinitely seems plausible, as long as failed pods don't accumulate. Having a submission hang indefinitely while kube churns may be confusing behavior, at least if users are not familiar with kube and hoping to treat it transparently. Maybe mesos backend policies could provide an analogy? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21241: [SPARK-24135][K8s] Resilience to init-container e...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21241#discussion_r186855467 --- Diff: docs/running-on-kubernetes.md --- @@ -561,6 +561,13 @@ specific to Spark on Kubernetes. This is distinct from spark.executor.cores: it is only used and takes precedence over spark.executor.cores for specifying the executor pod cpu request if set. Task parallelism, e.g., number of tasks an executor can run concurrently is not affected by this. + + spark.kubernetes.executor.maxInitFailures + 10 + +Maximum number of times executors are allowed to fail with an Init:Error state before failing the application. Note that Init:Error failures should not be caused by Spark itself because Spark does not attach init-containers to pods. Init-containers can be attached by the cluster itself. Users should check with their cluster administrator if these kinds of failures to start the executor pod occur frequently. --- End diff -- As long as it's relatively easy to extend, generalizing on a case by case basis should be OK --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21241: [SPARK-24135][K8s] Resilience to init-container e...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21241#discussion_r186846341 --- Diff: docs/running-on-kubernetes.md --- @@ -561,6 +561,13 @@ specific to Spark on Kubernetes. This is distinct from spark.executor.cores: it is only used and takes precedence over spark.executor.cores for specifying the executor pod cpu request if set. Task parallelism, e.g., number of tasks an executor can run concurrently is not affected by this. + + spark.kubernetes.executor.maxInitFailures + 10 + +Maximum number of times executors are allowed to fail with an Init:Error state before failing the application. Note that Init:Error failures should not be caused by Spark itself because Spark does not attach init-containers to pods. Init-containers can be attached by the cluster itself. Users should check with their cluster administrator if these kinds of failures to start the executor pod occur frequently. --- End diff -- I'm in agreement w/ @foxish about designing for wider categories of error, which can be future-proofed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21241: [SPARK-24135][K8s] Resilience to init-container e...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21241#discussion_r186839692 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala --- @@ -320,50 +322,83 @@ private[spark] class KubernetesClusterSchedulerBackend( override def eventReceived(action: Action, pod: Pod): Unit = { val podName = pod.getMetadata.getName val podIP = pod.getStatus.getPodIP - + val podPhase = pod.getStatus.getPhase action match { -case Action.MODIFIED if (pod.getStatus.getPhase == "Running" +case Action.MODIFIED if (podPhase == "Running" && pod.getMetadata.getDeletionTimestamp == null) => val clusterNodeName = pod.getSpec.getNodeName logInfo(s"Executor pod $podName ready, launched at $clusterNodeName as IP $podIP.") executorPodsByIPs.put(podIP, pod) -case Action.DELETED | Action.ERROR => +case Action.MODIFIED if (podPhase == "Init:Error" || podPhase == "Init:CrashLoopBackoff") + && pod.getMetadata.getDeletionTimestamp == null => val executorId = getExecutorId(pod) - logDebug(s"Executor pod $podName at IP $podIP was at $action.") - if (podIP != null) { -executorPodsByIPs.remove(podIP) + failedInitExecutors.add(executorId) + if (failedInitExecutors.size >= executorMaxInitErrors) { +val errorMessage = s"Aborting Spark application because $executorMaxInitErrors" + + s" executors failed to start. The maximum number of allowed startup failures is" + + s" $executorMaxInitErrors. Please contact your cluster administrator or increase" + + s" your setting of ${KUBERNETES_EXECUTOR_MAX_INIT_ERRORS.key}." +logError(errorMessage) + KubernetesClusterSchedulerBackend.this.scheduler.sc.stopInNewThread() --- End diff -- I'm unsure how elaborate we can/should get with this, but a basic test from the back-end side for context seems like "are there any registered executors", which isn't perfect but might be a decent heuristic for whether it's worth trying to continue or just fail the job entirely. If there are no currently registered executors, and we hit a failure threshold, just failing out altogether seems like a sane response. If there are executors running (satisfying configured minimum?), then continuing might be reasonable since that is graceful degradation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/20697 @ssuchter good point, they should. I thought they were set up to be invoked because kube has already been added to modules.py --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/20697 Barring further feedback, resolving a two issues should have this ready to merge: 1. move cloud-backed testing to a new PR 1. remove the repository clone logic --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/21092 @holdenk I think your [comment above](https://github.com/apache/spark/pull/21092#issuecomment-383211329) gets at a use-case "ambiguity" that containerization causes. There are now at least two choices of channel for supplying dependencies: from the command line, or by customized container (and here there are at least two sub-cases: manually created customizations, or via source-to-image tooling). When specifying deps via the command line, particularly in cluster mode, we have backed out of staging local files via init-container; does pulling from URI suffice? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21092#discussion_r183161979 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile --- @@ -0,0 +1,33 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +ARG base_img +FROM $base_img +WORKDIR / +COPY python /opt/spark/python +RUN apk add --no-cache python && \ +python -m ensurepip && \ +rm -r /usr/lib/python*/ensurepip && \ +pip install --upgrade pip setuptools && \ +rm -r /root/.cache +ENV PYTHON_VERSION 2.7.13 --- End diff -- I think a canonical container should include both. My instinct is that a user should be able to "force" the use of one or the other. If someone is invoking spark-submit in cluster-mode, with a supplied python file, some kind of CLI argument (--conf or otherwise) seems like the only totally foolproof way to identify that for the eventual pod construction, but maybe there is a better way? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21092#discussion_r182564020 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala --- @@ -71,7 +77,7 @@ private[spark] class BasicDriverFeatureStep( ("cpu", new QuantityBuilder(false).withAmount(limitCores).build()) } -val driverContainer = new ContainerBuilder(pod.container) +val withoutArgsDriverContainer: ContainerBuilder = new ContainerBuilder(pod.container) --- End diff -- is there a corresponding driver container _with_ args? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21092#discussion_r182563748 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh --- @@ -62,6 +69,14 @@ case "$SPARK_K8S_CMD" in "$@" ) ;; + driver-py) +CMD=( + "$SPARK_HOME/bin/spark-submit" + --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" + --deploy-mode client + "$@" $PYSPARK_PRIMARY $PYSPARK_SECONDARY --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings for PySp...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/21092 Thanks @ifilonenko ! I'm interested in figuring out what it means for the container images to be "python 2/3 generic" - does that imply being able to run either, based on submit parameters? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/20669 @mccheah workflow is to use `dev/merge_spark_pr.py` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r175532438 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh --- @@ -53,14 +53,10 @@ fi case "$SPARK_K8S_CMD" in driver) CMD=( - ${JAVA_HOME}/bin/java - "${SPARK_JAVA_OPTS[@]}" - -cp "$SPARK_CLASSPATH" - -Xms$SPARK_DRIVER_MEMORY - -Xmx$SPARK_DRIVER_MEMORY - -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS - $SPARK_DRIVER_CLASS - $SPARK_DRIVER_ARGS + "$SPARK_HOME/bin/spark-submit" + --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" + --deploy-mode client + "$@" --- End diff -- as @mccheah mentioned, I included some logic on the current entrypoint.sh to allow Spark to work in cases such as an anonymous uid (another way to look at it is managing Spark's long-standing quirk of failing when it can't find a passwd entry). Putting it in entrypoint.sh was a good way to make sure this happened regardless of how the actual CMD evolved. A kind of defensive future-proofing, which is important for reference Dockerfiles. It also provides execution via `tini` as "pid 1", which is considered good standard practice. All of this is done in part with the expectation that _most_ users are liable to just want to customize their images by building the reference dockerfiles and using those as base images for their own, without modifying the CMD or entrypoint.sh That said, I think that in terms of formally documenting a container API, entrypoint.sh may be a red herring. In theory, a user _should_ be able to build their own custom container from the ground up, up to and including a different entrypoint, or default entrypoint, etc. Part of the reason we went with an externalized CMD (instead of creating one in the backend code) was to allow maximum flexibility in how images were constructed. The back-end provides certain information to the pod. The "API" is a catalogue of this information, combined with any behaviors that the user's container _must_ implement. API doc shouldn't assume the existence of entrypoint.sh --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20822: [SPARK-23680] Fix entrypoint.sh to properly support Arbi...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/20822 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20822: [SPARK-23680] Fix entrypoint.sh to properly support Arbi...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/20822 LGTM, pending tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20822: [SPARK-23680] Fix entrypoint.sh to properly suppo...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/20822#discussion_r174527459 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh --- @@ -17,13 +17,15 @@ # # echo commands to the terminal output -set -ex +set -x # Check whether there is a passwd entry for the container UID myuid=$(id -u) mygid=$(id -g) uidentry=$(getent passwd $myuid) +set -e --- End diff -- it is probably best to limit the "no -e" specifically to `getent`, such as: ```bash set -ex myuid=$(id -u) mygid=$(id -g) # turn off -e for getent because it will return error code in anonymous uid case set +e uidentry=$(getent passwd $myuid) set -e ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20822: [SPARK-23680] Fix entrypoint.sh to properly support Arbi...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/20822 jenkins please test this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20822: [SPARK-23680] Fix entrypoint.sh to properly support Arbi...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/20822 @rimolive thanks for this! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19775: [SPARK-22343][core] Add support for publishing Spark met...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/19775 Although this is not kube-specific, kubernetes deployment is a major prometheus use case. Has it been tested in a kube environment? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19775: [SPARK-22343][core] Add support for publishing Spark met...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/19775 I agree w/ @jerryshao that adding new deps to core isn't ideal. (Also that having #11994 would be really nice) New deps on a sub-project seems more palatable, but interested in what other Sparkers think. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20007: [SPARK-22777][Scheduler] Kubernetes mode dockerfile perm...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/20007 If we wanted to lock down permissions a bit more, we might consider `750`, but I'd prefer to make permission changes with more time for testing, maybe for 2.4 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20007: [SPARK-22777][Scheduler] Kubernetes mode dockerfile perm...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/20007 Yes, by default in openshift, containers run as an anonymous uid, and as group id 0. So there are a few things that need to be given access to gid 0. I asked some of our security people and they considered these mods to be acceptable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/19995 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/19995 I'm picking up a few stragglers from `git grep` ``` resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BaseDriverConfigurationStep.scala 33=private[spark] class BaseDriverConfigurationStep( 49: private val driverDockerImage = submissionSparkConf 51:.getOrElse(throw new SparkException("Must specify the driver Docker image")) 113: .withImage(driverDockerImage) ``` ``` core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala 95=class SparkSubmitSuite 399: "--conf", "spark.kubernetes.driver.docker.image=bar", 415:conf.get("spark.kubernetes.driver.docker.image") should be ("bar") ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/19946#discussion_r157274013 --- Diff: docs/running-on-kubernetes.md --- @@ -0,0 +1,502 @@ +--- +layout: global +title: Running Spark on Kubernetes +--- +* This will become a table of contents (this text will be scraped). +{:toc} + +Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). This feature makes use of native +Kubernetes scheduler that has been added to Spark. + +# Prerequisites + +* A runnable distribution of Spark 2.3 or above. +* A running Kubernetes cluster at version >= 1.6 with access configured to it using +[kubectl](https://kubernetes.io/docs/user-guide/prereqs/). If you do not already have a working Kubernetes cluster, +you may setup a test cluster on your local machine using +[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/). + * We recommend using the latest release of minikube with the DNS addon enabled. +* You must have appropriate permissions to list, create, edit and delete +[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You can verify that you can list these resources +by running `kubectl auth can-i <list|create|edit|delete> pods`. + * The service account credentials used by the driver pods must be allowed to create pods, services and configmaps. +* You must have [Kubernetes DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) configured in your cluster. + +# How it works + + + + + +spark-submit can be directly used to submit a Spark application to a Kubernetes cluster. +The submission mechanism works as follows: + +* Spark creates a Spark driver running within a [Kubernetes pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/). +* The driver creates executors which are also running within Kubernetes pods and connects to them, and executes application code. +* When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists +logs and remains in "completed" state in the Kubernetes API until it's eventually garbage collected or manually cleaned up. + +Note that in the completed state, the driver pod does *not* use any computational or memory resources. + +The driver and executor pod scheduling is handled by Kubernetes. It will be possible to affect Kubernetes scheduling +decisions for driver and executor pods using advanced primitives like +[node selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector) +and [node/pod affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity) +in a future release. + +# Submitting Applications to Kubernetes + +## Docker Images + +Kubernetes requires users to supply images that can be deployed into containers within pods. The images are built to +be run in a container runtime environment that Kubernetes supports. Docker is a container runtime environment that is --- End diff -- I'd expect the container run-time to be transparent to the particular image, and very definitely transparent to spark. Using `container.image` makes sense from either standpoint. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/19946#discussion_r157076409 --- Diff: docs/running-on-kubernetes.md --- @@ -0,0 +1,502 @@ +--- +layout: global +title: Running Spark on Kubernetes +--- +* This will become a table of contents (this text will be scraped). +{:toc} + +Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). This feature makes use of native +Kubernetes scheduler that has been added to Spark. + +# Prerequisites + +* A runnable distribution of Spark 2.3 or above. +* A running Kubernetes cluster at version >= 1.6 with access configured to it using +[kubectl](https://kubernetes.io/docs/user-guide/prereqs/). If you do not already have a working Kubernetes cluster, +you may setup a test cluster on your local machine using +[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/). + * We recommend using the latest release of minikube with the DNS addon enabled. +* You must have appropriate permissions to list, create, edit and delete +[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You can verify that you can list these resources +by running `kubectl auth can-i <list|create|edit|delete> pods`. + * The service account credentials used by the driver pods must be allowed to create pods, services and configmaps. +* You must have [Kubernetes DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) configured in your cluster. + +# How it works + + + + + +spark-submit can be directly used to submit a Spark application to a Kubernetes cluster. +The submission mechanism works as follows: + +* Spark creates a Spark driver running within a [Kubernetes pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/). +* The driver creates executors which are also running within Kubernetes pods and connects to them, and executes application code. +* When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists +logs and remains in "completed" state in the Kubernetes API until it's eventually garbage collected or manually cleaned up. + +Note that in the completed state, the driver pod does *not* use any computational or memory resources. + +The driver and executor pod scheduling is handled by Kubernetes. It will be possible to affect Kubernetes scheduling +decisions for driver and executor pods using advanced primitives like +[node selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector) +and [node/pod affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity) +in a future release. + +# Submitting Applications to Kubernetes + +## Docker Images + +Kubernetes requires users to supply images that can be deployed into containers within pods. The images are built to +be run in a container runtime environment that Kubernetes supports. Docker is a container runtime environment that is --- End diff -- the images should be runnable by other container runtimes (cri-o, rkt, etc), although I'm not aware of anybody testing that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/19946#discussion_r157075744 --- Diff: docs/running-on-kubernetes.md --- @@ -0,0 +1,502 @@ +--- +layout: global +title: Running Spark on Kubernetes +--- +* This will become a table of contents (this text will be scraped). +{:toc} + +Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). This feature makes use of native +Kubernetes scheduler that has been added to Spark. + +# Prerequisites + +* A runnable distribution of Spark 2.3 or above. +* A running Kubernetes cluster at version >= 1.6 with access configured to it using +[kubectl](https://kubernetes.io/docs/user-guide/prereqs/). If you do not already have a working Kubernetes cluster, +you may setup a test cluster on your local machine using +[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/). + * We recommend using the latest release of minikube with the DNS addon enabled. +* You must have appropriate permissions to list, create, edit and delete +[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You can verify that you can list these resources +by running `kubectl auth can-i <list|create|edit|delete> pods`. + * The service account credentials used by the driver pods must be allowed to create pods, services and configmaps. +* You must have [Kubernetes DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) configured in your cluster. + +# How it works + + + + + +spark-submit can be directly used to submit a Spark application to a Kubernetes cluster. +The submission mechanism works as follows: + +* Spark creates a Spark driver running within a [Kubernetes pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/). +* The driver creates executors which are also running within Kubernetes pods and connects to them, and executes application code. +* When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists +logs and remains in "completed" state in the Kubernetes API until it's eventually garbage collected or manually cleaned up. + +Note that in the completed state, the driver pod does *not* use any computational or memory resources. + +The driver and executor pod scheduling is handled by Kubernetes. It will be possible to affect Kubernetes scheduling +decisions for driver and executor pods using advanced primitives like +[node selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector) +and [node/pod affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity) +in a future release. + +# Submitting Applications to Kubernetes + +## Docker Images + +Kubernetes requires users to supply images that can be deployed into containers within pods. The images are built to +be run in a container runtime environment that Kubernetes supports. Docker is a container runtime environment that is +frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles provided in the runnable distribution that can be customized +and built for your usage. + +You may build these docker images from sources. +There is a script, `sbin/build-push-docker-images.sh` that you can use to build and push +customized Spark distribution images consisting of all the above components. + +Example usage is: + +./sbin/build-push-docker-images.sh -r -t my-tag build +./sbin/build-push-docker-images.sh -r -t my-tag push + +Docker files are under the `dockerfiles/` and can be customized further before +building using the supplied script, or manually. + +## Cluster Mode + +To launch Spark Pi in cluster mode, + +{% highlight bash %} +$ bin/spark-submit \ +--master k8s://https://: \ +--deploy-mode cluster \ +--name spark-pi \ +--class org.apache.spark.examples.SparkPi \ +--conf spark.executor.instances=5 \ +--conf spark.kubernetes.driver.docker.image= \ +--conf spark.kubernetes.executor.docker.image= \ +local:///path/to/examples.jar +{% endhighlight %} + +The Spark master, specified either via passing the `--master` command line argument to `spark-submit` or by setting +`spark.master` in the application's configuration, must be a URL with the format `k8s://`. Prefixing the +master string with `k8s://` will cause the Spark application to launch on the Kubernetes cluster, with the API server +being contacted at `api_server_url`. If no HTTP protocol is specified in the URL, it defaults to `https`. For example, +setting the master to `k8