[spark] branch master updated: [SPARK-37319][K8S][FOLLOWUP] Set JAVA_HOME for Java 17 installed by apt-get
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a3886ba [SPARK-37319][K8S][FOLLOWUP] Set JAVA_HOME for Java 17 installed by apt-get a3886ba is described below commit a3886ba976469bef0dfafc3da8686a53c5a59d95 Author: Kousuke Saruta AuthorDate: Sun Nov 28 21:44:42 2021 -0800 [SPARK-37319][K8S][FOLLOWUP] Set JAVA_HOME for Java 17 installed by apt-get ### What changes were proposed in this pull request? This PR adds a configuration to `Dockerfile.java17` to set the environment variable `JAVA_HOME` for Java 17 installed by apt-get. ### Why are the changes needed? In `entrypoint.sh`, `${JAVA_HOME}/bin/java` is used but the container build from `Dockerfile.java17` is not set the environment variable. As a result, executors can't launch. ``` + CMD=(${JAVA_HOME}/bin/java "${SPARK_EXECUTOR_JAVA_OPTS[]}" -Xms$SPARK_EXECUTOR_MEMORY -Xmx$SPARK_EXECUTOR_MEMORY -cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH" org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBackend --driver-url $SPARK_DRIVER_URL --executor-id $SPARK_EXECUTOR_ID --cores $SPARK_EXECUTOR_CORES --app-id $SPARK_APPLICATION_ID --hostname $SPARK_EXECUTOR_POD_IP --resourceProfileId $SPARK_RESOURCE_PROFILE_ID --podName $SPARK_EXECUTOR_POD_NAME) + exec /usr/bin/tini -s -- /bin/java -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.at [...] [FATAL tini (15)] exec /bin/java failed: No such file or directory ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Confirmed that the following simple job can run successfully with a container image build from the modified `Dockerfile.java17`. ``` $ bin/spark-shell --master k8s://https://: --conf spark.kubernetes.container.image=spark: scala> spark.range(10).show +---+ | id| +---+ | 0| | 1| | 2| | 3| | 4| | 5| | 6| | 7| | 8| | 9| +---+ ``` Closes #34722 from sarutak/java17-home-kube. Authored-by: Kousuke Saruta Signed-off-by: Dongjoon Hyun --- .../kubernetes/docker/src/main/dockerfiles/spark/Dockerfile.java17 | 1 + 1 file changed, 1 insertion(+) diff --git a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile.java17 b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile.java17 index f9ab64e..96dd6c9 100644 --- a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile.java17 +++ b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile.java17 @@ -51,6 +51,7 @@ COPY kubernetes/tests /opt/spark/tests COPY data /opt/spark/data ENV SPARK_HOME /opt/spark +ENV JAVA_HOME /usr/lib/jvm/java-17-openjdk-amd64/ WORKDIR /opt/spark/work-dir RUN chmod g+w /opt/spark/work-dir - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (db9a982 -> e91ef19)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from db9a982 [SPARK-37461][YARN] YARN-CLIENT mode client.appId is always null add e91ef19 [SPARK-37443][PYTHON] Provide a profiler for Python/Pandas UDFs No new revisions were added by this update. Summary of changes: dev/sparktestsupport/modules.py| 1 + python/docs/source/development/debugging.rst | 56 - python/pyspark/context.py | 10 ++- python/pyspark/context.pyi | 1 + python/pyspark/profiler.py | 45 +-- python/pyspark/profiler.pyi| 17 +++- .../tests/test_udf_profiler.py}| 91 +++--- python/pyspark/sql/udf.py | 32 ++-- .../sql/catalyst/expressions/Expression.scala | 11 +++ .../spark/sql/catalyst/expressions/PythonUDF.scala | 20 - .../catalyst/expressions/namedExpressions.scala| 10 --- .../apache/spark/sql/catalyst/util/package.scala | 1 + .../apache/spark/sql/IntegratedUDFTestUtils.scala | 35 - 13 files changed, 251 insertions(+), 79 deletions(-) copy python/pyspark/{tests/test_profiler.py => sql/tests/test_udf_profiler.py} (52%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-37461][YARN] YARN-CLIENT mode client.appId is always null
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new db9a982 [SPARK-37461][YARN] YARN-CLIENT mode client.appId is always null db9a982 is described below commit db9a982a1441810314be07e2c3b7cc77d1f1 Author: Angerszh AuthorDate: Sun Nov 28 08:53:25 2021 -0600 [SPARK-37461][YARN] YARN-CLIENT mode client.appId is always null ### What changes were proposed in this pull request? In yarn-client mode, `Client.appId` variable is not assigned, it is always `null`, in cluster mode, this variable will be assigned to the true value. In this patch, we assign true application id to `appId` too ### Why are the changes needed? 1. Refactor the code to avoid define different id in each function, we can just use this variable. 2. In client mode, user can use this value to get the application id. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manuel tested. We have a internal proxy server to replace yarn tracking url, here use `appId`, with this patch it's not null. ``` 21/11/26 12:38:44 INFO Client: client token: N/A diagnostics: AM container is launched, waiting for AM container to Register with RM ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: user_queue start time: 1637901520956 final status: UNDEFINED tracking URL: http://internal-proxy-server/proxy?applicationId=application_1635856758535_4209064 user: user_name ``` Closes #34710 from AngersZh/SPARK-37461. Authored-by: Angerszh Signed-off-by: Sean Owen --- .../main/scala/org/apache/spark/deploy/yarn/Client.scala| 13 + 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala index 7787e2f..e6136fc 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala @@ -169,7 +169,6 @@ private[spark] class Client( def submitApplication(): ApplicationId = { ResourceRequestHelper.validateResources(sparkConf) -var appId: ApplicationId = null try { launcherBackend.connect() yarnClient.init(hadoopConf) @@ -181,7 +180,7 @@ private[spark] class Client( // Get a new application from our RM val newApp = yarnClient.createApplication() val newAppResponse = newApp.getNewApplicationResponse() - appId = newAppResponse.getApplicationId() + this.appId = newAppResponse.getApplicationId() // The app staging dir based on the STAGING_DIR configuration if configured // otherwise based on the users home directory. @@ -207,8 +206,7 @@ private[spark] class Client( yarnClient.submitApplication(appContext) launcherBackend.setAppId(appId.toString) reportLauncherState(SparkAppHandle.State.SUBMITTED) - - appId + this.appId } catch { case e: Throwable => if (stagingDirPath != null) { @@ -915,7 +913,6 @@ private[spark] class Client( private def createContainerLaunchContext(newAppResponse: GetNewApplicationResponse) : ContainerLaunchContext = { logInfo("Setting up container launch context for our AM") -val appId = newAppResponse.getApplicationId val pySparkArchives = if (sparkConf.get(IS_PYTHON_APP)) { findPySparkArchives() @@ -971,7 +968,7 @@ private[spark] class Client( if (isClusterMode) { sparkConf.get(DRIVER_JAVA_OPTIONS).foreach { opts => javaOpts ++= Utils.splitCommandString(opts) - .map(Utils.substituteAppId(_, appId.toString)) + .map(Utils.substituteAppId(_, this.appId.toString)) .map(YarnSparkHadoopUtil.escapeForShell) } val libraryPaths = Seq(sparkConf.get(DRIVER_LIBRARY_PATH), @@ -996,7 +993,7 @@ private[spark] class Client( throw new SparkException(msg) } javaOpts ++= Utils.splitCommandString(opts) - .map(Utils.substituteAppId(_, appId.toString)) + .map(Utils.substituteAppId(_, this.appId.toString)) .map(YarnSparkHadoopUtil.escapeForShell) } sparkConf.get(AM_LIBRARY_PATH).foreach { paths => @@ -1269,7 +1266,7 @@ private[spark] class Client( * throw an appropriate SparkException. */ def run(): Unit = { -this.appId = submitApplication() +submitApplication() if (!launcherBackend.isConnected() && fireAndForget) { val report = getApplicationReport(appId) val state =