Niels Pardon created SPARK-45557:
------------------------------------

             Summary: Spark Connect can not be started because of missing user 
home dir in Docker container
                 Key: SPARK-45557
                 URL: https://issues.apache.org/jira/browse/SPARK-45557
             Project: Spark
          Issue Type: Bug
          Components: Spark Docker
    Affects Versions: 3.5.0, 3.4.1, 3.4.0
            Reporter: Niels Pardon


I was trying to start Spark Connect within a container using the Spark Docker 
container images and ran into an issue where Ivy could not pull the Spark 
Connect JAR since the user home /home/spark does not exist.

Steps to reproduce:

1. Start the Spark container with `/bin/bash` as the command:
{code:java}
docker run -it --rm apache/spark:3.5.0 /bin/bash {code}
2. Try to start Spark Connect within the container:

 
{code:java}
/opt/spark/sbin/start-connect-server.sh --packages 
org.apache.spark:spark-connect_2.12:3.5.0 {code}
which lead to this output:

 

 
{code:java}
starting org.apache.spark.sql.connect.service.SparkConnectServer, logging to 
/opt/spark/logs/spark--org.apache.spark.sql.connect.service.SparkConnectServer-1-d8470a71dbd7.out
failed to launch: nice -n 0 bash /opt/spark/bin/spark-submit --class 
org.apache.spark.sql.connect.service.SparkConnectServer --name Spark Connect 
server --packages org.apache.spark:spark-connect_2.12:3.5.0
        at 
org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1535)
        at 
org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:185)
        at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:964)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
full log in 
/opt/spark/logs/spark--org.apache.spark.sql.connect.service.SparkConnectServer-1-d8470a71dbd7.out
 {code}
where then the full log file looks like this:
{code:java}
Spark Command: /opt/java/openjdk/bin/java -cp /opt/spark/conf:/opt/spark/jars/* 
-Xmx1g -XX:+IgnoreUnrecognizedVMOptions 
--add-opens=java.base/java.lang=ALL-UNNAMED 
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED 
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED 
--add-opens=java.base/java.io=ALL-UNNAMED 
--add-opens=java.base/java.net=ALL-UNNAMED 
--add-opens=java.base/java.nio=ALL-UNNAMED 
--add-opens=java.base/java.util=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED 
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED 
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED 
--add-opens=java.base/sun.security.action=ALL-UNNAMED 
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED 
--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED 
-Djdk.reflect.useDirectMethodHandle=false org.apache.spark.deploy.SparkSubmit 
--class org.apache.spark.sql.connect.service.SparkConnectServer --name Spark 
Connect server --packages org.apache.spark:spark-connect_2.12:3.5.0 
spark-internal
========================================
:: loading settings :: url = 
jar:file:/opt/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/spark/.ivy2/cache
The jars for the packages stored in: /home/spark/.ivy2/jars
org.apache.spark#spark-connect_2.12 added as a dependency
:: resolving dependencies :: 
org.apache.spark#spark-submit-parent-f8a04936-e8af-4f37-bdb0-e4026a8a3be5;1.0
        confs: [default]
Exception in thread "main" java.io.FileNotFoundException: 
/home/spark/.ivy2/cache/resolved-org.apache.spark-spark-submit-parent-f8a04936-e8af-4f37-bdb0-e4026a8a3be5-1.0.xml
 (No such file or directory)
        at java.base/java.io.FileOutputStream.open0(Native Method)
        at java.base/java.io.FileOutputStream.open(Unknown Source)
        at java.base/java.io.FileOutputStream.<init>(Unknown Source)
        at java.base/java.io.FileOutputStream.<init>(Unknown Source)
        at 
org.apache.ivy.plugins.parser.xml.XmlModuleDescriptorWriter.write(XmlModuleDescriptorWriter.java:71)
        at 
org.apache.ivy.plugins.parser.xml.XmlModuleDescriptorWriter.write(XmlModuleDescriptorWriter.java:63)
        at 
org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.toIvyFile(DefaultModuleDescriptor.java:553)
        at 
org.apache.ivy.core.cache.DefaultResolutionCacheManager.saveResolvedModuleDescriptor(DefaultResolutionCacheManager.java:184)
        at 
org.apache.ivy.core.resolve.ResolveEngine.resolve(ResolveEngine.java:259)
        at org.apache.ivy.Ivy.resolve(Ivy.java:522)
        at 
org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1535)
        at 
org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:185)
        at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:964)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code}
 

The issue is that the user home /home/spark directory does not exist.
{code:java}
$ ls -l /home
total 0 
${code}
It seems there is an easy fix: simply switching from useradd to adduser in the 
Dockerfile should get the user home directory created.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to