Building Spark + hadoop docker for openshift

Antoine DUBOIS Mon, 30 Mar 2020 05:03:58 -0700

Hello, 
I'm trying to build a spark+hadoop docker image compatible with Openshift. 
I've used oshinko Spark build script here 
https://github.com/radanalyticsio/openshift-spark 
to build something with Hadoop jar in classpath to allow usage of S3 storage. 
However I'm now stuk on the spark entrypoint.sh script. 
For reasons unknown, this script kubernetes/dockerfiles/spark/entrypoint.sh 
contains a reference to SPARK_JAVA_OPS which seems deprecated since 2.2 
https://issues.apache.org/jira/browse/SPARK-24577 
I'm using spark 2.4.5 and try to integrate hadoop 2.9.2, so far the image build 
but I fail all the time at submit with an error in entrypoint script.


Is any of you manage to use spark-submit on K8S and how and is this 
entrypoint.sh file is relevant ? 
here's my spark-submit option: 
./spark-submit \ 
--master k8s://https://wok.in2p3.fr \ 
--deploy-mode cluster \ 
--conf "spark.kubernetes.driverEnv.SPARK_DRIVER_CLASS=SparkPi" \ 
--conf "spark.kubernetes.driverEnv.SPARK_DRIVER_MEMORY=1024m" \ 
--conf "spark.kubernetes.driverEnv.SPARK_EXECUTOR_CORES=2" \ 
--conf "spark.kubernetes.driverEnv.SPARK_EXECUTOR_MEMORY=2048g" \ 
--name "test_$(date +'%m-%d-%y_%H:%m')" \ 
--conf "spark.kubernetes.container.image=private.repo/spark-docker:latest" \ 
--conf "spark.kubernetes.container.image.pullPolicy=Always" \ 
--conf "spark.kubernetes.container.image.pullSecrets=mysecret" \ 
--conf "spark.kubernetes.namespace=spark2" \ 
--conf "spark.executor.instances=4" \ 
--class SparkPi "local:///opt/jar/sparkpi_2.10-1.0.jar" 1000000000 

of course /opt/jar/sparkpi_2.10-1.0.jar is part of my docker build. 

Thank you in advance. 


Antoine DUBOIS 
CCIN2P3

smime.p7s
Description: S/MIME Cryptographic Signature

Building Spark + hadoop docker for openshift

Reply via email to