thnks, Mich .. let me check this
On Wed, Feb 15, 2023 at 1:42 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > > It may help to check this article of mine > > > Spark on Kubernetes, A Practitioner’s Guide > <https://www.linkedin.com/pulse/spark-kubernetes-practitioners-guide-mich-talebzadeh-ph-d-/?trackingId=FDQORri0TBeJl02p3D%2B2JA%3D%3D> > > > HTH > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Wed, 15 Feb 2023 at 09:12, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> Your submit command >> >> spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode >> cluster --name pyspark-example --conf >> spark.kubernetes.container.image=pyspark-example:0.1 >> --conf spark.kubernetes.file.upload.path=/myexample >> src/StructuredStream-on-gke.py >> >> >> pay attention to what it says >> >> >> --conf spark.kubernetes.file.upload.path >> >> That refers to your Python package on GCS storage not in the docker itself >> >> >> From >> https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management >> >> >> "... The app jar file will be uploaded to the S3 and then when the >> driver is launched it will be downloaded to the driver pod and will be >> added to its classpath. Spark will generate a subdir under the upload path >> with a random name to avoid conflicts with spark apps running in parallel. >> User could manage the subdirs created according to his needs..." >> >> >> In your case it is gs not s3 >> >> >> There is no point putting your python file in the docker image itself! >> >> >> HTH >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> https://en.everybodywiki.com/Mich_Talebzadeh >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Wed, 15 Feb 2023 at 07:46, karan alang <karan.al...@gmail.com> wrote: >> >>> Hi Ye, >>> >>> This is the error i get when i don't set the >>> spark.kubernetes.file.upload.path >>> >>> Any ideas on how to fix this ? >>> >>> ``` >>> >>> Exception in thread "main" org.apache.spark.SparkException: Please >>> specify spark.kubernetes.file.upload.path property. >>> >>> at >>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:299) >>> >>> at >>> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:248) >>> >>> at >>> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) >>> >>> at >>> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) >>> >>> at >>> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) >>> >>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) >>> >>> at scala.collection.TraversableLike.map(TraversableLike.scala:238) >>> >>> at scala.collection.TraversableLike.map$(TraversableLike.scala:231) >>> >>> at scala.collection.AbstractTraversable.map(Traversable.scala:108) >>> >>> at >>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:247) >>> >>> at >>> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:173) >>> >>> at scala.collection.immutable.List.foreach(List.scala:392) >>> >>> at >>> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:164) >>> >>> at >>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60) >>> >>> at >>> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) >>> >>> at >>> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) >>> >>> at scala.collection.immutable.List.foldLeft(List.scala:89) >>> >>> at >>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58) >>> >>> at >>> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106) >>> >>> at >>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213) >>> >>> at >>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207) >>> >>> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622) >>> >>> at >>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207) >>> >>> at >>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179) >>> >>> at org.apache.spark.deploy.SparkSubmit.org >>> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) >>> >>> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) >>> >>> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) >>> >>> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) >>> >>> at >>> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) >>> >>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) >>> >>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>> ``` >>> >>> On Tue, Feb 14, 2023 at 1:33 AM Ye Xianjin <advance...@gmail.com> wrote: >>> >>>> The configuration of ‘…file.upload.path’ is wrong. it means a >>>> distributed fs path to store your archives/resource/jars temporarily, then >>>> distributed by spark to drivers/executors. >>>> For your cases, you don’t need to set this configuration. >>>> Sent from my iPhone >>>> >>>> On Feb 14, 2023, at 5:43 AM, karan alang <karan.al...@gmail.com> wrote: >>>> >>>> >>>> Hello All, >>>> >>>> I'm trying to run a simple application on GKE (Kubernetes), and it is >>>> failing: >>>> Note : I have spark(bitnami spark chart) installed on GKE using helm >>>> install >>>> >>>> Here is what is done : >>>> 1. created a docker image using Dockerfile >>>> >>>> Dockerfile : >>>> ``` >>>> >>>> FROM python:3.7-slim >>>> >>>> RUN apt-get update && \ >>>> apt-get install -y default-jre && \ >>>> apt-get install -y openjdk-11-jre-headless && \ >>>> apt-get clean >>>> >>>> ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64 >>>> >>>> RUN pip install pyspark >>>> RUN mkdir -p /myexample && chmod 755 /myexample >>>> WORKDIR /myexample >>>> >>>> COPY src/StructuredStream-on-gke.py /myexample/StructuredStream-on-gke.py >>>> >>>> CMD ["pyspark"] >>>> >>>> ``` >>>> Simple pyspark application : >>>> ``` >>>> >>>> from pyspark.sql import SparkSession >>>> spark = >>>> SparkSession.builder.appName("StructuredStreaming-on-gke").getOrCreate() >>>> >>>> data = [('k1', 123000), ('k2', 234000), ('k3', 456000)] >>>> df = spark.createDataFrame(data, ('id', 'salary')) >>>> >>>> df.show(5, False) >>>> >>>> ``` >>>> >>>> Spark-submit command : >>>> ``` >>>> >>>> spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode >>>> cluster --name pyspark-example --conf >>>> spark.kubernetes.container.image=pyspark-example:0.1 --conf >>>> spark.kubernetes.file.upload.path=/myexample src/StructuredStream-on-gke.py >>>> ``` >>>> >>>> Error i get : >>>> ``` >>>> >>>> 23/02/13 13:18:27 INFO KubernetesUtils: Uploading file: >>>> /Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/StructuredStream-on-gke.py >>>> to dest: >>>> /myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a/StructuredStream-on-gke.py... >>>> >>>> Exception in thread "main" org.apache.spark.SparkException: Uploading >>>> file >>>> /Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/StructuredStream-on-gke.py >>>> failed... >>>> >>>> at >>>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:296) >>>> >>>> at >>>> org.apache.spark.deploy.k8s.KubernetesUtils$.renameMainAppResource(KubernetesUtils.scala:270) >>>> >>>> at >>>> org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configureForPython(DriverCommandFeatureStep.scala:109) >>>> >>>> at >>>> org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configurePod(DriverCommandFeatureStep.scala:44) >>>> >>>> at >>>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:59) >>>> >>>> at >>>> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) >>>> >>>> at >>>> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) >>>> >>>> at scala.collection.immutable.List.foldLeft(List.scala:89) >>>> >>>> at >>>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58) >>>> >>>> at >>>> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106) >>>> >>>> at >>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213) >>>> >>>> at >>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207) >>>> >>>> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622) >>>> >>>> at >>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207) >>>> >>>> at >>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179) >>>> >>>> at org.apache.spark.deploy.SparkSubmit.org >>>> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) >>>> >>>> at >>>> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) >>>> >>>> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) >>>> >>>> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) >>>> >>>> at >>>> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) >>>> >>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) >>>> >>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>>> >>>> Caused by: org.apache.spark.SparkException: Error uploading file >>>> StructuredStream-on-gke.py >>>> >>>> at >>>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:319) >>>> >>>> at >>>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:292) >>>> >>>> ... 21 more >>>> >>>> Caused by: java.io.IOException: Mkdirs failed to create >>>> /myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a >>>> >>>> at >>>> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:317) >>>> >>>> at >>>> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:305) >>>> >>>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098) >>>> >>>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987) >>>> >>>> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:414) >>>> >>>> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387) >>>> >>>> at >>>> org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2369) >>>> >>>> at >>>> org.apache.hadoop.fs.FilterFileSystem.copyFromLocalFile(FilterFileSystem.java:368) >>>> >>>> at >>>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:316) >>>> >>>> ... 22 more >>>> ``` >>>> >>>> Any ideas on how to fix this & get it to work ? >>>> tia ! >>>> >>>> Pls see the stackoverflow link : >>>> >>>> >>>> https://stackoverflow.com/questions/75441360/running-spark-application-on-gke-failing-on-spark-submit >>>> >>>>