thnks, Mich .. let me check this


On Wed, Feb 15, 2023 at 1:42 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

>
> It may help to check this article of mine
>
>
> Spark on Kubernetes, A Practitioner’s Guide
> <https://www.linkedin.com/pulse/spark-kubernetes-practitioners-guide-mich-talebzadeh-ph-d-/?trackingId=FDQORri0TBeJl02p3D%2B2JA%3D%3D>
>
>
> HTH
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 15 Feb 2023 at 09:12, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Your submit command
>>
>> spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode
>> cluster --name pyspark-example --conf 
>> spark.kubernetes.container.image=pyspark-example:0.1
>> --conf spark.kubernetes.file.upload.path=/myexample
>> src/StructuredStream-on-gke.py
>>
>>
>> pay attention to what it says
>>
>>
>> --conf spark.kubernetes.file.upload.path
>>
>> That refers to your Python package on GCS storage not in the docker itself
>>
>>
>> From
>> https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management
>>
>>
>> "... The app jar file will be uploaded to the S3 and then when the
>> driver is launched it will be downloaded to the driver pod and will be
>> added to its classpath. Spark will generate a subdir under the upload path
>> with a random name to avoid conflicts with spark apps running in parallel.
>> User could manage the subdirs created according to his needs..."
>>
>>
>> In your case it is gs not s3
>>
>>
>> There is no point putting your python file in the docker image itself!
>>
>>
>> HTH
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 15 Feb 2023 at 07:46, karan alang <karan.al...@gmail.com> wrote:
>>
>>> Hi Ye,
>>>
>>> This is the error i get when i don't set the
>>> spark.kubernetes.file.upload.path
>>>
>>> Any ideas on how to fix this ?
>>>
>>> ```
>>>
>>> Exception in thread "main" org.apache.spark.SparkException: Please
>>> specify spark.kubernetes.file.upload.path property.
>>>
>>> at
>>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:299)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:248)
>>>
>>> at
>>> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>>>
>>> at
>>> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>>>
>>> at
>>> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>>>
>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>>>
>>> at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>>>
>>> at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>>>
>>> at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:247)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:173)
>>>
>>> at scala.collection.immutable.List.foreach(List.scala:392)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:164)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
>>>
>>> at
>>> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>>>
>>> at
>>> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>>>
>>> at scala.collection.immutable.List.foldLeft(List.scala:89)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
>>>
>>> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
>>>
>>> at org.apache.spark.deploy.SparkSubmit.org
>>> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>>>
>>> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>>>
>>> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>>>
>>> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>>>
>>> at
>>> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
>>>
>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
>>>
>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>> ```
>>>
>>> On Tue, Feb 14, 2023 at 1:33 AM Ye Xianjin <advance...@gmail.com> wrote:
>>>
>>>> The configuration of ‘…file.upload.path’ is wrong. it means a
>>>> distributed fs path to store your archives/resource/jars temporarily, then
>>>> distributed by spark to drivers/executors.
>>>> For your cases, you don’t need to set this configuration.
>>>> Sent from my iPhone
>>>>
>>>> On Feb 14, 2023, at 5:43 AM, karan alang <karan.al...@gmail.com> wrote:
>>>>
>>>> 
>>>> Hello All,
>>>>
>>>> I'm trying to run a simple application on GKE (Kubernetes), and it is
>>>> failing:
>>>> Note : I have spark(bitnami spark chart) installed on GKE using helm
>>>> install
>>>>
>>>> Here is what is done :
>>>> 1. created a docker image using Dockerfile
>>>>
>>>> Dockerfile :
>>>> ```
>>>>
>>>> FROM python:3.7-slim
>>>>
>>>> RUN apt-get update && \
>>>>     apt-get install -y default-jre && \
>>>>     apt-get install -y openjdk-11-jre-headless && \
>>>>     apt-get clean
>>>>
>>>> ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64
>>>>
>>>> RUN pip install pyspark
>>>> RUN mkdir -p /myexample && chmod 755 /myexample
>>>> WORKDIR /myexample
>>>>
>>>> COPY src/StructuredStream-on-gke.py /myexample/StructuredStream-on-gke.py
>>>>
>>>> CMD ["pyspark"]
>>>>
>>>> ```
>>>> Simple pyspark application :
>>>> ```
>>>>
>>>> from pyspark.sql import SparkSession
>>>> spark = 
>>>> SparkSession.builder.appName("StructuredStreaming-on-gke").getOrCreate()
>>>>
>>>> data = [('k1', 123000), ('k2', 234000), ('k3', 456000)]
>>>> df = spark.createDataFrame(data, ('id', 'salary'))
>>>>
>>>> df.show(5, False)
>>>>
>>>> ```
>>>>
>>>> Spark-submit command :
>>>> ```
>>>>
>>>> spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode
>>>> cluster --name pyspark-example --conf
>>>> spark.kubernetes.container.image=pyspark-example:0.1 --conf
>>>> spark.kubernetes.file.upload.path=/myexample src/StructuredStream-on-gke.py
>>>> ```
>>>>
>>>> Error i get :
>>>> ```
>>>>
>>>> 23/02/13 13:18:27 INFO KubernetesUtils: Uploading file:
>>>> /Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/StructuredStream-on-gke.py
>>>> to dest:
>>>> /myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a/StructuredStream-on-gke.py...
>>>>
>>>> Exception in thread "main" org.apache.spark.SparkException: Uploading
>>>> file
>>>> /Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/StructuredStream-on-gke.py
>>>> failed...
>>>>
>>>> at
>>>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:296)
>>>>
>>>> at
>>>> org.apache.spark.deploy.k8s.KubernetesUtils$.renameMainAppResource(KubernetesUtils.scala:270)
>>>>
>>>> at
>>>> org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configureForPython(DriverCommandFeatureStep.scala:109)
>>>>
>>>> at
>>>> org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configurePod(DriverCommandFeatureStep.scala:44)
>>>>
>>>> at
>>>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:59)
>>>>
>>>> at
>>>> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>>>>
>>>> at
>>>> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>>>>
>>>> at scala.collection.immutable.List.foldLeft(List.scala:89)
>>>>
>>>> at
>>>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>>>>
>>>> at
>>>> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106)
>>>>
>>>> at
>>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
>>>>
>>>> at
>>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
>>>>
>>>> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622)
>>>>
>>>> at
>>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
>>>>
>>>> at
>>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
>>>>
>>>> at org.apache.spark.deploy.SparkSubmit.org
>>>> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>>>>
>>>> at
>>>> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>>>>
>>>> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>>>>
>>>> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>>>>
>>>> at
>>>> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
>>>>
>>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
>>>>
>>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>>
>>>> Caused by: org.apache.spark.SparkException: Error uploading file
>>>> StructuredStream-on-gke.py
>>>>
>>>> at
>>>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:319)
>>>>
>>>> at
>>>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:292)
>>>>
>>>> ... 21 more
>>>>
>>>> Caused by: java.io.IOException: Mkdirs failed to create
>>>> /myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a
>>>>
>>>> at
>>>> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:317)
>>>>
>>>> at
>>>> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:305)
>>>>
>>>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
>>>>
>>>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987)
>>>>
>>>> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:414)
>>>>
>>>> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387)
>>>>
>>>> at
>>>> org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2369)
>>>>
>>>> at
>>>> org.apache.hadoop.fs.FilterFileSystem.copyFromLocalFile(FilterFileSystem.java:368)
>>>>
>>>> at
>>>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:316)
>>>>
>>>> ... 22 more
>>>> ```
>>>>
>>>> Any ideas on how to fix this & get it to work ?
>>>> tia !
>>>>
>>>> Pls see the stackoverflow link :
>>>>
>>>>
>>>> https://stackoverflow.com/questions/75441360/running-spark-application-on-gke-failing-on-spark-submit
>>>>
>>>>

Reply via email to