Re:Upgrading from Spark SQL 3.2 to 3.3 faild

2023-02-15 Thread lk_spark
I need to use cast function to surround computed expression then explain the 
SQL is ok, for example :

cast(a.Split_Amt * b.percent / 100 asdecimal(20,8)) as split_amt



I don't know why , is there a config property could compatibility with spark3.2 
? 










At 2023-02-16 13:47:25, "lk_spark"  wrote:

hi,all :
I have a sql statement wich can be run on spark 3.2.1 but not on spark 3.3.1 . 
when I try to explain it, will got error with message: 
org.apache.spark.sql.catalyst.expressions.Literal cannot be cast to 
org.apache.spark.sql.catalyst.expressions.AnsiCast


execute the sql, error stack is : 


ERROR ExecuteStatement: Error operating ExecuteStatement: 
java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.Literal 
cannot be cast to org.apache.spark.sql.catalyst.expressions.AnsiCast
at 
org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2362)
at 
org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2360)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.$anonfun$mapChildren$5(TreeNode.scala:1233)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1232)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:498)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:635)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:635)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:498)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:635)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:188)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:200)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:200)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:211)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:216)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:216)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:221)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:427)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:221)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:188)
at 
org.apache.spark.sql.catalyst.optimizer.NullPropagation$$anonfun$apply$19.applyOrElse(expressions.scala:804)
at 
org.apache.spark.sql.catalyst.optimizer.NullPropagation$$anonfun$apply$19.applyOrElse(expressions.scala:801)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
at 

Upgrading from Spark SQL 3.2 to 3.3 faild

2023-02-15 Thread lk_spark
hi,all :
I have a sql statement wich can be run on spark 3.2.1 but not on spark 3.3.1 . 
when I try to explain it, will got error with message: 
org.apache.spark.sql.catalyst.expressions.Literal cannot be cast to 
org.apache.spark.sql.catalyst.expressions.AnsiCast


execute the sql, error stack is : 


ERROR ExecuteStatement: Error operating ExecuteStatement: 
java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.Literal 
cannot be cast to org.apache.spark.sql.catalyst.expressions.AnsiCast
at 
org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2362)
at 
org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2360)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.$anonfun$mapChildren$5(TreeNode.scala:1233)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1232)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:498)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:635)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:635)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:498)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:635)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:188)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:200)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:200)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:211)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:216)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:216)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:221)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:427)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:221)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:188)
at 
org.apache.spark.sql.catalyst.optimizer.NullPropagation$$anonfun$apply$19.applyOrElse(expressions.scala:804)
at 
org.apache.spark.sql.catalyst.optimizer.NullPropagation$$anonfun$apply$19.applyOrElse(expressions.scala:801)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at 

Re: [Spark Structured Streaming] Could we apply new options of readStream/writeStream without stopping spark application (zero downtime)?

2023-02-15 Thread Jack Goodson
Hi,

There is some good documentation under here

https://docs.databricks.com/structured-streaming/query-recovery.html


Under the “recovery after change in structured streaming query” heading
that gives good general guidelines on what can be changed in a “pause” of a
stream

On Thu, 16 Feb 2023 at 5:12 PM, hueiyuan su  wrote:

> *Component*: Spark Structured Streaming
> *Level*: Advanced
> *Scenario*: How-to
>
> -
> *Problems Description*
> I would like to confirm could we directly apply new options of
> readStream/writeStream without stopping current running spark structured
> streaming applications? For example, if we just want to adjust throughput
> properties of readStream with kafka. Do we have method can just adjust it
> without stopping application? If you have any ideas, please let me know. I
> will be appreciate it and your answer.
>
>
> --
> Best Regards,
>
> Mars Su
> *Phone*: 0988-661-013
> *Email*: hueiyua...@gmail.com
>


[Spark Structured Streaming] Could we apply new options of readStream/writeStream without stopping spark application (zero downtime)?

2023-02-15 Thread hueiyuan su
*Component*: Spark Structured Streaming
*Level*: Advanced
*Scenario*: How-to

-
*Problems Description*
I would like to confirm could we directly apply new options of
readStream/writeStream without stopping current running spark structured
streaming applications? For example, if we just want to adjust throughput
properties of readStream with kafka. Do we have method can just adjust it
without stopping application? If you have any ideas, please let me know. I
will be appreciate it and your answer.


-- 
Best Regards,

Mars Su
*Phone*: 0988-661-013
*Email*: hueiyua...@gmail.com


Re: ADLS Gen2 adfs sample yaml configuration

2023-02-15 Thread Jayabindu Singh
here you go. Please update the values for your specific bucket to be used.

spark-defaults.com - to make sure event logs go to ADLS

spark.eventLog.enabled   true

spark.eventLog.dir - abfss://
containen...@storageaccount.dfs.core.windows.net/tenant/spark/eventlogs

 spark.history.fs.logDirectory abfss://
containen...@storageaccount.dfs.core.windows.net/tenant/spark/eventlogs



core-site.xml



 fs.abfss.impl

 org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem






  fs.adl.oauth2.access.token.provider.type

  ClientCredential






  fs.adl.oauth2.refresh.url

  https://login.microsoftonline.com//oauth2/v2.0/authorize







  fs.adl.oauth2.client.id

  client_id






  fs.adl.oauth2.credential

  client secret







  fs.azure.account.key.incortapocstorage.dfs.core.windows.net

  access key






You need to have the ADLS jars in the spark class path. I just copy these
from hadoop/share/tools/lib to spark/jars.


Regards

Jay






On Tue, Feb 14, 2023 at 9:50 PM Kondala Ponnaboina (US)
 wrote:

> Hello,
>
> I need help/sample adfs(*Active Directory Federation Services) (ADLS
> GEN2) *on how to configure ADLS GEN2(adfs) configurations in yaml file
> with spark history server ??
>
> I would like to see running jobs from JupiterLab notebook with
> SparkOnK8sV3.0.2 Kernel shell.
>
>
>
> Any help is much appreciated ..
>
>
>
> Thanks,
>
> Kondal
>
>
>
>
>
>
>
> --
>
> The information transmitted, including any attachments, is intended only
> for the person or entity to which it is addressed and may contain
> confidential and/or privileged material. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended recipient
> is prohibited, and all liability arising therefrom is disclaimed. If you
> received this in error, please contact the sender and delete the material
> from any computer.
>
> In the event the content of this email includes Tax advice, the content of
> this email is limited to the matters specifically addressed herein and is
> not intended to address other potential tax consequences or the potential
> application of tax penalties to this or any other matter.
>
> PricewaterhouseCoopers LLP is a Delaware limited liability partnership.
> This communication may come from PricewaterhouseCoopers LLP or one of its
> subsidiaries.
>
> --
>


Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-15 Thread karan alang
thnks, Mich .. let me check this



On Wed, Feb 15, 2023 at 1:42 AM Mich Talebzadeh 
wrote:

>
> It may help to check this article of mine
>
>
> Spark on Kubernetes, A Practitioner’s Guide
> 
>
>
> HTH
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 15 Feb 2023 at 09:12, Mich Talebzadeh 
> wrote:
>
>> Your submit command
>>
>> spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode
>> cluster --name pyspark-example --conf 
>> spark.kubernetes.container.image=pyspark-example:0.1
>> --conf spark.kubernetes.file.upload.path=/myexample
>> src/StructuredStream-on-gke.py
>>
>>
>> pay attention to what it says
>>
>>
>> --conf spark.kubernetes.file.upload.path
>>
>> That refers to your Python package on GCS storage not in the docker itself
>>
>>
>> From
>> https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management
>>
>>
>> "... The app jar file will be uploaded to the S3 and then when the
>> driver is launched it will be downloaded to the driver pod and will be
>> added to its classpath. Spark will generate a subdir under the upload path
>> with a random name to avoid conflicts with spark apps running in parallel.
>> User could manage the subdirs created according to his needs..."
>>
>>
>> In your case it is gs not s3
>>
>>
>> There is no point putting your python file in the docker image itself!
>>
>>
>> HTH
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 15 Feb 2023 at 07:46, karan alang  wrote:
>>
>>> Hi Ye,
>>>
>>> This is the error i get when i don't set the
>>> spark.kubernetes.file.upload.path
>>>
>>> Any ideas on how to fix this ?
>>>
>>> ```
>>>
>>> Exception in thread "main" org.apache.spark.SparkException: Please
>>> specify spark.kubernetes.file.upload.path property.
>>>
>>> at
>>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:299)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:248)
>>>
>>> at
>>> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>>>
>>> at
>>> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>>>
>>> at
>>> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>>>
>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>>>
>>> at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>>>
>>> at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>>>
>>> at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:247)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:173)
>>>
>>> at scala.collection.immutable.List.foreach(List.scala:392)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:164)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
>>>
>>> at
>>> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>>>
>>> at
>>> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>>>
>>> at scala.collection.immutable.List.foldLeft(List.scala:89)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
>>>
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
>>>
>>> at 

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-15 Thread Mich Talebzadeh
It may help to check this article of mine


Spark on Kubernetes, A Practitioner’s Guide



HTH


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 15 Feb 2023 at 09:12, Mich Talebzadeh 
wrote:

> Your submit command
>
> spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode
> cluster --name pyspark-example --conf 
> spark.kubernetes.container.image=pyspark-example:0.1
> --conf spark.kubernetes.file.upload.path=/myexample
> src/StructuredStream-on-gke.py
>
>
> pay attention to what it says
>
>
> --conf spark.kubernetes.file.upload.path
>
> That refers to your Python package on GCS storage not in the docker itself
>
>
> From
> https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management
>
>
> "... The app jar file will be uploaded to the S3 and then when the driver
> is launched it will be downloaded to the driver pod and will be added to
> its classpath. Spark will generate a subdir under the upload path with a
> random name to avoid conflicts with spark apps running in parallel. User
> could manage the subdirs created according to his needs..."
>
>
> In your case it is gs not s3
>
>
> There is no point putting your python file in the docker image itself!
>
>
> HTH
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 15 Feb 2023 at 07:46, karan alang  wrote:
>
>> Hi Ye,
>>
>> This is the error i get when i don't set the
>> spark.kubernetes.file.upload.path
>>
>> Any ideas on how to fix this ?
>>
>> ```
>>
>> Exception in thread "main" org.apache.spark.SparkException: Please
>> specify spark.kubernetes.file.upload.path property.
>>
>> at
>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:299)
>>
>> at
>> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:248)
>>
>> at
>> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>>
>> at
>> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>>
>> at
>> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>>
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>>
>> at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>>
>> at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>>
>> at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>>
>> at
>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:247)
>>
>> at
>> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:173)
>>
>> at scala.collection.immutable.List.foreach(List.scala:392)
>>
>> at
>> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:164)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
>>
>> at
>> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>>
>> at
>> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>>
>> at scala.collection.immutable.List.foldLeft(List.scala:89)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
>>
>> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
>>
>> at
>> 

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-15 Thread Mich Talebzadeh
Your submit command

spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode cluster
--name pyspark-example --conf
spark.kubernetes.container.image=pyspark-example:0.1
--conf spark.kubernetes.file.upload.path=/myexample
src/StructuredStream-on-gke.py


pay attention to what it says


--conf spark.kubernetes.file.upload.path

That refers to your Python package on GCS storage not in the docker itself


From
https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management


"... The app jar file will be uploaded to the S3 and then when the driver
is launched it will be downloaded to the driver pod and will be added to
its classpath. Spark will generate a subdir under the upload path with a
random name to avoid conflicts with spark apps running in parallel. User
could manage the subdirs created according to his needs..."


In your case it is gs not s3


There is no point putting your python file in the docker image itself!


HTH


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 15 Feb 2023 at 07:46, karan alang  wrote:

> Hi Ye,
>
> This is the error i get when i don't set the
> spark.kubernetes.file.upload.path
>
> Any ideas on how to fix this ?
>
> ```
>
> Exception in thread "main" org.apache.spark.SparkException: Please specify
> spark.kubernetes.file.upload.path property.
>
> at
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:299)
>
> at
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:248)
>
> at
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>
> at
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>
> at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>
> at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>
> at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>
> at
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:247)
>
> at
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:173)
>
> at scala.collection.immutable.List.foreach(List.scala:392)
>
> at
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:164)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
>
> at
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>
> at
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>
> at scala.collection.immutable.List.foldLeft(List.scala:89)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>
> at
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
>
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
>
> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>
> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
>
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
>
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> ```
>
> On Tue, Feb 14, 2023 at 1:33 AM Ye Xianjin  wrote:
>
>> The configuration of ‘…file.upload.path’ is wrong. it means a distributed
>> fs path to store your archives/resource/jars temporarily, then distributed
>> by spark to drivers/executors.
>> For your cases, you don’t need to set this configuration.
>> Sent from my