Re: Re: should one every make a spark streaming job in pyspark

2022-11-03 Thread Lingzhe Sun
In addition to that:

For now some stateful operations in structured streaming don't have equivalent 
python API, e.g. flatMapGroupsWithState. However spark engineers are making it 
possible in the upcoming version. See more: 
https://www.databricks.com/blog/2022/10/18/python-arbitrary-stateful-processing-structured-streaming.html



Best Regards!
...
Lingzhe Sun 
Hirain Technology / APIC
 
From: Mich Talebzadeh
Date: 2022-11-03 19:15
To: Joris Billen
CC: User
Subject: Re: should one every make a spark streaming job in pyspark
Well your mileage varies so to speak.

Spark itself is written in Scala. However, that does not imply you should stick 
with Scala.
I have used both for spark streaming and spark structured streaming, they both 
work fine
PySpark has become popular with the widespread use of iData Science projects
What matters normally is the skill set you already have in-house. The 
likelihood is that there are more Python developers than Scala developers and 
the learning curve for scala has to be taken into account
The idea of performance etc is tangential.
 With regard to the Spark code itself, there should be little efforts in 
converting from Scala to PySpark or vice-versa
HTH

   view my Linkedin profile

  
Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 
 


On Wed, 2 Nov 2022 at 08:54, Joris Billen  wrote:
Dear community, 
I had a general question about the use of scala VS pyspark for spark streaming.
I believe spark streaming will work most efficiently when written in scala. I 
believe however that things can be implemented in pyspark. My question: 
1)is it completely dumb to make a streaming job in pyspark? 
2)what are the technical reasons that it is done best in scala (is this easy to 
understand why)? 
3)any good links anyone has seen with numbers of the difference in performance 
and under what circumstances+explanation?
4)are there certain scenarios when the use of pyspark can be motivated (maybe 
when someone doesn’t feel confortable writing a job in scala and the number of 
messages/minute aren’t gigantic so performance isnt that crucial?)

Thanks for any input!
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org


Re: should one every make a spark streaming job in pyspark

2022-11-03 Thread Mich Talebzadeh
Well your mileage varies so to speak.


   - Spark itself is written in Scala. However, that does not imply you
   should stick with Scala.
   - I have used both for spark streaming and spark structured streaming,
   they both work fine
   - PySpark has become popular with the widespread use of iData Science
   projects
   - What matters normally is the skill set you already have in-house. The
   likelihood is that there are more Python developers than Scala developers
   and the learning curve for scala has to be taken into account
   - The idea of performance etc is tangential.
   -  With regard to the Spark code itself, there should be little efforts
   in converting from Scala to PySpark or vice-versa

HTH


   view my Linkedin profile





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 2 Nov 2022 at 08:54, Joris Billen 
wrote:

> Dear community,
> I had a general question about the use of scala VS pyspark for spark
> streaming.
> I believe spark streaming will work most efficiently when written in
> scala. I believe however that things can be implemented in pyspark. My
> question:
> 1)is it completely dumb to make a streaming job in pyspark?
> 2)what are the technical reasons that it is done best in scala (is this
> easy to understand why)?
> 3)any good links anyone has seen with numbers of the difference in
> performance and under what circumstances+explanation?
> 4)are there certain scenarios when the use of pyspark can be motivated
> (maybe when someone doesn’t feel confortable writing a job in scala and the
> number of messages/minute aren’t gigantic so performance isnt that crucial?)
>
> Thanks for any input!
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>


should one every make a spark streaming job in pyspark

2022-11-02 Thread Joris Billen
Dear community, 
I had a general question about the use of scala VS pyspark for spark streaming.
I believe spark streaming will work most efficiently when written in scala. I 
believe however that things can be implemented in pyspark. My question: 
1)is it completely dumb to make a streaming job in pyspark? 
2)what are the technical reasons that it is done best in scala (is this easy to 
understand why)? 
3)any good links anyone has seen with numbers of the difference in performance 
and under what circumstances+explanation?
4)are there certain scenarios when the use of pyspark can be motivated (maybe 
when someone doesn’t feel confortable writing a job in scala and the number of 
messages/minute aren’t gigantic so performance isnt that crucial?)

Thanks for any input!
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org