Re: Adding pause() method to pyspark.sql.streaming.StreamingQuery

2023-03-14 Thread Mich Talebzadeh
Hi Martin,

I see the major benefit of the spark stop() method in giving the ability to
shut down the main topic gracefully. I have explained this in this SPIP
SPIP: Shutting down spark structured streaming when the streaming process
completed current process


With regard to pause() I saw a request from a member


Spark Structured Streaming] Could we apply new options of
readStream/writeStream without stopping spark application (zero downtime)?



I think it would be good to have this paus() added so we can adjust spark
streaming parameters without shutting down the streaming process.,
effectively with zero streaming downtime. This "change" is a challenge
because the parameters can only change at the start-up until now.


HTH


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 14 Mar 2023 at 12:33, Martin Andersson 
wrote:

> Hi Mich.
>
> I'm trying to understand, can you please provide some use-cases where it
> would be beneficial with a pause and how a pause would differ functionally
> from a stop?
>
> Best regards, Martin
> --
> *From:* Mich Talebzadeh 
> *Sent:* Thursday, March 9, 2023 17:12
> *To:* Spark dev list 
> *Subject:* Adding pause() method to pyspark.sql.streaming.StreamingQuery
>
>
> EXTERNAL SENDER. Do not click links or open attachments unless you
> recognize the sender and know the content is safe. DO NOT provide your
> username or password.
>
>
> Hi,
>
>
> Currently for Spark Streaming we have the following class:
>
>
> pyspark.sql.streaming.StreamingQuery
> 
>
>
> There are a number of useful methods, for example stop() which stops the
> streaming process gracefully.
>
>
> Can we add another method pause() so w can pause the processing. This will
> come handy in a number of occasions?
>
>
>
> Thanks
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>


Re: Topics for Spark online classes & webinars

2023-03-14 Thread Mich Talebzadeh
Hi Denny,

That Apache Spark Linkedin page
https://www.linkedin.com/company/apachespark/ looks fine. It also allows a
wider audience to benefit from it.

+1 for me



   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 14 Mar 2023 at 14:23, Denny Lee  wrote:

> In the past, we've been using the Apache Spark LinkedIn page
>  and group to broadcast
> these type of events - if you're cool with this?  Or we could go through
> the process of submitting and updating the current
> https://spark.apache.org or request to leverage the original Spark
> confluence page .
>  WDYT?
>
> On Mon, Mar 13, 2023 at 9:34 AM Mich Talebzadeh 
> wrote:
>
>> Well that needs to be created first for this purpose. The appropriate
>> name etc. to be decided. Maybe @Denny Lee   can
>> facilitate this as he offered his help.
>>
>>
>> cheers
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Mon, 13 Mar 2023 at 16:29, asma zgolli  wrote:
>>
>>> Hello Mich,
>>>
>>> Can you please provide the link for the confluence page?
>>>
>>> Many thanks
>>> Asma
>>> Ph.D. in Big Data - Applied Machine Learning
>>>
>>> Le lun. 13 mars 2023 à 17:21, Mich Talebzadeh 
>>> a écrit :
>>>
 Apologies I missed the list.

 To move forward I selected these topics from the thread "Online classes
 for spark topics".

 To take this further I propose a confluence page to be seup.


1. Spark UI
2. Dynamic allocation
3. Tuning of jobs
4. Collecting spark metrics for monitoring and alerting
5.  For those who prefer to use Pandas API on Spark since the
release of Spark 3.2, What are some important notes for those users? For
example, what are the additional factors affecting the Spark performance
using Pandas API on Spark? How to tune them in addition to the 
 conventional
Spark tuning methods applied to Spark SQL users.
6. Spark internals and/or comparing spark 3 and 2
7. Spark Streaming & Spark Structured Streaming
8. Spark on notebooks
9. Spark on serverless (for example Spark on Google Cloud)
10. Spark on k8s

 Opinions and how to is welcome


view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Mon, 13 Mar 2023 at 16:16, Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> Hi guys
>
> To move forward I selected these topics from the thread "Online
> classes for spark topics".
>
> To take this further I propose a confluence page to be seup.
>
> Opinions and how to is welcome
>
> Cheers
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary damages
> arising from such loss, damage or destruction.
>
>
>

>>>
>>>
>>>


Re: Topics for Spark online classes & webinars

2023-03-14 Thread Denny Lee
In the past, we've been using the Apache Spark LinkedIn page
 and group to broadcast
these type of events - if you're cool with this?  Or we could go through
the process of submitting and updating the current https://spark.apache.org
or request to leverage the original Spark confluence page
.WDYT?

On Mon, Mar 13, 2023 at 9:34 AM Mich Talebzadeh 
wrote:

> Well that needs to be created first for this purpose. The appropriate name
> etc. to be decided. Maybe @Denny Lee   can
> facilitate this as he offered his help.
>
>
> cheers
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 13 Mar 2023 at 16:29, asma zgolli  wrote:
>
>> Hello Mich,
>>
>> Can you please provide the link for the confluence page?
>>
>> Many thanks
>> Asma
>> Ph.D. in Big Data - Applied Machine Learning
>>
>> Le lun. 13 mars 2023 à 17:21, Mich Talebzadeh 
>> a écrit :
>>
>>> Apologies I missed the list.
>>>
>>> To move forward I selected these topics from the thread "Online classes
>>> for spark topics".
>>>
>>> To take this further I propose a confluence page to be seup.
>>>
>>>
>>>1. Spark UI
>>>2. Dynamic allocation
>>>3. Tuning of jobs
>>>4. Collecting spark metrics for monitoring and alerting
>>>5.  For those who prefer to use Pandas API on Spark since the
>>>release of Spark 3.2, What are some important notes for those users? For
>>>example, what are the additional factors affecting the Spark performance
>>>using Pandas API on Spark? How to tune them in addition to the 
>>> conventional
>>>Spark tuning methods applied to Spark SQL users.
>>>6. Spark internals and/or comparing spark 3 and 2
>>>7. Spark Streaming & Spark Structured Streaming
>>>8. Spark on notebooks
>>>9. Spark on serverless (for example Spark on Google Cloud)
>>>10. Spark on k8s
>>>
>>> Opinions and how to is welcome
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Mon, 13 Mar 2023 at 16:16, Mich Talebzadeh 
>>> wrote:
>>>
 Hi guys

 To move forward I selected these topics from the thread "Online classes
 for spark topics".

 To take this further I propose a confluence page to be seup.

 Opinions and how to is welcome

 Cheers



view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.



>>>
>>
>>
>>


unsubscribe

2023-03-14 Thread Atheer Abdullatif
unsubscribe


Re: Topics for Spark online classes & webinars

2023-03-14 Thread Suttraway, Manisha
+! For these classes,
Please share confluence page link.

-Manisha

From: asma zgolli 
Date: Monday 13 March 2023 at 16:30
To: Mich Talebzadeh 
Cc: "user @spark" , Spark dev list 
Subject: RE: [EXTERNAL]Topics for Spark online classes & webinars


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hello Mich,

Can you please provide the link for the confluence page?

Many thanks
Asma
Ph.D. in Big Data - Applied Machine Learning

Le lun. 13 mars 2023 à 17:21, Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>> a écrit :
Apologies I missed the list.

To move forward I selected these topics from the thread "Online classes for 
spark topics".

To take this further I propose a confluence page to be seup.


  1.  Spark UI
  2.  Dynamic allocation
  3.  Tuning of jobs
  4.  Collecting spark metrics for monitoring and alerting
  5.   For those who prefer to use Pandas API on Spark since the release of 
Spark 3.2, What are some important notes for those users? For example, what are 
the additional factors affecting the Spark performance using Pandas API on 
Spark? How to tune them in addition to the conventional Spark tuning methods 
applied to Spark SQL users.
  6.  Spark internals and/or comparing spark 3 and 2
  7.  Spark Streaming & Spark Structured Streaming
  8.  Spark on notebooks
  9.  Spark on serverless (for example Spark on Google Cloud)
  10. Spark on k8s

Opinions and how to is welcome


 [Image removed by sender.]   view my Linkedin 
profile

 https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Mon, 13 Mar 2023 at 16:16, Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>> wrote:
Hi guys

To move forward I selected these topics from the thread "Online classes for 
spark topics".

To take this further I propose a confluence page to be seup.

Opinions and how to is welcome

Cheers




 [Image removed by sender.]   view my Linkedin 
profile

 https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.







Re: Adding pause() method to pyspark.sql.streaming.StreamingQuery

2023-03-14 Thread Martin Andersson
Hi Mich.

I'm trying to understand, can you please provide some use-cases where it would 
be beneficial with a pause and how a pause would differ functionally from a 
stop?

Best regards, Martin

From: Mich Talebzadeh 
Sent: Thursday, March 9, 2023 17:12
To: Spark dev list 
Subject: Adding pause() method to pyspark.sql.streaming.StreamingQuery


EXTERNAL SENDER. Do not click links or open attachments unless you recognize 
the sender and know the content is safe. DO NOT provide your username or 
password.



Hi,


Currently for Spark Streaming we have the following class:


pyspark.sql.streaming.StreamingQuery


There are a number of useful methods, for example stop() which stops the 
streaming process gracefully.


Can we add another method pause() so w can pause the processing. This will come 
handy in a number of occasions?



Thanks



 
[https://ci3.googleusercontent.com/mail-sig/AIorK4zholKucR2Q9yMrKbHNn-o1TuS4mYXyi2KO6Xmx6ikHPySa9MLaLZ8t2hrA6AUcxSxDgHIwmKE]
   view my Linkedin 
profile


 
https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




Re: spark executor pod has same memory value for request and limit

2023-03-14 Thread Martin Andersson
There is a very good reason for this. It is recommended using k8s that you set 
memory request and limit to the same value, set a cpu request, but not a cpu 
limit. More info here https://home.robusta.dev/blog/kubernetes-memory-limit

BR, Martin

From: Mich Talebzadeh 
Sent: Friday, March 10, 2023 20:25
To: Ismail Yenigul 
Cc: dev@spark.apache.org 
Subject: Re: spark executor pod has same memory value for request and limit


EXTERNAL SENDER. Do not click links or open attachments unless you recognize 
the sender and know the content is safe. DO NOT provide your username or 
password.


agreed. need to be enhanced!


HTH


 
[https://ci3.googleusercontent.com/mail-sig/AIorK4zholKucR2Q9yMrKbHNn-o1TuS4mYXyi2KO6Xmx6ikHPySa9MLaLZ8t2hrA6AUcxSxDgHIwmKE]
   view my Linkedin 
profile


 
https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Fri, 10 Mar 2023 at 19:15, Ismail Yenigul 
mailto:ismailyeni...@gmail.com>> wrote:
Hi Mich,

The issue is here there is no parameter to set executor pod request memory 
value.
Currently we have only one parameter which is spark.executor.memory and it set  
pod resources limit  and requests.

Mich Talebzadeh mailto:mich.talebza...@gmail.com>>, 
10 Mar 2023 Cum, 22:04 tarihinde şunu yazdı:
Yes, both EKS and GKE (Google) are on 3.1.2 so I am not sure  those parameters 
will work :(



 
[https://ci3.googleusercontent.com/mail-sig/AIorK4zholKucR2Q9yMrKbHNn-o1TuS4mYXyi2KO6Xmx6ikHPySa9MLaLZ8t2hrA6AUcxSxDgHIwmKE]
   view my Linkedin 
profile


 
https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Fri, 10 Mar 2023 at 19:01, Ismail Yenigul 
mailto:ismailyeni...@gmail.com>> wrote:
Hi Mich,

it is on AWS EKS


Mich Talebzadeh mailto:mich.talebza...@gmail.com>>, 
10 Mar 2023 Cum, 21:11 tarihinde şunu yazdı:
I forgot top ask which k8s cluster are you using, assuming some clod vendor



 
[https://ci3.googleusercontent.com/mail-sig/AIorK4zholKucR2Q9yMrKbHNn-o1TuS4mYXyi2KO6Xmx6ikHPySa9MLaLZ8t2hrA6AUcxSxDgHIwmKE]
   view my Linkedin 
profile