Re: Topics for Spark online classes & webinars

2023-03-15 Thread Denny Lee
What we can do is get into the habit of compiling the list on LinkedIn but
making sure this list is shared and broadcast here, eh?!

As well, when we broadcast the videos, we can do this using zoom/jitsi/
riverside.fm as well as simulcasting this on LinkedIn. This way you can
view directly on the former without ever logging in with a user ID.

HTH!!

On Wed, Mar 15, 2023 at 4:30 PM Mich Talebzadeh 
wrote:

> Understood Nitin It would be wrong to act against one's conviction. I am
> sure we can find a way around providing the contents
>
> Regards
>
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead
> Palantir Technologies Limited
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 15 Mar 2023 at 22:34, Nitin Bhansali 
> wrote:
>
>> Hi Mich,
>>
>> Thanks for your prompt response ... much appreciated. I know how to and
>> can create login IDs on such sites but I had taken conscious decision some
>> 20 years ago ( and i will be going against my principles) not to be on such
>> sites. Hence I had asked for is there any other way I can join/view
>> recording of webinar.
>>
>> Anyways not to worry.
>>
>> Thanks & Regards
>>
>> Nitin.
>>
>>
>> On Wednesday, 15 March 2023 at 20:37:55 GMT, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>
>> Hi Nitin,
>>
>> Linkedin is more of a professional media.  FYI, I am only a member of
>> Linkedin, no facebook, etc.There is no reason for you NOT to create a
>> profile for yourself  in linkedin :)
>>
>>
>> https://www.linkedin.com/help/linkedin/answer/a1338223/sign-up-to-join-linkedin?lang=en
>>
>> see you there as well.
>>
>> Best of luck.
>>
>>
>> Mich Talebzadeh,
>> Lead Solutions Architect/Engineering Lead,
>> Palantir Technologies Limited
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 15 Mar 2023 at 18:31, Nitin Bhansali 
>> wrote:
>>
>> Hello Mich,
>>
>> My apologies  ...  but I am not on any of such social/professional sites?
>> Any other way to access such webinars/classes?
>>
>> Thanks & Regards
>> Nitin.
>>
>> On Wednesday, 15 March 2023 at 18:26:51 GMT, Denny Lee <
>> denny.g@gmail.com> wrote:
>>
>>
>> Thanks Mich for tackling this!  I encourage everyone to add to the list
>> so we can have a comprehensive list of topics, eh?!
>>
>> On Wed, Mar 15, 2023 at 10:27 Mich Talebzadeh 
>> wrote:
>>
>> Hi all,
>>
>> Thanks to @Denny Lee   to give access to
>>
>> https://www.linkedin.com/company/apachespark/
>>
>> and contribution from @asma zgolli 
>>
>> You will see my post at the bottom. Please add anything else on topics to
>> the list as a comment.
>>
>> We will then put them together in an article perhaps. Comments and
>> contributions are welcome.
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Lead Solutions Architect/Engineering Lead,
>> Palantir Technologies Limited
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 14 Mar 2023 at 15:09, Mich Talebzadeh 
>> wrote:
>>
>> Hi Denny,
>>
>> That Apache Spark Linkedin page
>> https://www.linkedin.com/company/apachespark/ looks fine. It also allows
>> a wider audience to benefit from it.
>>
>> +1 for me
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Mich Talebzadeh
Understood Nitin It would be wrong to act against one's conviction. I am
sure we can find a way around providing the contents

Regards

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies Limited



   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 15 Mar 2023 at 22:34, Nitin Bhansali 
wrote:

> Hi Mich,
>
> Thanks for your prompt response ... much appreciated. I know how to and
> can create login IDs on such sites but I had taken conscious decision some
> 20 years ago ( and i will be going against my principles) not to be on such
> sites. Hence I had asked for is there any other way I can join/view
> recording of webinar.
>
> Anyways not to worry.
>
> Thanks & Regards
>
> Nitin.
>
>
> On Wednesday, 15 March 2023 at 20:37:55 GMT, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>
> Hi Nitin,
>
> Linkedin is more of a professional media.  FYI, I am only a member of
> Linkedin, no facebook, etc.There is no reason for you NOT to create a
> profile for yourself  in linkedin :)
>
>
> https://www.linkedin.com/help/linkedin/answer/a1338223/sign-up-to-join-linkedin?lang=en
>
> see you there as well.
>
> Best of luck.
>
>
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead,
> Palantir Technologies Limited
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 15 Mar 2023 at 18:31, Nitin Bhansali 
> wrote:
>
> Hello Mich,
>
> My apologies  ...  but I am not on any of such social/professional sites?
> Any other way to access such webinars/classes?
>
> Thanks & Regards
> Nitin.
>
> On Wednesday, 15 March 2023 at 18:26:51 GMT, Denny Lee <
> denny.g@gmail.com> wrote:
>
>
> Thanks Mich for tackling this!  I encourage everyone to add to the list so
> we can have a comprehensive list of topics, eh?!
>
> On Wed, Mar 15, 2023 at 10:27 Mich Talebzadeh 
> wrote:
>
> Hi all,
>
> Thanks to @Denny Lee   to give access to
>
> https://www.linkedin.com/company/apachespark/
>
> and contribution from @asma zgolli 
>
> You will see my post at the bottom. Please add anything else on topics to
> the list as a comment.
>
> We will then put them together in an article perhaps. Comments and
> contributions are welcome.
>
> HTH
>
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead,
> Palantir Technologies Limited
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 14 Mar 2023 at 15:09, Mich Talebzadeh 
> wrote:
>
> Hi Denny,
>
> That Apache Spark Linkedin page
> https://www.linkedin.com/company/apachespark/ looks fine. It also allows
> a wider audience to benefit from it.
>
> +1 for me
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 14 Mar 2023 at 14:23, Denny Lee  wrote:
>
> In the past, we've been using the Apache Spark LinkedIn page
>  and group to broadcast
> these type of events - if you're cool with this?  Or we could go through
> the process of submitting and updating the current
> https://spark.apache.org or request to leverage the original Spark
> confluence page .
>  WDYT?
>
> On Mon, Mar 13, 2023 at 9:34 AM Mich Talebzadeh 
> wrote:
>
> Well that needs to be created first for this purpose. The 

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Mich Talebzadeh
Hi Nitin,

Linkedin is more of a professional media.  FYI, I am only a member of
Linkedin, no facebook, etc.There is no reason for you NOT to create a
profile for yourself  in linkedin :)

https://www.linkedin.com/help/linkedin/answer/a1338223/sign-up-to-join-linkedin?lang=en

see you there as well.

Best of luck.


Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead,
Palantir Technologies Limited


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 15 Mar 2023 at 18:31, Nitin Bhansali 
wrote:

> Hello Mich,
>
> My apologies  ...  but I am not on any of such social/professional sites?
> Any other way to access such webinars/classes?
>
> Thanks & Regards
> Nitin.
>
> On Wednesday, 15 March 2023 at 18:26:51 GMT, Denny Lee <
> denny.g@gmail.com> wrote:
>
>
> Thanks Mich for tackling this!  I encourage everyone to add to the list so
> we can have a comprehensive list of topics, eh?!
>
> On Wed, Mar 15, 2023 at 10:27 Mich Talebzadeh 
> wrote:
>
> Hi all,
>
> Thanks to @Denny Lee   to give access to
>
> https://www.linkedin.com/company/apachespark/
>
> and contribution from @asma zgolli 
>
> You will see my post at the bottom. Please add anything else on topics to
> the list as a comment.
>
> We will then put them together in an article perhaps. Comments and
> contributions are welcome.
>
> HTH
>
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead,
> Palantir Technologies Limited
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 14 Mar 2023 at 15:09, Mich Talebzadeh 
> wrote:
>
> Hi Denny,
>
> That Apache Spark Linkedin page
> https://www.linkedin.com/company/apachespark/ looks fine. It also allows
> a wider audience to benefit from it.
>
> +1 for me
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 14 Mar 2023 at 14:23, Denny Lee  wrote:
>
> In the past, we've been using the Apache Spark LinkedIn page
>  and group to broadcast
> these type of events - if you're cool with this?  Or we could go through
> the process of submitting and updating the current
> https://spark.apache.org or request to leverage the original Spark
> confluence page .
>  WDYT?
>
> On Mon, Mar 13, 2023 at 9:34 AM Mich Talebzadeh 
> wrote:
>
> Well that needs to be created first for this purpose. The appropriate name
> etc. to be decided. Maybe @Denny Lee   can
> facilitate this as he offered his help.
>
>
> cheers
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 13 Mar 2023 at 16:29, asma zgolli  wrote:
>
> Hello Mich,
>
> Can you please provide the link for the confluence page?
>
> Many thanks
> Asma
> Ph.D. in Big Data - Applied Machine Learning
>
> Le lun. 13 mars 2023 à 17:21, Mich Talebzadeh 
> a écrit :
>
> Apologies I missed the list.
>
> To move forward I selected these topics from the thread "Online classes
> for spark topics".
>
> To take this further I propose a confluence page to be seup.
>
>
>1. Spark UI
>2. Dynamic allocation
>3. Tuning of jobs
>4. Collecting spark metrics for monitoring and alerting
>5.  For those who prefer to use Pandas API on Spark since the release
>of Spark 3.2, What are some 

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Bjørn Jørgensen
Great.
A case that I hope can be better documented, especially now that we have
Pandas API on Spark and many potential new users coming from Pandas.
Is how to start Spark with full available memory and CPU.
I use this function to do this in a notebook.

import multiprocessing
import os
import sys
from pyspark import SparkConf, SparkContext
from pyspark import pandas as ps
from pyspark.sql import *

os.environ["PYARROW_IGNORE_TIMEZONE"] = "1"

number_cores = int(multiprocessing.cpu_count())

mem_bytes = os.sysconf("SC_PAGE_SIZE") * os.sysconf("SC_PHYS_PAGES")  #
e.g. 4015976448
memory_gb = int(mem_bytes / (1024.0**3))  # e.g. 3.74



def get_spark_session(app_name: str, conf: SparkConf):
conf.setMaster("local[{}]".format(number_cores))
conf.set("spark.driver.memory", "{}g".format(memory_gb)).set(
"spark.sql.adaptive.enabled", "True"
).set(
"spark.serializer", "org.apache.spark.serializer.KryoSerializer"
).set(
"spark.sql.repl.eagerEval.maxNumRows", "1"
).set(
"sc.setLogLevel", "ERROR"
)

return
SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()


spark = get_spark_session("My app", SparkConf())

ons. 15. mar. 2023 kl. 19:27 skrev Denny Lee :

> Thanks Mich for tackling this!  I encourage everyone to add to the list so
> we can have a comprehensive list of topics, eh?!
>
> On Wed, Mar 15, 2023 at 10:27 Mich Talebzadeh 
> wrote:
>
>> Hi all,
>>
>> Thanks to @Denny Lee   to give access to
>>
>> https://www.linkedin.com/company/apachespark/
>>
>> and contribution from @asma zgolli 
>>
>> You will see my post at the bottom. Please add anything else on topics to
>> the list as a comment.
>>
>> We will then put them together in an article perhaps. Comments and
>> contributions are welcome.
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Lead Solutions Architect/Engineering Lead,
>> Palantir Technologies Limited
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 14 Mar 2023 at 15:09, Mich Talebzadeh 
>> wrote:
>>
>>> Hi Denny,
>>>
>>> That Apache Spark Linkedin page
>>> https://www.linkedin.com/company/apachespark/ looks fine. It also
>>> allows a wider audience to benefit from it.
>>>
>>> +1 for me
>>>
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 14 Mar 2023 at 14:23, Denny Lee  wrote:
>>>
 In the past, we've been using the Apache Spark LinkedIn page
  and group to broadcast
 these type of events - if you're cool with this?  Or we could go through
 the process of submitting and updating the current
 https://spark.apache.org or request to leverage the original Spark
 confluence page .
  WDYT?

 On Mon, Mar 13, 2023 at 9:34 AM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> Well that needs to be created first for this purpose. The appropriate
> name etc. to be decided. Maybe @Denny Lee 
> can facilitate this as he offered his help.
>
>
> cheers
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary damages
> arising from such loss, damage or destruction.
>
>
>
>
> On Mon, 13 Mar 2023 at 16:29, asma zgolli 
> wrote:
>
>> Hello Mich,
>>
>> Can you please provide the link for the confluence page?
>>
>> Many thanks
>> Asma
>> Ph.D. in Big Data - Applied Machine Learning
>>
>> Le lun. 13 mars 2023 à 17:21, Mich Talebzadeh <
>> mich.talebza...@gmail.com> a écrit :
>>
>>> Apologies I missed the 

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Denny Lee
Thanks Mich for tackling this!  I encourage everyone to add to the list so
we can have a comprehensive list of topics, eh?!

On Wed, Mar 15, 2023 at 10:27 Mich Talebzadeh 
wrote:

> Hi all,
>
> Thanks to @Denny Lee   to give access to
>
> https://www.linkedin.com/company/apachespark/
>
> and contribution from @asma zgolli 
>
> You will see my post at the bottom. Please add anything else on topics to
> the list as a comment.
>
> We will then put them together in an article perhaps. Comments and
> contributions are welcome.
>
> HTH
>
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead,
> Palantir Technologies Limited
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 14 Mar 2023 at 15:09, Mich Talebzadeh 
> wrote:
>
>> Hi Denny,
>>
>> That Apache Spark Linkedin page
>> https://www.linkedin.com/company/apachespark/ looks fine. It also allows
>> a wider audience to benefit from it.
>>
>> +1 for me
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 14 Mar 2023 at 14:23, Denny Lee  wrote:
>>
>>> In the past, we've been using the Apache Spark LinkedIn page
>>>  and group to broadcast
>>> these type of events - if you're cool with this?  Or we could go through
>>> the process of submitting and updating the current
>>> https://spark.apache.org or request to leverage the original Spark
>>> confluence page .
>>>  WDYT?
>>>
>>> On Mon, Mar 13, 2023 at 9:34 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Well that needs to be created first for this purpose. The appropriate
 name etc. to be decided. Maybe @Denny Lee   can
 facilitate this as he offered his help.


 cheers



view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Mon, 13 Mar 2023 at 16:29, asma zgolli  wrote:

> Hello Mich,
>
> Can you please provide the link for the confluence page?
>
> Many thanks
> Asma
> Ph.D. in Big Data - Applied Machine Learning
>
> Le lun. 13 mars 2023 à 17:21, Mich Talebzadeh <
> mich.talebza...@gmail.com> a écrit :
>
>> Apologies I missed the list.
>>
>> To move forward I selected these topics from the thread "Online
>> classes for spark topics".
>>
>> To take this further I propose a confluence page to be seup.
>>
>>
>>1. Spark UI
>>2. Dynamic allocation
>>3. Tuning of jobs
>>4. Collecting spark metrics for monitoring and alerting
>>5.  For those who prefer to use Pandas API on Spark since the
>>release of Spark 3.2, What are some important notes for those users? 
>> For
>>example, what are the additional factors affecting the Spark 
>> performance
>>using Pandas API on Spark? How to tune them in addition to the 
>> conventional
>>Spark tuning methods applied to Spark SQL users.
>>6. Spark internals and/or comparing spark 3 and 2
>>7. Spark Streaming & Spark Structured Streaming
>>8. Spark on notebooks
>>9. Spark on serverless (for example Spark on Google Cloud)
>>10. Spark on k8s
>>
>> Opinions and how to is welcome
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>> for any loss, damage or destruction of data or 

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Mich Talebzadeh
Hi all,

Thanks to @Denny Lee   to give access to

https://www.linkedin.com/company/apachespark/

and contribution from @asma zgolli 

You will see my post at the bottom. Please add anything else on topics to
the list as a comment.

We will then put them together in an article perhaps. Comments and
contributions are welcome.

HTH

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead,
Palantir Technologies Limited



   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 14 Mar 2023 at 15:09, Mich Talebzadeh 
wrote:

> Hi Denny,
>
> That Apache Spark Linkedin page
> https://www.linkedin.com/company/apachespark/ looks fine. It also allows
> a wider audience to benefit from it.
>
> +1 for me
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 14 Mar 2023 at 14:23, Denny Lee  wrote:
>
>> In the past, we've been using the Apache Spark LinkedIn page
>>  and group to broadcast
>> these type of events - if you're cool with this?  Or we could go through
>> the process of submitting and updating the current
>> https://spark.apache.org or request to leverage the original Spark
>> confluence page .
>>  WDYT?
>>
>> On Mon, Mar 13, 2023 at 9:34 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Well that needs to be created first for this purpose. The appropriate
>>> name etc. to be decided. Maybe @Denny Lee   can
>>> facilitate this as he offered his help.
>>>
>>>
>>> cheers
>>>
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Mon, 13 Mar 2023 at 16:29, asma zgolli  wrote:
>>>
 Hello Mich,

 Can you please provide the link for the confluence page?

 Many thanks
 Asma
 Ph.D. in Big Data - Applied Machine Learning

 Le lun. 13 mars 2023 à 17:21, Mich Talebzadeh <
 mich.talebza...@gmail.com> a écrit :

> Apologies I missed the list.
>
> To move forward I selected these topics from the thread "Online
> classes for spark topics".
>
> To take this further I propose a confluence page to be seup.
>
>
>1. Spark UI
>2. Dynamic allocation
>3. Tuning of jobs
>4. Collecting spark metrics for monitoring and alerting
>5.  For those who prefer to use Pandas API on Spark since the
>release of Spark 3.2, What are some important notes for those users? 
> For
>example, what are the additional factors affecting the Spark 
> performance
>using Pandas API on Spark? How to tune them in addition to the 
> conventional
>Spark tuning methods applied to Spark SQL users.
>6. Spark internals and/or comparing spark 3 and 2
>7. Spark Streaming & Spark Structured Streaming
>8. Spark on notebooks
>9. Spark on serverless (for example Spark on Google Cloud)
>10. Spark on k8s
>
> Opinions and how to is welcome
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary damages
> arising from such loss, damage or destruction.
>
>
>
>
> On Mon, 13 Mar 2023 at 16:16, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>

Re: Adding pause() method to pyspark.sql.streaming.StreamingQuery

2023-03-15 Thread Mich Talebzadeh
Hi Martin.

Yes, that is the intent. There may be other ways, but I cannot think of.


HTH


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 15 Mar 2023 at 11:21, Martin Andersson 
wrote:

> Hi Mich.
>
> So it sounds like what you're really after is a way to apply new stream
> options in runtime without downtime?
>
> BR, Martin
> --
> *From:* Mich Talebzadeh 
> *Sent:* Tuesday, March 14, 2023 16:39
> *To:* Martin Andersson 
> *Cc:* Spark dev list 
> *Subject:* Re: Adding pause() method to
> pyspark.sql.streaming.StreamingQuery
>
>
> EXTERNAL SENDER. Do not click links or open attachments unless you
> recognize the sender and know the content is safe. DO NOT provide your
> username or password.
>
> Hi Martin,
>
> I see the major benefit of the spark stop() method in giving the ability
> to shut down the main topic gracefully. I have explained this in this SPIP
> SPIP: Shutting down spark structured streaming when the streaming process
> completed current process
> 
>
> With regard to pause() I saw a request from a member
>
>
> Spark Structured Streaming] Could we apply new options of
> readStream/writeStream without stopping spark application (zero downtime)?
> 
>
>
> I think it would be good to have this paus() added so we can adjust spark
> streaming parameters without shutting down the streaming process.,
> effectively with zero streaming downtime. This "change" is a challenge
> because the parameters can only change at the start-up until now.
>
>
> HTH
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 14 Mar 2023 at 12:33, Martin Andersson 
> wrote:
>
> Hi Mich.
>
> I'm trying to understand, can you please provide some use-cases where it
> would be beneficial with a pause and how a pause would differ functionally
> from a stop?
>
> Best regards, Martin
> --
> *From:* Mich Talebzadeh 
> *Sent:* Thursday, March 9, 2023 17:12
> *To:* Spark dev list 
> *Subject:* Adding pause() method to pyspark.sql.streaming.StreamingQuery
>
>
> EXTERNAL SENDER. Do not click links or open attachments unless you
> recognize the sender and know the content is safe. DO NOT provide your
> username or password.
>
>
> Hi,
>
>
> Currently for Spark Streaming we have the following class:
>
>
> pyspark.sql.streaming.StreamingQuery
> 

Re: Adding pause() method to pyspark.sql.streaming.StreamingQuery

2023-03-15 Thread Martin Andersson
Hi Mich.

So it sounds like what you're really after is a way to apply new stream options 
in runtime without downtime?

BR, Martin

From: Mich Talebzadeh 
Sent: Tuesday, March 14, 2023 16:39
To: Martin Andersson 
Cc: Spark dev list 
Subject: Re: Adding pause() method to pyspark.sql.streaming.StreamingQuery


EXTERNAL SENDER. Do not click links or open attachments unless you recognize 
the sender and know the content is safe. DO NOT provide your username or 
password.


Hi Martin,

I see the major benefit of the spark stop() method in giving the ability to 
shut down the main topic gracefully. I have explained this in this SPIP
SPIP: Shutting down spark structured streaming when the streaming process 
completed current 
process

With regard to pause() I saw a request from a member


Spark Structured Streaming] Could we apply new options of 
readStream/writeStream without stopping spark application (zero 
downtime)?


I think it would be good to have this paus() added so we can adjust spark 
streaming parameters without shutting down the streaming process., effectively 
with zero streaming downtime. This "change" is a challenge because the 
parameters can only change at the start-up until now.


HTH


 
[https://ci3.googleusercontent.com/mail-sig/AIorK4zholKucR2Q9yMrKbHNn-o1TuS4mYXyi2KO6Xmx6ikHPySa9MLaLZ8t2hrA6AUcxSxDgHIwmKE]
   view my Linkedin 
profile


 
https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Tue, 14 Mar 2023 at 12:33, Martin Andersson 
mailto:martin.anders...@kambi.com>> wrote:
Hi Mich.

I'm trying to understand, can you please provide some use-cases where it would 
be beneficial with a pause and how a pause would differ functionally from a 
stop?

Best regards, Martin

From: Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>>
Sent: Thursday, March 9, 2023 17:12
To: Spark dev list mailto:dev@spark.apache.org>>
Subject: Adding pause() method to pyspark.sql.streaming.StreamingQuery


EXTERNAL SENDER. Do not click links or open attachments unless you recognize 
the sender and know the content is safe. DO NOT provide your username or 
password.



Hi,


Currently for Spark Streaming we have the following class:


pyspark.sql.streaming.StreamingQuery


There are a number of useful methods, for example stop() which stops the 
streaming process gracefully.


Can we add another method pause() so w can pause the processing. This will come 
handy in a number of occasions?



Thanks