Re: How do you debug a code-generated aggregate?

2024-02-13 Thread Jack Goodson
ps://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying

Re: How do you debug a code-generated aggregate?

2024-02-12 Thread Jack Goodson
I may be ignorant of other debugging methods in Spark but the best success I've had is using smaller datasets (if runs take a long time) and adding intermediate output steps. This is quite different from application development in non-distributed systems where a debugger is trivial to attach but I

Re: [DISCUSS] Show Python code examples first in Spark documentation

2023-02-22 Thread Jack Goodson
Good idea, at the company I work at we discussed using Scala as our primary language because technically it is slightly stronger than python but ultimately chose python in the end as it’s easier for other devs to be on boarded to our platform and future hiring for the team etc would be easier On

Jira Account for Contributions

2023-02-09 Thread Jack Goodson
Hi, I'm wanting to start contributing to the Spark project, do I need a Jira account at https://issues.apache.org/jira/projects/SPARK/summary before I'm able to do this? If so can one please be created with this email address? Thank you

Re: How can I get the same spark context in two different python processes

2022-12-12 Thread Jack Goodson
apologies, the code should read as below from threading import Thread context = pyspark.sql.SparkSession.builder.appName("spark").getOrCreate() t1 = Thread(target=my_func, args=(context,)) t1.start() t2 = Thread(target=my_func, args=(context,)) t2.start() On Tue, Dec 13, 2022 at 4:

Re: How can I get the same spark context in two different python processes

2022-12-12 Thread Jack Goodson
Hi Kevin, I had a similar use case (see below code) but with something that wasn’t spark related. I think the below should work for you, you may need to edit the context variable to suit your needs but hopefully it gives the general idea of sharing a single object between multiple threads.