Re: How can I get the same spark context in two different python processes
Hi, Unfortunately, I don't have a working example I could share at hand, but the flow will be roughly like this - Retrieve an existing Python ClientServer (gateway) from the SparkContext - Get its gateway_parameters (some are constant for PySpark, but you'll need at least port and auth_token) - Pass these to a new process and use them to initialize a new ClientServer - From ClientServer jvm retrieve bindings for JVM SparkContext - Use JVM binding and gateway to initialize Python SparkContext in your process. Just to reiterate ‒ it is not something that we support (or Py4j for that matter) so don't do it unless you fully understand the implications (including, but not limited to, risk of leaking the token). Use this approach at your own risk. On 12/13/22 03:52, Kevin Su wrote: Maciej, Thanks for the reply. Could you share an example to achieve it? Maciej mailto:mszymkiew...@gmail.com>> 於 2022 年12月12日 週一 下午4:41寫道: Technically speaking, it is possible in stock distribution (can't speak for Databricks) and not super hard to do (just check out how we initialize sessions), but definitely not something that we test or support, especially in a scenario you described. If you want to achieve concurrent execution, multithreading is normally more than sufficient and avoids problems with the context. On 12/13/22 00:40, Kevin Su wrote: > I ran my spark job by using databricks job with a single python script. > IIUC, the databricks platform will create a spark context for this > python script. > However, I create a new subprocess in this script and run some spark > code in this subprocess, but this subprocess can't find the > context created by databricks. > Not sure if there is any api I can use to get the default context. > > bo yang mailto:bobyan...@gmail.com> <mailto:bobyan...@gmail.com <mailto:bobyan...@gmail.com>>> 於 2022年 12月 > 12日 週一 下午3:27寫道: > > In theory, maybe a Jupyter notebook or something similar could > achieve this? e.g. running some Jypyter kernel inside Spark driver, > then another Python process could connect to that kernel. > > But in the end, this is like Spark Connect :) > > > On Mon, Dec 12, 2022 at 2:55 PM Kevin Su mailto:pings...@gmail.com> > <mailto:pings...@gmail.com <mailto:pings...@gmail.com>>> wrote: > > Also, is there any way to workaround this issue without > using Spark connect? > > Kevin Su mailto:pings...@gmail.com> <mailto:pings...@gmail.com <mailto:pings...@gmail.com>>> 於 > 2022年12月12日 週一 下午2:52寫道: > > nvm, I found the ticket. > Also, is there any way to workaround this issue without > using Spark connect? > > Kevin Su mailto:pings...@gmail.com> <mailto:pings...@gmail.com <mailto:pings...@gmail.com>>> 於 > 2022年12月12日 週一 下午2:42寫道: > > Thanks for the quick response? Do we have any PR or Jira > ticket for it? > > Reynold Xin mailto:r...@databricks.com> > <mailto:r...@databricks.com <mailto:r...@databricks.com>>> 於 2022年12月12日 週一 下 > 午2:39寫道: > > Spark Connect :) > > (It’s work in progress) > > > On Mon, Dec 12 2022 at 2:29 PM, Kevin Su > mailto:pings...@gmail.com> <mailto:pings...@gmail.com <mailto:pings...@gmail.com>>> wrote: > > Hey there, How can I get the same spark context > in two different python processes? > Let’s say I create a context in Process A, and > then I want to use python subprocess B to get > the spark context created by Process A. How can > I achieve that? > > I've > tried pyspark.sql.SparkSession.builder.appName("spark").getOrCreate(), but it will create a new spark context. > -- Best regards, Maciej Szymkiewicz Web: https://zero323.net <https://zero323.net> PGP: A30CEF0C31A501EC -- Best regards, Maciej Szymkiewicz Web: https://zero323.net PGP: A30CEF0C31A501EC OpenPGP_signature Description: OpenPGP digital signature
Re: How can I get the same spark context in two different python processes
Hi Jack, My use case is a bit different, I created a subprocess instead of thread. I can't pass the args to subprocess. Jack Goodson 於 2022年12月12日 週一 晚上8:03寫道: > apologies, the code should read as below > > from threading import Thread > > context = pyspark.sql.SparkSession.builder.appName("spark").getOrCreate() > > t1 = Thread(target=my_func, args=(context,)) > t1.start() > > t2 = Thread(target=my_func, args=(context,)) > t2.start() > > On Tue, Dec 13, 2022 at 4:10 PM Jack Goodson > wrote: > >> Hi Kevin, >> >> I had a similar use case (see below code) but with something that wasn’t >> spark related. I think the below should work for you, you may need to edit >> the context variable to suit your needs but hopefully it gives the general >> idea of sharing a single object between multiple threads. >> >> Thanks >> >> >> from threading import Thread >> >> context = pyspark.sql.SparkSession.builder.appName("spark").getOrCreate() >> >> t1 = Thread(target=order_creator, args=(app_id, sleep_time,)) >> t1.start(target=my_func, args=(context,)) >> >> t2 = Thread(target=order_creator, args=(app_id, sleep_time,)) >> t2.start(target=my_func, args=(context,)) >> >
Re: How can I get the same spark context in two different python processes
apologies, the code should read as below from threading import Thread context = pyspark.sql.SparkSession.builder.appName("spark").getOrCreate() t1 = Thread(target=my_func, args=(context,)) t1.start() t2 = Thread(target=my_func, args=(context,)) t2.start() On Tue, Dec 13, 2022 at 4:10 PM Jack Goodson wrote: > Hi Kevin, > > I had a similar use case (see below code) but with something that wasn’t > spark related. I think the below should work for you, you may need to edit > the context variable to suit your needs but hopefully it gives the general > idea of sharing a single object between multiple threads. > > Thanks > > > from threading import Thread > > context = pyspark.sql.SparkSession.builder.appName("spark").getOrCreate() > > t1 = Thread(target=order_creator, args=(app_id, sleep_time,)) > t1.start(target=my_func, args=(context,)) > > t2 = Thread(target=order_creator, args=(app_id, sleep_time,)) > t2.start(target=my_func, args=(context,)) >
Re: How can I get the same spark context in two different python processes
Hi Kevin, I had a similar use case (see below code) but with something that wasn’t spark related. I think the below should work for you, you may need to edit the context variable to suit your needs but hopefully it gives the general idea of sharing a single object between multiple threads. Thanks from threading import Thread context = pyspark.sql.SparkSession.builder.appName("spark").getOrCreate() t1 = Thread(target=order_creator, args=(app_id, sleep_time,)) t1.start(target=my_func, args=(context,)) t2 = Thread(target=order_creator, args=(app_id, sleep_time,)) t2.start(target=my_func, args=(context,))
Re: How can I get the same spark context in two different python processes
Maciej, Thanks for the reply. Could you share an example to achieve it? Maciej 於 2022年12月12日 週一 下午4:41寫道: > Technically speaking, it is possible in stock distribution (can't speak > for Databricks) and not super hard to do (just check out how we > initialize sessions), but definitely not something that we test or > support, especially in a scenario you described. > > If you want to achieve concurrent execution, multithreading is normally > more than sufficient and avoids problems with the context. > > > > On 12/13/22 00:40, Kevin Su wrote: > > I ran my spark job by using databricks job with a single python script. > > IIUC, the databricks platform will create a spark context for this > > python script. > > However, I create a new subprocess in this script and run some spark > > code in this subprocess, but this subprocess can't find the > > context created by databricks. > > Not sure if there is any api I can use to get the default context. > > > > bo yang mailto:bobyan...@gmail.com>> 於 2022年12月 > > 12日 週一 下午3:27寫道: > > > > In theory, maybe a Jupyter notebook or something similar could > > achieve this? e.g. running some Jypyter kernel inside Spark driver, > > then another Python process could connect to that kernel. > > > > But in the end, this is like Spark Connect :) > > > > > > On Mon, Dec 12, 2022 at 2:55 PM Kevin Su > <mailto:pings...@gmail.com>> wrote: > > > > Also, is there any way to workaround this issue without > > using Spark connect? > > > > Kevin Su mailto:pings...@gmail.com>> 於 > > 2022年12月12日 週一 下午2:52寫道: > > > > nvm, I found the ticket. > > Also, is there any way to workaround this issue without > > using Spark connect? > > > > Kevin Su mailto:pings...@gmail.com>> 於 > > 2022年12月12日 週一 下午2:42寫道: > > > > Thanks for the quick response? Do we have any PR or Jira > > ticket for it? > > > > Reynold Xin > <mailto:r...@databricks.com>> 於 2022年12月12日 週一 下 > > 午2:39寫道: > > > > Spark Connect :) > > > > (It’s work in progress) > > > > > > On Mon, Dec 12 2022 at 2:29 PM, Kevin Su > > mailto:pings...@gmail.com>> > wrote: > > > > Hey there, How can I get the same spark context > > in two different python processes? > > Let’s say I create a context in Process A, and > > then I want to use python subprocess B to get > > the spark context created by Process A. How can > > I achieve that? > > > > I've > > > tried pyspark.sql.SparkSession.builder.appName("spark").getOrCreate(), but > it will create a new spark context. > > > > -- > Best regards, > Maciej Szymkiewicz > > Web: https://zero323.net > PGP: A30CEF0C31A501EC > >
Re: How can I get the same spark context in two different python processes
Technically speaking, it is possible in stock distribution (can't speak for Databricks) and not super hard to do (just check out how we initialize sessions), but definitely not something that we test or support, especially in a scenario you described. If you want to achieve concurrent execution, multithreading is normally more than sufficient and avoids problems with the context. On 12/13/22 00:40, Kevin Su wrote: I ran my spark job by using databricks job with a single python script. IIUC, the databricks platform will create a spark context for this python script. However, I create a new subprocess in this script and run some spark code in this subprocess, but this subprocess can't find the context created by databricks. Not sure if there is any api I can use to get the default context. bo yang mailto:bobyan...@gmail.com>> 於 2022年12月 12日 週一 下午3:27寫道: In theory, maybe a Jupyter notebook or something similar could achieve this? e.g. running some Jypyter kernel inside Spark driver, then another Python process could connect to that kernel. But in the end, this is like Spark Connect :) On Mon, Dec 12, 2022 at 2:55 PM Kevin Su mailto:pings...@gmail.com>> wrote: Also, is there any way to workaround this issue without using Spark connect? Kevin Su mailto:pings...@gmail.com>> 於 2022年12月12日 週一 下午2:52寫道: nvm, I found the ticket. Also, is there any way to workaround this issue without using Spark connect? Kevin Su mailto:pings...@gmail.com>> 於 2022年12月12日 週一 下午2:42寫道: Thanks for the quick response? Do we have any PR or Jira ticket for it? Reynold Xin mailto:r...@databricks.com>> 於 2022年12月12日 週一 下 午2:39寫道: Spark Connect :) (It’s work in progress) On Mon, Dec 12 2022 at 2:29 PM, Kevin Su mailto:pings...@gmail.com>> wrote: Hey there, How can I get the same spark context in two different python processes? Let’s say I create a context in Process A, and then I want to use python subprocess B to get the spark context created by Process A. How can I achieve that? I've tried pyspark.sql.SparkSession.builder.appName("spark").getOrCreate(), but it will create a new spark context. -- Best regards, Maciej Szymkiewicz Web: https://zero323.net PGP: A30CEF0C31A501EC OpenPGP_signature Description: OpenPGP digital signature
Re: How can I get the same spark context in two different python processes
I ran my spark job by using databricks job with a single python script. IIUC, the databricks platform will create a spark context for this python script. However, I create a new subprocess in this script and run some spark code in this subprocess, but this subprocess can't find the context created by databricks. Not sure if there is any api I can use to get the default context. bo yang 於 2022年12月12日 週一 下午3:27寫道: > In theory, maybe a Jupyter notebook or something similar could achieve > this? e.g. running some Jypyter kernel inside Spark driver, then another > Python process could connect to that kernel. > > But in the end, this is like Spark Connect :) > > > On Mon, Dec 12, 2022 at 2:55 PM Kevin Su wrote: > >> Also, is there any way to workaround this issue without using Spark >> connect? >> >> Kevin Su 於 2022年12月12日 週一 下午2:52寫道: >> >>> nvm, I found the ticket. >>> Also, is there any way to workaround this issue without using Spark >>> connect? >>> >>> Kevin Su 於 2022年12月12日 週一 下午2:42寫道: >>> >>>> Thanks for the quick response? Do we have any PR or Jira ticket for it? >>>> >>>> Reynold Xin 於 2022年12月12日 週一 下午2:39寫道: >>>> >>>>> Spark Connect :) >>>>> >>>>> (It’s work in progress) >>>>> >>>>> >>>>> On Mon, Dec 12 2022 at 2:29 PM, Kevin Su wrote: >>>>> >>>>>> Hey there, How can I get the same spark context in two different >>>>>> python processes? >>>>>> Let’s say I create a context in Process A, and then I want to use >>>>>> python subprocess B to get the spark context created by Process A. How >>>>>> can >>>>>> I achieve that? >>>>>> >>>>>> I've >>>>>> tried pyspark.sql.SparkSession.builder.appName("spark").getOrCreate(), >>>>>> but >>>>>> it will create a new spark context. >>>>>> >>>>>
Re: How can I get the same spark context in two different python processes
In theory, maybe a Jupyter notebook or something similar could achieve this? e.g. running some Jypyter kernel inside Spark driver, then another Python process could connect to that kernel. But in the end, this is like Spark Connect :) On Mon, Dec 12, 2022 at 2:55 PM Kevin Su wrote: > Also, is there any way to workaround this issue without using Spark > connect? > > Kevin Su 於 2022年12月12日 週一 下午2:52寫道: > >> nvm, I found the ticket. >> Also, is there any way to workaround this issue without using Spark >> connect? >> >> Kevin Su 於 2022年12月12日 週一 下午2:42寫道: >> >>> Thanks for the quick response? Do we have any PR or Jira ticket for it? >>> >>> Reynold Xin 於 2022年12月12日 週一 下午2:39寫道: >>> >>>> Spark Connect :) >>>> >>>> (It’s work in progress) >>>> >>>> >>>> On Mon, Dec 12 2022 at 2:29 PM, Kevin Su wrote: >>>> >>>>> Hey there, How can I get the same spark context in two different >>>>> python processes? >>>>> Let’s say I create a context in Process A, and then I want to use >>>>> python subprocess B to get the spark context created by Process A. How can >>>>> I achieve that? >>>>> >>>>> I've >>>>> tried pyspark.sql.SparkSession.builder.appName("spark").getOrCreate(), but >>>>> it will create a new spark context. >>>>> >>>>
Re: How can I get the same spark context in two different python processes
Also, is there any way to workaround this issue without using Spark connect? Kevin Su 於 2022年12月12日 週一 下午2:52寫道: > nvm, I found the ticket. > Also, is there any way to workaround this issue without using Spark > connect? > > Kevin Su 於 2022年12月12日 週一 下午2:42寫道: > >> Thanks for the quick response? Do we have any PR or Jira ticket for it? >> >> Reynold Xin 於 2022年12月12日 週一 下午2:39寫道: >> >>> Spark Connect :) >>> >>> (It’s work in progress) >>> >>> >>> On Mon, Dec 12 2022 at 2:29 PM, Kevin Su wrote: >>> >>>> Hey there, How can I get the same spark context in two different python >>>> processes? >>>> Let’s say I create a context in Process A, and then I want to use >>>> python subprocess B to get the spark context created by Process A. How can >>>> I achieve that? >>>> >>>> I've >>>> tried pyspark.sql.SparkSession.builder.appName("spark").getOrCreate(), but >>>> it will create a new spark context. >>>> >>>
Re: How can I get the same spark context in two different python processes
Spark Connect :) (It’s work in progress) On Mon, Dec 12 2022 at 2:29 PM, Kevin Su < pings...@gmail.com > wrote: > > Hey there, How can I get the same spark context in two different python > processes? > Let’s say I create a context in Process A, and then I want to use python > subprocess B to get the spark context created by Process A. How can I > achieve that? > > > I've tried > pyspark.sql.SparkSession.builder.appName("spark").getOrCreate(), but it > will create a new spark context. > smime.p7s Description: S/MIME Cryptographic Signature
How can I get the same spark context in two different python processes
Hey there, How can I get the same spark context in two different python processes? Let’s say I create a context in Process A, and then I want to use python subprocess B to get the spark context created by Process A. How can I achieve that? I've tried pyspark.sql.SparkSession.builder.appName("spark").getOrCreate(), but it will create a new spark context.