Re: How can I get the same spark context in two different python processes

Maciej Tue, 13 Dec 2022 04:04:41 -0800

Hi,

Unfortunately, I don't have a working example I could share at hand, but the flow will be roughly like this


- Retrieve an existing Python ClientServer  (gateway) from the SparkContext

- Get its gateway_parameters (some are constant for PySpark, but you'll need at least port and auth_token)

- Pass these to a new process and use them to initialize a new ClientServer
- From ClientServer jvm retrieve bindings for JVM SparkContext

- Use JVM binding and gateway to initialize Python SparkContext in your process.

Just to reiterate ‒ it is not something that we support (or Py4j for that matter) so don't do it unless you fully understand the implications (including, but not limited to, risk of leaking the token). Use this approach at your own risk.



On 12/13/22 03:52, Kevin Su wrote:

Maciej, Thanks for the reply.
Could you share an example to achieve it?

Maciej <mszymkiew...@gmail.com <mailto:mszymkiew...@gmail.com>> 於 2022 年12月12日週一下午4:41寫道：


    Technically speaking, it is possible in stock distribution (can't speak
    for Databricks) and not super hard to do (just check out how we
    initialize sessions), but definitely not something that we test or
    support, especially in a scenario you described.

    If you want to achieve concurrent execution, multithreading is normally
    more than sufficient and avoids problems with the context.



    On 12/13/22 00:40, Kevin Su wrote:
     > I ran my spark job by using databricks job with a single python
    script.
     > IIUC, the databricks platform will create a spark context for this
     > python script.
     > However, I create a new subprocess in this script and run some spark
     > code in this subprocess, but this subprocess can't find the
     > context created by databricks.
     > Not sure if there is any api I can use to get the default context.
     >
     > bo yang <bobyan...@gmail.com <mailto:bobyan...@gmail.com>
    <mailto:bobyan...@gmail.com <mailto:bobyan...@gmail.com>>> 於 2022年
    12月
     > 12日 週一 下午3:27寫道：
     >
     >     In theory, maybe a Jupyter notebook or something similar could
     >     achieve this? e.g. running some Jypyter kernel inside Spark
    driver,
     >     then another Python process could connect to that kernel.
     >
     >     But in the end, this is like Spark Connect :)
     >
     >
     >     On Mon, Dec 12, 2022 at 2:55 PM Kevin Su <pings...@gmail.com
    <mailto:pings...@gmail.com>
     >     <mailto:pings...@gmail.com <mailto:pings...@gmail.com>>> wrote:
     >
     >         Also, is there any way to workaround this issue without
     >         using Spark connect?
     >
     >         Kevin Su <pings...@gmail.com <mailto:pings...@gmail.com>
    <mailto:pings...@gmail.com <mailto:pings...@gmail.com>>> 於
     >         2022年12月12日 週一 下午2:52寫道：
     >
     >             nvm, I found the ticket.
     >             Also, is there any way to workaround this issue without
     >             using Spark connect?
     >
     >             Kevin Su <pings...@gmail.com
    <mailto:pings...@gmail.com> <mailto:pings...@gmail.com
    <mailto:pings...@gmail.com>>> 於
     >             2022年12月12日 週一 下午2:42寫道：
     >
     >                 Thanks for the quick response? Do we have any PR
    or Jira
     >                 ticket for it?
     >
     >                 Reynold Xin <r...@databricks.com
    <mailto:r...@databricks.com>
     >                 <mailto:r...@databricks.com
    <mailto:r...@databricks.com>>> 於 2022年12月12日 週一 下
     >                 午2:39寫道：
     >
     >                     Spark Connect :)
     >
     >                     (It’s work in progress)
     >
     >
     >                     On Mon, Dec 12 2022 at 2:29 PM, Kevin Su
     >                     <pings...@gmail.com
    <mailto:pings...@gmail.com> <mailto:pings...@gmail.com
    <mailto:pings...@gmail.com>>> wrote:
     >
     >                         Hey there, How can I get the same spark
    context
     >                         in two different python processes?
     >                         Let’s say I create a context in Process
    A, and
     >                         then I want to use python subprocess B to get
     >                         the spark context created by Process A.
    How can
     >                         I achieve that?
     >
     >                         I've

> tried pyspark.sql.SparkSession.builder.appName("spark").getOrCreate(), but it will create a new spark context.

-- Best regards,

    Maciej Szymkiewicz

    Web: https://zero323.net <https://zero323.net>
    PGP: A30CEF0C31A501EC


--
Best regards,
Maciej Szymkiewicz

Web: https://zero323.net
PGP: A30CEF0C31A501EC

OpenPGP_signature
Description: OpenPGP digital signature

Re: How can I get the same spark context in two different python processes

Reply via email to