Hi,

Here is Chenyang. I am working on a project using PySpark and I am blocked 
because I want to share data between different Spark applications. The 
situation is that we have a running java server which can handles incoming 
requests with a thread pool, and each thread has a corresponding python 
process. We want to use pandas on Spark, but have it so that any of the python 
processes can access the same data in spark. For example, in a python process, 
we created a SparkSession, read some data, modified the data using pandas on 
Spark api and we want to get access to that data in a different python process. 
Someone from Spark community point me to Apache Zeppelin because it implements 
logic to share one Spark Session. How did you achieve that? Are there any 
documentation or references I can refer to? Thanks so much for your help.

Best regards,
Chenyang



Chenyang Zhang
Software Engineering Intern, Platform
Redwood City, California

[cid:EnterpriseAI_Banner_1200_e6f8b810-93f3-44f1-b795-bb502b7a52ae.png]<https://c3.ai/?utm_source=signature&utm_campaign=enterpriseai>
(c) 2022 C3.ai. Confidential Information.

Reply via email to