Hi, Here is Chenyang. I am working on a project using PySpark and I am blocked because I want to share data between different Spark applications. The situation is that we have a running java server which can handles incoming requests with a thread pool, and each thread has a corresponding python process. We want to use pandas on Spark, but have it so that any of the python processes can access the same data in spark. For example, in a python process, we created a SparkSession, read some data, modified the data using pandas on Spark api and we want to get access to that data in a different python process. Someone from Spark community point me to Apache Zeppelin because it implements logic to share one Spark Session. How did you achieve that? Are there any documentation or references I can refer to? Thanks so much for your help.
Best regards, Chenyang Chenyang Zhang Software Engineering Intern, Platform Redwood City, California [cid:EnterpriseAI_Banner_1200_e6f8b810-93f3-44f1-b795-bb502b7a52ae.png]<https://c3.ai/?utm_source=signature&utm_campaign=enterpriseai> (c) 2022 C3.ai. Confidential Information.