Hi Folks,

I have been trying to dig up some information in regards to what are the
possibilities when wanting to deploy more than one client process that
consumes Spark.

Let's say I have a Spark Cluster of 10 servers, and would like to setup 2
additional servers which are sending requests to it through a Spark
context, referencing one specific file of 1TB of data.

Each client process, has its own SparkContext instance.
Currently, the result is that that same file is loaded into memory twice
because the Spark Context resources are not shared between processes/jvms.


I wouldn't like to have that same file loaded over and over again with
every new client being introduced.
What would be the best practice here? Am I missing something?

Thank you,
Asaf

Reply via email to