Hi Folks, I have been trying to dig up some information in regards to what are the possibilities when wanting to deploy more than one client process that consumes Spark.
Let's say I have a Spark Cluster of 10 servers, and would like to setup 2 additional servers which are sending requests to it through a Spark context, referencing one specific file of 1TB of data. Each client process, has its own SparkContext instance. Currently, the result is that that same file is loaded into memory twice because the Spark Context resources are not shared between processes/jvms. I wouldn't like to have that same file loaded over and over again with every new client being introduced. What would be the best practice here? Am I missing something? Thank you, Asaf