Spark clustered client

Asaf Lahav Tue, 22 Jul 2014 16:21:30 -0700

Hi Folks,

I have been trying to dig up some information in regards to what are the
possibilities when wanting to deploy more than one client process that
consumes Spark.


Let's say I have a Spark Cluster of 10 servers, and would like to setup 2
additional servers which are sending requests to it through a Spark
context, referencing one specific file of 1TB of data.

Each client process, has its own SparkContext instance.
Currently, the result is that that same file is loaded into memory twice
because the Spark Context resources are not shared between processes/jvms.


I wouldn't like to have that same file loaded over and over again with
every new client being introduced.
What would be the best practice here? Am I missing something?

Thank you,
Asaf

Spark clustered client

Reply via email to