Re: Suggested Filesystem Layout for Spark Cluster Node

Matei Zaharia Tue, 15 Oct 2013 21:20:35 -0700

Hi Craig,

Unfortunately there's nothing like SPARK_WORKER_INSTANCES for Mesos right now 
-- you'd probably have to run two mesos-slave processes per node if you wanted 
two instances of Spark, which will be harder to set up on the Mesos side. This 
is a good thing to add in the future. As for the local disk settings, you can 
set the spark.local.dir system property in your driver application (the one 
that creates the SparkContext) and still use those disks.


The size of the local disks doesn't matter a lot because presumably they won't 
get full with intermediate data. Any modern disk should be fine. It's more the 
number of disks that helps because that increases throughput.

Matei

On Oct 15, 2013, at 1:10 PM, Craig Vanderborgh <[email protected]> 
wrote:

> FInally:  how big do the "multiple disks configured as separate filesystems" 
> that are used for temporary Spark storage need to be?
> 
> Thanks,
> Craig
> 
> 
> On Tue, Oct 15, 2013 at 1:12 PM, Craig Vanderborgh 
> <[email protected]> wrote:
> In particular: If I make the "SPARK_WORKER_INSTANCES" env variable setting in 
> spark-env.sh, will this propagate through Mesos and result in (say) two 
> workers per cluster node?
> 
> Thanks,
> Craig
> 
> 
> On Tue, Oct 15, 2013 at 1:07 PM, Craig Vanderborgh 
> <[email protected]> wrote:
> Hi Matei,
> 
> This is helpful but it would be even more so if this documentation could 
> describe how to make these settings correctly in a Spark-on-Mesos 
> environment.  Can you describe the differences for Mesos?
> 
> Thanks again,
> Craig
> 
> 
> On Mon, Oct 14, 2013 at 6:15 PM, Matei Zaharia <[email protected]> 
> wrote:
> Hi Craig,
> 
> The best configuration is to have multiple disks configured as separate 
> filesystems (so no RAID), and set the spark.local.dir property, which 
> configures Spark's scratch space directories, to be a comma-separated list of 
> directories, one per disk. In 0.8 we've written a bit on how to configure 
> machines for Spark here: 
> http://spark.incubator.apache.org/docs/latest/hardware-provisioning.html. For 
> the filesystem I'd suggest ext3 with noatime set.
> 
> Matei
> 
> On Oct 14, 2013, at 11:28 AM, Craig Vanderborgh <[email protected]> 
> wrote:
> 
> > Hi All,
> >
> > We're setting up a new Spark-on-Mesos cluster.  I'd like anyone who is 
> > already done this to suggest a disk partitioning/filesystem layout that has 
> > worked well for them in their cluster deployment.
> >
> > We are running MapR M3 on the cluster, but only for maprfs.  Our jobs will 
> > be programmed for and run on Spark.
> >
> > Thanks in advance,
> > Craig Vanderborgh
> 
> 
> 
>

Re: Suggested Filesystem Layout for Spark Cluster Node

Reply via email to