Hi Sid,

We are successfully running Spark on an HPC, it works great. Here's info on our 
setup / approach.

We have a cluster with 256 nodes running Scientific Linux 6.3 and scheduled by 
Univa Grid Engine.  The environment also has a DDN GridScalar running GPFS and 
several EMC Isilon clusters serving NFS to the compute cluster.

We wrote a custom qsub job to spin up Spark dynamically on a user-designated 
quantity of nodes. The UGE scheduler first designates a set of nodes that will 
be used to run Spark. Once the nodes are available, we use start-master.sh 
script to launch a master, and send it the addresses of the other nodes. The 
master then starts the workers with start-all.sh. At that point, the Spark 
cluster is usable and remains active until the user issues a qdel, which 
triggers the stop-all.sh on the master, and takes down the cluster. 

This worked well for us because users can pick the number of nodes to suit 
their job, and multiple users can run their own Spark clusters on the same 
system (alongside other non-Spark jobs).

We don't use HDFS for the filesystem, instead relying on NFS and GPFS, and the 
cluster is not running Hadoop. In tests, we've seen similar performance between 
our set up, and using Spark w/ HDFS on EC2 with higher-end instances (matched 
roughly for memory and number of cores).

Unfortunately we can't open source the launched scripts because they contain 
proprietary UGE stuff, but happy to try and answer any follow-up questions.

-- Jeremy

---------------------
Jeremy Freeman, PhD
Neuroscientist
@thefreemanlab

On May 28, 2014, at 11:02 AM, Sidharth Kashyap <sidharth.n.kash...@outlook.com> 
wrote:

> Hi,
> 
> Has anyone tried to get Spark working on an HPC setup?
> If yes, can you please share your learnings and how you went about doing it?
> 
> An HPC setup typically comes bundled with dynamically allocated cluster and a 
> very efficient scheduler.
> 
> Configuring Spark standalone in this mode of operation is challenging as the 
> Hadoop dependencies need to be eliminated and the cluster needs to be 
> configured on the fly.
> 
> Thanks,
> Sid
> 
> 
> 

Reply via email to