subject:"Spark on an HPC setup"

RE: Spark on an HPC setup

2014-08-11 Thread Sidharth Kashyap

Hi Jeremy,
Thanks for the reply.
We got Spark on our setup after a similar script was brought up to work with 
LSF.
Really appreciate your help. 
Will keep in touch on Twitter
Thanks,@sidkashyap :)

From: freeman.jer...@gmail.com
Subject: Re: Spark on an HPC setup
Date: Thu, 29 May 2014 00:37:54 -0400
To: user@spark.apache.org

Hi Sid,
We are successfully running Spark on an HPC, it works great. Here's info on our 
setup / approach.
We have a cluster with 256 nodes running Scientific Linux 6.3 and scheduled by 
Univa Grid Engine.  The environment also has a DDN GridScalar running GPFS and 
several EMC Isilon clusters serving NFS to the compute cluster.
We wrote a custom qsub job to spin up Spark dynamically on a user-designated 
quantity of nodes. The UGE scheduler first designates a set of nodes that will 
be used to run Spark. Once the nodes are available, we use start-master.sh 
script to launch a master, and send it the addresses of the other nodes. The 
master then starts the workers with start-all.sh. At that point, the Spark 
cluster is usable and remains active until the user issues a qdel, which 
triggers the stop-all.sh on the master, and takes down the cluster. 
This worked well for us because users can pick the number of nodes to suit 
their job, and multiple users can run their own Spark clusters on the same 
system (alongside other non-Spark jobs).
We don't use HDFS for the filesystem, instead relying on NFS and GPFS, and the 
cluster is not running Hadoop. In tests, we've seen similar performance between 
our set up, and using Spark w/ HDFS on EC2 with higher-end instances (matched 
roughly for memory and number of cores).
Unfortunately we can't open source the launched scripts because they contain 
proprietary UGE stuff, but happy to try and answer any follow-up questions.
-- Jeremy

-
Jeremy Freeman, PhD
Neuroscientist
@thefreemanlab



On May 28, 2014, at 11:02 AM, Sidharth Kashyap sidharth.n.kash...@outlook.com 
wrote:Hi,
Has anyone tried to get Spark working on an HPC setup?If yes, can you please 
share your learnings and how you went about doing it?
An HPC setup typically comes bundled with dynamically allocated cluster and a 
very efficient scheduler.
Configuring Spark standalone in this mode of operation is challenging as the 
Hadoop dependencies need to be eliminated and the cluster needs to be 
configured on the fly.
Thanks,Sid

Re: Spark on an HPC setup

2014-05-28 Thread Jeremy Freeman

Hi Sid,

We are successfully running Spark on an HPC, it works great. Here's info on our 
setup / approach.

We have a cluster with 256 nodes running Scientific Linux 6.3 and scheduled by 
Univa Grid Engine.  The environment also has a DDN GridScalar running GPFS and 
several EMC Isilon clusters serving NFS to the compute cluster.

We wrote a custom qsub job to spin up Spark dynamically on a user-designated 
quantity of nodes. The UGE scheduler first designates a set of nodes that will 
be used to run Spark. Once the nodes are available, we use start-master.sh 
script to launch a master, and send it the addresses of the other nodes. The 
master then starts the workers with start-all.sh. At that point, the Spark 
cluster is usable and remains active until the user issues a qdel, which 
triggers the stop-all.sh on the master, and takes down the cluster. 

This worked well for us because users can pick the number of nodes to suit 
their job, and multiple users can run their own Spark clusters on the same 
system (alongside other non-Spark jobs).

We don't use HDFS for the filesystem, instead relying on NFS and GPFS, and the 
cluster is not running Hadoop. In tests, we've seen similar performance between 
our set up, and using Spark w/ HDFS on EC2 with higher-end instances (matched 
roughly for memory and number of cores).

Unfortunately we can't open source the launched scripts because they contain 
proprietary UGE stuff, but happy to try and answer any follow-up questions.

-- Jeremy

-
Jeremy Freeman, PhD
Neuroscientist
@thefreemanlab

On May 28, 2014, at 11:02 AM, Sidharth Kashyap sidharth.n.kash...@outlook.com 
wrote:

 Hi,
 
 Has anyone tried to get Spark working on an HPC setup?
 If yes, can you please share your learnings and how you went about doing it?
 
 An HPC setup typically comes bundled with dynamically allocated cluster and a 
 very efficient scheduler.
 
 Configuring Spark standalone in this mode of operation is challenging as the 
 Hadoop dependencies need to be eliminated and the cluster needs to be 
 configured on the fly.
 
 Thanks,
 Sid

RE: Spark on an HPC setup

Re: Spark on an HPC setup

2 matches

Site Navigation

Mail list logo

Footer information