Re: Optimal Server Design for Spark

2014-04-03 Thread Matei Zaharia
To run multiple workers with Spark’s standalone mode, set 
SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES in conf/spark-env.sh. For 
example, if you have 16 cores and want 2 workers, you could add

export SPARK_WORKER_INSTANCES=2
export SPARK_WORKER_CORES=8

Matei

On Apr 3, 2014, at 12:38 PM, Mayur Rustagi mayur.rust...@gmail.com wrote:

 Are your workers not utilizing all the cores?
 One worker will utilize multiple cores depending on resource allocation.
 Regards
 Mayur
 
 Mayur Rustagi
 Ph: +1 (760) 203 3257
 http://www.sigmoidanalytics.com
 @mayur_rustagi
 
 
 
 On Wed, Apr 2, 2014 at 7:19 PM, Debasish Das debasish.da...@gmail.com wrote:
 Hi Matei,
 
 How can I run multiple Spark workers per node ? I am running 8 core 10 node 
 cluster but I do have 8 more cores on each nodeSo having 2 workers per 
 node will definitely help my usecase.
 
 Thanks.
 Deb
 
 
 
 
 On Wed, Apr 2, 2014 at 3:58 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
 Hey Steve,
 
 This configuration sounds pretty good. The one thing I would consider is 
 having more disks, for two reasons — Spark uses the disks for large shuffles 
 and out-of-core operations, and often it’s better to run HDFS or your storage 
 system on the same nodes. But whether this is valuable will depend on whether 
 you plan to do that in your deployment. You should determine that and go from 
 there.
 
 The amount of cores and RAM are both good — actually with a lot more of these 
 you would probably want to run multiple Spark workers per node, which is more 
 work to configure. Your numbers are in line with other deployments.
 
 There’s a provisioning overview with more details at 
 https://spark.apache.org/docs/latest/hardware-provisioning.html but what you 
 have sounds fine.
 
 Matei
 
 On Apr 2, 2014, at 2:58 PM, Stephen Watt sw...@redhat.com wrote:
 
  Hi Folks
 
  I'm looking to buy some gear to run Spark. I'm quite well versed in Hadoop 
  Server design but there does not seem to be much Spark related collateral 
  around infrastructure guidelines (or at least I haven't been able to find 
  them). My current thinking for server design is something along these lines.
 
  - 2 x 10Gbe NICs
  - 128 GB RAM
  - 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and Runtimes, 
  4 x 1TB for Data Drives)
  - 1 Disk Controller
  - 2 x 2.6 GHz 6 core processors
 
  If I stick with 1u servers then I lose disk capacity per rack but I get a 
  lot more memory and CPU capacity per rack. This increases my total cluster 
  memory footprint and it doesn't seem to make sense to have super dense 
  storage servers because I can't fit all that data on disk in memory 
  anyways. So at present, my thinking is to go with 1u servers instead of 2u 
  Servers. Is 128GB RAM per server normal? Do you guys use more or less than 
  that?
 
  Any feedback would be appreciated
 
  Regards
  Steve Watt
 
 
 



Re: Optimal Server Design for Spark

2014-04-03 Thread Debasish Das
@Mayur...I am hitting ulimits on the cluster if I go beyond 4 core per
worker and I don't think I can change the ulimit due to sudo issues etc...

If I have more workers, in ALS, I can go for 20 blocks (right now I am
running 10 blocks on 10 nodes with 4 cores each and now I can go upto 20
blocks on 10 nodes with 4 cores each) and per process I can still be within
ulimit...

For the ALS stress case, right now with 10 blocks, seems like I have to
persist RDDs to HDFS each iteration which I want to avoid if possible..

@Matei Thanks, Trying those configs out...




On Thu, Apr 3, 2014 at 2:47 PM, Matei Zaharia matei.zaha...@gmail.comwrote:

 To run multiple workers with Spark's standalone mode, set
 SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES in conf/spark-env.sh. For
 example, if you have 16 cores and want 2 workers, you could add

 export SPARK_WORKER_INSTANCES=2
 export SPARK_WORKER_CORES=8

 Matei

 On Apr 3, 2014, at 12:38 PM, Mayur Rustagi mayur.rust...@gmail.com
 wrote:

  Are your workers not utilizing all the cores?
  One worker will utilize multiple cores depending on resource allocation.
  Regards
  Mayur
 
  Mayur Rustagi
  Ph: +1 (760) 203 3257
  http://www.sigmoidanalytics.com
  @mayur_rustagi
 
 
 
  On Wed, Apr 2, 2014 at 7:19 PM, Debasish Das debasish.da...@gmail.com
 wrote:
  Hi Matei,
 
  How can I run multiple Spark workers per node ? I am running 8 core 10
 node cluster but I do have 8 more cores on each nodeSo having 2 workers
 per node will definitely help my usecase.
 
  Thanks.
  Deb
 
 
 
 
  On Wed, Apr 2, 2014 at 3:58 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  Hey Steve,
 
  This configuration sounds pretty good. The one thing I would consider is
 having more disks, for two reasons -- Spark uses the disks for large
 shuffles and out-of-core operations, and often it's better to run HDFS or
 your storage system on the same nodes. But whether this is valuable will
 depend on whether you plan to do that in your deployment. You should
 determine that and go from there.
 
  The amount of cores and RAM are both good -- actually with a lot more of
 these you would probably want to run multiple Spark workers per node, which
 is more work to configure. Your numbers are in line with other deployments.
 
  There's a provisioning overview with more details at
 https://spark.apache.org/docs/latest/hardware-provisioning.html but what
 you have sounds fine.
 
  Matei
 
  On Apr 2, 2014, at 2:58 PM, Stephen Watt sw...@redhat.com wrote:
 
   Hi Folks
  
   I'm looking to buy some gear to run Spark. I'm quite well versed in
 Hadoop Server design but there does not seem to be much Spark related
 collateral around infrastructure guidelines (or at least I haven't been
 able to find them). My current thinking for server design is something
 along these lines.
  
   - 2 x 10Gbe NICs
   - 128 GB RAM
   - 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and
 Runtimes, 4 x 1TB for Data Drives)
   - 1 Disk Controller
   - 2 x 2.6 GHz 6 core processors
  
   If I stick with 1u servers then I lose disk capacity per rack but I
 get a lot more memory and CPU capacity per rack. This increases my total
 cluster memory footprint and it doesn't seem to make sense to have super
 dense storage servers because I can't fit all that data on disk in memory
 anyways. So at present, my thinking is to go with 1u servers instead of 2u
 Servers. Is 128GB RAM per server normal? Do you guys use more or less than
 that?
  
   Any feedback would be appreciated
  
   Regards
   Steve Watt
 
 
 




Re: Optimal Server Design for Spark

2014-04-02 Thread Mayur Rustagi
I would suggest to start with cloud hosting if you can, depending on your
usecase, memory requirement may vary a lot .
Regards
Mayur
On Apr 2, 2014 3:59 PM, Matei Zaharia matei.zaha...@gmail.com wrote:

 Hey Steve,

 This configuration sounds pretty good. The one thing I would consider is
 having more disks, for two reasons — Spark uses the disks for large
 shuffles and out-of-core operations, and often it’s better to run HDFS or
 your storage system on the same nodes. But whether this is valuable will
 depend on whether you plan to do that in your deployment. You should
 determine that and go from there.

 The amount of cores and RAM are both good — actually with a lot more of
 these you would probably want to run multiple Spark workers per node, which
 is more work to configure. Your numbers are in line with other deployments.

 There’s a provisioning overview with more details at
 https://spark.apache.org/docs/latest/hardware-provisioning.html but what
 you have sounds fine.

 Matei

 On Apr 2, 2014, at 2:58 PM, Stephen Watt sw...@redhat.com wrote:

  Hi Folks
 
  I'm looking to buy some gear to run Spark. I'm quite well versed in
 Hadoop Server design but there does not seem to be much Spark related
 collateral around infrastructure guidelines (or at least I haven't been
 able to find them). My current thinking for server design is something
 along these lines.
 
  - 2 x 10Gbe NICs
  - 128 GB RAM
  - 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and
 Runtimes, 4 x 1TB for Data Drives)
  - 1 Disk Controller
  - 2 x 2.6 GHz 6 core processors
 
  If I stick with 1u servers then I lose disk capacity per rack but I get
 a lot more memory and CPU capacity per rack. This increases my total
 cluster memory footprint and it doesn't seem to make sense to have super
 dense storage servers because I can't fit all that data on disk in memory
 anyways. So at present, my thinking is to go with 1u servers instead of 2u
 Servers. Is 128GB RAM per server normal? Do you guys use more or less than
 that?
 
  Any feedback would be appreciated
 
  Regards
  Steve Watt




Re: Optimal Server Design for Spark

2014-04-02 Thread Debasish Das
Hi Matei,

How can I run multiple Spark workers per node ? I am running 8 core 10 node
cluster but I do have 8 more cores on each nodeSo having 2 workers per
node will definitely help my usecase.

Thanks.
Deb




On Wed, Apr 2, 2014 at 3:58 PM, Matei Zaharia matei.zaha...@gmail.comwrote:

 Hey Steve,

 This configuration sounds pretty good. The one thing I would consider is
 having more disks, for two reasons -- Spark uses the disks for large
 shuffles and out-of-core operations, and often it's better to run HDFS or
 your storage system on the same nodes. But whether this is valuable will
 depend on whether you plan to do that in your deployment. You should
 determine that and go from there.

 The amount of cores and RAM are both good -- actually with a lot more of
 these you would probably want to run multiple Spark workers per node, which
 is more work to configure. Your numbers are in line with other deployments.

 There's a provisioning overview with more details at
 https://spark.apache.org/docs/latest/hardware-provisioning.html but what
 you have sounds fine.

 Matei

 On Apr 2, 2014, at 2:58 PM, Stephen Watt sw...@redhat.com wrote:

  Hi Folks
 
  I'm looking to buy some gear to run Spark. I'm quite well versed in
 Hadoop Server design but there does not seem to be much Spark related
 collateral around infrastructure guidelines (or at least I haven't been
 able to find them). My current thinking for server design is something
 along these lines.
 
  - 2 x 10Gbe NICs
  - 128 GB RAM
  - 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and
 Runtimes, 4 x 1TB for Data Drives)
  - 1 Disk Controller
  - 2 x 2.6 GHz 6 core processors
 
  If I stick with 1u servers then I lose disk capacity per rack but I get
 a lot more memory and CPU capacity per rack. This increases my total
 cluster memory footprint and it doesn't seem to make sense to have super
 dense storage servers because I can't fit all that data on disk in memory
 anyways. So at present, my thinking is to go with 1u servers instead of 2u
 Servers. Is 128GB RAM per server normal? Do you guys use more or less than
 that?
 
  Any feedback would be appreciated
 
  Regards
  Steve Watt