Re: Optimal Server Design for Spark

2014-04-03 Thread Debasish Das
@Mayur...I am hitting ulimits on the cluster if I go beyond 4 core per
worker and I don't think I can change the ulimit due to sudo issues etc...

If I have more workers, in ALS, I can go for 20 blocks (right now I am
running 10 blocks on 10 nodes with 4 cores each and now I can go upto 20
blocks on 10 nodes with 4 cores each) and per process I can still be within
ulimit...

For the ALS stress case, right now with 10 blocks, seems like I have to
persist RDDs to HDFS each iteration which I want to avoid if possible..

@Matei Thanks, Trying those configs out...




On Thu, Apr 3, 2014 at 2:47 PM, Matei Zaharia wrote:

> To run multiple workers with Spark's standalone mode, set
> SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES in conf/spark-env.sh. For
> example, if you have 16 cores and want 2 workers, you could add
>
> export SPARK_WORKER_INSTANCES=2
> export SPARK_WORKER_CORES=8
>
> Matei
>
> On Apr 3, 2014, at 12:38 PM, Mayur Rustagi 
> wrote:
>
> > Are your workers not utilizing all the cores?
> > One worker will utilize multiple cores depending on resource allocation.
> > Regards
> > Mayur
> >
> > Mayur Rustagi
> > Ph: +1 (760) 203 3257
> > http://www.sigmoidanalytics.com
> > @mayur_rustagi
> >
> >
> >
> > On Wed, Apr 2, 2014 at 7:19 PM, Debasish Das 
> wrote:
> > Hi Matei,
> >
> > How can I run multiple Spark workers per node ? I am running 8 core 10
> node cluster but I do have 8 more cores on each nodeSo having 2 workers
> per node will definitely help my usecase.
> >
> > Thanks.
> > Deb
> >
> >
> >
> >
> > On Wed, Apr 2, 2014 at 3:58 PM, Matei Zaharia 
> wrote:
> > Hey Steve,
> >
> > This configuration sounds pretty good. The one thing I would consider is
> having more disks, for two reasons -- Spark uses the disks for large
> shuffles and out-of-core operations, and often it's better to run HDFS or
> your storage system on the same nodes. But whether this is valuable will
> depend on whether you plan to do that in your deployment. You should
> determine that and go from there.
> >
> > The amount of cores and RAM are both good -- actually with a lot more of
> these you would probably want to run multiple Spark workers per node, which
> is more work to configure. Your numbers are in line with other deployments.
> >
> > There's a provisioning overview with more details at
> https://spark.apache.org/docs/latest/hardware-provisioning.html but what
> you have sounds fine.
> >
> > Matei
> >
> > On Apr 2, 2014, at 2:58 PM, Stephen Watt  wrote:
> >
> > > Hi Folks
> > >
> > > I'm looking to buy some gear to run Spark. I'm quite well versed in
> Hadoop Server design but there does not seem to be much Spark related
> collateral around infrastructure guidelines (or at least I haven't been
> able to find them). My current thinking for server design is something
> along these lines.
> > >
> > > - 2 x 10Gbe NICs
> > > - 128 GB RAM
> > > - 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and
> Runtimes, 4 x 1TB for Data Drives)
> > > - 1 Disk Controller
> > > - 2 x 2.6 GHz 6 core processors
> > >
> > > If I stick with 1u servers then I lose disk capacity per rack but I
> get a lot more memory and CPU capacity per rack. This increases my total
> cluster memory footprint and it doesn't seem to make sense to have super
> dense storage servers because I can't fit all that data on disk in memory
> anyways. So at present, my thinking is to go with 1u servers instead of 2u
> Servers. Is 128GB RAM per server normal? Do you guys use more or less than
> that?
> > >
> > > Any feedback would be appreciated
> > >
> > > Regards
> > > Steve Watt
> >
> >
> >
>
>


Re: Optimal Server Design for Spark

2014-04-03 Thread Matei Zaharia
To run multiple workers with Spark’s standalone mode, set 
SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES in conf/spark-env.sh. For 
example, if you have 16 cores and want 2 workers, you could add

export SPARK_WORKER_INSTANCES=2
export SPARK_WORKER_CORES=8

Matei

On Apr 3, 2014, at 12:38 PM, Mayur Rustagi  wrote:

> Are your workers not utilizing all the cores?
> One worker will utilize multiple cores depending on resource allocation.
> Regards
> Mayur
> 
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi
> 
> 
> 
> On Wed, Apr 2, 2014 at 7:19 PM, Debasish Das  wrote:
> Hi Matei,
> 
> How can I run multiple Spark workers per node ? I am running 8 core 10 node 
> cluster but I do have 8 more cores on each nodeSo having 2 workers per 
> node will definitely help my usecase.
> 
> Thanks.
> Deb
> 
> 
> 
> 
> On Wed, Apr 2, 2014 at 3:58 PM, Matei Zaharia  wrote:
> Hey Steve,
> 
> This configuration sounds pretty good. The one thing I would consider is 
> having more disks, for two reasons — Spark uses the disks for large shuffles 
> and out-of-core operations, and often it’s better to run HDFS or your storage 
> system on the same nodes. But whether this is valuable will depend on whether 
> you plan to do that in your deployment. You should determine that and go from 
> there.
> 
> The amount of cores and RAM are both good — actually with a lot more of these 
> you would probably want to run multiple Spark workers per node, which is more 
> work to configure. Your numbers are in line with other deployments.
> 
> There’s a provisioning overview with more details at 
> https://spark.apache.org/docs/latest/hardware-provisioning.html but what you 
> have sounds fine.
> 
> Matei
> 
> On Apr 2, 2014, at 2:58 PM, Stephen Watt  wrote:
> 
> > Hi Folks
> >
> > I'm looking to buy some gear to run Spark. I'm quite well versed in Hadoop 
> > Server design but there does not seem to be much Spark related collateral 
> > around infrastructure guidelines (or at least I haven't been able to find 
> > them). My current thinking for server design is something along these lines.
> >
> > - 2 x 10Gbe NICs
> > - 128 GB RAM
> > - 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and Runtimes, 
> > 4 x 1TB for Data Drives)
> > - 1 Disk Controller
> > - 2 x 2.6 GHz 6 core processors
> >
> > If I stick with 1u servers then I lose disk capacity per rack but I get a 
> > lot more memory and CPU capacity per rack. This increases my total cluster 
> > memory footprint and it doesn't seem to make sense to have super dense 
> > storage servers because I can't fit all that data on disk in memory 
> > anyways. So at present, my thinking is to go with 1u servers instead of 2u 
> > Servers. Is 128GB RAM per server normal? Do you guys use more or less than 
> > that?
> >
> > Any feedback would be appreciated
> >
> > Regards
> > Steve Watt
> 
> 
> 



Re: Optimal Server Design for Spark

2014-04-03 Thread Mayur Rustagi
Are your workers not utilizing all the cores?
One worker will utilize multiple cores depending on resource allocation.
Regards
Mayur

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi 



On Wed, Apr 2, 2014 at 7:19 PM, Debasish Das wrote:

> Hi Matei,
>
> How can I run multiple Spark workers per node ? I am running 8 core 10
> node cluster but I do have 8 more cores on each nodeSo having 2 workers
> per node will definitely help my usecase.
>
> Thanks.
> Deb
>
>
>
>
> On Wed, Apr 2, 2014 at 3:58 PM, Matei Zaharia wrote:
>
>> Hey Steve,
>>
>> This configuration sounds pretty good. The one thing I would consider is
>> having more disks, for two reasons — Spark uses the disks for large
>> shuffles and out-of-core operations, and often it’s better to run HDFS or
>> your storage system on the same nodes. But whether this is valuable will
>> depend on whether you plan to do that in your deployment. You should
>> determine that and go from there.
>>
>> The amount of cores and RAM are both good — actually with a lot more of
>> these you would probably want to run multiple Spark workers per node, which
>> is more work to configure. Your numbers are in line with other deployments.
>>
>> There’s a provisioning overview with more details at
>> https://spark.apache.org/docs/latest/hardware-provisioning.html but what
>> you have sounds fine.
>>
>> Matei
>>
>> On Apr 2, 2014, at 2:58 PM, Stephen Watt  wrote:
>>
>> > Hi Folks
>> >
>> > I'm looking to buy some gear to run Spark. I'm quite well versed in
>> Hadoop Server design but there does not seem to be much Spark related
>> collateral around infrastructure guidelines (or at least I haven't been
>> able to find them). My current thinking for server design is something
>> along these lines.
>> >
>> > - 2 x 10Gbe NICs
>> > - 128 GB RAM
>> > - 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and
>> Runtimes, 4 x 1TB for Data Drives)
>> > - 1 Disk Controller
>> > - 2 x 2.6 GHz 6 core processors
>> >
>> > If I stick with 1u servers then I lose disk capacity per rack but I get
>> a lot more memory and CPU capacity per rack. This increases my total
>> cluster memory footprint and it doesn't seem to make sense to have super
>> dense storage servers because I can't fit all that data on disk in memory
>> anyways. So at present, my thinking is to go with 1u servers instead of 2u
>> Servers. Is 128GB RAM per server normal? Do you guys use more or less than
>> that?
>> >
>> > Any feedback would be appreciated
>> >
>> > Regards
>> > Steve Watt
>>
>>
>


Re: Optimal Server Design for Spark

2014-04-02 Thread Debasish Das
Hi Matei,

How can I run multiple Spark workers per node ? I am running 8 core 10 node
cluster but I do have 8 more cores on each nodeSo having 2 workers per
node will definitely help my usecase.

Thanks.
Deb




On Wed, Apr 2, 2014 at 3:58 PM, Matei Zaharia wrote:

> Hey Steve,
>
> This configuration sounds pretty good. The one thing I would consider is
> having more disks, for two reasons -- Spark uses the disks for large
> shuffles and out-of-core operations, and often it's better to run HDFS or
> your storage system on the same nodes. But whether this is valuable will
> depend on whether you plan to do that in your deployment. You should
> determine that and go from there.
>
> The amount of cores and RAM are both good -- actually with a lot more of
> these you would probably want to run multiple Spark workers per node, which
> is more work to configure. Your numbers are in line with other deployments.
>
> There's a provisioning overview with more details at
> https://spark.apache.org/docs/latest/hardware-provisioning.html but what
> you have sounds fine.
>
> Matei
>
> On Apr 2, 2014, at 2:58 PM, Stephen Watt  wrote:
>
> > Hi Folks
> >
> > I'm looking to buy some gear to run Spark. I'm quite well versed in
> Hadoop Server design but there does not seem to be much Spark related
> collateral around infrastructure guidelines (or at least I haven't been
> able to find them). My current thinking for server design is something
> along these lines.
> >
> > - 2 x 10Gbe NICs
> > - 128 GB RAM
> > - 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and
> Runtimes, 4 x 1TB for Data Drives)
> > - 1 Disk Controller
> > - 2 x 2.6 GHz 6 core processors
> >
> > If I stick with 1u servers then I lose disk capacity per rack but I get
> a lot more memory and CPU capacity per rack. This increases my total
> cluster memory footprint and it doesn't seem to make sense to have super
> dense storage servers because I can't fit all that data on disk in memory
> anyways. So at present, my thinking is to go with 1u servers instead of 2u
> Servers. Is 128GB RAM per server normal? Do you guys use more or less than
> that?
> >
> > Any feedback would be appreciated
> >
> > Regards
> > Steve Watt
>
>


Re: Optimal Server Design for Spark

2014-04-02 Thread Mayur Rustagi
I would suggest to start with cloud hosting if you can, depending on your
usecase, memory requirement may vary a lot .
Regards
Mayur
On Apr 2, 2014 3:59 PM, "Matei Zaharia"  wrote:

> Hey Steve,
>
> This configuration sounds pretty good. The one thing I would consider is
> having more disks, for two reasons — Spark uses the disks for large
> shuffles and out-of-core operations, and often it’s better to run HDFS or
> your storage system on the same nodes. But whether this is valuable will
> depend on whether you plan to do that in your deployment. You should
> determine that and go from there.
>
> The amount of cores and RAM are both good — actually with a lot more of
> these you would probably want to run multiple Spark workers per node, which
> is more work to configure. Your numbers are in line with other deployments.
>
> There’s a provisioning overview with more details at
> https://spark.apache.org/docs/latest/hardware-provisioning.html but what
> you have sounds fine.
>
> Matei
>
> On Apr 2, 2014, at 2:58 PM, Stephen Watt  wrote:
>
> > Hi Folks
> >
> > I'm looking to buy some gear to run Spark. I'm quite well versed in
> Hadoop Server design but there does not seem to be much Spark related
> collateral around infrastructure guidelines (or at least I haven't been
> able to find them). My current thinking for server design is something
> along these lines.
> >
> > - 2 x 10Gbe NICs
> > - 128 GB RAM
> > - 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and
> Runtimes, 4 x 1TB for Data Drives)
> > - 1 Disk Controller
> > - 2 x 2.6 GHz 6 core processors
> >
> > If I stick with 1u servers then I lose disk capacity per rack but I get
> a lot more memory and CPU capacity per rack. This increases my total
> cluster memory footprint and it doesn't seem to make sense to have super
> dense storage servers because I can't fit all that data on disk in memory
> anyways. So at present, my thinking is to go with 1u servers instead of 2u
> Servers. Is 128GB RAM per server normal? Do you guys use more or less than
> that?
> >
> > Any feedback would be appreciated
> >
> > Regards
> > Steve Watt
>
>


Re: Optimal Server Design for Spark

2014-04-02 Thread Matei Zaharia
Hey Steve,

This configuration sounds pretty good. The one thing I would consider is having 
more disks, for two reasons — Spark uses the disks for large shuffles and 
out-of-core operations, and often it’s better to run HDFS or your storage 
system on the same nodes. But whether this is valuable will depend on whether 
you plan to do that in your deployment. You should determine that and go from 
there.

The amount of cores and RAM are both good — actually with a lot more of these 
you would probably want to run multiple Spark workers per node, which is more 
work to configure. Your numbers are in line with other deployments.

There’s a provisioning overview with more details at 
https://spark.apache.org/docs/latest/hardware-provisioning.html but what you 
have sounds fine.

Matei

On Apr 2, 2014, at 2:58 PM, Stephen Watt  wrote:

> Hi Folks
> 
> I'm looking to buy some gear to run Spark. I'm quite well versed in Hadoop 
> Server design but there does not seem to be much Spark related collateral 
> around infrastructure guidelines (or at least I haven't been able to find 
> them). My current thinking for server design is something along these lines.
> 
> - 2 x 10Gbe NICs
> - 128 GB RAM
> - 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and Runtimes, 4 
> x 1TB for Data Drives)
> - 1 Disk Controller
> - 2 x 2.6 GHz 6 core processors
> 
> If I stick with 1u servers then I lose disk capacity per rack but I get a lot 
> more memory and CPU capacity per rack. This increases my total cluster memory 
> footprint and it doesn't seem to make sense to have super dense storage 
> servers because I can't fit all that data on disk in memory anyways. So at 
> present, my thinking is to go with 1u servers instead of 2u Servers. Is 128GB 
> RAM per server normal? Do you guys use more or less than that?
> 
> Any feedback would be appreciated
> 
> Regards
> Steve Watt



Optimal Server Design for Spark

2014-04-02 Thread Stephen Watt
Hi Folks

I'm looking to buy some gear to run Spark. I'm quite well versed in Hadoop 
Server design but there does not seem to be much Spark related collateral 
around infrastructure guidelines (or at least I haven't been able to find 
them). My current thinking for server design is something along these lines.

- 2 x 10Gbe NICs
- 128 GB RAM
- 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and Runtimes, 4 x 
1TB for Data Drives)
- 1 Disk Controller
- 2 x 2.6 GHz 6 core processors

If I stick with 1u servers then I lose disk capacity per rack but I get a lot 
more memory and CPU capacity per rack. This increases my total cluster memory 
footprint and it doesn't seem to make sense to have super dense storage servers 
because I can't fit all that data on disk in memory anyways. So at present, my 
thinking is to go with 1u servers instead of 2u Servers. Is 128GB RAM per 
server normal? Do you guys use more or less than that?

Any feedback would be appreciated

Regards
Steve Watt