Re: Optimal Server Design for Spark
To run multiple workers with Spark’s standalone mode, set SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES in conf/spark-env.sh. For example, if you have 16 cores and want 2 workers, you could add export SPARK_WORKER_INSTANCES=2 export SPARK_WORKER_CORES=8 Matei On Apr 3, 2014, at 12:38 PM, Mayur Rustagi mayur.rust...@gmail.com wrote: Are your workers not utilizing all the cores? One worker will utilize multiple cores depending on resource allocation. Regards Mayur Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Wed, Apr 2, 2014 at 7:19 PM, Debasish Das debasish.da...@gmail.com wrote: Hi Matei, How can I run multiple Spark workers per node ? I am running 8 core 10 node cluster but I do have 8 more cores on each nodeSo having 2 workers per node will definitely help my usecase. Thanks. Deb On Wed, Apr 2, 2014 at 3:58 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hey Steve, This configuration sounds pretty good. The one thing I would consider is having more disks, for two reasons — Spark uses the disks for large shuffles and out-of-core operations, and often it’s better to run HDFS or your storage system on the same nodes. But whether this is valuable will depend on whether you plan to do that in your deployment. You should determine that and go from there. The amount of cores and RAM are both good — actually with a lot more of these you would probably want to run multiple Spark workers per node, which is more work to configure. Your numbers are in line with other deployments. There’s a provisioning overview with more details at https://spark.apache.org/docs/latest/hardware-provisioning.html but what you have sounds fine. Matei On Apr 2, 2014, at 2:58 PM, Stephen Watt sw...@redhat.com wrote: Hi Folks I'm looking to buy some gear to run Spark. I'm quite well versed in Hadoop Server design but there does not seem to be much Spark related collateral around infrastructure guidelines (or at least I haven't been able to find them). My current thinking for server design is something along these lines. - 2 x 10Gbe NICs - 128 GB RAM - 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and Runtimes, 4 x 1TB for Data Drives) - 1 Disk Controller - 2 x 2.6 GHz 6 core processors If I stick with 1u servers then I lose disk capacity per rack but I get a lot more memory and CPU capacity per rack. This increases my total cluster memory footprint and it doesn't seem to make sense to have super dense storage servers because I can't fit all that data on disk in memory anyways. So at present, my thinking is to go with 1u servers instead of 2u Servers. Is 128GB RAM per server normal? Do you guys use more or less than that? Any feedback would be appreciated Regards Steve Watt
Re: Optimal Server Design for Spark
@Mayur...I am hitting ulimits on the cluster if I go beyond 4 core per worker and I don't think I can change the ulimit due to sudo issues etc... If I have more workers, in ALS, I can go for 20 blocks (right now I am running 10 blocks on 10 nodes with 4 cores each and now I can go upto 20 blocks on 10 nodes with 4 cores each) and per process I can still be within ulimit... For the ALS stress case, right now with 10 blocks, seems like I have to persist RDDs to HDFS each iteration which I want to avoid if possible.. @Matei Thanks, Trying those configs out... On Thu, Apr 3, 2014 at 2:47 PM, Matei Zaharia matei.zaha...@gmail.comwrote: To run multiple workers with Spark's standalone mode, set SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES in conf/spark-env.sh. For example, if you have 16 cores and want 2 workers, you could add export SPARK_WORKER_INSTANCES=2 export SPARK_WORKER_CORES=8 Matei On Apr 3, 2014, at 12:38 PM, Mayur Rustagi mayur.rust...@gmail.com wrote: Are your workers not utilizing all the cores? One worker will utilize multiple cores depending on resource allocation. Regards Mayur Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Wed, Apr 2, 2014 at 7:19 PM, Debasish Das debasish.da...@gmail.com wrote: Hi Matei, How can I run multiple Spark workers per node ? I am running 8 core 10 node cluster but I do have 8 more cores on each nodeSo having 2 workers per node will definitely help my usecase. Thanks. Deb On Wed, Apr 2, 2014 at 3:58 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hey Steve, This configuration sounds pretty good. The one thing I would consider is having more disks, for two reasons -- Spark uses the disks for large shuffles and out-of-core operations, and often it's better to run HDFS or your storage system on the same nodes. But whether this is valuable will depend on whether you plan to do that in your deployment. You should determine that and go from there. The amount of cores and RAM are both good -- actually with a lot more of these you would probably want to run multiple Spark workers per node, which is more work to configure. Your numbers are in line with other deployments. There's a provisioning overview with more details at https://spark.apache.org/docs/latest/hardware-provisioning.html but what you have sounds fine. Matei On Apr 2, 2014, at 2:58 PM, Stephen Watt sw...@redhat.com wrote: Hi Folks I'm looking to buy some gear to run Spark. I'm quite well versed in Hadoop Server design but there does not seem to be much Spark related collateral around infrastructure guidelines (or at least I haven't been able to find them). My current thinking for server design is something along these lines. - 2 x 10Gbe NICs - 128 GB RAM - 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and Runtimes, 4 x 1TB for Data Drives) - 1 Disk Controller - 2 x 2.6 GHz 6 core processors If I stick with 1u servers then I lose disk capacity per rack but I get a lot more memory and CPU capacity per rack. This increases my total cluster memory footprint and it doesn't seem to make sense to have super dense storage servers because I can't fit all that data on disk in memory anyways. So at present, my thinking is to go with 1u servers instead of 2u Servers. Is 128GB RAM per server normal? Do you guys use more or less than that? Any feedback would be appreciated Regards Steve Watt
Re: Optimal Server Design for Spark
I would suggest to start with cloud hosting if you can, depending on your usecase, memory requirement may vary a lot . Regards Mayur On Apr 2, 2014 3:59 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hey Steve, This configuration sounds pretty good. The one thing I would consider is having more disks, for two reasons — Spark uses the disks for large shuffles and out-of-core operations, and often it’s better to run HDFS or your storage system on the same nodes. But whether this is valuable will depend on whether you plan to do that in your deployment. You should determine that and go from there. The amount of cores and RAM are both good — actually with a lot more of these you would probably want to run multiple Spark workers per node, which is more work to configure. Your numbers are in line with other deployments. There’s a provisioning overview with more details at https://spark.apache.org/docs/latest/hardware-provisioning.html but what you have sounds fine. Matei On Apr 2, 2014, at 2:58 PM, Stephen Watt sw...@redhat.com wrote: Hi Folks I'm looking to buy some gear to run Spark. I'm quite well versed in Hadoop Server design but there does not seem to be much Spark related collateral around infrastructure guidelines (or at least I haven't been able to find them). My current thinking for server design is something along these lines. - 2 x 10Gbe NICs - 128 GB RAM - 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and Runtimes, 4 x 1TB for Data Drives) - 1 Disk Controller - 2 x 2.6 GHz 6 core processors If I stick with 1u servers then I lose disk capacity per rack but I get a lot more memory and CPU capacity per rack. This increases my total cluster memory footprint and it doesn't seem to make sense to have super dense storage servers because I can't fit all that data on disk in memory anyways. So at present, my thinking is to go with 1u servers instead of 2u Servers. Is 128GB RAM per server normal? Do you guys use more or less than that? Any feedback would be appreciated Regards Steve Watt
Re: Optimal Server Design for Spark
Hi Matei, How can I run multiple Spark workers per node ? I am running 8 core 10 node cluster but I do have 8 more cores on each nodeSo having 2 workers per node will definitely help my usecase. Thanks. Deb On Wed, Apr 2, 2014 at 3:58 PM, Matei Zaharia matei.zaha...@gmail.comwrote: Hey Steve, This configuration sounds pretty good. The one thing I would consider is having more disks, for two reasons -- Spark uses the disks for large shuffles and out-of-core operations, and often it's better to run HDFS or your storage system on the same nodes. But whether this is valuable will depend on whether you plan to do that in your deployment. You should determine that and go from there. The amount of cores and RAM are both good -- actually with a lot more of these you would probably want to run multiple Spark workers per node, which is more work to configure. Your numbers are in line with other deployments. There's a provisioning overview with more details at https://spark.apache.org/docs/latest/hardware-provisioning.html but what you have sounds fine. Matei On Apr 2, 2014, at 2:58 PM, Stephen Watt sw...@redhat.com wrote: Hi Folks I'm looking to buy some gear to run Spark. I'm quite well versed in Hadoop Server design but there does not seem to be much Spark related collateral around infrastructure guidelines (or at least I haven't been able to find them). My current thinking for server design is something along these lines. - 2 x 10Gbe NICs - 128 GB RAM - 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and Runtimes, 4 x 1TB for Data Drives) - 1 Disk Controller - 2 x 2.6 GHz 6 core processors If I stick with 1u servers then I lose disk capacity per rack but I get a lot more memory and CPU capacity per rack. This increases my total cluster memory footprint and it doesn't seem to make sense to have super dense storage servers because I can't fit all that data on disk in memory anyways. So at present, my thinking is to go with 1u servers instead of 2u Servers. Is 128GB RAM per server normal? Do you guys use more or less than that? Any feedback would be appreciated Regards Steve Watt