It is a known limitation that spark-ec2 is very slow for large
clusters and as you mention most of this is due to the use of rsync to
transfer things from the master to all the slaves.

Nick cc'd has been working on an alternative approach at
https://github.com/nchammas/flintrock that is more scalable.

Thanks
Shivaram

On Thu, Nov 5, 2015 at 8:12 AM, Christian <engr...@gmail.com> wrote:
> For starters, thanks for the awesome product!
>
> When creating ec2-clusters of 20-40 nodes, things work great. When we create
> a cluster with the provided spark-ec2 script, it takes hours. When creating
> a 200 node cluster, it takes 2 1/2 hours and for a 500 node cluster it takes
> over 5 hours. One other problem we are having is that some nodes don't come
> up when the other ones do, the process seems to just move on, skipping the
> rsync and any installs on those ones.
>
> My guess as to why it takes so long to set up a large cluster is because of
> the use of rsync. What if instead of using rsync, you synched to s3 and then
> did a pdsh to pull it down on all of the machines. This is a big deal for us
> and if we can come up with a good plan, we might be able help out with the
> required changes.
>
> Are there any suggestions on how to deal with some of the nodes not being
> ready when the process starts?
>
> Thanks for your time,
> Christian
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to