For starters, thanks for the awesome product!

When creating ec2-clusters of 20-40 nodes, things work great. When we
create a cluster with the provided spark-ec2 script, it takes hours. When
creating a 200 node cluster, it takes 2 1/2 hours and for a 500 node
cluster it takes over 5 hours. One other problem we are having is that some
nodes don't come up when the other ones do, the process seems to just move
on, skipping the rsync and any installs on those ones.

My guess as to why it takes so long to set up a large cluster is because of
the use of rsync. What if instead of using rsync, you synched to s3 and
then did a pdsh to pull it down on all of the machines. This is a big deal
for us and if we can come up with a good plan, we might be able help out
with the required changes.

Are there any suggestions on how to deal with some of the nodes not being
ready when the process starts?

Thanks for your time,
Christian

Reply via email to