Spot instances on Amazon EMR

Grzegorz Białek Thu, 18 Sep 2014 03:20:48 -0700

Hi,
I would like to run Spark application on Amazon EMR. I have some questions
about that:
1. I have input data on other hdfs (not on Amazon). Can I send all input
data from that cluster to HDFS on Amazon EMR cluster (if it has enough
storage memory) or do I have send it to Amazon S3 storage and then load
this data on EMR cluster where I want to run my application?
2. Which nodes should be on-demand instances and which can be spot
instances (I don't want to spend to much money but I also lost my data or
have to recompute everything after spot instance interruption)?
3. Can I use Amazon S3 storage for input and output data to have less
on-demand instances and more spot instances? (Or maybe there is another
solution to lower costs)


I would like to run this application once and computation would take around
30h I think.

Could you answer on (at least some of) this questions?

Thanks,
Grzegorz

Spot instances on Amazon EMR

Reply via email to