If I recall correctly, there is no additional premium for using EMR unless you use one of the MapR distributions they offer, or the other value adds.
So a vanilla EMR cluster with spot instances will be no different cost than using spark-ec2. Sent from my iPhone > On 28 Jan 2016, at 01:34, Sung Hwan Chung <coded...@cs.stanford.edu> wrote: > > Hm thanks, > > I think what you are suggesting sounds like a recommendation for AWS EMR. > However, my questions were wrt spark-ec2. For our uses involving > spot-instances, EMR could potentially double/triple prices due to the > additional premiums. > > Thanks anyway! > >> On Wed, Jan 27, 2016 at 2:12 PM, Alexander Pivovarov <apivova...@gmail.com> >> wrote: >> you can use EMR-4.3.0 run on spot instances to control the price >> >> yes, you can add/remove instances to the cluster on fly (CORE instances >> support add only, TASK instances - add and remove) >> >> >> >>> On Wed, Jan 27, 2016 at 2:07 PM, Sung Hwan Chung <coded...@cs.stanford.edu> >>> wrote: >>> I noticed that in the main branch, the ec2 directory along with the >>> spark-ec2 script is no longer present. >>> >>> Is spark-ec2 going away in the next release? If so, what would be the best >>> alternative at that time? >>> >>> A couple more additional questions: >>> 1. Is there any way to add/remove additional workers while the cluster is >>> running without stopping/starting the EC2 cluster? >>> 2. For 1, if no such capability is provided with the current script., do we >>> have to write it ourselves? Or is there any plan in the future to add such >>> functions? >>> 2. In PySpark, is it possible to dynamically change driver/executor memory, >>> number of cores per executor without having to restart it? (e.g. via >>> changing sc configuration or recreating sc?) >>> >>> Our ideal scenario is to keep running PySpark (in our case, as a notebook) >>> and connect/disconnect to any spark clusters on demand. >