Thanks! That's very helpful. On Wed, Jan 27, 2016 at 3:33 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote:
> I noticed that in the main branch, the ec2 directory along with the > spark-ec2 script is no longer present. > > It’s been moved out of the main repo to its own location: > https://github.com/amplab/spark-ec2/pull/21 > > Is spark-ec2 going away in the next release? If so, what would be the best > alternative at that time? > > It’s not going away. It’s just being removed from the main Spark repo and > maintained separately. > > There are many alternatives like EMR, which was already mentioned, as well > as more full-service solutions like Databricks. It depends on what you’re > looking for. > > If you want something as close to spark-ec2 as possible but more actively > developed, you might be interested in checking out Flintrock > <https://github.com/nchammas/flintrock>, which I built. > > Is there any way to add/remove additional workers while the cluster is > running without stopping/starting the EC2 cluster? > > Not currently possible with spark-ec2 and a bit difficult to add. See: > https://issues.apache.org/jira/browse/SPARK-2008 > > For 1, if no such capability is provided with the current script., do we > have to write it ourselves? Or is there any plan in the future to add such > functions? > > No "official" plans to add this to spark-ec2. It’s up to a contributor to > step up and implement this feature, basically. Otherwise it won’t happen. > > Nick > > On Wed, Jan 27, 2016 at 5:13 PM Alexander Pivovarov <apivova...@gmail.com> > wrote: > > you can use EMR-4.3.0 run on spot instances to control the price >> >> yes, you can add/remove instances to the cluster on fly (CORE instances >> support add only, TASK instances - add and remove) >> >> >> >> On Wed, Jan 27, 2016 at 2:07 PM, Sung Hwan Chung < >> coded...@cs.stanford.edu> wrote: >> >>> I noticed that in the main branch, the ec2 directory along with the >>> spark-ec2 script is no longer present. >>> >>> Is spark-ec2 going away in the next release? If so, what would be the >>> best alternative at that time? >>> >>> A couple more additional questions: >>> 1. Is there any way to add/remove additional workers while the cluster >>> is running without stopping/starting the EC2 cluster? >>> 2. For 1, if no such capability is provided with the current script., do >>> we have to write it ourselves? Or is there any plan in the future to add >>> such functions? >>> 2. In PySpark, is it possible to dynamically change driver/executor >>> memory, number of cores per executor without having to restart it? (e.g. >>> via changing sc configuration or recreating sc?) >>> >>> Our ideal scenario is to keep running PySpark (in our case, as a >>> notebook) and connect/disconnect to any spark clusters on demand. >>> >> >> >