I noticed that in the main branch, the ec2 directory along with the
spark-ec2 script is no longer present.

It’s been moved out of the main repo to its own location:
https://github.com/amplab/spark-ec2/pull/21

Is spark-ec2 going away in the next release? If so, what would be the best
alternative at that time?

It’s not going away. It’s just being removed from the main Spark repo and
maintained separately.

There are many alternatives like EMR, which was already mentioned, as well
as more full-service solutions like Databricks. It depends on what you’re
looking for.

If you want something as close to spark-ec2 as possible but more actively
developed, you might be interested in checking out Flintrock
<https://github.com/nchammas/flintrock>, which I built.

Is there any way to add/remove additional workers while the cluster is
running without stopping/starting the EC2 cluster?

Not currently possible with spark-ec2 and a bit difficult to add. See:
https://issues.apache.org/jira/browse/SPARK-2008

For 1, if no such capability is provided with the current script., do we
have to write it ourselves? Or is there any plan in the future to add such
functions?

No "official" plans to add this to spark-ec2. It’s up to a contributor to
step up and implement this feature, basically. Otherwise it won’t happen.

Nick

On Wed, Jan 27, 2016 at 5:13 PM Alexander Pivovarov <apivova...@gmail.com>
wrote:

you can use EMR-4.3.0 run on spot instances to control the price
>
> yes, you can add/remove instances to the cluster on fly  (CORE instances
> support add only, TASK instances - add and remove)
>
>
>
> On Wed, Jan 27, 2016 at 2:07 PM, Sung Hwan Chung <coded...@cs.stanford.edu
> > wrote:
>
>> I noticed that in the main branch, the ec2 directory along with the
>> spark-ec2 script is no longer present.
>>
>> Is spark-ec2 going away in the next release? If so, what would be the
>> best alternative at that time?
>>
>> A couple more additional questions:
>> 1. Is there any way to add/remove additional workers while the cluster is
>> running without stopping/starting the EC2 cluster?
>> 2. For 1, if no such capability is provided with the current script., do
>> we have to write it ourselves? Or is there any plan in the future to add
>> such functions?
>> 2. In PySpark, is it possible to dynamically change driver/executor
>> memory, number of cores per executor without having to restart it? (e.g.
>> via changing sc configuration or recreating sc?)
>>
>> Our ideal scenario is to keep running PySpark (in our case, as a
>> notebook) and connect/disconnect to any spark clusters on demand.
>>
>
> ​

Reply via email to