Hi Alex,

If they're both configured correctly, there's no reason that Spark
Standalone should provide performance or memory improvement over Spark on
YARN.

-Sandy

On Fri, Sep 4, 2015 at 1:24 PM, Alexander Pivovarov <apivova...@gmail.com>
wrote:

> Hi Everyone
>
> We are trying the latest aws emr-4.0.0 and Spark and my question is about
> YARN vs Standalone mode.
> Our usecase is
> - start 100-150 nodes cluster every week,
> - run one heavy spark job (5-6 hours)
> - save data to s3
> - stop cluster
>
> Officially aws emr-4.0.0 comes with Spark on Yarn
> It's probably possible to hack emr by creating bootstrap script which
> stops yarn and starts master and slaves on each computer  (to start Spark
> in standalone mode)
>
> My questions are
> - Does Spark standalone provides significant performance / memory
> improvement in comparison to YARN mode?
> - Does it worth hacking official emr Spark on Yarn and switch Spark to
> Standalone mode?
>
>
> I already created comparison table and want you to check if my
> understanding is correct
>
> Lets say r3.2xlarge computer has 52GB ram available for Spark Executor JVMs
>
>                     standalone to yarn comparison
>
>
>           STDLN   YARN
>
> can executor allocate up to 52GB ram                           - yes  |
>  yes
>
> will executor be unresponsive after using all 52GB ram because of GC - yes
>  |  yes
>
> additional JVMs on slave except of spark executor        - workr | node
> mngr
>
> are additional JVMs lightweight                                     - yes
>  |  yes
>
>
> Thank you
>
> Alex
>

Reply via email to