Hi Alex, If they're both configured correctly, there's no reason that Spark Standalone should provide performance or memory improvement over Spark on YARN.
-Sandy On Fri, Sep 4, 2015 at 1:24 PM, Alexander Pivovarov <apivova...@gmail.com> wrote: > Hi Everyone > > We are trying the latest aws emr-4.0.0 and Spark and my question is about > YARN vs Standalone mode. > Our usecase is > - start 100-150 nodes cluster every week, > - run one heavy spark job (5-6 hours) > - save data to s3 > - stop cluster > > Officially aws emr-4.0.0 comes with Spark on Yarn > It's probably possible to hack emr by creating bootstrap script which > stops yarn and starts master and slaves on each computer (to start Spark > in standalone mode) > > My questions are > - Does Spark standalone provides significant performance / memory > improvement in comparison to YARN mode? > - Does it worth hacking official emr Spark on Yarn and switch Spark to > Standalone mode? > > > I already created comparison table and want you to check if my > understanding is correct > > Lets say r3.2xlarge computer has 52GB ram available for Spark Executor JVMs > > standalone to yarn comparison > > > STDLN YARN > > can executor allocate up to 52GB ram - yes | > yes > > will executor be unresponsive after using all 52GB ram because of GC - yes > | yes > > additional JVMs on slave except of spark executor - workr | node > mngr > > are additional JVMs lightweight - yes > | yes > > > Thank you > > Alex >