I don't think the YARN default of max 8GB container size is a good justification for limiting memory per worker. This is a sort of arbitrary number that came from an era where MapReduce was the main YARN application and machines generally had less memory. I expect to see this to get configured as much higher in practice on most clusters running Spark.
YARN integration is actually complete in CDH5.0. We support it as well as standalone mode. On Fri, Apr 18, 2014 at 11:49 AM, Sean Owen <so...@cloudera.com> wrote: > On Fri, Apr 18, 2014 at 7:31 PM, Sung Hwan Chung > <coded...@cs.stanford.edu> wrote: > > Debasish, > > > > Unfortunately, we are bound to YARN, at least for the time being, because > > that's what most of our customers would be using (unless, all the Hadoop > > vendors start supporting standalone Spark - I think Cloudera might do > > that?). > > Yes the CDH5.0.0 distro just runs Spark in stand-alone mode. Using the > YARN integration is still being worked on. >