Re: Random Forest on Spark

Sandy Ryza Fri, 18 Apr 2014 13:40:24 -0700

I don't think the YARN default of max 8GB container size is a good
justification for limiting memory per worker.  This is a sort of arbitrary
number that came from an era where MapReduce was the main YARN application
and machines generally had less memory.  I expect to see this to get
configured as much higher in practice on most clusters running Spark.


YARN integration is actually complete in CDH5.0.  We support it as well as
standalone mode.




On Fri, Apr 18, 2014 at 11:49 AM, Sean Owen <so...@cloudera.com> wrote:

> On Fri, Apr 18, 2014 at 7:31 PM, Sung Hwan Chung
> <coded...@cs.stanford.edu> wrote:
> > Debasish,
> >
> > Unfortunately, we are bound to YARN, at least for the time being, because
> > that's what most of our customers would be using (unless, all the Hadoop
> > vendors start supporting standalone Spark - I think Cloudera might do
> > that?).
>
> Yes the CDH5.0.0 distro just runs Spark in stand-alone mode. Using the
> YARN integration is still being worked on.
>

Re: Random Forest on Spark

Reply via email to