Hi Sebastian,

This is true but only if the data is less than 1M. After that it is
memmapped to a temp folder and is shared by all processes (
https://pythonhosted.org/joblib/parallel.html#working-with-numerical-data-in-shared-memory-memmaping
)

You can try varying "max_nbytes" parameter wherever Parallel is called in
the regression tress to trigger memmap conversion even with smaller size of
data and prevent duplication of data across all processes.

On Fri, Feb 12, 2016 at 11:42 AM, Sebastian Raschka <se.rasc...@gmail.com>
wrote:

> Hi, Waseem,
> I think lowering the value of n_jobs would help; as far as I know, each
> process get a copy of the data? Just stumbled upon spark-sklearn a few days
> ago, maybe that could help as well:
>
>
> https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
>
> When I understand correctly, the data is still copied, but here, each node
> gets a copy instead of one machine with many copies.
>
>
>
>
> > On Feb 12, 2016, at 11:35 AM, muhammad waseem <m.waseem.ah...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I am trying to fit my model using regression trees but the problem is,
> it consumes a lot of RAM, which makes my code unresponsive. By looking at
> different forums and platforms, I think this is a common problem. I was
> wondering, how you free up memory or what are the best ways to run the
> fitting process/cross-validation without running out of memory? This
> problem is mostly with all regression trees (I think with other ML
> algorithms as well). Shall I try to run without n_job=-1 and use some other
> value (e.g. n_jobs=10) in cross_validation?
> >
> > Thanks
> > Kindest Regards
> > Waseem
> >
> ------------------------------------------------------------------------------
> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> > Monitor end-to-end web transactions and take corrective actions now
> > Troubleshoot faster and improve end-user experience. Signup Now!
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>



-- 
Manoj,
http://github.com/MechCoder
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to