@Sebastian: I have tried running it by using n_jobs=2 and you were right it
uses around 27% of the RAM.
Does this mean I can only use max n_jobs=8 for my case (obviously this will
also depend on the number of estimators, more will require my RAM, is not
it?) or there is a bug?

Also, could you share the code for the way you tackled it? I have seen part
of it but is it possible to see full code?

Thanks for your time.

Regards
Waseem

On Mon, Feb 15, 2016 at 9:25 PM, Sebastian Raschka <se.rasc...@gmail.com>
wrote:

> Hm, unfortunately, that's what I thought -- sounds like a bug involved in
> joblib? Does someone has any ideas how to track this down?
>
> @Waseem Can you also try n_jobs=2? Here, I'd expect that it
> 1)  would use maybe 2 times the 12% plus a little bit extra if everything
> is working correctly with the multi-threading.
> 2) If you see something like ~30%, I'd say that there's an unnecessary
> copy made
> 3) If you see something like > 30% there would be a memory leak somewhere
>
> I mentioned scenario 3, because I observed a very similar behavior once:
> (see https://github.com/scikit-learn/scikit-learn/issues/3973)
>
> "I made some weird observations that my GridSearches keep failing after a
> couple of hours and I initially couldn't figure out why. I monitored the
> memory usage then over time and saw that it it started with a few gigabytes
> (~6 Gb) and kept increasing until it crashed the node when it reached the
> max. 128 Gb the hardware can take. I was experimenting with random forests
> for classification of a large number of text documents. For simplicity --
> to figure out what's going on -- I went back to naive Bayes.
> ...
> After some experimentation, I finally found out that
>
> gc.collect()
> len(gc.get_objects()) # particularly this part!
>
> in the for loop solves the problem and the memory usage stays constantly
> at 6.5 Gb over the run time of ~10 hours.
>
>
> > On Feb 15, 2016, at 9:37 AM, muhammad waseem <m.waseem.ah...@gmail.com>
> wrote:
> >
> > @Sebastian: I have tried to run cross_validation by using n_jobs=1 and
> it did not use SWAP memory, even the RAM usage was quite low (maximum 12%).
> However, this will take a longer time to finish. Any idea what to try now?
> >
> > Thanks
> > Kindest Regards
> > Waseem
> >
> > On Fri, Feb 12, 2016 at 9:58 PM, Jacob Schreiber <
> jmschreibe...@gmail.com> wrote:
> > I don't think that the data is copied for tree based classifiers. It
> uses the threading backend, so each thread should be sharing memory.
> >
> > On Fri, Feb 12, 2016 at 12:32 PM, Sebastian Raschka <
> se.rasc...@gmail.com> wrote:
> > I'd suggest trying n_jobs=1 and check if swap memory is used (you don't
> have to run it until completion). If this runs fine without swap, we can
> work further from there.
> >
> > Sent from my iPhone
> >
> > On Feb 12, 2016, at 2:57 PM, muhammad waseem <m.waseem.ah...@gmail.com>
> wrote:
> >
> >> @Sebastian: I tried with n_jobs=10 (total is equal to 12) and it still
> created the same problem. I could try running it by using n_jobs=1 but it
> would be so slow that it will take ages to complete. The machine has 32GB
> RAM and it started using Swap memory after consuming full RAM.
> >>
> >> Is there a way to tackle or you really think that all this k-fold cross
> validation, training should be done using Spark's MLib?
> >>
> >> Thanks
> >> Regards
> >> Waseem
> >>
> >>
> >> On Fri, Feb 12, 2016 at 6:40 PM, Sebastian Raschka <
> se.rasc...@gmail.com> wrote:
> >> Thanks for the note, Manoj, didn't know that!
> >>
> >> @muhammad So if there's no duplication of data across all processes, I
> guess that the you would also run into troubles with n_jobs=1. But just to
> make sure that data duplication is not an issue, could you try running it
> with n_jobs=1? In this case, probably only a smaller data set or machine
> with larger memory would help. Here, I'd probably think about using Spark's
> MLlib to deal with this particular dataset.
> >>
> >>> On Feb 12, 2016, at 12:30 PM, muhammad waseem <
> m.waseem.ah...@gmail.com> wrote:
> >>>
> >>> Hi Sebastian and Manoj,
> >>> @Manoj: What should be the value of max_nbytes parameter and will this
> affect the results and time it takes to run cross_validation, grid_search
> etc?
> >>> @Sebastian: Will the Spark implication will also improve the memory
> use or just the CPU?
> >>>
> >>>
> >>> Thanks
> >>> Kindest Regards
> >>>
> >>> On Fri, Feb 12, 2016 at 5:29 PM, muhammad waseem <
> m.waseem.ah...@gmail.com> wrote:
> >>> Hi Sebastian and Manoj,
> >>> @Manoj: What should be the value of max_nbytes parameter and will this
> affect the results and time it takes to run cross_validation, grid_search
> etc?
> >>>
> >>> Thanks
> >>> Kindest Regards
> >>> Waseem
> >>>
> >>> On Fri, Feb 12, 2016 at 4:42 PM, Sebastian Raschka <
> se.rasc...@gmail.com> wrote:
> >>> Hi, Waseem,
> >>> I think lowering the value of n_jobs would help; as far as I know,
> each process get a copy of the data? Just stumbled upon spark-sklearn a few
> days ago, maybe that could help as well:
> >>>
> >>>
> https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
> >>>
> >>> When I understand correctly, the data is still copied, but here, each
> node gets a copy instead of one machine with many copies.
> >>>
> >>>
> >>>
> >>>
> >>> > On Feb 12, 2016, at 11:35 AM, muhammad waseem <
> m.waseem.ah...@gmail.com> wrote:
> >>> >
> >>> > Hi,
> >>> >
> >>> > I am trying to fit my model using regression trees but the problem
> is, it consumes a lot of RAM, which makes my code unresponsive. By looking
> at different forums and platforms, I think this is a common problem. I was
> wondering, how you free up memory or what are the best ways to run the
> fitting process/cross-validation without running out of memory? This
> problem is mostly with all regression trees (I think with other ML
> algorithms as well). Shall I try to run without n_job=-1 and use some other
> value (e.g. n_jobs=10) in cross_validation?
> >>> >
> >>> > Thanks
> >>> > Kindest Regards
> >>> > Waseem
> >>> >
> ------------------------------------------------------------------------------
> >>> > Site24x7 APM Insight: Get Deep Visibility into Application
> Performance
> >>> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> >>> > Monitor end-to-end web transactions and take corrective actions now
> >>> > Troubleshoot faster and improve end-user experience. Signup Now!
> >>> >
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
> >>> > Scikit-learn-general mailing list
> >>> > Scikit-learn-general@lists.sourceforge.net
> >>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> >>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> >>> Monitor end-to-end web transactions and take corrective actions now
> >>> Troubleshoot faster and improve end-user experience. Signup Now!
> >>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> >>> _______________________________________________
> >>> Scikit-learn-general mailing list
> >>> Scikit-learn-general@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> >>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> >>> Monitor end-to-end web transactions and take corrective actions now
> >>> Troubleshoot faster and improve end-user experience. Signup Now!
> >>>
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
> >>> Scikit-learn-general mailing list
> >>> Scikit-learn-general@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> >> Monitor end-to-end web transactions and take corrective actions now
> >> Troubleshoot faster and improve end-user experience. Signup Now!
> >> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> >> _______________________________________________
> >> Scikit-learn-general mailing list
> >> Scikit-learn-general@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> >> Monitor end-to-end web transactions and take corrective actions now
> >> Troubleshoot faster and improve end-user experience. Signup Now!
> >> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> >> _______________________________________________
> >> Scikit-learn-general mailing list
> >> Scikit-learn-general@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> ------------------------------------------------------------------------------
> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> > Monitor end-to-end web transactions and take corrective actions now
> > Troubleshoot faster and improve end-user experience. Signup Now!
> > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> > Monitor end-to-end web transactions and take corrective actions now
> > Troubleshoot faster and improve end-user experience. Signup Now!
> > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> ------------------------------------------------------------------------------
> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> > Monitor end-to-end web transactions and take corrective actions now
> > Troubleshoot faster and improve end-user experience. Signup Now!
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to