Hm, I think if the others are right and the data set shouldn't be copied for 
each process, I guess that's a bug. Maybe you could create a reproducible 
example and post it on the issue tracker?

> Also, could you share the code for the way you tackled it? I have seen part 
> of it but is it possible to see full code?

I have it here:
https://github.com/rasbt/bugreport/tree/master/scikit-learn/gridsearch_memory

(but be aware that it's >1 year ago and I haven't tested it again since then)

> On Feb 17, 2016, at 2:25 PM, muhammad waseem <m.waseem.ah...@gmail.com> wrote:
> 
> @Sebastian: I have tried running it by using n_jobs=2 and you were right it 
> uses around 27% of the RAM.
> Does this mean I can only use max n_jobs=8 for my case (obviously this will 
> also depend on the number of estimators, more will require my RAM, is not 
> it?) or there is a bug?
> 
> Also, could you share the code for the way you tackled it? I have seen part 
> of it but is it possible to see full code?
> 
> Thanks for your time.
> 
> Regards
> Waseem
> 
> On Mon, Feb 15, 2016 at 9:25 PM, Sebastian Raschka <se.rasc...@gmail.com 
> <mailto:se.rasc...@gmail.com>> wrote:
> Hm, unfortunately, that's what I thought -- sounds like a bug involved in 
> joblib? Does someone has any ideas how to track this down?
> 
> @Waseem Can you also try n_jobs=2? Here, I'd expect that it
> 1)  would use maybe 2 times the 12% plus a little bit extra if everything is 
> working correctly with the multi-threading.
> 2) If you see something like ~30%, I'd say that there's an unnecessary copy 
> made
> 3) If you see something like > 30% there would be a memory leak somewhere
> 
> I mentioned scenario 3, because I observed a very similar behavior once:
> (see https://github.com/scikit-learn/scikit-learn/issues/3973 
> <https://github.com/scikit-learn/scikit-learn/issues/3973>)
> 
> "I made some weird observations that my GridSearches keep failing after a 
> couple of hours and I initially couldn't figure out why. I monitored the 
> memory usage then over time and saw that it it started with a few gigabytes 
> (~6 Gb) and kept increasing until it crashed the node when it reached the 
> max. 128 Gb the hardware can take. I was experimenting with random forests 
> for classification of a large number of text documents. For simplicity -- to 
> figure out what's going on -- I went back to naive Bayes.
> ...
> After some experimentation, I finally found out that
> 
> gc.collect()
> len(gc.get_objects()) # particularly this part!
> 
> in the for loop solves the problem and the memory usage stays constantly at 
> 6.5 Gb over the run time of ~10 hours.
> 
> 
> > On Feb 15, 2016, at 9:37 AM, muhammad waseem <m.waseem.ah...@gmail.com 
> > <mailto:m.waseem.ah...@gmail.com>> wrote:
> >
> > @Sebastian: I have tried to run cross_validation by using n_jobs=1 and it 
> > did not use SWAP memory, even the RAM usage was quite low (maximum 12%). 
> > However, this will take a longer time to finish. Any idea what to try now?
> >
> > Thanks
> > Kindest Regards
> > Waseem
> >
> > On Fri, Feb 12, 2016 at 9:58 PM, Jacob Schreiber <jmschreibe...@gmail.com 
> > <mailto:jmschreibe...@gmail.com>> wrote:
> > I don't think that the data is copied for tree based classifiers. It uses 
> > the threading backend, so each thread should be sharing memory.
> >
> > On Fri, Feb 12, 2016 at 12:32 PM, Sebastian Raschka <se.rasc...@gmail.com 
> > <mailto:se.rasc...@gmail.com>> wrote:
> > I'd suggest trying n_jobs=1 and check if swap memory is used (you don't 
> > have to run it until completion). If this runs fine without swap, we can 
> > work further from there.
> >
> > Sent from my iPhone
> >
> > On Feb 12, 2016, at 2:57 PM, muhammad waseem <m.waseem.ah...@gmail.com 
> > <mailto:m.waseem.ah...@gmail.com>> wrote:
> >
> >> @Sebastian: I tried with n_jobs=10 (total is equal to 12) and it still 
> >> created the same problem. I could try running it by using n_jobs=1 but it 
> >> would be so slow that it will take ages to complete. The machine has 32GB 
> >> RAM and it started using Swap memory after consuming full RAM.
> >>
> >> Is there a way to tackle or you really think that all this k-fold cross 
> >> validation, training should be done using Spark's MLib?
> >>
> >> Thanks
> >> Regards
> >> Waseem
> >>
> >>
> >> On Fri, Feb 12, 2016 at 6:40 PM, Sebastian Raschka <se.rasc...@gmail.com 
> >> <mailto:se.rasc...@gmail.com>> wrote:
> >> Thanks for the note, Manoj, didn't know that!
> >>
> >> @muhammad So if there's no duplication of data across all processes, I 
> >> guess that the you would also run into troubles with n_jobs=1. But just to 
> >> make sure that data duplication is not an issue, could you try running it 
> >> with n_jobs=1? In this case, probably only a smaller data set or machine 
> >> with larger memory would help. Here, I'd probably think about using 
> >> Spark's MLlib to deal with this particular dataset.
> >>
> >>> On Feb 12, 2016, at 12:30 PM, muhammad waseem <m.waseem.ah...@gmail.com 
> >>> <mailto:m.waseem.ah...@gmail.com>> wrote:
> >>>
> >>> Hi Sebastian and Manoj,
> >>> @Manoj: What should be the value of max_nbytes parameter and will this 
> >>> affect the results and time it takes to run cross_validation, grid_search 
> >>> etc?
> >>> @Sebastian: Will the Spark implication will also improve the memory use 
> >>> or just the CPU?
> >>>
> >>>
> >>> Thanks
> >>> Kindest Regards
> >>>
> >>> On Fri, Feb 12, 2016 at 5:29 PM, muhammad waseem 
> >>> <m.waseem.ah...@gmail.com <mailto:m.waseem.ah...@gmail.com>> wrote:
> >>> Hi Sebastian and Manoj,
> >>> @Manoj: What should be the value of max_nbytes parameter and will this 
> >>> affect the results and time it takes to run cross_validation, grid_search 
> >>> etc?
> >>>
> >>> Thanks
> >>> Kindest Regards
> >>> Waseem
> >>>
> >>> On Fri, Feb 12, 2016 at 4:42 PM, Sebastian Raschka <se.rasc...@gmail.com 
> >>> <mailto:se.rasc...@gmail.com>> wrote:
> >>> Hi, Waseem,
> >>> I think lowering the value of n_jobs would help; as far as I know, each 
> >>> process get a copy of the data? Just stumbled upon spark-sklearn a few 
> >>> days ago, maybe that could help as well:
> >>>
> >>> https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
> >>>  
> >>> <https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html>
> >>>
> >>> When I understand correctly, the data is still copied, but here, each 
> >>> node gets a copy instead of one machine with many copies.
> >>>
> >>>
> >>>
> >>>
> >>> > On Feb 12, 2016, at 11:35 AM, muhammad waseem <m.waseem.ah...@gmail.com 
> >>> > <mailto:m.waseem.ah...@gmail.com>> wrote:
> >>> >
> >>> > Hi,
> >>> >
> >>> > I am trying to fit my model using regression trees but the problem is, 
> >>> > it consumes a lot of RAM, which makes my code unresponsive. By looking 
> >>> > at different forums and platforms, I think this is a common problem. I 
> >>> > was wondering, how you free up memory or what are the best ways to run 
> >>> > the fitting process/cross-validation without running out of memory? 
> >>> > This problem is mostly with all regression trees (I think with other ML 
> >>> > algorithms as well). Shall I try to run without n_job=-1 and use some 
> >>> > other value (e.g. n_jobs=10) in cross_validation?
> >>> >
> >>> > Thanks
> >>> > Kindest Regards
> >>> > Waseem
> >>> > ------------------------------------------------------------------------------
> >>> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
> >>> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> >>> > Monitor end-to-end web transactions and take corrective actions now
> >>> > Troubleshoot faster and improve end-user experience. Signup Now!
> >>> > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
> >>> >  
> >>> > <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________>
> >>> > Scikit-learn-general mailing list
> >>> > Scikit-learn-general@lists.sourceforge.net 
> >>> > <mailto:Scikit-learn-general@lists.sourceforge.net>
> >>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> >>> > <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> >>>
> >>>
> >>> ------------------------------------------------------------------------------
> >>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> >>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> >>> Monitor end-to-end web transactions and take corrective actions now
> >>> Troubleshoot faster and improve end-user experience. Signup Now!
> >>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 
> >>> <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140>
> >>> _______________________________________________
> >>> Scikit-learn-general mailing list
> >>> Scikit-learn-general@lists.sourceforge.net 
> >>> <mailto:Scikit-learn-general@lists.sourceforge.net>
> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> >>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ------------------------------------------------------------------------------
> >>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> >>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> >>> Monitor end-to-end web transactions and take corrective actions now
> >>> Troubleshoot faster and improve end-user experience. Signup Now!
> >>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
> >>>  
> >>> <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________>
> >>> Scikit-learn-general mailing list
> >>> Scikit-learn-general@lists.sourceforge.net 
> >>> <mailto:Scikit-learn-general@lists.sourceforge.net>
> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> >>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> >>
> >>
> >> ------------------------------------------------------------------------------
> >> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> >> Monitor end-to-end web transactions and take corrective actions now
> >> Troubleshoot faster and improve end-user experience. Signup Now!
> >> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 
> >> <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140>
> >> _______________________________________________
> >> Scikit-learn-general mailing list
> >> Scikit-learn-general@lists.sourceforge.net 
> >> <mailto:Scikit-learn-general@lists.sourceforge.net>
> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> >> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> >>
> >>
> >>
> >> ------------------------------------------------------------------------------
> >> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> >> Monitor end-to-end web transactions and take corrective actions now
> >> Troubleshoot faster and improve end-user experience. Signup Now!
> >> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 
> >> <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140>
> >> _______________________________________________
> >> Scikit-learn-general mailing list
> >> Scikit-learn-general@lists.sourceforge.net 
> >> <mailto:Scikit-learn-general@lists.sourceforge.net>
> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> >> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> >
> > ------------------------------------------------------------------------------
> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> > Monitor end-to-end web transactions and take corrective actions now
> > Troubleshoot faster and improve end-user experience. Signup Now!
> > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 
> > <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140>
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net 
> > <mailto:Scikit-learn-general@lists.sourceforge.net>
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> > <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> >
> >
> >
> > ------------------------------------------------------------------------------
> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> > Monitor end-to-end web transactions and take corrective actions now
> > Troubleshoot faster and improve end-user experience. Signup Now!
> > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 
> > <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140>
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net 
> > <mailto:Scikit-learn-general@lists.sourceforge.net>
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> > <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> > ------------------------------------------------------------------------------
> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> > Monitor end-to-end web transactions and take corrective actions now
> > Troubleshoot faster and improve end-user experience. Signup Now!
> > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
> >  
> > <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________>
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net 
> > <mailto:Scikit-learn-general@lists.sourceforge.net>
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> > <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> 
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 
> <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net 
> <mailto:Scikit-learn-general@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to