@Waseem Oh, wait, I just see that we already have an open issue for that, 
please see: https://github.com/scikit-learn/scikit-learn/issues/3973 Would be 
great if you could add to the discussion there. Meanwhile, I will try to run my 
code again in the next few days to check if this bug still persists.


> On Feb 15, 2016, at 4:25 PM, Sebastian Raschka <se.rasc...@gmail.com> wrote:
> 
> Hm, unfortunately, that's what I thought -- sounds like a bug involved in 
> joblib? Does someone has any ideas how to track this down?
> 
> @Waseem Can you also try n_jobs=2? Here, I'd expect that it
> 1)  would use maybe 2 times the 12% plus a little bit extra if everything is 
> working correctly with the multi-threading. 
> 2) If you see something like ~30%, I'd say that there's an unnecessary copy 
> made 
> 3) If you see something like > 30% there would be a memory leak somewhere
> 
> I mentioned scenario 3, because I observed a very similar behavior once: 
> (see https://github.com/scikit-learn/scikit-learn/issues/3973)
> 
> "I made some weird observations that my GridSearches keep failing after a 
> couple of hours and I initially couldn't figure out why. I monitored the 
> memory usage then over time and saw that it it started with a few gigabytes 
> (~6 Gb) and kept increasing until it crashed the node when it reached the 
> max. 128 Gb the hardware can take. I was experimenting with random forests 
> for classification of a large number of text documents. For simplicity -- to 
> figure out what's going on -- I went back to naive Bayes.
> ...
> After some experimentation, I finally found out that 
> 
> gc.collect()
> len(gc.get_objects()) # particularly this part!
> 
> in the for loop solves the problem and the memory usage stays constantly at 
> 6.5 Gb over the run time of ~10 hours.
> 
> 
>> On Feb 15, 2016, at 9:37 AM, muhammad waseem <m.waseem.ah...@gmail.com> 
>> wrote:
>> 
>> @Sebastian: I have tried to run cross_validation by using n_jobs=1 and it 
>> did not use SWAP memory, even the RAM usage was quite low (maximum 12%). 
>> However, this will take a longer time to finish. Any idea what to try now?
>> 
>> Thanks
>> Kindest Regards
>> Waseem
>> 
>> On Fri, Feb 12, 2016 at 9:58 PM, Jacob Schreiber <jmschreibe...@gmail.com> 
>> wrote:
>> I don't think that the data is copied for tree based classifiers. It uses 
>> the threading backend, so each thread should be sharing memory.
>> 
>> On Fri, Feb 12, 2016 at 12:32 PM, Sebastian Raschka <se.rasc...@gmail.com> 
>> wrote:
>> I'd suggest trying n_jobs=1 and check if swap memory is used (you don't have 
>> to run it until completion). If this runs fine without swap, we can work 
>> further from there. 
>> 
>> Sent from my iPhone
>> 
>> On Feb 12, 2016, at 2:57 PM, muhammad waseem <m.waseem.ah...@gmail.com> 
>> wrote:
>> 
>>> @Sebastian: I tried with n_jobs=10 (total is equal to 12) and it still 
>>> created the same problem. I could try running it by using n_jobs=1 but it 
>>> would be so slow that it will take ages to complete. The machine has 32GB 
>>> RAM and it started using Swap memory after consuming full RAM. 
>>> 
>>> Is there a way to tackle or you really think that all this k-fold cross 
>>> validation, training should be done using Spark's MLib?
>>> 
>>> Thanks
>>> Regards
>>> Waseem
>>> 
>>> 
>>> On Fri, Feb 12, 2016 at 6:40 PM, Sebastian Raschka <se.rasc...@gmail.com> 
>>> wrote:
>>> Thanks for the note, Manoj, didn't know that!
>>> 
>>> @muhammad So if there's no duplication of data across all processes, I 
>>> guess that the you would also run into troubles with n_jobs=1. But just to 
>>> make sure that data duplication is not an issue, could you try running it 
>>> with n_jobs=1? In this case, probably only a smaller data set or machine 
>>> with larger memory would help. Here, I'd probably think about using Spark's 
>>> MLlib to deal with this particular dataset.
>>> 
>>>> On Feb 12, 2016, at 12:30 PM, muhammad waseem <m.waseem.ah...@gmail.com> 
>>>> wrote:
>>>> 
>>>> Hi Sebastian and Manoj, 
>>>> @Manoj: What should be the value of max_nbytes parameter and will this 
>>>> affect the results and time it takes to run cross_validation, grid_search 
>>>> etc?
>>>> @Sebastian: Will the Spark implication will also improve the memory use or 
>>>> just the CPU?
>>>> 
>>>> 
>>>> Thanks
>>>> Kindest Regards
>>>> 
>>>> On Fri, Feb 12, 2016 at 5:29 PM, muhammad waseem 
>>>> <m.waseem.ah...@gmail.com> wrote:
>>>> Hi Sebastian and Manoj, 
>>>> @Manoj: What should be the value of max_nbytes parameter and will this 
>>>> affect the results and time it takes to run cross_validation, grid_search 
>>>> etc?
>>>> 
>>>> Thanks
>>>> Kindest Regards
>>>> Waseem 
>>>> 
>>>> On Fri, Feb 12, 2016 at 4:42 PM, Sebastian Raschka <se.rasc...@gmail.com> 
>>>> wrote:
>>>> Hi, Waseem,
>>>> I think lowering the value of n_jobs would help; as far as I know, each 
>>>> process get a copy of the data? Just stumbled upon spark-sklearn a few 
>>>> days ago, maybe that could help as well:
>>>> 
>>>> https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
>>>> 
>>>> When I understand correctly, the data is still copied, but here, each node 
>>>> gets a copy instead of one machine with many copies.
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Feb 12, 2016, at 11:35 AM, muhammad waseem <m.waseem.ah...@gmail.com> 
>>>>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am trying to fit my model using regression trees but the problem is, it 
>>>>> consumes a lot of RAM, which makes my code unresponsive. By looking at 
>>>>> different forums and platforms, I think this is a common problem. I was 
>>>>> wondering, how you free up memory or what are the best ways to run the 
>>>>> fitting process/cross-validation without running out of memory? This 
>>>>> problem is mostly with all regression trees (I think with other ML 
>>>>> algorithms as well). Shall I try to run without n_job=-1 and use some 
>>>>> other value (e.g. n_jobs=10) in cross_validation?
>>>>> 
>>>>> Thanks
>>>>> Kindest Regards
>>>>> Waseem
>>>>> ------------------------------------------------------------------------------
>>>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>> Monitor end-to-end web transactions and take corrective actions now
>>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>> 
>>>> 
>>>> ------------------------------------------------------------------------------
>>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>> Monitor end-to-end web transactions and take corrective actions now
>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ------------------------------------------------------------------------------
>>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>> Monitor end-to-end web transactions and take corrective actions now
>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>> 
>>> 
>>> ------------------------------------------------------------------------------
>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>> Monitor end-to-end web transactions and take corrective actions now
>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>> 
>>> 
>>> 
>>> ------------------------------------------------------------------------------
>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>> Monitor end-to-end web transactions and take corrective actions now
>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> 
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> 
>> 
>> 
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> 
> 
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to