I'd suggest trying n_jobs=1 and check if swap memory is used (you don't have to 
run it until completion). If this runs fine without swap, we can work further 
from there. 

Sent from my iPhone

> On Feb 12, 2016, at 2:57 PM, muhammad waseem <m.waseem.ah...@gmail.com> wrote:
> 
> @Sebastian: I tried with n_jobs=10 (total is equal to 12) and it still 
> created the same problem. I could try running it by using n_jobs=1 but it 
> would be so slow that it will take ages to complete. The machine has 32GB RAM 
> and it started using Swap memory after consuming full RAM. 
> 
> Is there a way to tackle or you really think that all this k-fold cross 
> validation, training should be done using Spark's MLib?
> 
> Thanks
> Regards
> Waseem
> 
> 
>> On Fri, Feb 12, 2016 at 6:40 PM, Sebastian Raschka <se.rasc...@gmail.com> 
>> wrote:
>> Thanks for the note, Manoj, didn't know that!
>> 
>> @muhammad So if there's no duplication of data across all processes, I guess 
>> that the you would also run into troubles with n_jobs=1. But just to make 
>> sure that data duplication is not an issue, could you try running it with 
>> n_jobs=1? In this case, probably only a smaller data set or machine with 
>> larger memory would help. Here, I'd probably think about using Spark's MLlib 
>> to deal with this particular dataset.
>> 
>>> On Feb 12, 2016, at 12:30 PM, muhammad waseem <m.waseem.ah...@gmail.com> 
>>> wrote:
>>> 
>>> Hi Sebastian and Manoj, 
>>> @Manoj: What should be the value of max_nbytes parameter and will this 
>>> affect the results and time it takes to run cross_validation, grid_search 
>>> etc?
>>> @Sebastian: Will the Spark implication will also improve the memory use or 
>>> just the CPU?
>>> 
>>> 
>>> Thanks
>>> Kindest Regards
>>> 
>>>> On Fri, Feb 12, 2016 at 5:29 PM, muhammad waseem 
>>>> <m.waseem.ah...@gmail.com> wrote:
>>>> Hi Sebastian and Manoj, 
>>>> @Manoj: What should be the value of max_nbytes parameter and will this 
>>>> affect the results and time it takes to run cross_validation, grid_search 
>>>> etc?
>>>> 
>>>> Thanks
>>>> Kindest Regards
>>>> Waseem 
>>>> 
>>>>> On Fri, Feb 12, 2016 at 4:42 PM, Sebastian Raschka <se.rasc...@gmail.com> 
>>>>> wrote:
>>>>> Hi, Waseem,
>>>>> I think lowering the value of n_jobs would help; as far as I know, each 
>>>>> process get a copy of the data? Just stumbled upon spark-sklearn a few 
>>>>> days ago, maybe that could help as well:
>>>>> 
>>>>> https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
>>>>> 
>>>>> When I understand correctly, the data is still copied, but here, each 
>>>>> node gets a copy instead of one machine with many copies.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> > On Feb 12, 2016, at 11:35 AM, muhammad waseem 
>>>>> > <m.waseem.ah...@gmail.com> wrote:
>>>>> >
>>>>> > Hi,
>>>>> >
>>>>> > I am trying to fit my model using regression trees but the problem is, 
>>>>> > it consumes a lot of RAM, which makes my code unresponsive. By looking 
>>>>> > at different forums and platforms, I think this is a common problem. I 
>>>>> > was wondering, how you free up memory or what are the best ways to run 
>>>>> > the fitting process/cross-validation without running out of memory? 
>>>>> > This problem is mostly with all regression trees (I think with other ML 
>>>>> > algorithms as well). Shall I try to run without n_job=-1 and use some 
>>>>> > other value (e.g. n_jobs=10) in cross_validation?
>>>>> >
>>>>> > Thanks
>>>>> > Kindest Regards
>>>>> > Waseem
>>>>> > ------------------------------------------------------------------------------
>>>>> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>>> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>> > Monitor end-to-end web transactions and take corrective actions now
>>>>> > Troubleshoot faster and improve end-user experience. Signup Now!
>>>>> > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
>>>>> > Scikit-learn-general mailing list
>>>>> > Scikit-learn-general@lists.sourceforge.net
>>>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>> 
>>>>> 
>>>>> ------------------------------------------------------------------------------
>>>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>> Monitor end-to-end web transactions and take corrective actions now
>>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>> 
>>> 
>>> 
>>> ------------------------------------------------------------------------------
>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>> Monitor end-to-end web transactions and take corrective actions now
>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> 
>> 
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> 
> 
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to