Thanks for the note, Manoj, didn't know that!

@muhammad So if there's no duplication of data across all processes, I guess 
that the you would also run into troubles with n_jobs=1. But just to make sure 
that data duplication is not an issue, could you try running it with n_jobs=1? 
In this case, probably only a smaller data set or machine with larger memory 
would help. Here, I'd probably think about using Spark's MLlib to deal with 
this particular dataset.

> On Feb 12, 2016, at 12:30 PM, muhammad waseem <m.waseem.ah...@gmail.com> 
> wrote:
> 
> Hi Sebastian and Manoj, 
> @Manoj: What should be the value of max_nbytes parameter and will this affect 
> the results and time it takes to run cross_validation, grid_search etc?
> @Sebastian: Will the Spark implication will also improve the memory use or 
> just the CPU?
> 
> 
> Thanks
> Kindest Regards
> 
> On Fri, Feb 12, 2016 at 5:29 PM, muhammad waseem <m.waseem.ah...@gmail.com 
> <mailto:m.waseem.ah...@gmail.com>> wrote:
> Hi Sebastian and Manoj, 
> @Manoj: What should be the value of max_nbytes parameter and will this affect 
> the results and time it takes to run cross_validation, grid_search etc?
> 
> Thanks
> Kindest Regards
> Waseem 
> 
> On Fri, Feb 12, 2016 at 4:42 PM, Sebastian Raschka <se.rasc...@gmail.com 
> <mailto:se.rasc...@gmail.com>> wrote:
> Hi, Waseem,
> I think lowering the value of n_jobs would help; as far as I know, each 
> process get a copy of the data? Just stumbled upon spark-sklearn a few days 
> ago, maybe that could help as well:
> 
> https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
>  
> <https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html>
> 
> When I understand correctly, the data is still copied, but here, each node 
> gets a copy instead of one machine with many copies.
> 
> 
> 
> 
> > On Feb 12, 2016, at 11:35 AM, muhammad waseem <m.waseem.ah...@gmail.com 
> > <mailto:m.waseem.ah...@gmail.com>> wrote:
> >
> > Hi,
> >
> > I am trying to fit my model using regression trees but the problem is, it 
> > consumes a lot of RAM, which makes my code unresponsive. By looking at 
> > different forums and platforms, I think this is a common problem. I was 
> > wondering, how you free up memory or what are the best ways to run the 
> > fitting process/cross-validation without running out of memory? This 
> > problem is mostly with all regression trees (I think with other ML 
> > algorithms as well). Shall I try to run without n_job=-1 and use some other 
> > value (e.g. n_jobs=10) in cross_validation?
> >
> > Thanks
> > Kindest Regards
> > Waseem
> > ------------------------------------------------------------------------------
> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> > Monitor end-to-end web transactions and take corrective actions now
> > Troubleshoot faster and improve end-user experience. Signup Now!
> > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
> >  
> > <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________>
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net 
> > <mailto:Scikit-learn-general@lists.sourceforge.net>
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> > <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> 
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 
> <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net 
> <mailto:Scikit-learn-general@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> 
> 
> 
> 
> 
> -- 
> Dr Muhammad Waseem Ahmad
> Research Associate,
> BRE Center for Sustainable Construction,
> School of Engineering,
> Cardiff University,
> Cardiff, UK.
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to