Hi, Andy, the models that I am using are Random Forests and naive Bayes classifiers.
Maybe it's something in scipy according to Manoj's linked discussion ... in any case, maybe a workaround for this issue and future issues would be to have a "forxe_clear_gc" (default=False) parameter to force the garbage collector to be emptied after every cycle for estimators and GridSearch? Here are more details about the particular system setup. scikit-learn and scipy should be up to the most recent versions. Python is installed via conda. Python 3.4.2 |Continuum Analytics, Inc.| (default, Oct 21 2014, 17:16:37) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux Type "help", "copyright", "credits" or "license" for more information. import sklearn i>>> import scipy scipy.__version__ '0.14.0' sklearn.__version__ '0.15.2' Best, Sebastian > On Dec 16, 2014, at 11:33 AM, Andy <t3k...@gmail.com> wrote: > > Hi. > > Which models are you using and which version of scikit-learn? > > Cheers, > Andy > > On 12/16/2014 11:19 AM, Sebastian Raschka wrote: >> Hi all, >> >> I am wondering if someone noticed that GridSearch is eating more and more >> memory over time? I read related discussion on the issue list on GitHub and >> it sounds like that it has been solved (estimators are not kept anymore, and >> the best estimator can optionally be refitted at the end of the GridSearch). >> >> However, when I ran the GridSearch, I noticed that it always "crashed" after >> a couple of hours. When I monitored the system usage over time, I saw the >> memory utilization (almost linearly) increasing over time until it reached >> the 128 Gb max of the machine I was running it on. >> >> I then wrote a naive grid search with nested for loops and it had the same >> issues. So, it is probably not the grid search but something with Python ... >> >> Eventually, I added the 2 lines >> >> gc.collect() >> len(gc.get_objects()) >> >> which seem to do the trick! Especially the 2nd one. Now, I can run the >> gridsearch for hours and with a constant ~6.8 Gb memory utilization. >> >> >> I am curious, did anyone else have this memory issue? >> >> Best, >> Sebastian >> ------------------------------------------------------------------------------ >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >> with Interactivity, Sharing, Native Excel Exports, App Integration & more >> Get technology previously reserved for billion-dollar corporations, FREE >> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general