Thank you Olivier for these suggestions. I'd try/test them with pleasure, but meanwhile I discovered that there was just no way the dataset I was trying to use would ever fit in the 72GB of memory of the machine I'm using. So I just scaled it down, and obviously this error is not happening anymore.
But I'd be curious to know if there are any mechanism I could use to allow a Random Forest classifier to work with bigger datasets (than what simply fits in memory)? Thanks! On 22 September 2012 16:18, Olivier Grisel <olivier.gri...@ensta.org> wrote: > 2012/9/22 Christian Jauvin <cjau...@gmail.com>: >> Hi, >> >> I have been doing multiple experiments using a RandomForestClassifier >> (trained with the parallel code option) recently, without encountering >> any particular problem. However as soon as I began using a much bigger >> dataset (with the exact same code), I got this threading error: >> >> Exception in thread Thread-2: >> Traceback (most recent call last): >> File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner >> self.run() >> File "/usr/lib/python2.7/threading.py", line 504, in run >> self.__target(*self.__args, **self.__kwargs) >> File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in >> _handle_tasks >> put(task) >> SystemError: NULL result without error in PyObject_Call >> >> I can provide additional details of course, but first maybe there is >> something in particular I should be aware of, about size or memory >> limit of the underlying objects in question? >> > > It can be a memory error as the current implementation is very bad at > managing the memory. > > You can try to replace the joblib folder in the sklearn source tree by > the "pickling-pool" branch of my repo: > > https://github.com/joblib/joblib/pull/44 > > That should help a lot. You can further memmap your original dataset > has explained in the following doc to get even better memory usage > reduction: > > https://github.com/ogrisel/joblib/blob/pickling-pool/doc/parallel_numpy.rst > > You might also want to set the TMP environment variable to a folder on > a big partition. > > I am very interested in any feedback while using this branch. > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > > ------------------------------------------------------------------------------ > How fast is your code? > 3 out of 4 devs don\\\'t know how their code performs in production. > Find out how slow your code is with AppDynamics Lite. > http://ad.doubleclick.net/clk;262219672;13503038;z? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general