2011/9/20 Robert Kern <[email protected]>: > On Mon, Sep 19, 2011 at 20:00, xinfan meng <[email protected]> wrote: >> Hi: >> I ran the sklearn sentiment classification codes >> (https://github.com/scikit-learn/scikit-learn-tutorial/blob/master/solutions/exercise_02_sentiment.py) >> and found it keep creating new python.exe instance, and then my computer >> crashed because ran out of memory. >> >> My OS system is Windows 7, and the scikits.learn comes from the >> Enthought Python Distribution bundle. Since this scikits.learn is version >> 0.8, I made several neccesary modification to the codes. The codes behaved >> normally in MacOS. So I wonder if this is a problem with my OS. May be >> someone can try on their system to see if it is reproducible? Thanks. > > A proper fork does not exist on Windows so multiprocessing needs to do > things differently than on POSIX-like platforms. In particular, it > needs to import the __main__ module from the parent process in order > to figure out parts of its environment. The expectation is that > properly written scripts will not execute multiprocessing code in the > main block outside of an "if __name__ == '__main__':" suite. Since > this example code does everything at the top level, all of the child > processes will execute the same code as if they were the main process > and creating an explosion of processes. > > The fix is straightforward on sklearn's part: move the code into a > function and call that function under an "if __name__ == '__main__':" > test. Or just move everything under that __main__ test.
Indeed I keep forgetting about this. That's a pity though: this script is just an exercise script in a tutorial, it was meant to be a short and readable sequence of operations executed once for teaching purpose, not a real packaged program... I wonder if we should just remove the `n_jobs` argument to run the grid search sequentially in that case. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
