Also in test suites, it restarts the suite in every process and you get infinite dots.
On Tue, Sep 20, 2011 at 10:49 AM, Olivier Grisel <[email protected]> wrote: > 2011/9/20 Robert Kern <[email protected]>: >> On Mon, Sep 19, 2011 at 20:00, xinfan meng <[email protected]> wrote: >>> Hi: >>> I ran the sklearn sentiment classification codes >>> (https://github.com/scikit-learn/scikit-learn-tutorial/blob/master/solutions/exercise_02_sentiment.py) >>> and found it keep creating new python.exe instance, and then my computer >>> crashed because ran out of memory. >>> >>> My OS system is Windows 7, and the scikits.learn comes from the >>> Enthought Python Distribution bundle. Since this scikits.learn is version >>> 0.8, I made several neccesary modification to the codes. The codes behaved >>> normally in MacOS. So I wonder if this is a problem with my OS. May be >>> someone can try on their system to see if it is reproducible? Thanks. >> >> A proper fork does not exist on Windows so multiprocessing needs to do >> things differently than on POSIX-like platforms. In particular, it >> needs to import the __main__ module from the parent process in order >> to figure out parts of its environment. The expectation is that >> properly written scripts will not execute multiprocessing code in the >> main block outside of an "if __name__ == '__main__':" suite. Since >> this example code does everything at the top level, all of the child >> processes will execute the same code as if they were the main process >> and creating an explosion of processes. >> >> The fix is straightforward on sklearn's part: move the code into a >> function and call that function under an "if __name__ == '__main__':" >> test. Or just move everything under that __main__ test. > > Indeed I keep forgetting about this. That's a pity though: this script > is just an exercise script in a tutorial, it was meant to be a short > and readable sequence of operations executed once for teaching > purpose, not a real packaged program... > > I wonder if we should just remove the `n_jobs` argument to run the > grid search sequentially in that case. > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
