2011/9/20 Robert Kern <[email protected]>:
> On Mon, Sep 19, 2011 at 20:00, xinfan meng <[email protected]> wrote:
>> Hi:
>>     I ran the sklearn sentiment classification codes
>> (https://github.com/scikit-learn/scikit-learn-tutorial/blob/master/solutions/exercise_02_sentiment.py)
>> and found it keep creating new python.exe instance, and then my computer
>> crashed because ran out of memory.
>>
>>     My OS system is Windows 7, and the scikits.learn comes from the
>> Enthought Python Distribution bundle. Since this scikits.learn is version
>> 0.8, I made several neccesary modification to the codes. The codes behaved
>> normally in MacOS. So I wonder if this is a problem with my OS. May be
>> someone can try on their system to see if it is reproducible? Thanks.
>
> A proper fork does not exist on Windows so multiprocessing needs to do
> things differently than on POSIX-like platforms. In particular, it
> needs to import the __main__ module from the parent process in order
> to figure out parts of its environment. The expectation is that
> properly written scripts will not execute multiprocessing code in the
> main block outside of an "if __name__ == '__main__':" suite. Since
> this example code does everything at the top level, all of the child
> processes will execute the same code as if they were the main process
> and creating an explosion of processes.
>
> The fix is straightforward on sklearn's part: move the code into a
> function and call that function under an "if __name__ == '__main__':"
> test. Or just move everything under that __main__ test.

Indeed I keep forgetting about this. That's a pity though: this script
is just an exercise script in a tutorial, it was meant to be a short
and readable sequence of operations executed once for teaching
purpose, not a real packaged program...

I wonder if we should just remove the `n_jobs` argument to run the
grid search sequentially in that case.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to