Also in test suites, it restarts the suite in every process and you
get infinite dots.

On Tue, Sep 20, 2011 at 10:49 AM, Olivier Grisel
<[email protected]> wrote:
> 2011/9/20 Robert Kern <[email protected]>:
>> On Mon, Sep 19, 2011 at 20:00, xinfan meng <[email protected]> wrote:
>>> Hi:
>>>     I ran the sklearn sentiment classification codes
>>> (https://github.com/scikit-learn/scikit-learn-tutorial/blob/master/solutions/exercise_02_sentiment.py)
>>> and found it keep creating new python.exe instance, and then my computer
>>> crashed because ran out of memory.
>>>
>>>     My OS system is Windows 7, and the scikits.learn comes from the
>>> Enthought Python Distribution bundle. Since this scikits.learn is version
>>> 0.8, I made several neccesary modification to the codes. The codes behaved
>>> normally in MacOS. So I wonder if this is a problem with my OS. May be
>>> someone can try on their system to see if it is reproducible? Thanks.
>>
>> A proper fork does not exist on Windows so multiprocessing needs to do
>> things differently than on POSIX-like platforms. In particular, it
>> needs to import the __main__ module from the parent process in order
>> to figure out parts of its environment. The expectation is that
>> properly written scripts will not execute multiprocessing code in the
>> main block outside of an "if __name__ == '__main__':" suite. Since
>> this example code does everything at the top level, all of the child
>> processes will execute the same code as if they were the main process
>> and creating an explosion of processes.
>>
>> The fix is straightforward on sklearn's part: move the code into a
>> function and call that function under an "if __name__ == '__main__':"
>> test. Or just move everything under that __main__ test.
>
> Indeed I keep forgetting about this. That's a pity though: this script
> is just an exercise script in a tutorial, it was meant to be a short
> and readable sequence of operations executed once for teaching
> purpose, not a real packaged program...
>
> I wonder if we should just remove the `n_jobs` argument to run the
> grid search sequentially in that case.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to