Re: [Scikit-learn-general] Weird memory error

Andreas Mueller Tue, 04 Aug 2015 09:26:54 -0700

Thanks Maria.

What I was asking was that you could use the debugger to see whatlen(j_indices) is when it crashes.I'm not sure if there were improvements to this code since 0.15.2 butI'd encourage you to upgrade to 0.16.1 anyhow.


Cheers,
Andy


On 08/04/2015 11:56 AM, Maria Gorinova wrote:

Hi Andreas,

Thank you for the reply. The error also happens if I load differentfiles, yes, but here I am actually loading the SAME file "a.txt".Which I did, just to demonstrate how awkward the error is... I don'tknow what len(j_indices) is, that's insklearn\feature_extraction\text.py as shown in the exception trace.The version I'm using is 0.15.2 (I think...)


Best,
Maria

On 4 August 2015 at 16:30, Andreas Mueller <[email protected]<mailto:[email protected]>> wrote:


    Just to make sure, you are actually loading different files, not
    the same file over and over again, right?
    It seems an odd place for a memory error. Which version of
    scikit-learn are you using?
    What is ``len(j_indices)``?



    On 08/04/2015 10:18 AM, Maria Gorinova wrote:

    Hello,

    (I think I might have sent this to the wrong address the first
    time, so I'm sending it again)

    I have been trying to find my way around a weird memory error for
    days now. If I'm doing something wrong and this question is
    completely dumb, I'm sorry for spamming the maillist. But I'm
    desperate.

    When running this code, everything works as expected:

    #######################################
    import os
    from sklearn.feature_extraction.text import CountVectorizer

    data = []
    for i in range(0, 1000):
        filename = "a.txt"
        data.append(os.path.join(DATA_DIR, filename))

    vectorizer = CountVectorizer(encoding = 'utf-8-sig', input =
    'filename')
    vectors = vectorizer.fit_transform(data)
    #######################################

    However, if I change the range to (0, 2000) it gives me a Memory
    Error with the following trace:

    #######################################
    Traceback (most recent call last):
      File "C:\...\msin.py", line 16, in <module>
        vectors = vectorizer.fit_transform(data)
      File
    "C:\Python27\lib\site-packages\sklearn\feature_extraction\text.py",
    line 817, in fit_transform
        self.fixed_vocabulary_)
      File
    "C:\Python27\lib\site-packages\sklearn\feature_extraction\text.py",
    line 769, in _count_vocab
        values = np.ones(len(j_indices))
      File "C:\Python27\lib\site-packages\numpy\core\numeric.py",
    line 178, in ones
        a = empty(shape, dtype, order)
    MemoryError
    #######################################

    Notes:
    - the file is about 200 000 characters / 40 000 words.
    - OS is Windows 10.
    - the python process takes about 340MB RAM at the moment of
    Memory Error.
    - I've seen my python processes taking about 1.8GB before and
    there was never a problem. So Windows killing the process because
    it's trying to use too much memory doesn't seem to be the case here.
    - I keep receiving the error even if I restrict the vocabulary size.

    Thanks in advance!!!
    Maria





    
------------------------------------------------------------------------------


    _______________________________________________
    Scikit-learn-general mailing list
    [email protected]  
<mailto:[email protected]>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



    
------------------------------------------------------------------------------

    _______________________________________________
    Scikit-learn-general mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Weird memory error

Reply via email to