Hello,

(I think I might have sent this to the wrong address the first time, so I'm
sending it again)

I have been trying to find my way around a weird memory error for days now.
If I'm doing something wrong and this question is completely dumb,
I'm sorry for spamming the maillist. But I'm desperate.

When running this code, everything works as expected:

#######################################
import os
from sklearn.feature_extraction.text import CountVectorizer

data = []
for i in range(0, 1000):
    filename = "a.txt"
    data.append(os.path.join(DATA_DIR, filename))

vectorizer = CountVectorizer(encoding = 'utf-8-sig', input = 'filename')
vectors = vectorizer.fit_transform(data)
#######################################

However, if I change the range to (0, 2000) it gives me a Memory Error with
the following trace:

#######################################
Traceback (most recent call last):
  File "C:\...\msin.py", line 16, in <module>
    vectors = vectorizer.fit_transform(data)
  File "C:\Python27\lib\site-packages\sklearn\feature_extraction\text.py",
line 817, in fit_transform
    self.fixed_vocabulary_)
  File "C:\Python27\lib\site-packages\sklearn\feature_extraction\text.py",
line 769, in _count_vocab
    values = np.ones(len(j_indices))
  File "C:\Python27\lib\site-packages\numpy\core\numeric.py", line 178, in
ones
    a = empty(shape, dtype, order)
MemoryError
#######################################

Notes:
- the file is about 200 000 characters / 40 000 words.
- OS is Windows 10.
- the python process takes about 340MB RAM at the moment of Memory Error.
- I've seen my python processes taking about 1.8GB before and there was
never a problem. So Windows killing the process because it's trying to use
too much memory doesn't seem to be the case here.
- I keep receiving the error even if I restrict the vocabulary size.

Thanks in advance!!!
Maria
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to