Hm, I have never used Python on Windows but I have heard from many people that it is way buggier than the Posix equivalent; maybe it's just a quirk of the garbage collector?
Maybe you could try to add the following lines: gc.collect() len(gc.get_objects()) inside your for-loop and give it another try? I know, it looks weird to "clear" the garbage collector this way, but it worked for me when I had also memory issues running it on a torque cluster. > On Aug 4, 2015, at 11:56 AM, Maria Gorinova <m.gorin...@gmail.com> wrote: > > Hi Andreas, > > Thank you for the reply. The error also happens if I load different files, > yes, but here I am actually loading the SAME file "a.txt". Which I did, just > to demonstrate how awkward the error is... I don't know what len(j_indices) > is, that's in sklearn\feature_extraction\text.py as shown in the exception > trace. The version I'm using is 0.15.2 (I think...) > > Best, > Maria > > On 4 August 2015 at 16:30, Andreas Mueller <t3k...@gmail.com > <mailto:t3k...@gmail.com>> wrote: > Just to make sure, you are actually loading different files, not the same > file over and over again, right? > It seems an odd place for a memory error. Which version of scikit-learn are > you using? > What is ``len(j_indices)``? > > > > On 08/04/2015 10:18 AM, Maria Gorinova wrote: >> Hello, >> >> (I think I might have sent this to the wrong address the first time, so I'm >> sending it again) >> >> I have been trying to find my way around a weird memory error for days now. >> If I'm doing something wrong and this question is completely dumb, I'm sorry >> for spamming the maillist. But I'm desperate. >> >> When running this code, everything works as expected: >> >> ####################################### >> import os >> from sklearn.feature_extraction.text import CountVectorizer >> >> data = [] >> for i in range(0, 1000): >> filename = "a.txt" >> data.append(os.path.join(DATA_DIR, filename)) >> >> vectorizer = CountVectorizer(encoding = 'utf-8-sig', input = 'filename') >> vectors = vectorizer.fit_transform(data) >> ####################################### >> >> However, if I change the range to (0, 2000) it gives me a Memory Error with >> the following trace: >> >> ####################################### >> Traceback (most recent call last): >> File "C:\...\msin.py", line 16, in <module> >> vectors = vectorizer.fit_transform(data) >> File "C:\Python27\lib\site-packages\sklearn\feature_extraction\text.py", >> line 817, in fit_transform >> self.fixed_vocabulary_) >> File "C:\Python27\lib\site-packages\sklearn\feature_extraction\text.py", >> line 769, in _count_vocab >> values = np.ones(len(j_indices)) >> File "C:\Python27\lib\site-packages\numpy\core\numeric.py", line 178, in >> ones >> a = empty(shape, dtype, order) >> MemoryError >> ####################################### >> >> Notes: >> - the file is about 200 000 characters / 40 000 words. >> - OS is Windows 10. >> - the python process takes about 340MB RAM at the moment of Memory Error. >> - I've seen my python processes taking about 1.8GB before and there was >> never a problem. So Windows killing the process because it's trying to use >> too much memory doesn't seem to be the case here. >> - I keep receiving the error even if I restrict the vocabulary size. >> >> Thanks in advance!!! >> Maria >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> <mailto:Scikit-learn-general@lists.sourceforge.net> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > <mailto:Scikit-learn-general@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general> > > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general