are you able to make a np.ones stand alone of that size? On Tue, Aug 4, 2015 at 10:20 AM, Maria Gorinova <m.gorin...@gmail.com> wrote:
> Hi Andy, > > Thanks, I updated to 0.16.1, but the problem persists. > len(j_indices) is 68 356 000 when running for range(0,2000) and exactly > half of that when running for range(0,1000). > > Sebastian, thank you for the suggestion, but again, the issue doesn't seem > to be that the process is using too much memory, thus calling the garbage > collector doesn't help. > > Best, > Maria > > On 4 August 2015 at 17:24, Andreas Mueller <t3k...@gmail.com> wrote: > >> Thanks Maria. >> What I was asking was that you could use the debugger to see what >> len(j_indices) is when it crashes. >> I'm not sure if there were improvements to this code since 0.15.2 but I'd >> encourage you to upgrade to 0.16.1 anyhow. >> >> Cheers, >> Andy >> >> >> >> On 08/04/2015 11:56 AM, Maria Gorinova wrote: >> >> Hi Andreas, >> >> Thank you for the reply. The error also happens if I load different >> files, yes, but here I am actually loading the SAME file "a.txt". >> Which I did, just to demonstrate how awkward the error is... I don't know >> what len(j_indices) is, that's in sklearn\feature_extraction\text.py as >> shown in the exception trace. The version I'm using is 0.15.2 (I think...) >> >> Best, >> Maria >> >> On 4 August 2015 at 16:30, Andreas Mueller <t3k...@gmail.com> wrote: >> >>> Just to make sure, you are actually loading different files, not the >>> same file over and over again, right? >>> It seems an odd place for a memory error. Which version of scikit-learn >>> are you using? >>> What is ``len(j_indices)``? >>> >>> >>> >>> On 08/04/2015 10:18 AM, Maria Gorinova wrote: >>> >>> Hello, >>> >>> (I think I might have sent this to the wrong address the first time, so >>> I'm sending it again) >>> >>> I have been trying to find my way around a weird memory error for days >>> now. If I'm doing something wrong and this question is completely dumb, >>> I'm sorry for spamming the maillist. But I'm desperate. >>> >>> When running this code, everything works as expected: >>> >>> ####################################### >>> import os >>> from sklearn.feature_extraction.text import CountVectorizer >>> >>> data = [] >>> for i in range(0, 1000): >>> filename = "a.txt" >>> data.append(os.path.join(DATA_DIR, filename)) >>> >>> vectorizer = CountVectorizer(encoding = 'utf-8-sig', input = 'filename') >>> vectors = vectorizer.fit_transform(data) >>> ####################################### >>> >>> However, if I change the range to (0, 2000) it gives me a Memory Error >>> with the following trace: >>> >>> ####################################### >>> Traceback (most recent call last): >>> File "C:\...\msin.py", line 16, in <module> >>> vectors = vectorizer.fit_transform(data) >>> File >>> "C:\Python27\lib\site-packages\sklearn\feature_extraction\text.py", line >>> 817, in fit_transform >>> self.fixed_vocabulary_) >>> File >>> "C:\Python27\lib\site-packages\sklearn\feature_extraction\text.py", line >>> 769, in _count_vocab >>> values = np.ones(len(j_indices)) >>> File "C:\Python27\lib\site-packages\numpy\core\numeric.py", line 178, >>> in ones >>> a = empty(shape, dtype, order) >>> MemoryError >>> ####################################### >>> >>> Notes: >>> - the file is about 200 000 characters / 40 000 words. >>> - OS is Windows 10. >>> - the python process takes about 340MB RAM at the moment of Memory Error. >>> - I've seen my python processes taking about 1.8GB before and there was >>> never a problem. So Windows killing the process because it's trying to use >>> too much memory doesn't seem to be the case here. >>> - I keep receiving the error even if I restrict the vocabulary size. >>> >>> Thanks in advance!!! >>> Maria >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> >>> >>> _______________________________________________ >>> Scikit-learn-general mailing >>> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >>> >> >> >> ------------------------------------------------------------------------------ >> >> >> >> _______________________________________________ >> Scikit-learn-general mailing >> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general