Re: [scikit-learn] Memory efficient TfidfVectorizer

Peng Yu Tue, 28 Jan 2020 03:28:53 -0800

> Are you concerned about storing the whole corpus text in memory, or the
> whole corpus' statistics? If the text, use input='file' or input='filename'
> (or a generator of texts).


I am not really sure which stage takes the most memory as my program
kills itself due to memory limitation. But I suspect it is the latter
(whole corpus statistics) that takes the most memory? (I used
1<=ngram<=3).

-- 
Regards,
Peng
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Memory efficient TfidfVectorizer

Reply via email to