Re: [scikit-learn] Memory efficient TfidfVectorizer

Joel Nothman Tue, 28 Jan 2020 02:22:23 -0800

Are you concerned about storing the whole corpus text in memory, or the
whole corpus' statistics? If the text, use input='file' or input='filename'
(or a generator of texts).


On Tue, 28 Jan 2020 at 18:01, Peng Yu <pengyu...@gmail.com> wrote:

> Hi,
>
> To use TfidfVectorizer, the whole corpus must be used into memory.
> This can be a problem for machines without a lot of memory. Is there a
> way to use only a small amount of memory by saving most intermediate
> results in the disk? Thanks.
>
> --
> Regards,
> Peng
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Memory efficient TfidfVectorizer

Reply via email to