Are you concerned about storing the whole corpus text in memory, or the
whole corpus' statistics? If the text, use input='file' or input='filename'
(or a generator of texts).

On Tue, 28 Jan 2020 at 18:01, Peng Yu <pengyu...@gmail.com> wrote:

> Hi,
>
> To use TfidfVectorizer, the whole corpus must be used into memory.
> This can be a problem for machines without a lot of memory. Is there a
> way to use only a small amount of memory by saving most intermediate
> results in the disk? Thanks.
>
> --
> Regards,
> Peng
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to