Are you concerned about storing the whole corpus text in memory, or the whole corpus' statistics? If the text, use input='file' or input='filename' (or a generator of texts).
On Tue, 28 Jan 2020 at 18:01, Peng Yu <pengyu...@gmail.com> wrote: > Hi, > > To use TfidfVectorizer, the whole corpus must be used into memory. > This can be a problem for machines without a lot of memory. Is there a > way to use only a small amount of memory by saving most intermediate > results in the disk? Thanks. > > -- > Regards, > Peng > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn