Are you concerned about storing the whole corpus text in memory, or the whole corpus' statistics? If the text, use input='file' or input='filename' (or a generator of texts).
On Tue, 28 Jan 2020 at 18:01, Peng Yu <[email protected]> wrote: > Hi, > > To use TfidfVectorizer, the whole corpus must be used into memory. > This can be a problem for machines without a lot of memory. Is there a > way to use only a small amount of memory by saving most intermediate > results in the disk? Thanks. > > -- > Regards, > Peng > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
