> Are you concerned about storing the whole corpus text in memory, or the > whole corpus' statistics? If the text, use input='file' or input='filename' > (or a generator of texts).
I am not really sure which stage takes the most memory as my program kills itself due to memory limitation. But I suspect it is the latter (whole corpus statistics) that takes the most memory? (I used 1<=ngram<=3). -- Regards, Peng _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn