> Are you concerned about storing the whole corpus text in memory, or the
> whole corpus' statistics? If the text, use input='file' or input='filename'
> (or a generator of texts).

I am not really sure which stage takes the most memory as my program
kills itself due to memory limitation. But I suspect it is the latter
(whole corpus statistics) that takes the most memory? (I used
1<=ngram<=3).

-- 
Regards,
Peng
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to