Is there a workflow figured out for how to handle collecting and processing multiple document collections? Meaning if I run N documents through SparseVectorsFromSequenceFiles and a month later have another 50K documents I'd like to add to the same corpus, what is the standard workflow for doing this?
Are people re-processing the entire corpus, including new files? I haven't seen any code/classes in the mahout vectorizer package for adding new documents to the dictionary, and tfidf vectors. -- Thanks, John C
