Hi Boris - Unless Mahout has super-powers that I am not aware of, years of experience in text classification tell me that - yes, you will have to rebuild the classifier model regularly as new labeled data becomes available.
If you are building a system that incorporates a user feedback loop as it sounds like you are (i.e., "yes, this message is spam"), one thing that might reduce the amount of classifier re-training would be to verify that the new incoming labeled document is not already in your data set, i.e., not a dupe. Additionally, you probably want to wait to retrain until you have some critical mass of newly labeled documents or else you have a critical data point to include. If someone has the ability to say "no this is not spam", keeping that data as labeled data to add to your anti-content/negative content set would be valuable. Best, Temese On Tue, Mar 6, 2012 at 7:48 AM, Boris Fersing <[email protected]> wrote: > Hi all, > > is there a way to update a classifier model on the fly? Or do I need > to recompute everything each time I add a document to a category in > the training set? > > I would like to build something similar to some spam filters, where > you can confirm that a message is a spam or not, and thus, train the > classifier. > > regards, > Boris > -- > 42 >
