Hi Temese, thank you very much for this information.
Boris On Tue, Mar 6, 2012 at 11:14, Temese Szalai <[email protected]> wrote: > Hi Boris - > > Unless Mahout has super-powers that I am not aware of, years of experience > in text classification tell me that - yes, you will have to rebuild the > classifier model regularly as new labeled data becomes available. > > If you are building a system that incorporates a user feedback loop as it > sounds like you are (i.e., "yes, this message is spam"), one thing that > might reduce the amount of classifier re-training would be to verify that > the > new incoming labeled document is not already in your data set, i.e., not a > dupe. Additionally, you probably want to wait to retrain until you have > some critical mass of newly labeled documents or else you have a critical > data point to include. > > If someone has the ability to say "no this is not spam", keeping that data > as labeled data to add to your anti-content/negative content set would be > valuable. > Best, > Temese > > On Tue, Mar 6, 2012 at 7:48 AM, Boris Fersing <[email protected]> wrote: > >> Hi all, >> >> is there a way to update a classifier model on the fly? Or do I need >> to recompute everything each time I add a document to a category in >> the training set? >> >> I would like to build something similar to some spam filters, where >> you can confirm that a message is a spam or not, and thus, train the >> classifier. >> >> regards, >> Boris >> -- >> 42 >> -- 42
