Thanks Charles, I'll have a look at it. cheers, Boris
On Tue, Mar 6, 2012 at 11:25, Charles Earl <[email protected]> wrote: > Boris, > Have you looked at online decision trees and the ilke > http://www.cs.washington.edu/homes/pedrod/papers/kdd01b.pdf > I think ultimately the concept boils down to Temese's observation of their > being some measure (in the paper's case, concept drift) > that triggers re-training of the entire set. > C > On Mar 6, 2012, at 11:17 AM, Boris Fersing wrote: > >> Hi Temese, >> >> thank you very much for this information. >> >> Boris >> >> On Tue, Mar 6, 2012 at 11:14, Temese Szalai <[email protected]> wrote: >>> Hi Boris - >>> >>> Unless Mahout has super-powers that I am not aware of, years of experience >>> in text classification tell me that - yes, you will have to rebuild the >>> classifier model regularly as new labeled data becomes available. >>> >>> If you are building a system that incorporates a user feedback loop as it >>> sounds like you are (i.e., "yes, this message is spam"), one thing that >>> might reduce the amount of classifier re-training would be to verify that >>> the >>> new incoming labeled document is not already in your data set, i.e., not a >>> dupe. Additionally, you probably want to wait to retrain until you have >>> some critical mass of newly labeled documents or else you have a critical >>> data point to include. >>> >>> If someone has the ability to say "no this is not spam", keeping that data >>> as labeled data to add to your anti-content/negative content set would be >>> valuable. >>> Best, >>> Temese >>> >>> On Tue, Mar 6, 2012 at 7:48 AM, Boris Fersing <[email protected]> wrote: >>> >>>> Hi all, >>>> >>>> is there a way to update a classifier model on the fly? Or do I need >>>> to recompute everything each time I add a document to a category in >>>> the training set? >>>> >>>> I would like to build something similar to some spam filters, where >>>> you can confirm that a message is a spam or not, and thus, train the >>>> classifier. >>>> >>>> regards, >>>> Boris >>>> -- >>>> 42 >>>> >> >> >> >> -- >> 42 > -- 42
