Hi Boris -

Unless Mahout has super-powers that I am not aware of, years of experience
in text classification tell me that - yes, you will have to rebuild the
classifier model regularly as new labeled data becomes available.

If you are building a system that incorporates a user feedback loop as it
sounds like you are (i.e., "yes, this message is spam"), one thing that
might reduce the amount of classifier re-training would be to verify that
the
new incoming labeled document is not already in your data set, i.e., not a
dupe. Additionally, you probably want to wait to retrain until you have
some critical mass of newly labeled documents or else you have a critical
data point to include.

If someone has the ability to say "no this is not spam", keeping that data
as labeled data to add to your anti-content/negative content set would be
valuable.
Best,
Temese

On Tue, Mar 6, 2012 at 7:48 AM, Boris Fersing <[email protected]> wrote:

> Hi all,
>
> is there a way to update a classifier model on the fly? Or do I need
> to recompute everything each time I add a document to a category in
> the training set?
>
> I would like to build something similar to some spam filters, where
> you can confirm that a message is a spam or not, and thus, train the
> classifier.
>
> regards,
> Boris
> --
> 42
>

Reply via email to