Hi Temese,

thank you very much for this information.

Boris

On Tue, Mar 6, 2012 at 11:14, Temese Szalai <[email protected]> wrote:
> Hi Boris -
>
> Unless Mahout has super-powers that I am not aware of, years of experience
> in text classification tell me that - yes, you will have to rebuild the
> classifier model regularly as new labeled data becomes available.
>
> If you are building a system that incorporates a user feedback loop as it
> sounds like you are (i.e., "yes, this message is spam"), one thing that
> might reduce the amount of classifier re-training would be to verify that
> the
> new incoming labeled document is not already in your data set, i.e., not a
> dupe. Additionally, you probably want to wait to retrain until you have
> some critical mass of newly labeled documents or else you have a critical
> data point to include.
>
> If someone has the ability to say "no this is not spam", keeping that data
> as labeled data to add to your anti-content/negative content set would be
> valuable.
> Best,
> Temese
>
> On Tue, Mar 6, 2012 at 7:48 AM, Boris Fersing <[email protected]> wrote:
>
>> Hi all,
>>
>> is there a way to update a classifier model on the fly? Or do I need
>> to recompute everything each time I add a document to a category in
>> the training set?
>>
>> I would like to build something similar to some spam filters, where
>> you can confirm that a message is a spam or not, and thus, train the
>> classifier.
>>
>> regards,
>> Boris
>> --
>> 42
>>



-- 
42

Reply via email to