On Wed, Apr 13, 2011 at 8:56 AM, Claudia Grieco <[email protected]>wrote:
> Thanks for the help :) > > Why not just train with those documents and put a category tag of "other" > on > >them and run normal categorization? If you can distinguish these > documents > >by word frequencies, then this should do the trick. > I don't know if this will help > Only an experiment will tell you. > 1)I'm still not sure where to put the threshold (if a document has word > frequency less than X...how to choose X?) > The classifier should handle that for you for the most part. Again, experimentation is the way to go here. My first cut would be to assign to the category with the highest score, possibly including the other category. > 2)The classifier is built incrementally: a document who would be classified > as "other" today may be classified as "new category the user has just added" > tomorrow. New docs in the training set and new categories are added from > time to time. > That is pretty easy. Just retrain with the new category assignments.
