Hi guys,

I'm using SGD to classify a set of documents but I have a problem: there are
some documents that are not related to any of the categories and I want to
be able to identify them and exclude them from the classification. My idea
is to read the documents of the training set (that are currently in a Lucene
index) and identify the docs that have less terms in common with them. Any
idea on how to do it?

Thanks a lot

Claudia 

Reply via email to