Hi guys, I'm using SGD to classify a set of documents but I have a problem: there are some documents that are not related to any of the categories and I want to be able to identify them and exclude them from the classification. My idea is to read the documents of the training set (that are currently in a Lucene index) and identify the docs that have less terms in common with them. Any idea on how to do it?
Thanks a lot Claudia
