Try looking into Stochastic Gradient Descent (SGD), you could use 
AdaptiveLogisticRegression to simultaneously create multiple training models 
and try running your tests with the best model as spewed out by 
AdapativeLogisticRegression.



________________________________
From: Yuval Feinstein <[email protected]>
To: [email protected]
Sent: Monday, November 14, 2011 2:11 AM
Subject: Terminology Extraction

Hi all.
I am trying to use Mahout for terminology extraction:
I have ~140 classes, each of which contains ~100 text documents.
The class categories are distinct but may overlap a bit.
I want to extract terms related to the label, for example if I have a
"dogs" category,
the terms "canine", "German Sheppard", "bone" may be related to the
category.
What I have come up with in the meantime was:
1. Learn a classifier using Mahout.
2. Look at term weights for the classifier - terms with high weights are
suspect as representing the category.
I currently only use Naive Bayes, with ng=1.
My questions are:
a. Is this a good setting for the problem at hand? Or does Mahout have a
better algorithm for this?
b. Which Mahout classifier is best for this? I chose Naive Bayes first
because its parameters have a simple interpretation.
Which other (stronger) classifiers also have this property?
TIA,
Yuval

Reply via email to