Hi,
I'm testing the example of 20 news groups, but I've got a little problem.
When I try to classify some data and use the model (generated by training
step) it always returns "unknown" category.
I think the model is correct because when I'm testing the model I can see
the matrix, and the result is correct so I don't understand why it happens.
I put my custom classifier:
/public class CustomClassifier {
private ClassifierContext context;
private Algorithm algorithm;
private Datastore datastore;
private File modelDirectory;
Analyzer analyzer;
BayesParameters p;
public CustomClassifier(){
analyzer = new DefaultAnalyzer();
}
public static BayesParameters setParams() {
BayesParameters bayesParams = new BayesParameters();
bayesParams.setGramSize(1);
bayesParams.set("dataSource", "hdfs");
bayesParams.set("defaultCat", "unknown");
bayesParams.set("encoding", "UTF-8");
bayesParams.set("alpha_i", "1.0");
return bayesParams;
}
public void init(File basePath) throws FileNotFoundException,
InvalidDatastoreException {
algorithm = new BayesAlgorithm();
p = setParams();
p.set("basePath", basePath.getAbsolutePath());
p.setGramSize(1);
datastore = new InMemoryBayesDatastore(p);
context = new ClassifierContext(algorithm, datastore);
context.initialize();
}
public String classify() throws IOException, InvalidDatastoreException {
StringReader reader = new StringReader("Thanks to a reply from
someone I looked a little further and found what I was looking for. The
April CR magazine has most of the above things. Despite recent articles here
the ratings looked pretty good for relative comparison purposes.
Unfortunately the crash test comparisons didn't include half of the cars I'm
comparing. Anybody know how '93 Honda Civic hatchbacks and Toyota Tercels
fare in an accident? ");
String[] document =
BayesFileFormatter.readerToDocument(analyzer, reader);
ClassifierResult result = context.classifyDocument(document,
"unknown");
return result.getLabel();
}
public static void main(String[] args) throws Exception {
CustomClassifier cc;
try {
cc = new CustomClassifier();
cc.init(new File(args[0]));
System.out.println("Category::: " + cc.classify());
} catch (Exception e) {
e.printStackTrace();
}
}
}/
I can't figure out why it happens. When I go to the hdfs, I see the folders
where the model is stored and all of them have the correct structure.
If I get the labels from the classifier, all of them are empty. I discovered
that when the classifier gets the model with:
/ SequenceFileModelReader.loadModel(this, params, conf);
loadFeatureWeights(datastore, new Path(params.get("sigma_j")), conf);
loadLabelWeights(datastore, new Path(params.get("sigma_k")), conf);
loadSumWeight(datastore, new Path(params.get("sigma_kSigma_j")), conf);
loadThetaNormalizer(datastore, new Path(params.get("thetaNormalizer")),
conf);
loadWeightMatrix(datastore, new Path(params.get("weight")), conf);
/
All of the parts are empty but it's false. Is there any problem with
SequenceFileDirIterable? I'm using of Cloudera distribution 3u3, so mahout
0.5 version.
Thanks
--
View this message in context:
http://lucene.472066.n3.nabble.com/Problem-with-custom-classifier-tp3994358.html
Sent from the Mahout User List mailing list archive at Nabble.com.