Hi all,

I have a question about the label in the output of random forest. Suppose I
do a binary classification with label 0 and 1. In my data description file,
I have something like
{"values":["1","0"],"label":true,"type":"categorical"}. The label 1 is in
index 0 and label 2 in index 1. Is it possible in the output file .out the
label is swapped?

I checked the source code of mahout Classifier
(mr/src/main/java/org/apache/mahout/classifier/df/mapreduce/Classifier.jara).
In the "parseOutput" function, it directly outputs the result into file
without trying to get the right label.

In the TestForest
(examples/src/main/java/org/apache/mahout/classifier/df/mapreduce/TestForest.jara).
If I specify the -a parameter. Then, it will output confusion matrix.
There, it looks like the right label is obtained by calling dataset
.getLabelString().

So, my conclusion is that the confusion matrix is always right (the user
provided label is used to compute). However, the output of prediction could
have a different label compared to the user supplied label. Is it right?

Thanks a lot

Best

Xuan

Reply via email to