hi,
when I using sequential method to classify 20news-groups dataset, all is ok.
but when I change the method to mapreduce, its confusion matrix all
becomes 0. and see output file , it all classified as unknown.
the following is my shell scripts.
train.sh:
MAHOUT_HOME=/home/lijun/mahout-0.3
$HADOOP_HOME/bin/hadoop fs -put $MAHOUT_HOME/examples/20news-input 20news-input
hadoop \
jar \
$MAHOUT_HOME/examples/target/mahout-examples-0.3.job \
org.apache.mahout.classifier.bayes.TrainClassifier \
-i 20news-input \
-o newsmodel-ng1 \
-ng 1 \
-type bayes \
-source hdfs
test.sh :
hadoop \
jar \
$MAHOUT_HOME/examples/target/mahout-examples-0.3.job \
org.apache.mahout.classifier.bayes.TestClassifier \
-m newsmodel-ng1 \
-d 20news-input \
-ng 1 \
-type bayes \
-source hdfs \
-v \
-method mapreduce ( only here is changed, others untouched. )
when using mapreduce, the result matrix all is 0.
and see output file , they are all classifed as unknown.
../bin/mahout seqdumper -s 20news-input-output/part-00000
Input Path: 20news-input-output/part-00000
Key class: class org.apache.mahout.common.StringTuple Value Class:
class org.apache.hadoop.io.DoubleWritable
Key: [__CT, alt.atheism, unknown]: Value: 799.0
Key: [__CT, comp.graphics, unknown]: Value: 973.0
Key: [__CT, comp.os.ms-windows.misc, unknown]: Value: 985.0
Key: [__CT, comp.sys.ibm.pc.hardware, unknown]: Value: 982.0
Key: [__CT, comp.sys.mac.hardware, unknown]: Value: 961.0
Key: [__CT, comp.windows.x, unknown]: Value: 980.0
Key: [__CT, misc.forsale, unknown]: Value: 972.0
Key: [__CT, rec.autos, unknown]: Value: 990.0
Key: [__CT, rec.motorcycles, unknown]: Value: 994.0
Key: [__CT, rec.sport.baseball, unknown]: Value: 994.0
Key: [__CT, rec.sport.hockey, unknown]: Value: 999.0
Key: [__CT, sci.crypt, unknown]: Value: 991.0
Key: [__CT, sci.electronics, unknown]: Value: 981.0
Key: [__CT, sci.med, unknown]: Value: 990.0
Key: [__CT, sci.space, unknown]: Value: 987.0
Key: [__CT, soc.religion.christian, unknown]: Value: 997.0
Key: [__CT, talk.politics.guns, unknown]: Value: 910.0
Key: [__CT, talk.politics.mideast, unknown]: Value: 940.0
Key: [__CT, talk.politics.misc, unknown]: Value: 775.0
Key: [__CT, talk.religion.misc, unknown]: Value: 628.0
Count: 20
I think maybe bugs happened at modeling loading before mapper.
Any suggest or patch ?
thanks.
--
Li Jun