hi,
when I using sequential method to classify 20news-groups  dataset, all is ok.
but when I change the method to mapreduce, its confusion matrix all
becomes 0. and see output file , it all classified as unknown.

the following is my shell scripts.

train.sh:
MAHOUT_HOME=/home/lijun/mahout-0.3
$HADOOP_HOME/bin/hadoop fs -put $MAHOUT_HOME/examples/20news-input 20news-input
hadoop \
    jar \
    $MAHOUT_HOME/examples/target/mahout-examples-0.3.job \
    org.apache.mahout.classifier.bayes.TrainClassifier \
    -i 20news-input \
    -o newsmodel-ng1 \
    -ng 1 \
    -type bayes \
    -source hdfs

test.sh :
hadoop \
    jar \
    $MAHOUT_HOME/examples/target/mahout-examples-0.3.job \
    org.apache.mahout.classifier.bayes.TestClassifier \
    -m newsmodel-ng1 \
    -d 20news-input \
    -ng 1 \
   -type bayes \
   -source hdfs \
   -v \
   -method mapreduce  ( only here is changed, others untouched. )

when using mapreduce,  the result matrix all is 0.
and see output file , they are all classifed as unknown.
 ../bin/mahout seqdumper -s 20news-input-output/part-00000
Input Path: 20news-input-output/part-00000
Key class: class org.apache.mahout.common.StringTuple Value Class:
class org.apache.hadoop.io.DoubleWritable
Key: [__CT, alt.atheism, unknown]: Value: 799.0
Key: [__CT, comp.graphics, unknown]: Value: 973.0
Key: [__CT, comp.os.ms-windows.misc, unknown]: Value: 985.0
Key: [__CT, comp.sys.ibm.pc.hardware, unknown]: Value: 982.0
Key: [__CT, comp.sys.mac.hardware, unknown]: Value: 961.0
Key: [__CT, comp.windows.x, unknown]: Value: 980.0
Key: [__CT, misc.forsale, unknown]: Value: 972.0
Key: [__CT, rec.autos, unknown]: Value: 990.0
Key: [__CT, rec.motorcycles, unknown]: Value: 994.0
Key: [__CT, rec.sport.baseball, unknown]: Value: 994.0
Key: [__CT, rec.sport.hockey, unknown]: Value: 999.0
Key: [__CT, sci.crypt, unknown]: Value: 991.0
Key: [__CT, sci.electronics, unknown]: Value: 981.0
Key: [__CT, sci.med, unknown]: Value: 990.0
Key: [__CT, sci.space, unknown]: Value: 987.0
Key: [__CT, soc.religion.christian, unknown]: Value: 997.0
Key: [__CT, talk.politics.guns, unknown]: Value: 910.0
Key: [__CT, talk.politics.mideast, unknown]: Value: 940.0
Key: [__CT, talk.politics.misc, unknown]: Value: 775.0
Key: [__CT, talk.religion.misc, unknown]: Value: 628.0
Count: 20

I think maybe bugs happened at  modeling loading  before mapper.
Any suggest or patch ?
thanks.



-- 
Li Jun

Reply via email to