Hi,
I am working on sentiment analysis of tweets.
I am using mahout naive bayes classifier for it.I am making a directory
"data".Inside "data" I am making  three more directories named
"positive","negative","uncertain"..Then I kept 151 files(total 151Mb) on
each of these positive,negatie and uncertain directory..Then I kept the data
directory in hdfs..below are the set of command i ran to generate the model
and labelindex out of it.

bin/mahout seqdirectory -i ${WORK_DIR}/data  -o ${WORK_DIR}/data-seq
bin/mahout seq2sparse   -i ${WORK_DIR}/data-seq  -o ${WORK_DIR}/data-vectors 
-lnorm -nv  -wt tfidf
bin/mahout split -i ${WORK_DIR}/data-vectors/tfidf-vectors  
--trainingOutput ${WORK_DIR}/data-train-vectors --testOutput
${WORK_DIR}/data-test-vectors  --randomSelectionPct 40 --overwrite
--sequenceFiles -xm sequential
bin/mahout trainnb -i ${WORK_DIR}/data-train-vectors -el -o
${WORK_DIR}/model -li ${WORK_DIR}/labelindex -ow $c

 I am getting the confusion matrix after testing on the same set of data
using "testnb" command as given below:

bin/mahout testnb  -i ${WORK_DIR}/data-train-vectors  -m ${WORK_DIR}/model 
-l ${WORK_DIR}/labelindex -ow -o ${WORK_DIR}/data-testing $c

Confusion Matrix
-------------------------------------------------------
a       b       c       <--Classified as
151    0        0        |  151         a     = negative
0    151        0        |  151         b     = positive
0       0       151    |  151           c     = uncertain


Then I created a some another directory "data2" in the same way and put some
random data(which is a sub set of the training data(30 files(total size
30MB) each)) in positive,negative,uncertain directory inside it .Then i
created a vector out of it using the "seq2sparse" command given below :-

bin/mahout seqdirectory -i ${WORK_DIR}/data2  -o ${WORK_DIR}/data2-seq
bin/mahout seq2sparse   -i ${WORK_DIR}/data2-seq  -o
${WORK_DIR}/data2-vectors  -lnorm -nv  -wt tfidf 

On  running the "testnb" using the model/lablelindex created from the
previous set of data using the command given below:-

bin/mahout testnb  -i ${WORK_DIR}/data2-vectors/tfidf-vectors/part-r-00000 
-m ${WORK_DIR}/model  -l ${WORK_DIR}/labelindex -ow -o
${WORK_DIR}/data2-testing $c

.I am getting confusion matrix like this.

Confusion Matrix
-------------------------------------------------------
a       b       c       <--Classified as
0     30        0       |  30           a     = negative
0     30        0       | 30            b     = positive
0     30      0      |  30      c     = uncertain

Can anyone tell me why this is coming.Am i using the correct way to test the
model or it is a bug in mahout 0.7.If it is not the correct way please
suggest a way out of it.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-model-of-mahout-0-7-tp4013891.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Reply via email to