Hi again,
I'm now using Mahout 0.9, and I'm trying to use PCA (via the SSVD) to
reduce the dimention of a dataset from 1600+ features to ~100 and then to
use the reducted dataset to train a naive bayes model and test it.
So here is my workflow :
- Transform my CSV into a SequencFile with
key = class as Text (with a "/" in it to be accepted by NaiveBayes, so in
the for "class/class") using a custom job in MapReduce.
value = features as VectorWritable
- Use mahout command line to reduce the dimension of the dataset :
mahout ssvd -i /user/myCompny/Echant/echant100k.seq -o
/user/myCompany/Echant/echant100k_red.seq --rank 100 -us -V false -U true
-pca -ow -t 3
==> Here I get - if I understand things correctly - U, being the reducted
dataset.
- Use mahout command line to train the NaiveBayes model :
mahout trainnb -i /user/myCompany/Echant/echant100k_red.seq/U -o
/user/myCompany/Echant/echant100k_red.model -l 0,1
-li /user/myCompany/Echant/labelIndex100k_red -ow
- Use mahout command line to test the generated model :
mahout testnb
-i /user/myCompany/Echant/echant100k_red.seq/U --model
/user/myCompany/Echant/echant100k_red.model -ow
-o /user/myCompany/Echant/predicted_echant100k --labelIndex
/user/myCompany/Echant/labelIndex100k_red
(Here I test with the same dataset, but I should try with other datasets as
well once it runs smoothly)
Here is my problem, everything seems to work quite well until I test my
model : the output is full of NaN :
Key: 1: Value: {0:NaN,1:NaN}
Key: 1: Value: {0:NaN,1:NaN}
Key: 0: Value: {0:NaN,1:NaN}
Key: 0: Value: {0:NaN,1:NaN}
Key: 1: Value: {0:NaN,1:NaN}
Key: 0: Value: {0:NaN,1:NaN}
Key: 1: Value: {0:NaN,1:NaN}
Key: 0: Value: {0:NaN,1:NaN}
Key: 0: Value: {0:NaN,1:NaN}
Key: 0: Value: {0:NaN,1:NaN}
Key: 1: Value: {0:NaN,1:NaN}
I already have the same problem when training and testing with the full
dataset but there, about 15% of the data still has values in output and
gets predicted, the rest being NaN and unpredicted.
Could you help me see what could be causing that ?
Thanks in advance
Bests,
Kévin Moulart