Hello, Ted.
I have a big problem when I run random forest to classify my dataset which
is in the following format:
-2.73887,-2.731803,15,0.00009,3,8,0.002033,0.046203,0.000005,1,0.00009,1,1,3,8,0.002033,0.124838,0.125,0.000005,1,0.298142,0,0,0,0.001425,1,1,11,11,0.001425,0.062832,0.090909,0.00001,0.001425,1,1,11,11,0.001425,0.114017,0.090909,0.000008,5.466667,10,1
There are 44 numeric features and one label at last.
First, I split the dataset into two parts. One has 90% which is used to
train the model and the remain is used to test. So when I run random forest
to train model with the following parameters:
    -Dmapred.max.split.size=13488881 -oob -sl 5 -p -t 100 -o forest-model
the 13488881 means 1/10 size of the dataset. After I use the test set to
predict the values, I get the following result:

12/02/24 22:12:55 INFO mapreduce.TestForest:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :       1999        3.7294%
Incorrectly Classified Instances        :      51602       96.2706%
Total Classified Instances              :      53601

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       d       e       f       g       h
<--Classified as
0       51      1       7154    5257    0       0       0        |  12463
   a     = 1
0       183     0       3255    26901   0       0       0        |  30339
   b     = 2
0       152     0       549     3742    0       0       0        |  4443
    c     = 3
0       699     0       320     2280    0       0       0        |  3299
    d     = 5
0       148     1       234     1472    0       0       0        |  1855
    e     = 4
0       14      0       497     332     20      19      0        |  882
   f     = 0
0       14      0       94      206     2       4       0        |  320
   g     = 6
0       0       0       0       0       0       0       0        |  0
   h     = unknown
Default Category: unknown: 7

Is this really a bad result? I don't know what this means? I need a help.
Thank you.

Reply via email to