OK, I will have a try.
Because of the large size of the dataset, I manually split the dataset into
two parts using VI editor. One have the first 90%, and another have the
last 10%. I don't complete the process of cross validation, just finish the
first iteration. Is there a problem?
Thank you for your replies.

在 2012年2月26日 上午2:17,deneche abdelhakim <[email protected]>写道:

> If you have enough memory, you could try the in-memory implementation
> (remove -p parameter) and see if the results do improve.
>
> How did you split the data into train/test datasets ?
>
> On Sat, Feb 25, 2012 at 12:57 PM, tanzek <[email protected]> wrote:
>
> > Hello, Ted.
> > I have a big problem when I run random forest to classify my dataset
> which
> > is in the following format:
> >
> >
> -2.73887,-2.731803,15,0.00009,3,8,0.002033,0.046203,0.000005,1,0.00009,1,1,3,8,0.002033,0.124838,0.125,0.000005,1,0.298142,0,0,0,0.001425,1,1,11,11,0.001425,0.062832,0.090909,0.00001,0.001425,1,1,11,11,0.001425,0.114017,0.090909,0.000008,5.466667,10,1
> > There are 44 numeric features and one label at last.
> > First, I split the dataset into two parts. One has 90% which is used to
> > train the model and the remain is used to test. So when I run random
> forest
> > to train model with the following parameters:
> >    -Dmapred.max.split.size=13488881 -oob -sl 5 -p -t 100 -o forest-model
> > the 13488881 means 1/10 size of the dataset. After I use the test set to
> > predict the values, I get the following result:
> >
> > 12/02/24 22:12:55 INFO mapreduce.TestForest:
> > =======================================================
> > Summary
> > -------------------------------------------------------
> > Correctly Classified Instances          :       1999        3.7294%
> > Incorrectly Classified Instances        :      51602       96.2706%
> > Total Classified Instances              :      53601
> >
> > =======================================================
> > Confusion Matrix
> > -------------------------------------------------------
> > a       b       c       d       e       f       g       h
> > <--Classified as
> > 0       51      1       7154    5257    0       0       0        |  12463
> >   a     = 1
> > 0       183     0       3255 26901 0       0       0        |  30339
> >   b     = 2
> > 0       152     0       549     3742    0       0       0        |  4443
> >    c     = 3
> > 0       699     0       320     2280    0       0       0        |  3299
> >    d     = 5
> > 0       148     1       234     1472    0       0       0        |  1855
> >    e     = 4
> > 0       14      0       497     332     20      19      0        |  882
> >   f     = 0
> > 0       14      0       94      206     2       4       0        |  320
> >   g     = 6
> > 0       0       0       0       0       0       0       0        |  0
> >   h     = unknown
> > Default Category: unknown: 7
> >
> > Is this really a bad result? I don't know what this means? I need a help.
> > Thank you.
> >
>

Reply via email to