[R] random forest -optimising mtry

Ute Wed, 13 Oct 2004 03:00:54 -0700

Dear R-helpers,

I'm working on mass spectra in randomForest/R, and following the recommendations for the case of noisy variables, I don't want to use the default mtry (sqrt of nvariables), but I'm not sure up to which proportion mtry/nvariables it makes sense to increase mtry without "overtuning" RF. Let me tell my example: I have 106 spectra belonging to 4 classes, the number of variables is 172. I'm interested in finding information about variables (importance, split points etc.) and proximities. First I ran a forest with mtry =30 and ntree=2500. The result was an oob-estimate of overall error rate of zero, perfect classification. In order to explore my results, I calculated the average proximity between the classes. I got: > res op12 op13 op14 op23 op24 op34 [1,] 0.06145473 0.1369406 0.08036264 0.06171053 0.1113126 0.06732087 For me, the important meaning of these values is that from comparision of class 1 and 3, as well as class 2 and 4 result more common features than from other comparisions. I have worked yet a lot about these data, I have looked a lot on my spectra, and I believe these proximities to be realistic.

Then I ran the tune RF function(step factor 1.5), I got out an mtry=63. A new forest having this mtry and 2500 trees gave me perfect classification as well, but the relation between proximitiy values changed a lot: res op12 op13 op14 op23 op24 op34 [1,] 0.1092702 0.117489 0.09696328 0.08725208 0.08495621 0.06506148

This is what makes me think that I have overtuned my second forest...So how should I choose mtry?

Best regards,
Ute

______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] random forest -optimising mtry

Reply via email to