Hi, everyone, I haven't found anything similar in the forum, so here's my problem (I'm no expert in R nor statistics):
I have a data set of 59.000 cases with 9 variables each (fractional coverage of 9 different plant types, such as deciduous broad-leaved temperate trees or evergreen tropical trees etc.), which was generated by a vegetation model. In order to evaluate the quality of the vegetation model's output, I want to compare it to a land-cover data set which has 23 different land-cover types (such as needle leaved evergreen forest, dense broad-leaved forest, barren, etc.). A statistician advised me to use the randomForest package in R and using a sub-set to generate the random Forest, I get a very good prediction for the rest. However, I need to evaluate how meaningful this classification is in an ecological sense (boreal trees should not play a role in the definition of tropical land-cover types, for example), otherwise I cannot judge the quality of the vegetation model's output. Unfortunately, randomForest gives me about 15.000 splits of which about 5000 are end branches (rough guess), so it's very hard and time-consuming to check each single branch of one of the final trees for its ecological meaning. Is there any utility to summarize the characteristics of each of the 23 prediction classes? Such as "land-cover class 1 has less than 5% of plant types 1-5, 20-50% of plant type 7 and at least 30% of plant type 8". Or is there a more suitable method to classify my data? Thanks a lot in advance! Christoph ____________________________________________________________________________ Click on the following link for the Netherlands Environmental Assessment Agency(MNP)mission and contact information: http://www.mnp.nl/signature.html Klik op de volgende link voor missie en contactinformatie van het Milieu- en Natuurplanbureau (MNP): http://www.mnp.nl/signature.html ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.