On Sun, 10 Sep 2006, Maciej Blizi?ski wrote: > Hello all R-help list subscribers, > > I'd like to create a regression tree of a data set with binary response > variable. Only 5% of observations are a success, so the regression tree > will not find really any variable value combinations that will yield > more than 50% of probability of success.
This would be a misuse of a regression tree, for the exact problem for which classification trees were designed. > I am however interested in areas where the probability of success is > noticeably higher than 5%, for example 20%. I've tried rpart and the > weights option, increasing the weights of the success-observations. You are 'misleading' rpart by using 'weights', claiming to have case weights for cases you do not have. You need to use 'cost' instead. This is a standard issue, discussed in all good books on classification (including mine). > It works as expected in terms of the tree creation: instead of a single > root, a tree is being built. But the tree plot() and text() are somewhat > misleading. I'm interested in the observation counts inside each leaf. > I use the "use.n = TRUE" parameter. The counts displayed are misleading, > the numbers of successes are not the original numbers from the sample, > they seem to be cloned success-observations. They _are_ the original numbers, for that is what 'case weights' means. > I'd like to split the tree just as weights parameter allows me to, > keeping the original number of observations in the tree plot. Is it > possible? If yes, how? > > Kind regards, > Maciej -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
