[R] rpart-question regarding relation between cp and rel error
Dear useRs, I may be temporarily (I hope :-)) confused, and I hope that someone can answer this question that bugs me at the moment: In the CP table of rpart, I thought the following equation should hold: rel error = rel error(before) - (nsplit - nsplit(before)) * CP(before), where (before) always denotes the entry in the row above. While this equation holds for many rows of the CP tables I've looked at, it doesn't hold for all. For example, in the table below, 0.67182 != 0.68405 - (47-38)*0.0010616, with a difference of 0.002676 which appears larger than just numerical inaccuracy. CP nsplit rel error xerror xstd 1 0.1820909 0 1.0 1.0 0.012890 2 0.0526194 1 0.81791 0.81768 0.012062 3 0.0070390 2 0.76529 0.76529 0.011780 4 0.0043850 4 0.75121 0.77660 0.011842 5 0.0036157 5 0.74683 0.77106 0.011812 6 0.0032310 8 0.73598 0.77083 0.011810 7 0.0026541 9 0.73275 0.77083 0.011810 8 0.0025387 14 0.71936 0.76829 0.011796 9 0.0016155 16 0.71429 0.76644 0.011786 10 0.0013847 20 0.70759 0.76206 0.011761 11 0.0011539 28 0.69605 0.76621 0.011785 12 0.0010616 38 0.68405 0.76875 0.011799 13 0.0010001 47 0.67182 0.76991 0.011805 14 0.001 57 0.66144 0.77060 0.011809 Can someone explain why/when this happens? Regards, Ulrike -- View this message in context: http://www.nabble.com/rpart-question-regarding-relation-between-cp-and-rel-error-tf3356652.html#a9335690 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rpart question
On Thu, 25 Jan 2007, Aimin Yan wrote: I make classification tree like this code p.t2.90 - rpart(y~aa_three+bas+bcu+aa_ss, data=training,method=class,control=rpart.control(cp=0.0001)) Here I want to set weight for 4 predictors(aa_three,bas,bcu,aa_ss). I know that there is a weight set-up in rpart. Can this set-up satisfy my need? It depends on what _you_ mean by 'set weight'. You will need to tell us in detail what exactly you want the weights to do. Using the 'weights' argument is specifying case weights (as the help says). There are also 'cost' and 'parms' for other aspects of weighting. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rpart question
I make classification tree like this code p.t2.90 - rpart(y~aa_three+bas+bcu+aa_ss, data=training,method=class,control=rpart.control(cp=0.0001)) Here I want to set weight for 4 predictors(aa_three,bas,bcu,aa_ss). I know that there is a weight set-up in rpart. Can this set-up satisfy my need? If so, could someone give me an example? Thanks, Aimin Yan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rpart question
Wondered about the best way to control for input variables that have a large number of levels in 'rpart' models. I understand the algorithm searches through all possible splits (2^(k-1) for k levels) and so variables with more levels are more prone to be good spliters... so I'm looking for ways to compensate and adjust for this complexity. For example, if two variables produce comparable splits in the data but one contains 2 levels and the other 13 levels then I would like to have to have the algorithm choose the 'simpler' split. Is this best done with the 'cost' argument in the rpart options? This defaults to one for all variables... so would it make sense to scale this by nlevels in each variable or sqrt(nlevels) or something similar? Thanks, Landon [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] rpart question
AFAIK rpart does not have built-in facility for adjusting bias in split selection. One possibility is to define your own splitting criterion that does the adjustment is some fashion. I believe the current version of rpart allows you to define custom splitting criterion, but I have not tried it myself. Prof. Wei-yin Loh at UW-Madison (and his current and former students) had worked on algorithms that compensate for bias in split selection. There are software on his web page that you might want to check out. HTH, Andy From: [EMAIL PROTECTED] Wondered about the best way to control for input variables that have a large number of levels in 'rpart' models. I understand the algorithm searches through all possible splits (2^(k-1) for k levels) and so variables with more levels are more prone to be good spliters... so I'm looking for ways to compensate and adjust for this complexity. For example, if two variables produce comparable splits in the data but one contains 2 levels and the other 13 levels then I would like to have to have the algorithm choose the 'simpler' split. Is this best done with the 'cost' argument in the rpart options? This defaults to one for all variables... so would it make sense to scale this by nlevels in each variable or sqrt(nlevels) or something similar? Thanks, Landon [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] rpart question on loss matrix
Hello again I've looked through ?rpart, Atkinson Therneau (1997), Chap 10 of Venables and Ripley, Breman et al., and the r hgelp archives but haven't seen the answer to these two questions 1) How does rpart deal with asymmetric loss matrices? Breiman et al. suggest some possibilities, but, of course, do not say how rpart does it. 2) In the loss matrix, which direction (column or row) is 'truth' and which 'output of program'? e.g., if you have a 3 level DV (say the levels are A, B, C) and you want a higher cost for misclassifying as later in the alphabet, would it be 0 3 5 1 0 2 2 1 0 or 0 1 2 3 0 1 5 2 0 Thanks in advance Peter __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Rpart question - labeling nodes with something not in x$frame
On Thu, 17 Jul 2003, Peter Flom wrote: I have a tree created with tr.hh.logcas - rpart(log(YCASSX + 1)~AGE+DRUGUSEY+SEX+OBSXNUM +WINDLE, xval = 10) I would like to label the nodes with YCASSX rather than log(YCASSX + 1). But the help file for text in library rpart says that you can only use labels that are part of x$frame, which YCASSX is not. This may not be the best solution, but what I have done once is to add another column into the data frame with the labels I want. For example: data(iris) library(rpart) # Recoding the response: #s: setosa #c: versicolor #v: virginica ir - iris[, -5] Species - rep(c(s, c, v), rep(50, 3)) ir - as.data.frame(cbind(ir, Species)) ir.rp - rpart(Species ~ ., data = ir) plot(ir.rp) text(ir.rp) This is probably the long/silly way, but it works ;-D -- Cheers, Kevin -- On two occasions, I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able to rightly apprehend the kind of confusion of ideas that could provoke such a question. -- Charles Babbage (1791-1871) From Computer Stupidities: http://rinkworks.com/stupid/ -- Ko-Kang Kevin Wang Master of Science (MSc) Student SLC Tutor and Lab Demonstrator Department of Statistics University of Auckland New Zealand Homepage: http://www.stat.auckland.ac.nz/~kwan022 Ph: 373-7599 x88475 (City) x88480 (Tamaki) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help