[R] rpart-question regarding relation between cp and rel error

2007-03-06 Thread Ulrike Grömping

Dear useRs,

I may be temporarily (I hope :-)) confused, and I hope that someone can
answer this question that bugs me at the moment:

In the CP table of rpart, I thought the following equation should hold: 
 rel error = rel error(before) - (nsplit - nsplit(before)) * CP(before),
where (before) always denotes the entry in the row above.
While this equation holds for many rows of the CP tables I've looked at, it
doesn't hold for all. 

For example, in the table below, 0.67182 != 0.68405 - (47-38)*0.0010616,
with a difference of 0.002676 which appears larger than just numerical
inaccuracy.

  CP nsplit rel error  xerror xstd
1  0.1820909  0   1.0 1.0 0.012890
2  0.0526194  1   0.81791 0.81768 0.012062
3  0.0070390  2   0.76529 0.76529 0.011780
4  0.0043850  4   0.75121 0.77660 0.011842
5  0.0036157  5   0.74683 0.77106 0.011812
6  0.0032310  8   0.73598 0.77083 0.011810
7  0.0026541  9   0.73275 0.77083 0.011810
8  0.0025387 14   0.71936 0.76829 0.011796
9  0.0016155 16   0.71429 0.76644 0.011786
10 0.0013847 20   0.70759 0.76206 0.011761
11 0.0011539 28   0.69605 0.76621 0.011785
12 0.0010616 38   0.68405 0.76875 0.011799
13 0.0010001 47   0.67182 0.76991 0.011805
14 0.001 57   0.66144 0.77060 0.011809

Can someone explain why/when this happens?

Regards, Ulrike
-- 
View this message in context: 
http://www.nabble.com/rpart-question-regarding-relation-between-cp-and-rel-error-tf3356652.html#a9335690
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rpart question

2007-01-25 Thread Prof Brian Ripley
On Thu, 25 Jan 2007, Aimin Yan wrote:

 I make classification tree like this code
 p.t2.90 - rpart(y~aa_three+bas+bcu+aa_ss,
 data=training,method=class,control=rpart.control(cp=0.0001))

 Here I want to set weight for 4 predictors(aa_three,bas,bcu,aa_ss).

 I know that there is a weight set-up in rpart.
 Can this set-up satisfy my need?

It depends on what _you_ mean by 'set weight'.  You will need to tell us 
in detail what exactly you want the weights to do.

Using the 'weights' argument is specifying case weights (as the help 
says).  There are also 'cost' and 'parms' for other aspects of weighting.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rpart question

2007-01-24 Thread Aimin Yan
I make classification tree like this code
p.t2.90 - rpart(y~aa_three+bas+bcu+aa_ss, 
data=training,method=class,control=rpart.control(cp=0.0001))

Here I want to set weight for 4 predictors(aa_three,bas,bcu,aa_ss).

I know that there is a weight set-up in rpart.
Can this set-up satisfy my need?

If so, could someone give me an example?

Thanks,

Aimin Yan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rpart question

2004-05-04 Thread lsjensen
Wondered about the best way to control for input variables that have a
large number of levels in 'rpart' models.  I understand the algorithm
searches through all possible splits (2^(k-1) for k levels) and so
variables with more levels are more prone to be good spliters... so I'm
looking for ways to compensate and adjust for this complexity.

For example, if two variables produce comparable splits in the data but
one contains 2 levels and the other 13 levels then I would like to have
to have the algorithm choose the 'simpler' split.

Is this best done with the 'cost' argument in the rpart options?  This
defaults to one for all variables... so would it make sense to scale
this by nlevels in each variable or sqrt(nlevels) or something similar?

Thanks,
Landon


[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] rpart question

2004-05-04 Thread Liaw, Andy
AFAIK rpart does not have built-in facility for adjusting bias in split
selection.  One possibility is to define your own splitting criterion that
does the adjustment is some fashion.  I believe the current version of rpart
allows you to define custom splitting criterion, but I have not tried it
myself.

Prof. Wei-yin Loh at UW-Madison (and his current and former students) had
worked on algorithms that compensate for bias in split selection.  There are
software on his web page that you might want to check out.

HTH,
Andy

 From: [EMAIL PROTECTED]
 
 Wondered about the best way to control for input variables that have a
 large number of levels in 'rpart' models.  I understand the algorithm
 searches through all possible splits (2^(k-1) for k levels) and so
 variables with more levels are more prone to be good 
 spliters... so I'm
 looking for ways to compensate and adjust for this complexity.
 
 For example, if two variables produce comparable splits in 
 the data but
 one contains 2 levels and the other 13 levels then I would 
 like to have
 to have the algorithm choose the 'simpler' split.
 
 Is this best done with the 'cost' argument in the rpart options?  This
 defaults to one for all variables... so would it make sense to scale
 this by nlevels in each variable or sqrt(nlevels) or 
 something similar?
 
 Thanks,
 Landon
 
 
   [[alternative HTML version deleted]]
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] rpart question on loss matrix

2004-01-07 Thread Peter Flom
Hello again

I've looked through ?rpart, Atkinson  Therneau (1997), Chap 10 of
Venables and Ripley, Breman et al., and the r hgelp archives  but
haven't seen the answer to these two questions

1) How does rpart deal with asymmetric loss matrices?  Breiman et al.
suggest some possibilities, but, of course, do not say how rpart does
it.

2) In the loss matrix, which direction (column or row) is 'truth' and
which 'output of program'?  e.g., if you have a 3 level DV (say the
levels are A, B, C) and you want a higher cost for misclassifying as
later in the alphabet, would it be

0  3  5  
1  0  2
2  1  0

or

0  1  2
3  0  1  
5  2  0


Thanks in advance

Peter

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Rpart question - labeling nodes with something not in x$frame

2003-07-17 Thread Ko-Kang Kevin Wang
On Thu, 17 Jul 2003, Peter Flom wrote:

 I have a tree created with
 
 tr.hh.logcas - rpart(log(YCASSX + 1)~AGE+DRUGUSEY+SEX+OBSXNUM +WINDLE,
 xval = 10)
 
 I would like to label the nodes with YCASSX rather than log(YCASSX +
 1).  But the help file for text in library rpart says that you can only
 use labels that are part of x$frame, which YCASSX is not.

This may not be the best solution, but what I have done once is to add 
another column into the data frame with the labels I want.

For example:
  data(iris)
  library(rpart)
  # Recoding the response:
  #s: setosa
  #c: versicolor
  #v: virginica
  ir - iris[, -5]
  Species - rep(c(s, c, v), rep(50, 3))
  ir - as.data.frame(cbind(ir, Species))
  ir.rp - rpart(Species ~ ., data = ir)
  plot(ir.rp)
  text(ir.rp)

This is probably the long/silly way, but it works ;-D

-- 
Cheers,

Kevin

--
On two occasions, I have been asked [by members of Parliament],
'Pray, Mr. Babbage, if you put into the machine wrong figures, will
the right answers come out?' I am not able to rightly apprehend the
kind of confusion of ideas that could provoke such a question.

-- Charles Babbage (1791-1871) 
 From Computer Stupidities: http://rinkworks.com/stupid/

--
Ko-Kang Kevin Wang
Master of Science (MSc) Student
SLC Tutor and Lab Demonstrator
Department of Statistics
University of Auckland
New Zealand
Homepage: http://www.stat.auckland.ac.nz/~kwan022
Ph: 373-7599
x88475 (City)
x88480 (Tamaki)

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help