[R] rpart weight prior

2007-07-09 Thread Aurélie Davranche
Hi! Could you please explain the difference between prior and weight in rpart? It seems to be the same. But in this case why including a weight option in the latest versions? For an unbalanced sampling what is the best to use : weight, prior or the both together? Thanks a lot. Aurélie

Re: [R] rpart weight prior

2007-07-09 Thread Prof Brian Ripley
On Sun, 8 Jul 2007, Aurélie Davranche wrote: Hi! Could you please explain the difference between prior and weight in rpart? It seems to be the same. But in this case why including a weight option in the latest versions? For an unbalanced sampling what is the best to use : weight, prior or

[R] rpart-question regarding relation between cp and rel error

2007-03-06 Thread Ulrike Grömping
Dear useRs, I may be temporarily (I hope :-)) confused, and I hope that someone can answer this question that bugs me at the moment: In the CP table of rpart, I thought the following equation should hold: rel error = rel error(before) - (nsplit - nsplit(before)) * CP(before), where

[R] rpart minimum sample size

2007-02-28 Thread Terry Therneau
Look at rpart.control. Rpart has two advisory parameters that control the tree size at the smallest nodes: minsplit (default 20): a node with less than this many subjects will not be worth splitting minbucket (default 7) : don't create any final nodes with 7

[R] rpart minimum sample size

2007-02-27 Thread Amy Uhrin
Is there an optimal / minimum sample size for attempting to construct a classification tree using /rpart/? I have 27 seagrass disturbance sites (boat groundings) that have been monitored for a number of years. The monitoring protocol for each site is identical. From the monitoring data, I am

Re: [R] rpart minimum sample size

2007-02-27 Thread Wensui Liu
amy, without looking at your actual code, i would suggest you to take a look at rpart.control() On 2/27/07, Amy Uhrin [EMAIL PROTECTED] wrote: Is there an optimal / minimum sample size for attempting to construct a classification tree using /rpart/? I have 27 seagrass disturbance sites (boat

Re: [R] rpart minimum sample size

2007-02-27 Thread Frank E Harrell Jr
Amy Uhrin wrote: Is there an optimal / minimum sample size for attempting to construct a classification tree using /rpart/? I have 27 seagrass disturbance sites (boat groundings) that have been monitored for a number of years. The monitoring protocol for each site is identical. From

[R] rpart with overdispersed count data?

2007-02-25 Thread David Farrar
I would like to do recursive partitioning when the response is a count variable subject to overdispersion, using say negative binomial likelihood or something like quasipoisson in glm. Would appreciate any thoughts on how to go about this (theory/computation). If I understand the rpart

[R] rpart tree node label

2007-02-14 Thread Aimin Yan
I generate a tree use rpart. In the node of tree, split is based on the some factor. I want to label these node based on the levels of this factor. Does anyone know how to do this? Thanks, Aimin __ R-help@stat.math.ethz.ch mailing list

Re: [R] rpart tree node label

2007-02-14 Thread Wensui Liu
not sure how you want to label it. could you be more specific? thanks. On 2/14/07, Aimin Yan [EMAIL PROTECTED] wrote: I generate a tree use rpart. In the node of tree, split is based on the some factor. I want to label these node based on the levels of this factor. Does anyone know how to do

Re: [R] rpart tree node label

2007-02-14 Thread Aimin Yan
levels(training$aa_one) [1] A C D E F H I K L M N P Q R S T V W Y this is 19 levels of aa_one. When I see tree, in one node, it is labeled by aa_one=bcdfgknop it is obvious that it is labeled by alphabet letter ,not by levels of aa_one. I want to get aa_one=CDE.. such like. Do you

Re: [R] rpart tree node label [Broadcast]

2007-02-14 Thread Liaw, Andy
Try the following to see: library(rpart) iris.rp(Sepal.Length ~ Species, iris) plot(iris.rp) text(iris.rp) Two possible solutions: 1. Use text(..., pretty=0). See ?text.rpart. 2. Use post(..., filename=). Andy From: Wensui Liu not sure how you want to label it. could you be more

[R] rpart

2007-02-05 Thread Aimin Yan
Hello, I have a question for rpart, I try to use it to do prediction for a continuous variable. But I get the different prediction accuracy for same training set, anyone know why? Aimin __ R-help@stat.math.ethz.ch mailing list

Re: [R] rpart

2007-02-05 Thread Aimin Yan
Yes, I use the same setting, and I calculate MSE and CC as prediction accuracy measure. Someone told me I should not trust one tree and should do bagging. Is this correct? Aimin At 03:11 PM 2/5/2007, Wensui Liu wrote: are you sure you are using the same setting, tree size, and so on? On

Re: [R] rpart

2007-02-05 Thread Wensui Liu
man, oh, man Surely you can use bagging, or probably boosting. But that doesn't answer your question, does it? Believe me, even you use bagging, the result will vary, depending on set.seed(). On 2/5/07, Aimin Yan [EMAIL PROTECTED] wrote: Yes, I use the same setting, and I calculate MSE and CC as

Re: [R] rpart question

2007-01-25 Thread Prof Brian Ripley
On Thu, 25 Jan 2007, Aimin Yan wrote: I make classification tree like this code p.t2.90 - rpart(y~aa_three+bas+bcu+aa_ss, data=training,method=class,control=rpart.control(cp=0.0001)) Here I want to set weight for 4 predictors(aa_three,bas,bcu,aa_ss). I know that there is a weight set-up in

[R] rpart question

2007-01-24 Thread Aimin Yan
I make classification tree like this code p.t2.90 - rpart(y~aa_three+bas+bcu+aa_ss, data=training,method=class,control=rpart.control(cp=0.0001)) Here I want to set weight for 4 predictors(aa_three,bas,bcu,aa_ss). I know that there is a weight set-up in rpart. Can this set-up satisfy my need?

[R] rpart - I'm confused by the loss matrix

2006-11-09 Thread Barbora Arendacká
Hello, As I couldn't find anywhere in the help to rpart which element in the loss matrix means which loss, I played with this parameter and became a bit confused. What I did was this: I used kyphosis data(classification absent/present, number of 'absent' cases is 64, of 'present' cases 17) and I

[R] rpart

2006-09-26 Thread henrigel
Dear r-help-list: If I use the rpart method like cfit-rpart(y~.,data=data,...), what kind of tree is stored in cfit? Is it right that this tree is not pruned at all, that it is the full tree? If so, it's up to me to choose a subtree by using the printcp method. In the technical report from

Re: [R] rpart

2006-09-26 Thread Prof Brian Ripley
On Mon, 25 Sep 2006, [EMAIL PROTECTED] wrote: Dear r-help-list: If I use the rpart method like cfit-rpart(y~.,data=data,...), what kind of tree is stored in cfit? Is it right that this tree is not pruned at all, that it is the full tree? It is an rpart object. This contains both the

Re: [R] rpart

2006-09-26 Thread Prof Brian Ripley
On Tue, 26 Sep 2006, [EMAIL PROTECTED] wrote: Original-Nachricht Datum: Tue, 26 Sep 2006 09:56:53 +0100 (BST) Von: Prof Brian Ripley [EMAIL PROTECTED] An: [EMAIL PROTECTED] Betreff: Re: [R] rpart On Mon, 25 Sep 2006, [EMAIL PROTECTED] wrote: Dear r-help-list: If I

Re: [R] rpart

2006-09-26 Thread henrigel
Original-Nachricht Datum: Tue, 26 Sep 2006 09:56:53 +0100 (BST) Von: Prof Brian Ripley [EMAIL PROTECTED] An: [EMAIL PROTECTED] Betreff: Re: [R] rpart On Mon, 25 Sep 2006, [EMAIL PROTECTED] wrote: Dear r-help-list: If I use the rpart method like cfit-rpart(y~.,data

Re: [R] rpart

2006-09-26 Thread henrigel
Original-Nachricht Datum: Tue, 26 Sep 2006 12:54:22 +0100 (BST) Von: Prof Brian Ripley [EMAIL PROTECTED] An: [EMAIL PROTECTED] Betreff: Re: [R] rpart On Tue, 26 Sep 2006, [EMAIL PROTECTED] wrote: Original-Nachricht Datum: Tue, 26 Sep 2006 09:56:53

Re: [R] Rpart, custom penalty for an error

2006-09-15 Thread Maciej Bliziński
On Sun, 2006-09-10 at 20:36 +0100, Prof Brian Ripley wrote: I am however interested in areas where the probability of success is noticeably higher than 5%, for example 20%. I've tried rpart and the weights option, increasing the weights of the success-observations. You are 'misleading'

[R] Rpart, custom penalty for an error

2006-09-10 Thread Maciej Bliziński
Hello all R-help list subscribers, I'd like to create a regression tree of a data set with binary response variable. Only 5% of observations are a success, so the regression tree will not find really any variable value combinations that will yield more than 50% of probability of success. I am

Re: [R] Rpart, custom penalty for an error

2006-09-10 Thread Prof Brian Ripley
On Sun, 10 Sep 2006, Maciej Blizi?ski wrote: Hello all R-help list subscribers, I'd like to create a regression tree of a data set with binary response variable. Only 5% of observations are a success, so the regression tree will not find really any variable value combinations that will

[R] rpart output: rule extraction beyond path.rpart()

2006-08-22 Thread Bryant, Benjamin
Greetings - Is there a way to automatically perform what I believe is called rule extraction (by Quinlan and the machine learning community at least) for the leaves of trees generated by rpart? I can use path.rpart() to automatically extract the paths to the leaves, but these can be

[R] rpart unbalanced data

2006-07-21 Thread helen . mills
Hello all, I am currently working with rpart to classify vegetation types by spectral characteristics, and am comming up with poor classifications based on the fact that I have some vegetation types that have only 15 observations, while others have over 100. I have attempted to supply prior

Re: [R] rpart unbalanced data

2006-07-21 Thread Dr. Diego Kuonen
Dear Helen, You may want to have a look at http://www.togaware.com/datamining/survivor/Predicting_Fraud.html Greets, Diego Kuonen [EMAIL PROTECTED] wrote: Hello all, I am currently working with rpart to classify vegetation types by spectral characteristics, and am comming up with poor

[R] Rpart -- using predict() when missing data is present?

2005-10-08 Thread Ajay Narottam Shah
I am doing library(rpart) m - rpart(y ~ x, D[insample,]) D[outsample,] y x 8 0.78391922 0.579025591 9 0.06629211 NA 10 NA 0.001593063 p - predict(m, newdata=D[9,]) Error in model.frame(formula, rownames, variables, varnames, extras, extranames, :

Re: [R] Rpart -- using predict() when missing data is present?

2005-10-08 Thread Prof Brian Ripley
On Sat, 8 Oct 2005, Ajay Narottam Shah wrote: I am doing library(rpart) m - rpart(y ~ x, D[insample,]) D[outsample,] y x 8 0.78391922 0.579025591 9 0.06629211 NA 10 NA 0.001593063 p - predict(m, newdata=D[9,]) Error in model.frame(formula,

[R] rpart Error in yval[, 1] : incorrect number of dimensions

2005-09-24 Thread Little, Mark P
I tried using rpart, as below, and got this error message rpart Error in yval[, 1] : incorrect number of dimensions. Thinking it might somehow be related to the large number of missing values, I tried using complete data, but with the same result. Does anyone know what may be going on, and how

[R] rpart Error in yval[, 1] : incorrect number of dimensions

2005-09-24 Thread Little, Mark P
I tried using rpart, as below, and got this error message rpart Error in yval[, 1] : incorrect number of dimensions. Thinking it might somehow be related to the large number of missing values, I tried using complete data, but with the same result. Does anyone know what may be going on, and how

Re: [R] rpart plot question

2005-08-11 Thread John Field
Petr Pikal wrote: Dear all I am quite confused by rpart plotting. Here is example. set.seed(1) y - (c(rnorm(10), rnorm(10)+2, rnorm(10)+5)) x - c(rep(c(1,2,5), c(10,10,10)) fit - rpart(x~y)## NB should be y~x plot(fit) text(fit) Text on first split says x 3.5 and on the second

[R] rpart plot question

2005-08-09 Thread Petr Pikal
Dear all I am quite confused by rpart plotting. Here is example. set.seed(1) y - (c(rnorm(10), rnorm(10)+2, rnorm(10)+5)) x - c(rep(c(1,2,5), c(10,10,10)) fit - rpart(x~y) plot(fit) text(fit) Text on first split says x 3.5 and on the second split x 1.5 what I understand: If x 3.5 so y is

[R] rpart memory problem

2005-03-21 Thread jenniferbecq
Hi everyone, I have a problem using rpart (R 2.0.1 under Unix) Indeed, I have a large matrix (9271x7), my response variable is numeric and all my predictor variables are categorical (from 3 to 8 levels). Here is an example : mydata[1:5,] distance group3 group4 group5 group6

Re: [R] rpart memory problem

2005-03-21 Thread Uwe Ligges
[EMAIL PROTECTED] wrote: Hi everyone, I have a problem using rpart (R 2.0.1 under Unix) Indeed, I have a large matrix (9271x7), my response variable is numeric and all my predictor variables are categorical (from 3 to 8 levels). Your problem is the number of levels. You get a similar number of

[R] rpart

2005-01-17 Thread Weiwei Shi
Hi, there: I am working on a classification problem by using rpart. when my response variable y is binary, the trees grow very fast, but if I add one more case to y, that is making y has 3 cases, the tree growing cannot be finished. the command looks like: x-rpart(r0$V142~.,data=r0[,1:141],

Re: [R] rpart

2005-01-17 Thread Prof Brian Ripley
On Mon, 17 Jan 2005, Weiwei Shi wrote: I am working on a classification problem by using rpart. when my response variable y is binary, the trees grow very fast, but if I add one more case to y, that is making y has 3 cases, Do you mean 3 classes?: you have many more than 3 cases below. the tree

[R] rpart problem

2004-09-06 Thread pfm401
Dear all, I am having some trouble with getting the rpart function to work as expected. I am trying to use rpart to combine levels of a factor to reduce the number of levels of that factor. In exploring the code I have noticed that it is possible for chisq.test to return a statistically

Re: [R] rpart problem

2004-09-06 Thread Prof Brian Ripley
I think you are confusing the purpose of rpart, which is prediction. You want to predict `mysuccess'. One group has 90% success, so the best prediction is `success'. The other group has 60% success, so the best prediction is `success'. So there is no point in splitting into groups. Replace 60%

RE: [R] rpart and TREE, can be the same?

2004-07-19 Thread WWei
in tree( )? Thanks, Auston Liaw, Andy [EMAIL PROTECTED] 07/16/2004 02:04 PM To: '[EMAIL PROTECTED]' [EMAIL PROTECTED] cc: Subject: RE: [R] rpart and TREE, can be the same? Auston, tree() does not use Gini as splitting criterion, AFAIK. It uses deviance. You can try

RE: [R] rpart and TREE, can be the same?

2004-07-19 Thread WWei
/2004 09:38 AM To: '[EMAIL PROTECTED]' [EMAIL PROTECTED] cc: Subject: RE: [R] rpart and TREE, can be the same? Auston, I see that now. Have you tried setting mindev=0 in tree() and cp=0 in rpart(), to see if the unpruned trees are identical? If so, you can probably try pruning

[R] rpart and TREE, can be the same?

2004-07-16 Thread WWei
Hi, all, I am wondering if it is possible to set parameters of 'rpart' and 'tree' such that they will produce the exact same tree? Thanks. Auston Wei Statistical Analyst Department of Biostatistics and Applied Mathematics The University of Texas MD Anderson Cancer Center Tel: 713-563-4281

RE: [R] rpart and TREE, can be the same?

2004-07-16 Thread Liaw, Andy
I guess if you define the splitting criterion in rpart so that it matches the one used in tree(), that's possible. However, I believe the two also differ in how they handle NAs. Andy From: [EMAIL PROTECTED] Hi, all, I am wondering if it is possible to set parameters of 'rpart' and

[R] rpart

2004-06-04 Thread h0444k87
Hello everyone, I'm a newbie to R and to CART so I hope my questions don't seem too stupid. 1.) My first question concerns the rpart() method. Which method does rpart use in order to get the best split - entropy impurity, Bayes error (min. error) or Gini index? Is there a way to make it use the

Re: [R] rpart

2004-06-04 Thread Ko-Kang Kevin Wang
] To: [EMAIL PROTECTED] Sent: Friday, June 04, 2004 9:59 PM Subject: [R] rpart Hello everyone, I'm a newbie to R and to CART so I hope my questions don't seem too stupid. 1.) My first question concerns the rpart() method. Which method does rpart use in order to get the best split - entropy

[R] rpart for CART with weights/priors

2004-05-07 Thread Carolin Strobl
Hi, I have a technical question about rpart: according to Breiman et al. 1984, different costs for misclassification in CART can be modelled either by means of modifying the loss matrix or by means of using different prior probabilities for the classes, which again should have the same effect as

[R] rpart question

2004-05-04 Thread lsjensen
Wondered about the best way to control for input variables that have a large number of levels in 'rpart' models. I understand the algorithm searches through all possible splits (2^(k-1) for k levels) and so variables with more levels are more prone to be good spliters... so I'm looking for ways

RE: [R] rpart question

2004-05-04 Thread Liaw, Andy
AFAIK rpart does not have built-in facility for adjusting bias in split selection. One possibility is to define your own splitting criterion that does the adjustment is some fashion. I believe the current version of rpart allows you to define custom splitting criterion, but I have not tried it

[R] RPART drawing the tree

2004-04-29 Thread Rob Kamstra
Hello, I am using the RPART library to find patterns in HIV mutations regarding drug-resistancy. My data consists of aminoacid at certain locations and two classes resistant and susceptible. The classification and pruning work fine with Rpart. however there is a problem with displaying the

Re: [R] RPART drawing the tree

2004-04-29 Thread Prof Brian Ripley
On Thu, 29 Apr 2004, Rob Kamstra wrote: I am using the RPART library to find patterns in HIV mutations regarding drug-resistancy. My data consists of aminoacid at certain locations and two classes resistant and susceptible. The classification and pruning work fine with Rpart. however

[R] rpart or mvpart

2004-03-29 Thread Ben Stewart-Koster
__ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] rpart question on loss matrix

2004-01-07 Thread Peter Flom
Hello again I've looked through ?rpart, Atkinson Therneau (1997), Chap 10 of Venables and Ripley, Breman et al., and the r hgelp archives but haven't seen the answer to these two questions 1) How does rpart deal with asymmetric loss matrices? Breiman et al. suggest some possibilities, but, of

Re: [R] rpart postscript graphics, Mac OS

2003-11-18 Thread Uwe Ligges
On Tue, 18 Nov 2003, Paul Murrell wrote: Hi Kaiser Fung wrote: I am running R on Mac OS X 10.2x. When I create postscript graphics of rpart tree objects, a tiny part of the tree gets trimmed off, even when it has only a few terminal nodes. This happens even without fancy but

[R] rpart postscript graphics, Mac OS

2003-11-17 Thread Kaiser Fung
I am running R on Mac OS X 10.2x. When I create postscript graphics of rpart tree objects, a tiny part of the tree gets trimmed off, even when it has only a few terminal nodes. This happens even without fancy but worse if fancy=T. (This doesn't happen with boxplot, scatter plots, etc.) How do

Re: [R] rpart postscript graphics, Mac OS

2003-11-17 Thread Paul Murrell
Hi Kaiser Fung wrote: I am running R on Mac OS X 10.2x. When I create postscript graphics of rpart tree objects, a tiny part of the tree gets trimmed off, even when it has only a few terminal nodes. This happens even without fancy but worse if fancy=T. (This doesn't happen with boxplot,

Re: [R] Rpart question - labeling nodes with something not in x$frame

2003-07-17 Thread Ko-Kang Kevin Wang
On Thu, 17 Jul 2003, Peter Flom wrote: I have a tree created with tr.hh.logcas - rpart(log(YCASSX + 1)~AGE+DRUGUSEY+SEX+OBSXNUM +WINDLE, xval = 10) I would like to label the nodes with YCASSX rather than log(YCASSX + 1). But the help file for text in library rpart says that you can only

Re: [R] rpart vs. randomForest

2003-04-14 Thread Martin Maechler
Anonymous == [EMAIL PROTECTED] on Sat, 12 Apr 2003 14:41:00 -0700 writes: Anonymous Greetings. I'm trying to determine whether to use Anonymous rpart or randomForest for a classification Anonymous tree. Has anybody tested efficacy formally? I've Anonymous run both and the

Re: [R] rpart v. lda classification.

2003-02-12 Thread ripley
On Tue, 11 Feb 2003, Rolf Turner wrote: I've been groping my way through a classification/discrimination problem, from a consulting client. There are 26 observations, with 4 possible categories and 24 (!!!) potential predictor variables. I tried using lda() on the first 7 predictor

[R] rpart v. lda classification.

2003-02-11 Thread Rolf Turner
I've been groping my way through a classification/discrimination problem, from a consulting client. There are 26 observations, with 4 possible categories and 24 (!!!) potential predictor variables. I tried using lda() on the first 7 predictor variables and got 24 of the 26 observations