Hi!
Could you please explain the difference between prior and weight in
rpart? It seems to be the same. But in this case why including a weight
option in the latest versions? For an unbalanced sampling what is the
best to use : weight, prior or the both together?
Thanks a lot.
Aurélie
On Sun, 8 Jul 2007, Aurélie Davranche wrote:
Hi!
Could you please explain the difference between prior and weight in
rpart? It seems to be the same. But in this case why including a weight
option in the latest versions? For an unbalanced sampling what is the best to
use : weight, prior or
Dear useRs,
I may be temporarily (I hope :-)) confused, and I hope that someone can
answer this question that bugs me at the moment:
In the CP table of rpart, I thought the following equation should hold:
rel error = rel error(before) - (nsplit - nsplit(before)) * CP(before),
where
Look at rpart.control. Rpart has two advisory parameters that control
the tree size at the smallest nodes:
minsplit (default 20): a node with less than this many subjects will
not be worth splitting
minbucket (default 7) : don't create any final nodes with 7
Is there an optimal / minimum sample size for attempting to construct a
classification tree using /rpart/?
I have 27 seagrass disturbance sites (boat groundings) that have been
monitored for a number of years. The monitoring protocol for each site
is identical. From the monitoring data, I am
amy,
without looking at your actual code, i would suggest you to take a
look at rpart.control()
On 2/27/07, Amy Uhrin [EMAIL PROTECTED] wrote:
Is there an optimal / minimum sample size for attempting to construct a
classification tree using /rpart/?
I have 27 seagrass disturbance sites (boat
Amy Uhrin wrote:
Is there an optimal / minimum sample size for attempting to construct a
classification tree using /rpart/?
I have 27 seagrass disturbance sites (boat groundings) that have been
monitored for a number of years. The monitoring protocol for each site
is identical. From
I would like to do recursive partitioning when the response is a
count variable subject to overdispersion, using say negative
binomial likelihood or something like quasipoisson in glm. Would
appreciate any thoughts on how to go about this (theory/computation).
If I understand the rpart
I generate a tree use rpart.
In the node of tree, split is based on the some factor.
I want to label these node based on the levels of this factor.
Does anyone know how to do this?
Thanks,
Aimin
__
R-help@stat.math.ethz.ch mailing list
not sure how you want to label it.
could you be more specific?
thanks.
On 2/14/07, Aimin Yan [EMAIL PROTECTED] wrote:
I generate a tree use rpart.
In the node of tree, split is based on the some factor.
I want to label these node based on the levels of this factor.
Does anyone know how to do
levels(training$aa_one)
[1] A C D E F H I K L M N P Q R S T V
W Y
this is 19 levels of aa_one.
When I see tree,
in one node, it is labeled by
aa_one=bcdfgknop
it is obvious that it is labeled by alphabet letter ,not by levels of aa_one.
I want to get
aa_one=CDE.. such like.
Do you
Try the following to see:
library(rpart)
iris.rp(Sepal.Length ~ Species, iris)
plot(iris.rp)
text(iris.rp)
Two possible solutions:
1. Use text(..., pretty=0). See ?text.rpart.
2. Use post(..., filename=).
Andy
From: Wensui Liu
not sure how you want to label it.
could you be more
Hello,
I have a question for rpart,
I try to use it to do prediction for a continuous variable.
But I get the different prediction accuracy for same training set,
anyone know why?
Aimin
__
R-help@stat.math.ethz.ch mailing list
Yes, I use the same setting, and I calculate MSE and CC as
prediction accuracy measure.
Someone told me I should not trust one tree and should do bagging.
Is this correct?
Aimin
At 03:11 PM 2/5/2007, Wensui Liu wrote:
are you sure you are using the same setting, tree size, and so on?
On
man, oh, man
Surely you can use bagging, or probably boosting. But that doesn't
answer your question, does it?
Believe me, even you use bagging, the result will vary, depending on set.seed().
On 2/5/07, Aimin Yan [EMAIL PROTECTED] wrote:
Yes, I use the same setting, and I calculate MSE and CC as
On Thu, 25 Jan 2007, Aimin Yan wrote:
I make classification tree like this code
p.t2.90 - rpart(y~aa_three+bas+bcu+aa_ss,
data=training,method=class,control=rpart.control(cp=0.0001))
Here I want to set weight for 4 predictors(aa_three,bas,bcu,aa_ss).
I know that there is a weight set-up in
I make classification tree like this code
p.t2.90 - rpart(y~aa_three+bas+bcu+aa_ss,
data=training,method=class,control=rpart.control(cp=0.0001))
Here I want to set weight for 4 predictors(aa_three,bas,bcu,aa_ss).
I know that there is a weight set-up in rpart.
Can this set-up satisfy my need?
Hello,
As I couldn't find anywhere in the help to rpart which element in the
loss matrix means which loss, I played with this parameter and became
a bit confused.
What I did was this:
I used kyphosis data(classification absent/present, number of 'absent'
cases is 64, of 'present' cases 17)
and I
Dear r-help-list:
If I use the rpart method like
cfit-rpart(y~.,data=data,...),
what kind of tree is stored in cfit?
Is it right that this tree is not pruned at all, that it is the full tree?
If so, it's up to me to choose a subtree by using the printcp method.
In the technical report from
On Mon, 25 Sep 2006, [EMAIL PROTECTED] wrote:
Dear r-help-list:
If I use the rpart method like
cfit-rpart(y~.,data=data,...),
what kind of tree is stored in cfit?
Is it right that this tree is not pruned at all, that it is the full tree?
It is an rpart object. This contains both the
On Tue, 26 Sep 2006, [EMAIL PROTECTED] wrote:
Original-Nachricht
Datum: Tue, 26 Sep 2006 09:56:53 +0100 (BST)
Von: Prof Brian Ripley [EMAIL PROTECTED]
An: [EMAIL PROTECTED]
Betreff: Re: [R] rpart
On Mon, 25 Sep 2006, [EMAIL PROTECTED] wrote:
Dear r-help-list:
If I
Original-Nachricht
Datum: Tue, 26 Sep 2006 09:56:53 +0100 (BST)
Von: Prof Brian Ripley [EMAIL PROTECTED]
An: [EMAIL PROTECTED]
Betreff: Re: [R] rpart
On Mon, 25 Sep 2006, [EMAIL PROTECTED] wrote:
Dear r-help-list:
If I use the rpart method like
cfit-rpart(y~.,data
Original-Nachricht
Datum: Tue, 26 Sep 2006 12:54:22 +0100 (BST)
Von: Prof Brian Ripley [EMAIL PROTECTED]
An: [EMAIL PROTECTED]
Betreff: Re: [R] rpart
On Tue, 26 Sep 2006, [EMAIL PROTECTED] wrote:
Original-Nachricht
Datum: Tue, 26 Sep 2006 09:56:53
On Sun, 2006-09-10 at 20:36 +0100, Prof Brian Ripley wrote:
I am however interested in areas where the probability of success is
noticeably higher than 5%, for example 20%. I've tried rpart and the
weights option, increasing the weights of the success-observations.
You are 'misleading'
Hello all R-help list subscribers,
I'd like to create a regression tree of a data set with binary response
variable. Only 5% of observations are a success, so the regression tree
will not find really any variable value combinations that will yield
more than 50% of probability of success. I am
On Sun, 10 Sep 2006, Maciej Blizi?ski wrote:
Hello all R-help list subscribers,
I'd like to create a regression tree of a data set with binary response
variable. Only 5% of observations are a success, so the regression tree
will not find really any variable value combinations that will
Greetings -
Is there a way to automatically perform what I believe is called rule
extraction (by Quinlan and the machine learning community at least) for
the leaves of trees generated by rpart? I can use path.rpart() to
automatically extract the paths to the leaves, but these can be
Hello all,
I am currently working with rpart to classify vegetation types by spectral
characteristics, and am comming up with poor classifications based on the fact
that I have some vegetation types that have only 15 observations, while others
have over 100. I have attempted to supply prior
Dear Helen,
You may want to have a look at
http://www.togaware.com/datamining/survivor/Predicting_Fraud.html
Greets,
Diego Kuonen
[EMAIL PROTECTED] wrote:
Hello all,
I am currently working with rpart to classify vegetation types by spectral
characteristics, and am comming up with poor
I am doing
library(rpart)
m - rpart(y ~ x, D[insample,])
D[outsample,]
y x
8 0.78391922 0.579025591
9 0.06629211 NA
10 NA 0.001593063
p - predict(m, newdata=D[9,])
Error in model.frame(formula, rownames, variables, varnames, extras,
extranames, :
On Sat, 8 Oct 2005, Ajay Narottam Shah wrote:
I am doing
library(rpart)
m - rpart(y ~ x, D[insample,])
D[outsample,]
y x
8 0.78391922 0.579025591
9 0.06629211 NA
10 NA 0.001593063
p - predict(m, newdata=D[9,])
Error in model.frame(formula,
I tried using rpart, as below, and got this error message rpart Error in
yval[, 1] : incorrect number of dimensions. Thinking it might somehow be
related to the large number of missing values, I tried using complete data, but
with the same result. Does anyone know what may be going on, and how
I tried using rpart, as below, and got this error message rpart Error in
yval[, 1] : incorrect number of dimensions. Thinking it might somehow be
related to the large number of missing values, I tried using complete data, but
with the same result. Does anyone know what may be going on, and how
Petr Pikal wrote:
Dear all
I am quite confused by rpart plotting. Here is example.
set.seed(1)
y - (c(rnorm(10), rnorm(10)+2, rnorm(10)+5))
x - c(rep(c(1,2,5), c(10,10,10))
fit - rpart(x~y)## NB should be y~x
plot(fit)
text(fit)
Text on first split says x 3.5 and on the second
Dear all
I am quite confused by rpart plotting. Here is example.
set.seed(1)
y - (c(rnorm(10), rnorm(10)+2, rnorm(10)+5))
x - c(rep(c(1,2,5), c(10,10,10))
fit - rpart(x~y)
plot(fit)
text(fit)
Text on first split says x 3.5 and on the second split x 1.5 what
I understand:
If x 3.5 so y is
Hi everyone,
I have a problem using rpart (R 2.0.1 under Unix)
Indeed, I have a large matrix (9271x7), my response variable is numeric and all
my predictor variables are categorical (from 3 to 8 levels).
Here is an example :
mydata[1:5,]
distance group3 group4 group5 group6
[EMAIL PROTECTED] wrote:
Hi everyone,
I have a problem using rpart (R 2.0.1 under Unix)
Indeed, I have a large matrix (9271x7), my response variable is numeric and all
my predictor variables are categorical (from 3 to 8 levels).
Your problem is the number of levels. You get a similar number of
Hi, there:
I am working on a classification problem by using
rpart. when my response variable y is binary, the
trees grow very fast, but if I add one more case to y,
that is making y has 3 cases, the tree growing cannot
be finished.
the command looks like:
x-rpart(r0$V142~.,data=r0[,1:141],
On Mon, 17 Jan 2005, Weiwei Shi wrote:
I am working on a classification problem by using
rpart. when my response variable y is binary, the
trees grow very fast, but if I add one more case to y,
that is making y has 3 cases,
Do you mean 3 classes?: you have many more than 3 cases below.
the tree
Dear all,
I am having some trouble with getting the rpart function to work as expected.
I am trying to use rpart to combine levels of a factor to reduce the number
of levels of that factor. In exploring the code I have noticed that it is
possible for chisq.test to return a statistically
I think you are confusing the purpose of rpart, which is prediction.
You want to predict `mysuccess'.
One group has 90% success, so the best prediction is `success'.
The other group has 60% success, so the best prediction is `success'.
So there is no point in splitting into groups. Replace 60%
in tree(
)?
Thanks,
Auston
Liaw, Andy [EMAIL PROTECTED]
07/16/2004 02:04 PM
To:
'[EMAIL PROTECTED]' [EMAIL PROTECTED]
cc:
Subject:
RE: [R] rpart and TREE, can be the same?
Auston,
tree() does not use Gini as splitting criterion, AFAIK. It uses deviance.
You can try
/2004 09:38 AM
To:
'[EMAIL PROTECTED]' [EMAIL PROTECTED]
cc:
Subject:
RE: [R] rpart and TREE, can be the same?
Auston,
I see that now. Have you tried setting mindev=0 in tree() and cp=0 in
rpart(), to see if the unpruned trees are identical? If so, you can
probably try pruning
Hi, all,
I am wondering if it is possible to set parameters of 'rpart' and 'tree'
such that they will produce the exact same tree? Thanks.
Auston Wei
Statistical Analyst
Department of Biostatistics and Applied Mathematics
The University of Texas MD Anderson Cancer Center
Tel: 713-563-4281
I guess if you define the splitting criterion in rpart so that it matches
the one used in tree(), that's possible. However, I believe the two also
differ in how they handle NAs.
Andy
From: [EMAIL PROTECTED]
Hi, all,
I am wondering if it is possible to set parameters of 'rpart'
and
Hello everyone,
I'm a newbie to R and to CART so I hope my questions don't seem too stupid.
1.)
My first question concerns the rpart() method. Which method does rpart use in
order to get the best split - entropy impurity, Bayes error (min. error) or Gini
index? Is there a way to make it use the
]
To: [EMAIL PROTECTED]
Sent: Friday, June 04, 2004 9:59 PM
Subject: [R] rpart
Hello everyone,
I'm a newbie to R and to CART so I hope my questions don't seem too
stupid.
1.)
My first question concerns the rpart() method. Which method does rpart use
in
order to get the best split - entropy
Hi,
I have a technical question about rpart:
according to Breiman et al. 1984, different costs for misclassification in
CART can be modelled
either by means of modifying the loss matrix or by means of using different
prior probabilities for the classes,
which again should have the same effect as
Wondered about the best way to control for input variables that have a
large number of levels in 'rpart' models. I understand the algorithm
searches through all possible splits (2^(k-1) for k levels) and so
variables with more levels are more prone to be good spliters... so I'm
looking for ways
AFAIK rpart does not have built-in facility for adjusting bias in split
selection. One possibility is to define your own splitting criterion that
does the adjustment is some fashion. I believe the current version of rpart
allows you to define custom splitting criterion, but I have not tried it
Hello,
I am using the RPART library to find patterns in HIV mutations regarding
drug-resistancy.
My data consists of aminoacid at certain locations and two classes resistant
and susceptible.
The classification and pruning work fine with Rpart. however there is a
problem with displaying the
On Thu, 29 Apr 2004, Rob Kamstra wrote:
I am using the RPART library to find patterns in HIV mutations regarding
drug-resistancy.
My data consists of aminoacid at certain locations and two classes resistant
and susceptible.
The classification and pruning work fine with Rpart. however
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Hello again
I've looked through ?rpart, Atkinson Therneau (1997), Chap 10 of
Venables and Ripley, Breman et al., and the r hgelp archives but
haven't seen the answer to these two questions
1) How does rpart deal with asymmetric loss matrices? Breiman et al.
suggest some possibilities, but, of
On Tue, 18 Nov 2003, Paul Murrell wrote:
Hi
Kaiser Fung wrote:
I am running R on Mac OS X 10.2x. When I create
postscript graphics of rpart tree objects, a tiny part
of the tree gets trimmed off, even when it has only a
few terminal nodes. This happens even without fancy
but
I am running R on Mac OS X 10.2x. When I create
postscript graphics of rpart tree objects, a tiny part
of the tree gets trimmed off, even when it has only a
few terminal nodes. This happens even without fancy
but worse if fancy=T. (This doesn't happen with
boxplot, scatter plots, etc.) How do
Hi
Kaiser Fung wrote:
I am running R on Mac OS X 10.2x. When I create
postscript graphics of rpart tree objects, a tiny part
of the tree gets trimmed off, even when it has only a
few terminal nodes. This happens even without fancy
but worse if fancy=T. (This doesn't happen with
boxplot,
On Thu, 17 Jul 2003, Peter Flom wrote:
I have a tree created with
tr.hh.logcas - rpart(log(YCASSX + 1)~AGE+DRUGUSEY+SEX+OBSXNUM +WINDLE,
xval = 10)
I would like to label the nodes with YCASSX rather than log(YCASSX +
1). But the help file for text in library rpart says that you can only
Anonymous == [EMAIL PROTECTED]
on Sat, 12 Apr 2003 14:41:00 -0700 writes:
Anonymous Greetings. I'm trying to determine whether to use
Anonymous rpart or randomForest for a classification
Anonymous tree. Has anybody tested efficacy formally? I've
Anonymous run both and the
On Tue, 11 Feb 2003, Rolf Turner wrote:
I've been groping my way through a classification/discrimination
problem, from a consulting client. There are 26 observations, with 4
possible categories and 24 (!!!) potential predictor variables.
I tried using lda() on the first 7 predictor
I've been groping my way through a classification/discrimination
problem, from a consulting client. There are 26 observations, with 4
possible categories and 24 (!!!) potential predictor variables.
I tried using lda() on the first 7 predictor variables and got 24 of
the 26 observations
61 matches
Mail list logo