On Mon, 11 Oct 2010, jagdeesh_mn wrote:
Hi,
Being a novice this is my first usage of R.
I am trying to use rpart for building a decision tree in R. And I have the
following dataframe
Outlook Temp Humidity Windy Class
Sunny 75 70 Yes Play
Sunny 80 90 Yes Don't Play
Sunny 85 85 No Don't Play
Sunny 72 95 No Don't Play
Sunny 69 70 No Play
Overcast 72 90 Yes Play
Overcast 83 78 No Play
Overcast 64 65 Yes Play
Overcast 81 75 No Play
The first line indicating the header. When I use the formula,
"CART<-rpart(Class ~ Outlook + Temp + Humidity + Windy, data=dataframe)"
and trying to plot the values of CART using plot(CART), I get the following
error,
"Error in plot.rpart(CART) : fit is not a tree, just a root".
Am I missing something here? Any help would be greatly appreciated. Btw, the
dataframe was obtained by reading a csv which shouldn't be an issue.
The error message says it all: In this tiny data set rpart() decides that
it doesn't split the data at all and thus just retains a root and not a
tree.
If you want to make rpart() split the data, you can modify some of its
hyperparameters, e.g., the minimum number of observations required to
attempt a split.
The data above are often used in machine learning textbooks to introduce
the concept of recursive partitioning. They are also provided in the
"RWeka" package. However, many (statistical) recursive partitioning
algorithms will be default consider the data too small to attempt
splitting.
## load RWeka and data
library("RWeka")
weather <- read.arff(system.file("arff", "weather.arff",
package = "RWeka"))
## J4.8 tree (Java implementation of C4.5, revision 8)
j48 <- J48(play ~ ., data = weather)
j48
## RPart tree (R implementation of CART)
library("rpart")
rp <- rpart(play ~ ., data = weather, minsplit = 5)
plot(rp)
text(rp)
## Conditional inference tree
library("party")
ct <- ctree(play ~ ., data = weather,
control = ctree_control(minsplit = 5, mincriterion = 0.3))
plot(ct)
As you see, all trees have different opinions about how the data should be
split. However, in this tiny data set, nothing could be considered
statistically significant.
I would recommend to use some larger data set to try to understand how the
different algorithms work.
hth,
Z
-Jagdeesh
--
View this message in context:
http://r.789695.n4.nabble.com/Rpart-query-tp2991198p2991198.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.