On Mon, 11 Oct 2010, jagdeesh_mn wrote:


Hi,

Being a novice this is my first usage of R.

I am trying to use rpart for building a decision tree in R. And I have the
following dataframe


Outlook Temp    Humidity        Windy   Class
Sunny   75      70      Yes     Play
Sunny   80      90      Yes     Don't Play
Sunny   85      85      No      Don't Play
Sunny   72      95      No      Don't Play
Sunny   69      70      No      Play
Overcast        72      90      Yes     Play
Overcast        83      78      No      Play
Overcast        64      65      Yes     Play
Overcast        81      75      No      Play

The first line indicating the header. When I use the formula,

"CART<-rpart(Class ~ Outlook + Temp + Humidity + Windy, data=dataframe)"

and trying to plot the values of CART using plot(CART), I get the following
error,

"Error in plot.rpart(CART) : fit is not a tree, just a root".

Am I missing something here? Any help would be greatly appreciated. Btw, the
dataframe was obtained by reading a csv which shouldn't be an issue.

The error message says it all: In this tiny data set rpart() decides that it doesn't split the data at all and thus just retains a root and not a tree.

If you want to make rpart() split the data, you can modify some of its hyperparameters, e.g., the minimum number of observations required to attempt a split.

The data above are often used in machine learning textbooks to introduce the concept of recursive partitioning. They are also provided in the "RWeka" package. However, many (statistical) recursive partitioning algorithms will be default consider the data too small to attempt splitting.

## load RWeka and data
library("RWeka")
weather <- read.arff(system.file("arff", "weather.arff",
  package = "RWeka"))

## J4.8 tree (Java implementation of C4.5, revision 8)
j48 <- J48(play ~ ., data = weather)
j48

## RPart tree (R implementation of CART)
library("rpart")
rp <- rpart(play ~ ., data = weather, minsplit = 5)
plot(rp)
text(rp)

## Conditional inference tree
library("party")
ct <- ctree(play ~ ., data = weather,
  control = ctree_control(minsplit = 5, mincriterion = 0.3))
plot(ct)

As you see, all trees have different opinions about how the data should be split. However, in this tiny data set, nothing could be considered statistically significant.

I would recommend to use some larger data set to try to understand how the different algorithms work.

hth,
Z

-Jagdeesh


--
View this message in context: 
http://r.789695.n4.nabble.com/Rpart-query-tp2991198p2991198.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to