Bhupendrashinh, thanks very much! I ran J48 on a respondent-level data set and got a 61.75% correct classification rate!
Correctly Classified Instances 988 61.75 % Incorrectly Classified Instances 612 38.25 % Kappa statistic 0.5651 Mean absolute error 0.0432 Root mean squared error 0.1469 Relative absolute error 52.7086 % Root relative squared error 72.6299 % Coverage of cases (0.95 level) 99.6875 % Mean rel. region size (0.95 level) 15.4915 % Total Number of Instances 1600 When I plot it I get an enormous chart. Running : >respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + MODE >+ SPED + REVW, data = respLevel) >respLevelTree ...reports: J48 pruned tree ------------------ Is there a way to further prune the tree so that I can present a chart that would fit on a single page or two? Thanks very much in advance for any thoughts. -Vik On Sep 20, 2012, at 8:37 PM, Bhupendrasinh Thakre wrote: > Not very sure what the problem is as I was not able to take your data for > run. You might want to use dput() command to present the data. > > Now on the programming side. As we can see that we have more than 2 levels > for the brands and hence method = class is not able to able to understand > what you actually want from it. > > Suggestion : For predictions having more than 2 levels I will go for Weka and > specifically C4.5 algorithm. You also have the RWeka package for it. > > Best Regards, > > Bhupendrasinh Thakre > Sent from my iPhone > > On Sep 20, 2012, at 9:47 PM, Vik Rubenfeld <v...@mindspring.com> wrote: > >> I'm working with some data from which a client would like to make a decision >> tree predicting brand preference based on inputs such as price, speed, etc. >> After running the decision tree analysis using rpart, it appears that this >> data is not capable of predicting brand preference. >> >> Here's the data set: >> >> BRND PRI PROM FORM FAMI DRRE FREC MODE >> SPED REVW >> Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 >> 0.8817 0.9032 0.6452 >> Brand 2 0.8621 0.3793 0.8621 0.931 0.7586 0.6897 >> 0.8966 0.9655 0.8276 >> Brand 3 0.6 0.1 0.6 0.7 0.9 0.7 >> 0.7 0.8 0.6 >> Brand 4 0.6429 0.25 0.5714 0.5 0.6071 0.5 >> 0.75 0.8214 0.5 >> Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 >> 0.8621 0.8621 0.6897 >> Brand 6 0.75 0.0833 0.5833 0.4167 0.5 0.4167 >> 0.75 0.6667 0.5 >> Brand 7 0.7742 0.4839 0.6129 0.5161 0.8065 0.6452 >> 0.7742 0.9032 0.6129 >> Brand 8 0.6429 0.2679 0.6964 0.7143 0.875 0.5536 >> 0.8036 0.9464 0.6607 >> Brand 9 0.575 0.175 0.65 0.55 0.625 0.375 >> 0.825 0.85 0.475 >> Brand 10 0.8095 0.5238 0.6667 0.6429 0.6667 0.5952 >> 0.8571 0.8095 0.5714 >> Brand 11 0.6308 0.3 0.6077 0.5846 0.6769 0.5231 >> 0.7462 0.8846 0.6 >> Brand 12 0.7212 0.3152 0.7152 0.6545 0.6606 0.503 >> 0.8061 0.8909 0.6 >> Brand 13 0.7419 0.2258 0.6129 0.5806 0.7097 0.6129 >> 0.871 0.9677 0.3226 >> Brand 14 0.7176 0.2706 0.6353 0.5647 0.6941 0.4471 >> 0.7176 0.9412 0.5176 >> Brand 15 0.7287 0.3437 0.5995 0.5788 0.8527 0.5478 >> 0.8217 0.8941 0.6227 >> Brand 16 0.7 0.4 0.6 0.4 1 0.4 >> 0.9 0.9 0.5 >> Brand 17 0.7193 0.3333 0.6667 0.6667 0.7018 0.5263 >> 0.7719 0.8596 0.7018 >> Brand 18 0.7778 0.4127 0.6508 0.6349 0.7937 0.6032 >> 0.8571 0.9206 0.619 >> Brand 19 0.8028 0.2817 0.6197 0.4366 0.7042 0.4366 >> 0.7183 0.9155 0.5634 >> Brand 20 0.7736 0.2453 0.6226 0.3774 0.5849 0.3019 >> 0.717 0.8679 0.4717 >> Brand 21 0.8481 0.2152 0.6329 0.4051 0.6329 0.4557 >> 0.6962 0.8481 0.3418 >> Brand 22 0.75 0.3333 0.6667 0.5 0.6667 0.5833 >> 0.9167 0.9167 0.4167 >> >> Here are my R commands: >> >>> test.df = read.csv("test.csv") >>> head(test.df) >> BRND PRI PROM FORM FAMI DRRE FREC MODE SPED REVW >> 1 Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452 >> 2 Brand 2 0.8621 0.3793 0.8621 0.9310 0.7586 0.6897 0.8966 0.9655 0.8276 >> 3 Brand 3 0.6000 0.1000 0.6000 0.7000 0.9000 0.7000 0.7000 0.8000 0.6000 >> 4 Brand 4 0.6429 0.2500 0.5714 0.5000 0.6071 0.5000 0.7500 0.8214 0.5000 >> 5 Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897 >> 6 Brand 6 0.7500 0.0833 0.5833 0.4167 0.5000 0.4167 0.7500 0.6667 0.5000 >> >>> testTree = rpart(BRAND~PRI + PROM + FORM + FAMI+ DRRE + FREC + MODE >>> + SPED + REVW, method="class", data=test.df) >> >>> printcp(testTree) >> >> Classification tree: >> rpart(formula = BRND ~ PRI + PROM + FORM + FAMI + DRRE + FREC + >> MODE + SPED + REVW, data = test.df, method = "class") >> >> Variables actually used in tree construction: >> [1] FORM >> >> Root node error: 21/22 = 0.95455 >> >> n= 22 >> >> CP nsplit rel error xerror xstd >> 1 0.047619 0 1.00000 1.0476 0 >> 2 0.010000 1 0.95238 1.0476 0 >> >> I note that only one variable (FORM) was actually used in tree construction. >> When I run a plot using: >> >>> plot(testTree) >>> text(testTree) >> >> ...I get a tree with one branch. >> >> It looks to me like I'm doing everything right, and this data is just not >> capable of predicting brand preference. >> >> Am I missing anything? >> >> Thanks very much in advance for any thoughts! >> >> -Vik >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.