Re: [R] Decision Tree: Am I Missing Anything?

Vik Rubenfeld Thu, 20 Sep 2012 22:18:31 -0700

Bhupendrashinh, thanks very much!  I ran J48 on a respondent-level data set and 
got a 61.75% correct classification rate!


Correctly Classified Instances         988               61.75   %
Incorrectly Classified Instances       612               38.25   %
Kappa statistic                          0.5651
Mean absolute error                      0.0432
Root mean squared error                  0.1469
Relative absolute error                 52.7086 %
Root relative squared error             72.6299 %
Coverage of cases (0.95 level)          99.6875 %
Mean rel. region size (0.95 level)      15.4915 %
Total Number of Instances             1600     

When I plot it I get an enormous chart.  Running :

>respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + MODE 
>+ SPED + REVW, data = respLevel)
>respLevelTree

...reports:

J48 pruned tree
------------------

Is there a way to further prune the tree so that I can present a chart that 
would fit on a single page or two?

Thanks very much in advance for any thoughts.


-Vik




On Sep 20, 2012, at 8:37 PM, Bhupendrasinh Thakre wrote:

> Not very sure what the problem is as I was not able to take your data for 
> run. You might want to use dput() command to present the data. 
> 
> Now on the programming side. As we can see that we have more than 2 levels 
> for the brands and hence method  = class is not able to able to understand 
> what you actually want from it.
> 
> Suggestion : For predictions having more than 2 levels I will go for Weka and 
> specifically C4.5 algorithm. You also have the RWeka package for it.
> 
> Best Regards,
> 
> Bhupendrasinh Thakre
> Sent from my iPhone
> 
> On Sep 20, 2012, at 9:47 PM, Vik Rubenfeld <v...@mindspring.com> wrote:
> 
>> I'm working with some data from which a client would like to make a decision 
>> tree predicting brand preference based on inputs such as price, speed, etc.  
>> After running the decision tree analysis using rpart, it appears that this 
>> data is not capable of predicting brand preference.  
>> 
>> Here's the data set:
>> 
>> BRND      PRI       PROM      FORM      FAMI      DRRE      FREC      MODE   
>>    SPED      REVW
>> Brand 1       0.6989    0.4731    0.7849    0.6989    0.7419    0.6022    
>> 0.8817    0.9032    0.6452
>> Brand 2       0.8621    0.3793    0.8621     0.931    0.7586    0.6897    
>> 0.8966    0.9655    0.8276
>> Brand 3          0.6       0.1       0.6       0.7       0.9       0.7       
>> 0.7       0.8       0.6
>> Brand 4       0.6429      0.25    0.5714       0.5    0.6071       0.5      
>> 0.75    0.8214       0.5
>> Brand 5       0.7586    0.4224    0.7328    0.6638    0.7328    0.6379    
>> 0.8621    0.8621    0.6897
>> Brand 6         0.75    0.0833    0.5833    0.4167       0.5    0.4167      
>> 0.75    0.6667       0.5
>> Brand 7       0.7742    0.4839    0.6129    0.5161    0.8065    0.6452    
>> 0.7742    0.9032    0.6129
>> Brand 8       0.6429    0.2679    0.6964    0.7143     0.875    0.5536    
>> 0.8036    0.9464    0.6607
>> Brand 9        0.575     0.175      0.65      0.55     0.625     0.375     
>> 0.825      0.85     0.475
>> Brand 10      0.8095    0.5238    0.6667    0.6429    0.6667    0.5952    
>> 0.8571    0.8095    0.5714
>> Brand 11      0.6308       0.3    0.6077    0.5846    0.6769    0.5231    
>> 0.7462    0.8846       0.6
>> Brand 12      0.7212    0.3152    0.7152    0.6545    0.6606     0.503    
>> 0.8061    0.8909       0.6
>> Brand 13      0.7419    0.2258    0.6129    0.5806    0.7097    0.6129     
>> 0.871    0.9677    0.3226
>> Brand 14      0.7176    0.2706    0.6353    0.5647    0.6941    0.4471    
>> 0.7176    0.9412    0.5176
>> Brand 15      0.7287    0.3437    0.5995    0.5788    0.8527    0.5478    
>> 0.8217    0.8941    0.6227
>> Brand 16         0.7       0.4       0.6       0.4         1       0.4       
>> 0.9       0.9       0.5
>> Brand 17      0.7193    0.3333    0.6667    0.6667    0.7018    0.5263    
>> 0.7719    0.8596    0.7018
>> Brand 18      0.7778    0.4127    0.6508    0.6349    0.7937    0.6032    
>> 0.8571    0.9206     0.619
>> Brand 19      0.8028    0.2817    0.6197    0.4366    0.7042    0.4366    
>> 0.7183    0.9155    0.5634
>> Brand 20      0.7736    0.2453    0.6226    0.3774    0.5849    0.3019     
>> 0.717    0.8679    0.4717
>> Brand 21      0.8481    0.2152    0.6329    0.4051    0.6329    0.4557    
>> 0.6962    0.8481    0.3418
>> Brand 22        0.75    0.3333    0.6667       0.5    0.6667    0.5833    
>> 0.9167    0.9167    0.4167
>> 
>> Here are my R commands:
>> 
>>> test.df = read.csv("test.csv")
>>> head(test.df)
>>    BRND    PRI   PROM   FORM   FAMI   DRRE   FREC   MODE   SPED   REVW
>> 1 Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452
>> 2 Brand 2 0.8621 0.3793 0.8621 0.9310 0.7586 0.6897 0.8966 0.9655 0.8276
>> 3 Brand 3 0.6000 0.1000 0.6000 0.7000 0.9000 0.7000 0.7000 0.8000 0.6000
>> 4 Brand 4 0.6429 0.2500 0.5714 0.5000 0.6071 0.5000 0.7500 0.8214 0.5000
>> 5 Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897
>> 6 Brand 6 0.7500 0.0833 0.5833 0.4167 0.5000 0.4167 0.7500 0.6667 0.5000
>> 
>>> testTree = rpart(BRAND~PRI  + PROM  + FORM +  FAMI+   DRRE +  FREC  + MODE 
>>> +  SPED +  REVW, method="class", data=test.df)
>> 
>>> printcp(testTree)
>> 
>> Classification tree:
>> rpart(formula = BRND ~ PRI + PROM + FORM + FAMI + DRRE + FREC + 
>>   MODE + SPED + REVW, data = test.df, method = "class")
>> 
>> Variables actually used in tree construction:
>> [1] FORM
>> 
>> Root node error: 21/22 = 0.95455
>> 
>> n= 22 
>> 
>>       CP nsplit rel error xerror xstd
>> 1 0.047619      0   1.00000 1.0476    0
>> 2 0.010000      1   0.95238 1.0476    0
>> 
>> I note that only one variable (FORM) was actually used in tree construction. 
>> When I run a plot using:
>> 
>>> plot(testTree)
>>> text(testTree)
>> 
>> ...I get a tree with one branch.  
>> 
>> It looks to me like I'm doing everything right, and this data is just not 
>> capable of predicting brand preference. 
>> 
>> Am I missing anything?
>> 
>> Thanks very much in advance for any thoughts!
>> 
>> -Vik
>> 
>> 
>> 
>> 
>> 
>>   [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Decision Tree: Am I Missing Anything?

Reply via email to