Re: [R] [OT] 1 vs 2-way anova technical question

Giovanni Azua Mon, 21 Nov 2011 11:41:42 -0800

Hello Rob,

Thank you for your suggestions. I tried glm too without success. Anyhow I 
include all the information just in case someone with good knowledge can give 
me a hand with this. I take log of the response variable because: 
- its values span across multiple orders of magnitudes 
- the diagnostic plots e.g. QQ, residuals vs fitted etc do improve with that.


Below I include:
1) general summary of my data
2) 1-way anova and summary of the model
3) 4-way anova and summary of the model  

Attached:
a) Overview of the data (where main interactions occur i.e. No_databases and 
No_middlewares)
b) diagnostic plots for 2) Here the Normality assumption of the residuals looks 
reasonable
c) diagnostic plots for 3) Here the Normality assumption of the residuals does 
not seem to hold so it invalidates the 4-way aov model?

I tried glm and it delivers similar results as 3)

My impression is that my system is heavily polluted with outliers one can see 
that from plot a) how much the mean and the median differ due to the outliers. 
That's just the way the system I implemented behaves. Btw the system is a 
multi-tiered architecture that I developed in Java from scratch that includes 
XA and different data access and partitioning patterns. I need to 
quantitatively analyze and draw conclusion from this system. Most of my class 
mates just make it real simple: make 2^k experiments take one grand mean out of 
each experiment and do the ANOVA on those means i.e. 1-repetition, compute the 
fraction of variation and that's it. I am trying to model it more deeply by 
checking model assumptions, etc. 

Many thanks in advance,
Best regards,
Giovanni

> str(throughput)
'data.frame':   479 obs. of  9 variables:
 $ Time              : num  7 8 9 10 11 12 13 14 15 16 ...
 $ Throughput        : int  155 155 154 157 155 214 4631 2118 136 132 ...
 $ Workload          : chr  "All" "All" "All" "All" ...
 $ No_databases      : Factor w/ 2 levels "1","4": 1 1 1 1 1 1 1 1 1 1 ...
 $ Partitioning      : Factor w/ 2 levels "sharding","replication": 1 1 1 1 1 1 
1 1 1 1 ...
 $ No_middlewares    : Factor w/ 3 levels "1","2","4": 1 1 1 1 1 1 1 1 1 1 ...
 $ Queue_size        : Factor w/ 2 levels "40","100": 1 1 1 1 1 1 1 1 1 1 ...
 $ No_clients        : Factor w/ 1 level "64": 1 1 1 1 1 1 1 1 1 1 ...
 $ Experimental_error: Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...

> summary(throughput)
      Time         Throughput       Workload         No_databases      
Partitioning No_middlewares
 Min.   : 7.00   Min.   :  35.0   Length:479         1:239        sharding   
:240   1:160         
 1st Qu.:11.50   1st Qu.:  50.5   Class :character   4:240        
replication:239   2:159         
 Median :16.00   Median : 744.0   Mode  :character                              
    4:160         
 Mean   :16.48   Mean   : 830.3                                                 
                  
 3rd Qu.:21.00   3rd Qu.:1205.5                                                 
                  
 Max.   :26.00   Max.   :4631.0                                                 
                  
 Queue_size No_clients Experimental_error
 40 :240    64:479     1:479             
 100:239   

## #######################################################
##
##  ANOVA "one-way" interaction
##
## #######################################################
> throughput.aov <- 
> aov(log(Throughput)~No_databases+Partitioning+No_middlewares+Queue_size,data=throughput)
> throughput.aov
Call:
   aov(formula = log(Throughput) ~ No_databases + Partitioning + 
    No_middlewares + Queue_size, data = throughput)

Terms:
                No_databases Partitioning No_middlewares Queue_size Residuals
Sum of Squares      521.5264       5.6971        50.5814     0.4628  476.6826
Deg. of Freedom            1            1              2          1       473

Residual standard error: 1.003885 
Estimated effects may be unbalanced
> summary(throughput.aov)
                Df Sum Sq Mean Sq  F value    Pr(>F)    
No_databases      1 521.53  521.53 517.4974 < 2.2e-16 ***
Partitioning           1   5.70    5.70   5.6530   0.01782 *  
No_middlewares   2  50.58   25.29  25.0953 4.381e-11 ***
Queue_size          1   0.46    0.46   0.4592   0.49833    
Residuals      473 476.68    1.01                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
> 

## #######################################################
##
##  ANOVA 4-way interaction
##
## #######################################################

> throughput.aov <- 
> aov(log(Throughput)~No_databases*Partitioning*No_middlewares*Queue_size,data=throughput)
> throughput.aov
Call:
   aov(formula = log(Throughput) ~ No_databases * Partitioning * 
    No_middlewares * Queue_size, data = throughput)

Terms:
                No_databases Partitioning No_middlewares Queue_size 
No_databases:Partitioning
Sum of Squares      521.5264       5.6971        50.5814     0.4628             
      96.9198
Deg. of Freedom            1            1              2          1             
            1
                No_databases:No_middlewares Partitioning:No_middlewares 
No_databases:Queue_size
Sum of Squares                     110.4102                      8.4819         
         0.0916
Deg. of Freedom                           2                           2         
              1
                Partitioning:Queue_size No_middlewares:Queue_size
Sum of Squares                   0.0015                    0.2254
Deg. of Freedom                       1                         2
                No_databases:Partitioning:No_middlewares 
No_databases:Partitioning:Queue_size
Sum of Squares                                   23.6400                        
       0.0512
Deg. of Freedom                                        2                        
            1
                No_databases:No_middlewares:Queue_size 
Partitioning:No_middlewares:Queue_size
Sum of Squares                                  0.1247                          
       0.1511
Deg. of Freedom                                      2                          
            2
                No_databases:Partitioning:No_middlewares:Queue_size Residuals
Sum of Squares                                               0.7391  235.8461
Deg. of Freedom                                                   2       455

Residual standard error: 0.7199605 
Estimated effects may be unbalanced
> summary(throughput.aov)
                                                     Df Sum Sq Mean Sq   F 
value    Pr(>F)    
No_databases                               1 521.53  521.53 1006.1413 < 2.2e-16 
***
Partitioning                                    1   5.70    5.70   10.9909 
0.0009888 ***
No_middlewares                           2  50.58   25.29   48.7914 < 2.2e-16 
***
Queue_size                                  1   0.46    0.46    0.8928 
0.3452201    
No_databases:Partitioning           1  96.92   96.92  186.9800 < 2.2e-16 ***
No_databases:No_middlewares  2 110.41   55.21  106.5030 < 2.2e-16 ***
Partitioning:No_middlewares       2   8.48    4.24    8.1818 0.0003229 ***
No_databases:Queue_size         1   0.09    0.09    0.1766 0.6744713    
Partitioning:Queue_size              1   0.00    0.00    0.0028 0.9576692    
No_middlewares:Queue_size     2   0.23    0.11    0.2174 0.8046764    
No_databases:Partitioning:No_middlewares   2  23.64   11.82   22.8034 3.648e-10 
***
No_databases:Partitioning:Queue_size          1   0.05    0.05    0.0988 
0.7534090    
No_databases:No_middlewares:Queue_size 2   0.12    0.06    0.1203 0.8866605    
Partitioning:No_middlewares:Queue_size      2   0.15    0.08    0.1457 
0.8644517    
No_databases:Partitioning:No_middlewares:Queue_size   2   0.74    0.37    
0.7129 0.4907654    
Residuals                                           455 235.85    0.52          
              
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 


Thanks in advance,
Best regards,
Giovanni





______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [OT] 1 vs 2-way anova technical question

Reply via email to