Hello Rob, Thank you for your suggestions. I tried glm too without success. Anyhow I include all the information just in case someone with good knowledge can give me a hand with this. I take log of the response variable because: - its values span across multiple orders of magnitudes - the diagnostic plots e.g. QQ, residuals vs fitted etc do improve with that.
Below I include: 1) general summary of my data 2) 1-way anova and summary of the model 3) 4-way anova and summary of the model Attached: a) Overview of the data (where main interactions occur i.e. No_databases and No_middlewares) b) diagnostic plots for 2) Here the Normality assumption of the residuals looks reasonable c) diagnostic plots for 3) Here the Normality assumption of the residuals does not seem to hold so it invalidates the 4-way aov model? I tried glm and it delivers similar results as 3) My impression is that my system is heavily polluted with outliers one can see that from plot a) how much the mean and the median differ due to the outliers. That's just the way the system I implemented behaves. Btw the system is a multi-tiered architecture that I developed in Java from scratch that includes XA and different data access and partitioning patterns. I need to quantitatively analyze and draw conclusion from this system. Most of my class mates just make it real simple: make 2^k experiments take one grand mean out of each experiment and do the ANOVA on those means i.e. 1-repetition, compute the fraction of variation and that's it. I am trying to model it more deeply by checking model assumptions, etc. Many thanks in advance, Best regards, Giovanni > str(throughput) 'data.frame': 479 obs. of 9 variables: $ Time : num 7 8 9 10 11 12 13 14 15 16 ... $ Throughput : int 155 155 154 157 155 214 4631 2118 136 132 ... $ Workload : chr "All" "All" "All" "All" ... $ No_databases : Factor w/ 2 levels "1","4": 1 1 1 1 1 1 1 1 1 1 ... $ Partitioning : Factor w/ 2 levels "sharding","replication": 1 1 1 1 1 1 1 1 1 1 ... $ No_middlewares : Factor w/ 3 levels "1","2","4": 1 1 1 1 1 1 1 1 1 1 ... $ Queue_size : Factor w/ 2 levels "40","100": 1 1 1 1 1 1 1 1 1 1 ... $ No_clients : Factor w/ 1 level "64": 1 1 1 1 1 1 1 1 1 1 ... $ Experimental_error: Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ... > summary(throughput) Time Throughput Workload No_databases Partitioning No_middlewares Min. : 7.00 Min. : 35.0 Length:479 1:239 sharding :240 1:160 1st Qu.:11.50 1st Qu.: 50.5 Class :character 4:240 replication:239 2:159 Median :16.00 Median : 744.0 Mode :character 4:160 Mean :16.48 Mean : 830.3 3rd Qu.:21.00 3rd Qu.:1205.5 Max. :26.00 Max. :4631.0 Queue_size No_clients Experimental_error 40 :240 64:479 1:479 100:239 ## ####################################################### ## ## ANOVA "one-way" interaction ## ## ####################################################### > throughput.aov <- > aov(log(Throughput)~No_databases+Partitioning+No_middlewares+Queue_size,data=throughput) > throughput.aov Call: aov(formula = log(Throughput) ~ No_databases + Partitioning + No_middlewares + Queue_size, data = throughput) Terms: No_databases Partitioning No_middlewares Queue_size Residuals Sum of Squares 521.5264 5.6971 50.5814 0.4628 476.6826 Deg. of Freedom 1 1 2 1 473 Residual standard error: 1.003885 Estimated effects may be unbalanced > summary(throughput.aov) Df Sum Sq Mean Sq F value Pr(>F) No_databases 1 521.53 521.53 517.4974 < 2.2e-16 *** Partitioning 1 5.70 5.70 5.6530 0.01782 * No_middlewares 2 50.58 25.29 25.0953 4.381e-11 *** Queue_size 1 0.46 0.46 0.4592 0.49833 Residuals 473 476.68 1.01 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > ## ####################################################### ## ## ANOVA 4-way interaction ## ## ####################################################### > throughput.aov <- > aov(log(Throughput)~No_databases*Partitioning*No_middlewares*Queue_size,data=throughput) > throughput.aov Call: aov(formula = log(Throughput) ~ No_databases * Partitioning * No_middlewares * Queue_size, data = throughput) Terms: No_databases Partitioning No_middlewares Queue_size No_databases:Partitioning Sum of Squares 521.5264 5.6971 50.5814 0.4628 96.9198 Deg. of Freedom 1 1 2 1 1 No_databases:No_middlewares Partitioning:No_middlewares No_databases:Queue_size Sum of Squares 110.4102 8.4819 0.0916 Deg. of Freedom 2 2 1 Partitioning:Queue_size No_middlewares:Queue_size Sum of Squares 0.0015 0.2254 Deg. of Freedom 1 2 No_databases:Partitioning:No_middlewares No_databases:Partitioning:Queue_size Sum of Squares 23.6400 0.0512 Deg. of Freedom 2 1 No_databases:No_middlewares:Queue_size Partitioning:No_middlewares:Queue_size Sum of Squares 0.1247 0.1511 Deg. of Freedom 2 2 No_databases:Partitioning:No_middlewares:Queue_size Residuals Sum of Squares 0.7391 235.8461 Deg. of Freedom 2 455 Residual standard error: 0.7199605 Estimated effects may be unbalanced > summary(throughput.aov) Df Sum Sq Mean Sq F value Pr(>F) No_databases 1 521.53 521.53 1006.1413 < 2.2e-16 *** Partitioning 1 5.70 5.70 10.9909 0.0009888 *** No_middlewares 2 50.58 25.29 48.7914 < 2.2e-16 *** Queue_size 1 0.46 0.46 0.8928 0.3452201 No_databases:Partitioning 1 96.92 96.92 186.9800 < 2.2e-16 *** No_databases:No_middlewares 2 110.41 55.21 106.5030 < 2.2e-16 *** Partitioning:No_middlewares 2 8.48 4.24 8.1818 0.0003229 *** No_databases:Queue_size 1 0.09 0.09 0.1766 0.6744713 Partitioning:Queue_size 1 0.00 0.00 0.0028 0.9576692 No_middlewares:Queue_size 2 0.23 0.11 0.2174 0.8046764 No_databases:Partitioning:No_middlewares 2 23.64 11.82 22.8034 3.648e-10 *** No_databases:Partitioning:Queue_size 1 0.05 0.05 0.0988 0.7534090 No_databases:No_middlewares:Queue_size 2 0.12 0.06 0.1203 0.8866605 Partitioning:No_middlewares:Queue_size 2 0.15 0.08 0.1457 0.8644517 No_databases:Partitioning:No_middlewares:Queue_size 2 0.74 0.37 0.7129 0.4907654 Residuals 455 235.85 0.52 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Thanks in advance, Best regards, Giovanni ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.