Re: [R] [OT] 1 vs 2-way anova technical question
On Mon, Nov 21, 2011 at 1:28 PM, Giovanni Azua brave...@gmail.com wrote: On Nov 21, 2011, at 8:31 PM, Bert Gunter wrote: we disagree is that I think data analysts with limited statistical backgrounds should consult with local statisticians instead of trying to muddle through on their own thru lists like this. This is not meant I think that people lacking reading skills should not be subscribed in lists like this one, bullying and creating confusion around. I agree with you that emails lists are not the place for those who cannot read. I will asks as many times as I want/need and the way I use lists if none of your f. b. It is true the way you use general lists is not our business, but the R-help list is a community and there are community rules. One of those is not to ask questions that are primarily about a lack of statistical understanding (although they are not strictly prohibited). Your original post suggests that you knew this, I know there is plenty of people in this group who can give me a good answer :) but chose to ignore it. Despite this, Bert was generous enough to give you some suggestions, perhaps not what you wanted but useful tips nonetheless. You may ask many times, but failing to follow guidelines and thinly veiled profanity are unlikely to endear you to the people here who can offer useful suggestions. Further, to me, your comment below falls under the, Ad hominem comments are absolutely out of place. Note that no loop hole or pass is given for the behavior of other parties. That is, ad hominem comments do not become in place if someone else is rude enough or makes them first. Regarding your suggestion that the list be split into a beginner and advanced list, while that is one option, your original question was appropriate for neither. It was, however, very appropriate for a statistics list (e.g., http://stats.stackexchange.com/). to be arrogance on my part -- though it may seem to come across that way -- but rather a plea for good science. I believe that bad statistics -- bad science, a problem that I see as pervasive and inimical to scientific progress, especially in today's data saturated world. But enough of my off topic B.S. Please reply privately to not waste yet more space here (positively or negatively -- stone throwers need to catch them, too). You are actually full of your off topic prime matter, you arrogant prick. To me, this is extremely offensive. Disagreements are inevitable, but we can strive to keep them civil. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. I agree with Rolf Turner's idea that it would be nice if there was a mechanism to limit these sorts of posts. Sincerely, Josh -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [OT] 1 vs 2-way anova technical question
On Nov 22, 2011, at 10:35 AM, Joshua Wiley wrote: It is true the way you use general lists is not our business, but the R-help list is a community and there are community rules. One of I meant that my use of the lists is not of __his__ business I wasn't referring to you nor other people in this list. Ok the reading skills remark starts to get recursive ... and btw the OP even though marked in the subject as OT was not entirely so i.e. use of R formula etc. those is not to ask questions that are primarily about a lack of statistical understanding (although they are not strictly prohibited). The lack of statistical understanding was his own judgmental conclusion which he should have kept for himself, if he starts throwing stones around he should not expect to get flowers back. Previous to this I also received some totally out of place private emails from him and I am not the kind of person that takes B.S. from anyone, he got the wrong guy. And in fact his great fallacious conclusions originated from his lack of reading skills, besides I don't really think he read it at all but just try to run me down with his attacks and unwelcome remarks. Your original post suggests that you knew this, I know there is plenty of people in this group who can give me a good answer :) but chose to ignore it. Despite this, Bert was generous enough to give you some suggestions, perhaps not what you wanted but useful tips nonetheless. My original post only suggested that I know there is people with knowledge about this practical applied statistics problems, nothing else. Before addressing the list I talked to two TA's and one professor. Their help was generic but helpful nevertheless. I preferred to address the question to people with practical working knowledge of ANOVA (I don't think there is a huge population in this area) and the best place I can think of is the R list, the place where I would be subscribed if I worked on these problems every day. Statistics lists will be full of college students who will have equivalent knowledge to what I already have and there they will probably only agree to say yes your QQ looks non-normal and heavy tailed which is what I already knew ... this is a similar answer I got from a TA and couple of student friends doing the MSc in Statistics track. Mr. Gunter did not read/understand my problem, and there were no useful tips but only ad hominem attacks. By your side-taking I suspect you are in the same party club if you want to defend him maybe you should start by tying better your dog so to speak. Regarding your suggestion that the list be split into a beginner and advanced list, while that is one option, your original question was appropriate for neither. It was, however, very appropriate for a statistics list (e.g., http://stats.stackexchange.com/). Thank you for the link, it looks very promising. Best regards, Giovanni __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [OT] 1 vs 2-way anova technical question
On Tue, Nov 22, 2011 at 2:09 PM, Giovanni Azua brave...@gmail.com wrote: Mr. Gunter did not read/understand my problem, and there were no useful tips but only ad hominem attacks. By your side-taking I suspect you are in the same party club if you want to defend him maybe you should start by tying better your dog so to speak. I believe that most of the readers of this thread got put off by your offending and misplaced remarks. To echo other posters, it would be nice to get your e-mail address banned from the list. Regards Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [OT] 1 vs 2-way anova technical question
On Nov 22, 2011, at 3:52 PM, Liviu Andronic wrote: On Tue, Nov 22, 2011 at 2:09 PM, Giovanni Azua brave...@gmail.com wrote: Mr. Gunter did not read/understand my problem, and there were no useful tips but only ad hominem attacks. By your side-taking I suspect you are in the same party club if you want to defend him maybe you should start by tying better your dog so to speak. I believe that most of the readers of this thread got put off by your offending and misplaced remarks. To echo other posters, it would be nice to get your e-mail address banned from the list. If needed go ahead and do so, your blockade won't stop my learning efforts. I don't see any concrete reason why I should be taking bullets from random people who fancy themselves with superiority and arrogance. And as usual these bullies always seem to win. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [OT] 1 vs 2-way anova technical question
Hello, Couple of clarifications: - A,B,C,D are factors and I am also interested in possible interactions but the model that comes out from aov R~A*B*C*D violates the model assumptions - My 2^k is unbalanced i.e. missing data and an additional level I also include in one of the factors i.e. C - I was referring in the OP to the 4-way interactions and not 2-way, I'm sorry for my confusion. - I tried to create an aov model with less interactions this way but I get the following error: model.aov - aov(log(R)~A+B+I(A*B)+C+D,data=throughput) Error in `contrasts-`(`*tmp*`, value = contr.treatment) : contrasts can be applied only to factors with 2 or more levels In addition: Warning message: In Ops.factor(A, B) : * not meaningful for factors Here I was trying to say: do a one-way anova except for the A and B factors for which I would like to get their 2-way interactions ... Thanks in advance, Best regards, Giovanni On Nov 21, 2011, at 12:04 PM, Giovanni Azua wrote: Hello, I know there is plenty of people in this group who can give me a good answer :) I have a 2^k model where k=4 like this: Model 1) R~A*B*C*D If I use the * in R among all elements it means to me to explore all interactions and include them in the model i.e. I think this would be the so called 2-way anova. However, if I do this, it leads to model violations i.e. the homoscedasticity is violated, the normality assumption of the sample errors i.e. residuals is violated etc. I tried correcting the issues using different standard transformations: log, sqrt, Box-Cox forms etc but none really improve the result. In this case even though the model assumptions do not hold, some of the interactions are found to significatively influence the response variable. But then shall I trust the results of this Model 1) given that the assumptions do not hold? Then I try this other model where I exclude the interactions (is this the 1-way anova?): Model 2) R~A+B+C+D In this one the model assumptions hold except the existence of some outliers and a slightly heavy tail in the QQ-plot. Given that the assumptions for Model 1) do not hold, I assume I should ignore the results altogether for Model 1) or? or instead can I safely use the Sum Sq. of Model 1) to get my table of percent of variations? This to me was a bit counter-intuitive since I assumed that if there was collinearity among factors (and there is e.g. I(A*B*C)) the Model 1) and I included those interactions, my model would be more accurate ... ok this turned into a brand new topic of model selection but I am mostly interested in the question: if model is violated can I or must I not use the results e.g. Sum Sqr for that model? Can anyone advice please? btw I have bought most books on R and statistical analysis. I have researched them all and the ANOVA coverage is very shallow in most of them specially in the R-sy ones, they just offer a slightly pimped up version of the R-help. I am also unofficially following a course on ANOVA from the university I am registered in and most examples are too simplistic and either the assumptions just hold easily or the assumptions don't hold and nothing happens. Thanks in advance, Best regards, Giovanni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [OT] 1 vs 2-way anova technical question
Giovanni: 1. Please read ?formula and/or An Introduction to R for how to specify linear models in R. 2. Correct specification of what you want (if I understand correctly) is log(R) ~ A*B + C + D 3. ... which presumably will also fail because some of your factors have only one level, which means that you cannot use them in your model. 4. ... which, in turn, suggests you don't know what your doing statistically and should seek local assistance, especially in trying to interpret a fit to an unbalanced model (you can't do it as you probably think you can). I should say in your defense that posts on this list indicate that point 4 is a widely shared problem among posters here. Cheers, Bert On Mon, Nov 21, 2011 at 5:02 AM, Giovanni Azua brave...@gmail.com wrote: Hello, Couple of clarifications: - A,B,C,D are factors and I am also interested in possible interactions but the model that comes out from aov R~A*B*C*D violates the model assumptions - My 2^k is unbalanced i.e. missing data and an additional level I also include in one of the factors i.e. C - I was referring in the OP to the 4-way interactions and not 2-way, I'm sorry for my confusion. - I tried to create an aov model with less interactions this way but I get the following error: model.aov - aov(log(R)~A+B+I(A*B)+C+D,data=throughput) Error in `contrasts-`(`*tmp*`, value = contr.treatment) : contrasts can be applied only to factors with 2 or more levels In addition: Warning message: In Ops.factor(A, B) : * not meaningful for factors Here I was trying to say: do a one-way anova except for the A and B factors for which I would like to get their 2-way interactions ... Thanks in advance, Best regards, Giovanni On Nov 21, 2011, at 12:04 PM, Giovanni Azua wrote: Hello, I know there is plenty of people in this group who can give me a good answer :) I have a 2^k model where k=4 like this: Model 1) R~A*B*C*D If I use the * in R among all elements it means to me to explore all interactions and include them in the model i.e. I think this would be the so called 2-way anova. However, if I do this, it leads to model violations i.e. the homoscedasticity is violated, the normality assumption of the sample errors i.e. residuals is violated etc. I tried correcting the issues using different standard transformations: log, sqrt, Box-Cox forms etc but none really improve the result. In this case even though the model assumptions do not hold, some of the interactions are found to significatively influence the response variable. But then shall I trust the results of this Model 1) given that the assumptions do not hold? Then I try this other model where I exclude the interactions (is this the 1-way anova?): Model 2) R~A+B+C+D In this one the model assumptions hold except the existence of some outliers and a slightly heavy tail in the QQ-plot. Given that the assumptions for Model 1) do not hold, I assume I should ignore the results altogether for Model 1) or? or instead can I safely use the Sum Sq. of Model 1) to get my table of percent of variations? This to me was a bit counter-intuitive since I assumed that if there was collinearity among factors (and there is e.g. I(A*B*C)) the Model 1) and I included those interactions, my model would be more accurate ... ok this turned into a brand new topic of model selection but I am mostly interested in the question: if model is violated can I or must I not use the results e.g. Sum Sqr for that model? Can anyone advice please? btw I have bought most books on R and statistical analysis. I have researched them all and the ANOVA coverage is very shallow in most of them specially in the R-sy ones, they just offer a slightly pimped up version of the R-help. I am also unofficially following a course on ANOVA from the university I am registered in and most examples are too simplistic and either the assumptions just hold easily or the assumptions don't hold and nothing happens. Thanks in advance, Best regards, Giovanni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [OT] 1 vs 2-way anova technical question
Hello Bert, Thank you for taking the time to try to answer. 1) I know this, however if one is interested in only interaction between two specific factors then in R one uses I(A*B*C) meaning 3-way anova for that and not the implicit 2-ways that would otherwise be computed. 2) True, but it fails. 3) No, I don't have any factors with one level, I never said that. It would not be a 2^k experiment otherwise, my OP states this clearly, this is a 2^k experimental design ___2___ 4) this is only your judgmental attitude that many people unfortunately have in some of these lists, focussing on ad-hominem judgements or even attacks to try to prove their superiority without actually answering nor adding any value to the question at hand. I have taken many graduate courses in subjects that have all Statistics in the title and passed all of them. However, as an experienced Software Engineer working for more than 10 years in the field, I can tell you that there is a huge difference between solving toy problems to implementing real-life complex projects. Same rules apply here, one thing is the toy examples one finds in R books and course exercises and another totally different story is the real life data I am trying to model. I'm a student in the quantitative part and learning, so I do have some gaps, I am curious and trying to learn and I think there is no shame in that. If this makes you upset maybe you should ask to split the list in two or more: Advanc! ed-PhD-black-belt-10th-dan-in-Statistics-and-R level list and newbies list. Best regards, Giovanni On Nov 21, 2011, at 3:55 PM, Bert Gunter wrote: Giovanni: 1. Please read ?formula and/or An Introduction to R for how to specify linear models in R. 2. Correct specification of what you want (if I understand correctly) is log(R) ~ A*B + C + D 3. ... which presumably will also fail because some of your factors have only one level, which means that you cannot use them in your model. 4. ... which, in turn, suggests you don't know what your doing statistically and should seek local assistance, especially in trying to interpret a fit to an unbalanced model (you can't do it as you probably think you can). I should say in your defense that posts on this list indicate that point 4 is a widely shared problem among posters here. Cheers, Bert On Mon, Nov 21, 2011 at 5:02 AM, Giovanni Azua brave...@gmail.com wrote: Hello, Couple of clarifications: - A,B,C,D are factors and I am also interested in possible interactions but the model that comes out from aov R~A*B*C*D violates the model assumptions - My 2^k is unbalanced i.e. missing data and an additional level I also include in one of the factors i.e. C - I was referring in the OP to the 4-way interactions and not 2-way, I'm sorry for my confusion. - I tried to create an aov model with less interactions this way but I get the following error: model.aov - aov(log(R)~A+B+I(A*B)+C+D,data=throughput) Error in `contrasts-`(`*tmp*`, value = contr.treatment) : contrasts can be applied only to factors with 2 or more levels In addition: Warning message: In Ops.factor(A, B) : * not meaningful for factors Here I was trying to say: do a one-way anova except for the A and B factors for which I would like to get their 2-way interactions ... Thanks in advance, Best regards, Giovanni On Nov 21, 2011, at 12:04 PM, Giovanni Azua wrote: Hello, I know there is plenty of people in this group who can give me a good answer :) I have a 2^k model where k=4 like this: Model 1) R~A*B*C*D If I use the * in R among all elements it means to me to explore all interactions and include them in the model i.e. I think this would be the so called 2-way anova. However, if I do this, it leads to model violations i.e. the homoscedasticity is violated, the normality assumption of the sample errors i.e. residuals is violated etc. I tried correcting the issues using different standard transformations: log, sqrt, Box-Cox forms etc but none really improve the result. In this case even though the model assumptions do not hold, some of the interactions are found to significatively influence the response variable. But then shall I trust the results of this Model 1) given that the assumptions do not hold? Then I try this other model where I exclude the interactions (is this the 1-way anova?): Model 2) R~A+B+C+D In this one the model assumptions hold except the existence of some outliers and a slightly heavy tail in the QQ-plot. Given that the assumptions for Model 1) do not hold, I assume I should ignore the results altogether for Model 1) or? or instead can I safely use the Sum Sq. of Model 1) to get my table of percent of variations? This to me was a bit counter-intuitive since I assumed that if there was collinearity among factors (and there is e.g. I(A*B*C)) the Model 1) and I included those interactions, my model would be more
Re: [R] [OT] 1 vs 2-way anova technical question
Giovanni, Have you tried Bert suggestion 2)? Because his log(R) ~ A*B + C + D is NOT the same as your log(R)~A+B+I(A*B)+C+D Note that I(A * B) means: create a new variable that is the product of A and B. Which is not meaningfull if A and B are factors (hence the warning you got). So I(A * B) is not the interaction between A and B. You need A:B if you want the interaction. Thierry -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Giovanni Azua Verzonden: maandag 21 november 2011 17:00 Aan: r-help@r-project.org Onderwerp: Re: [R] [OT] 1 vs 2-way anova technical question Hello Bert, Thank you for taking the time to try to answer. 1) I know this, however if one is interested in only interaction between two specific factors then in R one uses I(A*B*C) meaning 3-way anova for that and not the implicit 2-ways that would otherwise be computed. 2) True, but it fails. 3) No, I don't have any factors with one level, I never said that. It would not be a 2^k experiment otherwise, my OP states this clearly, this is a 2^k experimental design ___2___ 4) this is only your judgmental attitude that many people unfortunately have in some of these lists, focussing on ad-hominem judgements or even attacks to try to prove their superiority without actually answering nor adding any value to the question at hand. I have taken many graduate courses in subjects that have all Statistics in the title and passed all of them. However, as an experienced Software Engineer working for more than 10 years in the field, I can tell you that there is a huge difference between solving toy problems to implementing real- life complex projects. Same rules apply here, one thing is the toy examples one finds in R books and course exercises and another totally different story is the real life data I am trying to model. I'm a student in the quantitative part and learning, so I do have some gaps, I am curious and trying to learn and I think there is no shame in that. If this makes you upset maybe you should ask to split the list in two or more: Advanc! ed-PhD-black-belt-10th-dan-in-Statistics-and-R level list and newbies list. Best regards, Giovanni On Nov 21, 2011, at 3:55 PM, Bert Gunter wrote: Giovanni: 1. Please read ?formula and/or An Introduction to R for how to specify linear models in R. 2. Correct specification of what you want (if I understand correctly) is log(R) ~ A*B + C + D 3. ... which presumably will also fail because some of your factors have only one level, which means that you cannot use them in your model. 4. ... which, in turn, suggests you don't know what your doing statistically and should seek local assistance, especially in trying to interpret a fit to an unbalanced model (you can't do it as you probably think you can). I should say in your defense that posts on this list indicate that point 4 is a widely shared problem among posters here. Cheers, Bert On Mon, Nov 21, 2011 at 5:02 AM, Giovanni Azua brave...@gmail.com wrote: Hello, Couple of clarifications: - A,B,C,D are factors and I am also interested in possible interactions but the model that comes out from aov R~A*B*C*D violates the model assumptions - My 2^k is unbalanced i.e. missing data and an additional level I also include in one of the factors i.e. C - I was referring in the OP to the 4-way interactions and not 2-way, I'm sorry for my confusion. - I tried to create an aov model with less interactions this way but I get the following error: model.aov - aov(log(R)~A+B+I(A*B)+C+D,data=throughput) Error in `contrasts-`(`*tmp*`, value = contr.treatment) : contrasts can be applied only to factors with 2 or more levels In addition: Warning message: In Ops.factor(A, B) : * not meaningful for factors Here I was trying to say: do a one-way anova except for the A and B factors for which I would like to get their 2-way interactions ... Thanks in advance, Best regards, Giovanni On Nov 21, 2011, at 12:04 PM, Giovanni Azua wrote: Hello, I know there is plenty of people in this group who can give me a good answer :) I have a 2^k model where k=4 like this: Model 1) R~A*B*C*D If I use the * in R among all elements it means to me to explore all interactions and include them in the model i.e. I think this would be the so called 2-way anova. However, if I do this, it leads to model violations i.e. the homoscedasticity is violated, the normality assumption of the sample errors i.e. residuals is violated etc. I tried correcting the issues using different standard transformations: log, sqrt, Box-Cox forms etc but none really improve the result. In this case even though the model assumptions do not hold, some of the interactions are found to significatively influence the response variable. But then shall I
Re: [R] [OT] 1 vs 2-way anova technical question
the way I interpret the problem (and I may be wrong here, I don't think you have been particularly clear with your question) is that you are trying to make a factorial anova where you are trying to explain R as a result of A,B,C and D, and their interaction terms. so using A*B*C*D. what you should consider is the error family of your data (poisson, binomial...) and use: model-glm(R~A*B*C*D) and then simplify your model. I suggest reading chapter 7 in Crawleys Statistics: an introduction using R. and combined with the statistical knowledge you have learnt on one of your courses you should hopefully find the answer. Perhaps you could also speak to someone within the course you are registered to and some statistics focussed forums - it tends to annoy some people on here when they find a stats question on their R mailing list, obviously they don't have a delete button... Good luck. Rob -Original Message- From: Giovanni Azua Sent: Monday, November 21, 2011 4:59 PM To: r-help@r-project.org Subject: Re: [R] [OT] 1 vs 2-way anova technical question Hello Bert, Thank you for taking the time to try to answer. 1) I know this, however if one is interested in only interaction between two specific factors then in R one uses I(A*B*C) meaning 3-way anova for that and not the implicit 2-ways that would otherwise be computed. 2) True, but it fails. 3) No, I don't have any factors with one level, I never said that. It would not be a 2^k experiment otherwise, my OP states this clearly, this is a 2^k experimental design ___2___ 4) this is only your judgmental attitude that many people unfortunately have in some of these lists, focussing on ad-hominem judgements or even attacks to try to prove their superiority without actually answering nor adding any value to the question at hand. I have taken many graduate courses in subjects that have all Statistics in the title and passed all of them. However, as an experienced Software Engineer working for more than 10 years in the field, I can tell you that there is a huge difference between solving toy problems to implementing real-life complex projects. Same rules apply here, one thing is the toy examples one finds in R books and course exercises and another totally different story is the real life data I am trying to model. I'm a student in the quantitative part and learning, so I do have some gaps, I am curious and trying to learn and I think there is no shame in that. If this makes you upset maybe you should ask to split the list in two or more: Advanc! ed-PhD-black-belt-10th-dan-in-Statistics-and-R level list and newbies list. Best regards, Giovanni On Nov 21, 2011, at 3:55 PM, Bert Gunter wrote: Giovanni: 1. Please read ?formula and/or An Introduction to R for how to specify linear models in R. 2. Correct specification of what you want (if I understand correctly) is log(R) ~ A*B + C + D 3. ... which presumably will also fail because some of your factors have only one level, which means that you cannot use them in your model. 4. ... which, in turn, suggests you don't know what your doing statistically and should seek local assistance, especially in trying to interpret a fit to an unbalanced model (you can't do it as you probably think you can). I should say in your defense that posts on this list indicate that point 4 is a widely shared problem among posters here. Cheers, Bert On Mon, Nov 21, 2011 at 5:02 AM, Giovanni Azua brave...@gmail.com wrote: Hello, Couple of clarifications: - A,B,C,D are factors and I am also interested in possible interactions but the model that comes out from aov R~A*B*C*D violates the model assumptions - My 2^k is unbalanced i.e. missing data and an additional level I also include in one of the factors i.e. C - I was referring in the OP to the 4-way interactions and not 2-way, I'm sorry for my confusion. - I tried to create an aov model with less interactions this way but I get the following error: model.aov - aov(log(R)~A+B+I(A*B)+C+D,data=throughput) Error in `contrasts-`(`*tmp*`, value = contr.treatment) : contrasts can be applied only to factors with 2 or more levels In addition: Warning message: In Ops.factor(A, B) : * not meaningful for factors Here I was trying to say: do a one-way anova except for the A and B factors for which I would like to get their 2-way interactions ... Thanks in advance, Best regards, Giovanni On Nov 21, 2011, at 12:04 PM, Giovanni Azua wrote: Hello, I know there is plenty of people in this group who can give me a good answer :) I have a 2^k model where k=4 like this: Model 1) R~A*B*C*D If I use the * in R among all elements it means to me to explore all interactions and include them in the model i.e. I think this would be the so called 2-way anova. However, if I do this, it leads to model violations i.e. the homoscedasticity is violated, the normality assumption of the sample errors i.e. residuals is violated etc. I tried correcting
Re: [R] [OT] 1 vs 2-way anova technical question
Thanks Thierry: I had missed that the OP's failure to read the formula docs and use of I(A*B) was what caused the error. Mea Culpa. However, I actually agree with Giovanni's remarks about the difference between what is typically taught and what one faces in practice. Where we disagree is that I think data analysts with limited statistical backgrounds should consult with local statisticians instead of trying to muddle through on their own thru lists like this. This is not meant to be arrogance on my part -- though it may seem to come across that way -- but rather a plea for good science. I believe that bad statistics -- bad science, a problem that I see as pervasive and inimical to scientific progress, especially in today's data saturated world. But enough of my off topic B.S. Please reply privately to not waste yet more space here (positively or negatively -- stone throwers need to catch them, too). Cheers, -- Bert On Mon, Nov 21, 2011 at 8:20 AM, ONKELINX, Thierry thierry.onkel...@inbo.be wrote: Giovanni, Have you tried Bert suggestion 2)? Because his log(R) ~ A*B + C + D is NOT the same as your log(R)~A+B+I(A*B)+C+D Note that I(A * B) means: create a new variable that is the product of A and B. Which is not meaningfull if A and B are factors (hence the warning you got). So I(A * B) is not the interaction between A and B. You need A:B if you want the interaction. Thierry -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Giovanni Azua Verzonden: maandag 21 november 2011 17:00 Aan: r-help@r-project.org Onderwerp: Re: [R] [OT] 1 vs 2-way anova technical question Hello Bert, Thank you for taking the time to try to answer. 1) I know this, however if one is interested in only interaction between two specific factors then in R one uses I(A*B*C) meaning 3-way anova for that and not the implicit 2-ways that would otherwise be computed. 2) True, but it fails. 3) No, I don't have any factors with one level, I never said that. It would not be a 2^k experiment otherwise, my OP states this clearly, this is a 2^k experimental design ___2___ 4) this is only your judgmental attitude that many people unfortunately have in some of these lists, focussing on ad-hominem judgements or even attacks to try to prove their superiority without actually answering nor adding any value to the question at hand. I have taken many graduate courses in subjects that have all Statistics in the title and passed all of them. However, as an experienced Software Engineer working for more than 10 years in the field, I can tell you that there is a huge difference between solving toy problems to implementing real- life complex projects. Same rules apply here, one thing is the toy examples one finds in R books and course exercises and another totally different story is the real life data I am trying to model. I'm a student in the quantitative part and learning, so I do have some gaps, I am curious and trying to learn and I think there is no shame in that. If this makes you upset maybe you should ask to split the list in two or more: Advanc! ed-PhD-black-belt-10th-dan-in-Statistics-and-R level list and newbies list. Best regards, Giovanni On Nov 21, 2011, at 3:55 PM, Bert Gunter wrote: Giovanni: 1. Please read ?formula and/or An Introduction to R for how to specify linear models in R. 2. Correct specification of what you want (if I understand correctly) is log(R) ~ A*B + C + D 3. ... which presumably will also fail because some of your factors have only one level, which means that you cannot use them in your model. 4. ... which, in turn, suggests you don't know what your doing statistically and should seek local assistance, especially in trying to interpret a fit to an unbalanced model (you can't do it as you probably think you can). I should say in your defense that posts on this list indicate that point 4 is a widely shared problem among posters here. Cheers, Bert On Mon, Nov 21, 2011 at 5:02 AM, Giovanni Azua brave...@gmail.com wrote: Hello, Couple of clarifications: - A,B,C,D are factors and I am also interested in possible interactions but the model that comes out from aov R~A*B*C*D violates the model assumptions - My 2^k is unbalanced i.e. missing data and an additional level I also include in one of the factors i.e. C - I was referring in the OP to the 4-way interactions and not 2-way, I'm sorry for my confusion. - I tried to create an aov model with less interactions this way but I get the following error: model.aov - aov(log(R)~A+B+I(A*B)+C+D,data=throughput) Error in `contrasts-`(`*tmp*`, value = contr.treatment) : contrasts can be applied only to factors with 2 or more levels In addition: Warning message: In Ops.factor(A, B) : * not meaningful for factors Here I was trying to say: do a one
Re: [R] [OT] 1 vs 2-way anova technical question
Hello Rob, Thank you for your suggestions. I tried glm too without success. Anyhow I include all the information just in case someone with good knowledge can give me a hand with this. I take log of the response variable because: - its values span across multiple orders of magnitudes - the diagnostic plots e.g. QQ, residuals vs fitted etc do improve with that. Below I include: 1) general summary of my data 2) 1-way anova and summary of the model 3) 4-way anova and summary of the model Attached: a) Overview of the data (where main interactions occur i.e. No_databases and No_middlewares) b) diagnostic plots for 2) Here the Normality assumption of the residuals looks reasonable c) diagnostic plots for 3) Here the Normality assumption of the residuals does not seem to hold so it invalidates the 4-way aov model? I tried glm and it delivers similar results as 3) My impression is that my system is heavily polluted with outliers one can see that from plot a) how much the mean and the median differ due to the outliers. That's just the way the system I implemented behaves. Btw the system is a multi-tiered architecture that I developed in Java from scratch that includes XA and different data access and partitioning patterns. I need to quantitatively analyze and draw conclusion from this system. Most of my class mates just make it real simple: make 2^k experiments take one grand mean out of each experiment and do the ANOVA on those means i.e. 1-repetition, compute the fraction of variation and that's it. I am trying to model it more deeply by checking model assumptions, etc. Many thanks in advance, Best regards, Giovanni str(throughput) 'data.frame': 479 obs. of 9 variables: $ Time : num 7 8 9 10 11 12 13 14 15 16 ... $ Throughput: int 155 155 154 157 155 214 4631 2118 136 132 ... $ Workload : chr All All All All ... $ No_databases : Factor w/ 2 levels 1,4: 1 1 1 1 1 1 1 1 1 1 ... $ Partitioning : Factor w/ 2 levels sharding,replication: 1 1 1 1 1 1 1 1 1 1 ... $ No_middlewares: Factor w/ 3 levels 1,2,4: 1 1 1 1 1 1 1 1 1 1 ... $ Queue_size: Factor w/ 2 levels 40,100: 1 1 1 1 1 1 1 1 1 1 ... $ No_clients: Factor w/ 1 level 64: 1 1 1 1 1 1 1 1 1 1 ... $ Experimental_error: Factor w/ 1 level 1: 1 1 1 1 1 1 1 1 1 1 ... summary(throughput) Time Throughput Workload No_databases Partitioning No_middlewares Min. : 7.00 Min. : 35.0 Length:479 1:239sharding :240 1:160 1st Qu.:11.50 1st Qu.: 50.5 Class :character 4:240 replication:239 2:159 Median :16.00 Median : 744.0 Mode :character 4:160 Mean :16.48 Mean : 830.3 3rd Qu.:21.00 3rd Qu.:1205.5 Max. :26.00 Max. :4631.0 Queue_size No_clients Experimental_error 40 :24064:479 1:479 100:239 ## ### ## ## ANOVA one-way interaction ## ## ### throughput.aov - aov(log(Throughput)~No_databases+Partitioning+No_middlewares+Queue_size,data=throughput) throughput.aov Call: aov(formula = log(Throughput) ~ No_databases + Partitioning + No_middlewares + Queue_size, data = throughput) Terms: No_databases Partitioning No_middlewares Queue_size Residuals Sum of Squares 521.5264 5.697150.5814 0.4628 476.6826 Deg. of Freedom11 2 1 473 Residual standard error: 1.003885 Estimated effects may be unbalanced summary(throughput.aov) Df Sum Sq Mean Sq F valuePr(F) No_databases 1 521.53 521.53 517.4974 2.2e-16 *** Partitioning 1 5.705.70 5.6530 0.01782 * No_middlewares 2 50.58 25.29 25.0953 4.381e-11 *** Queue_size 1 0.460.46 0.4592 0.49833 Residuals 473 476.681.01 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 ## ### ## ## ANOVA 4-way interaction ## ## ### throughput.aov - aov(log(Throughput)~No_databases*Partitioning*No_middlewares*Queue_size,data=throughput) throughput.aov Call: aov(formula = log(Throughput) ~ No_databases * Partitioning * No_middlewares * Queue_size, data = throughput) Terms: No_databases Partitioning No_middlewares Queue_size No_databases:Partitioning Sum of Squares 521.5264 5.697150.5814 0.4628 96.9198 Deg. of Freedom11 2 1
Re: [R] [OT] 1 vs 2-way anova technical question
On Nov 21, 2011, at 8:31 PM, Bert Gunter wrote: we disagree is that I think data analysts with limited statistical backgrounds should consult with local statisticians instead of trying to muddle through on their own thru lists like this. This is not meant I think that people lacking reading skills should not be subscribed in lists like this one, bullying and creating confusion around. I will asks as many times as I want/need and the way I use lists if none of your f. b. to be arrogance on my part -- though it may seem to come across that way -- but rather a plea for good science. I believe that bad statistics -- bad science, a problem that I see as pervasive and inimical to scientific progress, especially in today's data saturated world. But enough of my off topic B.S. Please reply privately to not waste yet more space here (positively or negatively -- stone throwers need to catch them, too). You are actually full of your off topic prime matter, you arrogant prick. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [OT] 1 vs 2-way anova technical question
This sort of post seems to me to be completely unacceptable. Is there a mechanism by which the list manager can unsubscribe Mr. Azua and keep him unsubscribed until he learns some manners? cheers, Rolf Turner On 22/11/11 10:28, Giovanni Azua wrote: On Nov 21, 2011, at 8:31 PM, Bert Gunter wrote: we disagree is that I think data analysts with limited statistical backgrounds should consult with local statisticians instead of trying to muddle through on their own thru lists like this. This is not meant I think that people lacking reading skills should not be subscribed in lists like this one, bullying and creating confusion around. I will asks as many times as I want/need and the way I use lists if none of your f. b. to be arrogance on my part -- though it may seem to come across that way -- but rather a plea for good science. I believe that bad statistics -- bad science, a problem that I see as pervasive and inimical to scientific progress, especially in today's data saturated world. But enough of my off topic B.S. Please reply privately to not waste yet more space here (positively or negatively -- stone throwers need to catch them, too). You are actually full of your off topic prime matter, you arrogant prick. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.