Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-22 Thread Joshua Wiley
On Mon, Nov 21, 2011 at 1:28 PM, Giovanni Azua brave...@gmail.com wrote:

 On Nov 21, 2011, at 8:31 PM, Bert Gunter wrote:
 we disagree is that I think data analysts with limited statistical
 backgrounds should consult with local statisticians instead of trying
 to muddle through on their own thru lists like this. This is not meant

 I think that people lacking reading skills should not be subscribed in lists 
 like this one, bullying and creating confusion around.

I agree with you that emails lists are not the place for those who cannot read.


 I will asks as many times as I want/need and the way I use lists if none of 
 your f. b.

It is true the way you use general lists is not our business, but the
R-help list is a community and there are community rules.  One of
those is not to ask questions that are primarily about a lack of
statistical understanding (although they are not strictly prohibited).
 Your original post suggests that you knew this, I know there is
plenty of people in this group who can give me a good answer :) but
chose to ignore it.  Despite this, Bert was generous enough to give
you some suggestions, perhaps not what you wanted but useful tips
nonetheless.

You may ask many times, but failing to follow guidelines and thinly
veiled profanity are unlikely to endear you to the people here who can
offer useful suggestions.  Further, to me, your comment below falls
under the, Ad hominem comments are absolutely out of place.  Note
that no loop hole or pass is given for the behavior of other parties.
That is, ad hominem comments do not become in place if someone else
is rude enough or makes them first.

Regarding your suggestion that the list be split into a beginner and
advanced list, while that is one option, your original question was
appropriate for neither.  It was, however, very appropriate for a
statistics list (e.g., http://stats.stackexchange.com/).


 to be arrogance on my part -- though it may seem to come across that
 way -- but rather a plea for good science. I believe that bad
 statistics -- bad science, a problem that I see as pervasive and
 inimical to scientific progress, especially in today's data saturated
 world.

 But enough of my off topic B.S. Please reply privately to not waste
 yet more space here (positively or negatively -- stone throwers need
 to catch them, too).


 You are actually full of your off topic prime matter, you arrogant prick.

To me, this is extremely offensive.  Disagreements are inevitable, but
we can strive to keep them civil.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

I agree with Rolf Turner's idea that it would be nice if there was a
mechanism to limit these sorts of posts.

Sincerely,

Josh

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, ATS Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-22 Thread Giovanni Azua

On Nov 22, 2011, at 10:35 AM, Joshua Wiley wrote:
 It is true the way you use general lists is not our business, but the
 R-help list is a community and there are community rules.  One of

I meant that my use of the lists is not of __his__ business I wasn't referring 
to you nor other people in this list. Ok the reading skills remark starts to 
get recursive ... and btw the OP even though marked in the subject as OT was 
not entirely so i.e. use of R formula etc. 

 those is not to ask questions that are primarily about a lack of
 statistical understanding (although they are not strictly prohibited).

The lack of statistical understanding was his own judgmental conclusion which 
he should have kept for himself, if he starts throwing stones around he should 
not expect to get flowers back. Previous to this I also received some totally 
out of place private emails from him and I am not the kind of person that takes 
B.S. from anyone, he got the wrong guy. And in fact his great fallacious 
conclusions originated from his lack of reading skills, besides I don't really 
think he read it at all but just try to run me down with his attacks and 
unwelcome remarks. 

 Your original post suggests that you knew this, I know there is
 plenty of people in this group who can give me a good answer :) but
 chose to ignore it.  Despite this, Bert was generous enough to give
 you some suggestions, perhaps not what you wanted but useful tips
 nonetheless.
 

My original post only suggested that I know there is people with knowledge 
about this practical applied statistics problems, nothing else. Before 
addressing the list I talked to two TA's and one professor. Their help was 
generic but helpful nevertheless. I preferred to address the question to people 
with practical working knowledge of ANOVA (I don't think there is a huge 
population in this area) and the best place I can think of is the R list, the 
place where I would be subscribed if I worked on these problems every day. 
Statistics lists will be full of college students who will have equivalent 
knowledge to what I already have and there they will probably only agree to say 
yes your QQ looks non-normal and heavy tailed which is what I already knew ... 
this is a similar answer I got from a TA and couple of student friends doing 
the MSc in Statistics track.   

Mr. Gunter did not read/understand my problem, and there were no useful tips 
but only ad hominem attacks. By your side-taking I suspect you are in the same 
party club if you want to defend him maybe you should start by tying better 
your dog so to speak.

 Regarding your suggestion that the list be split into a beginner and
 advanced list, while that is one option, your original question was
 appropriate for neither.  It was, however, very appropriate for a
 statistics list (e.g., http://stats.stackexchange.com/).


Thank you for the link, it looks very promising. 

Best regards,
Giovanni
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-22 Thread Liviu Andronic
On Tue, Nov 22, 2011 at 2:09 PM, Giovanni Azua brave...@gmail.com wrote:
 Mr. Gunter did not read/understand my problem, and there were no useful tips 
 but only ad hominem attacks. By your side-taking I suspect you are in the 
 same party club if you want to defend him maybe you should start by tying 
 better your dog so to speak.

I believe that most of the readers of this thread got put off by your
offending and misplaced remarks. To echo other posters, it would be
nice to get your e-mail address banned from the list.

Regards
Liviu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-22 Thread Giovanni Azua

On Nov 22, 2011, at 3:52 PM, Liviu Andronic wrote:

 On Tue, Nov 22, 2011 at 2:09 PM, Giovanni Azua brave...@gmail.com wrote:
 Mr. Gunter did not read/understand my problem, and there were no useful tips 
 but only ad hominem attacks. By your side-taking I suspect you are in the 
 same party club if you want to defend him maybe you should start by tying 
 better your dog so to speak.
 
 I believe that most of the readers of this thread got put off by your
 offending and misplaced remarks. To echo other posters, it would be
 nice to get your e-mail address banned from the list.

If needed go ahead and do so, your blockade won't stop my learning efforts. I 
don't see any concrete reason why I should be taking bullets from random people 
who fancy themselves with superiority and arrogance. And as usual these bullies 
always seem to win.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-21 Thread Giovanni Azua
Hello,

Couple of clarifications: 
- A,B,C,D are factors and I am also interested in possible interactions but the 
model that comes out from aov R~A*B*C*D violates the model assumptions
- My 2^k is unbalanced i.e. missing data and an additional level I also include 
in one of the factors i.e. C
- I was referring in the OP to the 4-way interactions and not 2-way, I'm sorry 
for my confusion.
- I tried to create an aov model with less interactions this way but I get the 
following error:

model.aov - aov(log(R)~A+B+I(A*B)+C+D,data=throughput)
Error in `contrasts-`(`*tmp*`, value = contr.treatment) : 
  contrasts can be applied only to factors with 2 or more levels
In addition: Warning message:
In Ops.factor(A, B) : * not meaningful for factors

Here I was trying to say: do a one-way anova except for the A and B factors for 
which I would like to get their 2-way interactions ...

Thanks in advance,
Best regards,
Giovanni

On Nov 21, 2011, at 12:04 PM, Giovanni Azua wrote:

 Hello,
 
 I know there is plenty of people in this group who can give me a good answer 
 :)
 
 I have a 2^k model where k=4 like this:
 Model 1) R~A*B*C*D
 
 If I use the * in R among all elements it means to me to explore all 
 interactions and include them in the model i.e. I think this would be the so 
 called 2-way anova. However, if I do this, it leads to model violations i.e. 
 the homoscedasticity is violated, the normality assumption of the sample 
 errors i.e. residuals is violated etc. I tried correcting the issues using 
 different standard transformations: log, sqrt, Box-Cox forms etc but none 
 really improve the result. In this case even though the model assumptions do 
 not hold, some of the interactions are found to significatively influence the 
 response variable. But then shall I trust the results of this Model 1) given 
 that the assumptions do not hold?
 
 Then I try this other model where I exclude the interactions (is this the 
 1-way anova?):
 Model 2) R~A+B+C+D
 
 In this one the model assumptions hold except the existence of some outliers 
 and a slightly heavy tail in the QQ-plot.
 
 Given that the assumptions for Model 1) do not hold, I assume I should ignore 
 the results altogether for Model 1) or? or instead can I safely use the Sum 
 Sq. of Model 1) to get my table of percent of variations?
 
 This to me was a bit counter-intuitive since I assumed that if there was 
 collinearity among factors (and there is e.g. I(A*B*C)) the Model 1) and I 
 included those interactions, my model would be more accurate ... ok this 
 turned into a brand new topic of model selection but I am mostly interested 
 in the question: if model is violated can I or must I not use the results 
 e.g. Sum Sqr for that model?
 
 Can anyone advice please?
 
 btw I have bought most books on R and statistical analysis. I have researched 
 them all and the ANOVA coverage is very shallow in most of them specially in 
 the R-sy ones, they just offer a slightly pimped up version of the R-help. 
 
 I am also unofficially following a course on ANOVA from the university I am 
 registered in and most examples are too simplistic and either the assumptions 
 just hold easily or the assumptions don't hold and nothing happens.  
 
 Thanks in advance,
 Best regards,
 Giovanni
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-21 Thread Bert Gunter
Giovanni:

1. Please read ?formula and/or An Introduction to R for how to specify
linear models in R.

2. Correct specification of what you want (if I understand correctly) is
log(R) ~ A*B + C + D

3. ... which presumably will also fail because some of your factors
have only one level, which means that you cannot use them in your
model.

4. ... which, in turn, suggests you don't know what your doing
statistically and should seek local assistance, especially in trying
to interpret a fit to an unbalanced model (you can't do it as you
probably think you can).

I should say in your defense that posts on this list indicate that
point 4 is a widely shared problem among posters here.

Cheers,
Bert

On Mon, Nov 21, 2011 at 5:02 AM, Giovanni Azua brave...@gmail.com wrote:
 Hello,

 Couple of clarifications:
 - A,B,C,D are factors and I am also interested in possible interactions but 
 the model that comes out from aov R~A*B*C*D violates the model assumptions
 - My 2^k is unbalanced i.e. missing data and an additional level I also 
 include in one of the factors i.e. C
 - I was referring in the OP to the 4-way interactions and not 2-way, I'm 
 sorry for my confusion.
 - I tried to create an aov model with less interactions this way but I get 
 the following error:

 model.aov - aov(log(R)~A+B+I(A*B)+C+D,data=throughput)
 Error in `contrasts-`(`*tmp*`, value = contr.treatment) :
  contrasts can be applied only to factors with 2 or more levels
 In addition: Warning message:
 In Ops.factor(A, B) : * not meaningful for factors

 Here I was trying to say: do a one-way anova except for the A and B factors 
 for which I would like to get their 2-way interactions ...

 Thanks in advance,
 Best regards,
 Giovanni

 On Nov 21, 2011, at 12:04 PM, Giovanni Azua wrote:

 Hello,

 I know there is plenty of people in this group who can give me a good answer 
 :)

 I have a 2^k model where k=4 like this:
 Model 1) R~A*B*C*D

 If I use the * in R among all elements it means to me to explore all 
 interactions and include them in the model i.e. I think this would be the so 
 called 2-way anova. However, if I do this, it leads to model violations i.e. 
 the homoscedasticity is violated, the normality assumption of the sample 
 errors i.e. residuals is violated etc. I tried correcting the issues using 
 different standard transformations: log, sqrt, Box-Cox forms etc but none 
 really improve the result. In this case even though the model assumptions do 
 not hold, some of the interactions are found to significatively influence 
 the response variable. But then shall I trust the results of this Model 1) 
 given that the assumptions do not hold?

 Then I try this other model where I exclude the interactions (is this the 
 1-way anova?):
 Model 2) R~A+B+C+D

 In this one the model assumptions hold except the existence of some outliers 
 and a slightly heavy tail in the QQ-plot.

 Given that the assumptions for Model 1) do not hold, I assume I should 
 ignore the results altogether for Model 1) or? or instead can I safely use 
 the Sum Sq. of Model 1) to get my table of percent of variations?

 This to me was a bit counter-intuitive since I assumed that if there was 
 collinearity among factors (and there is e.g. I(A*B*C)) the Model 1) and I 
 included those interactions, my model would be more accurate ... ok this 
 turned into a brand new topic of model selection but I am mostly interested 
 in the question: if model is violated can I or must I not use the results 
 e.g. Sum Sqr for that model?

 Can anyone advice please?

 btw I have bought most books on R and statistical analysis. I have 
 researched them all and the ANOVA coverage is very shallow in most of them 
 specially in the R-sy ones, they just offer a slightly pimped up version of 
 the R-help.

 I am also unofficially following a course on ANOVA from the university I am 
 registered in and most examples are too simplistic and either the 
 assumptions just hold easily or the assumptions don't hold and nothing 
 happens.

 Thanks in advance,
 Best regards,
 Giovanni



        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-21 Thread Giovanni Azua
Hello Bert,

Thank you for taking the time to try to answer.

1) I know this, however if one is interested in only interaction between two 
specific factors then in R one uses I(A*B*C) meaning 3-way anova for that and 
not the implicit 2-ways that would otherwise be computed.

2) True, but it fails.

3) No, I don't have any factors with one level, I never said that. It would not 
be a 2^k experiment otherwise, my OP states this clearly, this is a 2^k 
experimental design ___2___

4) this is only your judgmental attitude that many people unfortunately have in 
some of these lists, focussing on ad-hominem judgements or even attacks to try 
to prove their superiority without actually answering nor adding any value to 
the question at hand. I have taken many graduate courses in subjects that have 
all Statistics in the title and passed all of them. However, as an experienced 
Software Engineer working for more than 10 years in the field, I can tell you 
that there is a huge difference between solving toy problems to implementing 
real-life complex projects.  Same rules apply here, one thing is the toy 
examples one finds in R books and course exercises and another totally 
different story is the real life data I am trying to model. I'm a student in 
the quantitative part and learning, so I do have some gaps, I am curious and 
trying to learn and I think there is no shame in that. If this makes you upset 
maybe you should ask to split the list in two or more: Advanc!
 ed-PhD-black-belt-10th-dan-in-Statistics-and-R level list and newbies list.

Best regards,
Giovanni

On Nov 21, 2011, at 3:55 PM, Bert Gunter wrote:

 Giovanni:
 
 1. Please read ?formula and/or An Introduction to R for how to specify
 linear models in R.
 
 2. Correct specification of what you want (if I understand correctly) is
 log(R) ~ A*B + C + D
 
 3. ... which presumably will also fail because some of your factors
 have only one level, which means that you cannot use them in your
 model.
 
 4. ... which, in turn, suggests you don't know what your doing
 statistically and should seek local assistance, especially in trying
 to interpret a fit to an unbalanced model (you can't do it as you
 probably think you can).
 
 I should say in your defense that posts on this list indicate that
 point 4 is a widely shared problem among posters here.
 
 Cheers,
 Bert
 
 On Mon, Nov 21, 2011 at 5:02 AM, Giovanni Azua brave...@gmail.com wrote:
 Hello,
 
 Couple of clarifications:
 - A,B,C,D are factors and I am also interested in possible interactions but 
 the model that comes out from aov R~A*B*C*D violates the model assumptions
 - My 2^k is unbalanced i.e. missing data and an additional level I also 
 include in one of the factors i.e. C
 - I was referring in the OP to the 4-way interactions and not 2-way, I'm 
 sorry for my confusion.
 - I tried to create an aov model with less interactions this way but I get 
 the following error:
 
 model.aov - aov(log(R)~A+B+I(A*B)+C+D,data=throughput)
 Error in `contrasts-`(`*tmp*`, value = contr.treatment) :
  contrasts can be applied only to factors with 2 or more levels
 In addition: Warning message:
 In Ops.factor(A, B) : * not meaningful for factors
 
 Here I was trying to say: do a one-way anova except for the A and B factors 
 for which I would like to get their 2-way interactions ...
 
 Thanks in advance,
 Best regards,
 Giovanni
 
 On Nov 21, 2011, at 12:04 PM, Giovanni Azua wrote:
 
 Hello,
 
 I know there is plenty of people in this group who can give me a good 
 answer :)
 
 I have a 2^k model where k=4 like this:
 Model 1) R~A*B*C*D
 
 If I use the * in R among all elements it means to me to explore all 
 interactions and include them in the model i.e. I think this would be the 
 so called 2-way anova. However, if I do this, it leads to model violations 
 i.e. the homoscedasticity is violated, the normality assumption of the 
 sample errors i.e. residuals is violated etc. I tried correcting the issues 
 using different standard transformations: log, sqrt, Box-Cox forms etc but 
 none really improve the result. In this case even though the model 
 assumptions do not hold, some of the interactions are found to 
 significatively influence the response variable. But then shall I trust the 
 results of this Model 1) given that the assumptions do not hold?
 
 Then I try this other model where I exclude the interactions (is this the 
 1-way anova?):
 Model 2) R~A+B+C+D
 
 In this one the model assumptions hold except the existence of some 
 outliers and a slightly heavy tail in the QQ-plot.
 
 Given that the assumptions for Model 1) do not hold, I assume I should 
 ignore the results altogether for Model 1) or? or instead can I safely use 
 the Sum Sq. of Model 1) to get my table of percent of variations?
 
 This to me was a bit counter-intuitive since I assumed that if there was 
 collinearity among factors (and there is e.g. I(A*B*C)) the Model 1) and I 
 included those interactions, my model would be more 

Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-21 Thread ONKELINX, Thierry
Giovanni,

Have you tried Bert suggestion 2)? Because his log(R) ~ A*B + C + D is NOT the 
same as your log(R)~A+B+I(A*B)+C+D

Note that I(A * B) means: create a new variable that is the product of A and B. 
Which is not meaningfull if A and B are factors (hence the warning you got).
So I(A * B) is not the interaction between A and B. You need A:B if you want 
the interaction.

Thierry


 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 Namens Giovanni Azua
 Verzonden: maandag 21 november 2011 17:00
 Aan: r-help@r-project.org
 Onderwerp: Re: [R] [OT] 1 vs 2-way anova technical question
 
 Hello Bert,
 
 Thank you for taking the time to try to answer.
 
 1) I know this, however if one is interested in only interaction between two
 specific factors then in R one uses I(A*B*C) meaning 3-way anova for that and
 not the implicit 2-ways that would otherwise be computed.
 
 2) True, but it fails.
 
 3) No, I don't have any factors with one level, I never said that. It would 
 not be a
 2^k experiment otherwise, my OP states this clearly, this is a 2^k 
 experimental
 design ___2___
 
 4) this is only your judgmental attitude that many people unfortunately have 
 in
 some of these lists, focussing on ad-hominem judgements or even attacks to try
 to prove their superiority without actually answering nor adding any value to 
 the
 question at hand. I have taken many graduate courses in subjects that have all
 Statistics in the title and passed all of them. However, as an experienced
 Software Engineer working for more than 10 years in the field, I can tell you 
 that
 there is a huge difference between solving toy problems to implementing real-
 life complex projects.  Same rules apply here, one thing is the toy examples 
 one
 finds in R books and course exercises and another totally different story is 
 the
 real life data I am trying to model. I'm a student in the quantitative part 
 and
 learning, so I do have some gaps, I am curious and trying to learn and I think
 there is no shame in that. If this makes you upset maybe you should ask to 
 split
 the list in two or more: Advanc!
  ed-PhD-black-belt-10th-dan-in-Statistics-and-R level list and newbies 
 list.
 
 Best regards,
 Giovanni
 
 On Nov 21, 2011, at 3:55 PM, Bert Gunter wrote:
 
  Giovanni:
 
  1. Please read ?formula and/or An Introduction to R for how to specify
  linear models in R.
 
  2. Correct specification of what you want (if I understand correctly)
  is
  log(R) ~ A*B + C + D
 
  3. ... which presumably will also fail because some of your factors
  have only one level, which means that you cannot use them in your
  model.
 
  4. ... which, in turn, suggests you don't know what your doing
  statistically and should seek local assistance, especially in trying
  to interpret a fit to an unbalanced model (you can't do it as you
  probably think you can).
 
  I should say in your defense that posts on this list indicate that
  point 4 is a widely shared problem among posters here.
 
  Cheers,
  Bert
 
  On Mon, Nov 21, 2011 at 5:02 AM, Giovanni Azua brave...@gmail.com
 wrote:
  Hello,
 
  Couple of clarifications:
  - A,B,C,D are factors and I am also interested in possible
  interactions but the model that comes out from aov R~A*B*C*D violates
  the model assumptions
  - My 2^k is unbalanced i.e. missing data and an additional level I
  also include in one of the factors i.e. C
  - I was referring in the OP to the 4-way interactions and not 2-way, I'm 
  sorry
 for my confusion.
  - I tried to create an aov model with less interactions this way but I get 
  the
 following error:
 
  model.aov - aov(log(R)~A+B+I(A*B)+C+D,data=throughput)
  Error in `contrasts-`(`*tmp*`, value = contr.treatment) :
   contrasts can be applied only to factors with 2 or more levels In
  addition: Warning message:
  In Ops.factor(A, B) : * not meaningful for factors
 
  Here I was trying to say: do a one-way anova except for the A and B factors
 for which I would like to get their 2-way interactions ...
 
  Thanks in advance,
  Best regards,
  Giovanni
 
  On Nov 21, 2011, at 12:04 PM, Giovanni Azua wrote:
 
  Hello,
 
  I know there is plenty of people in this group who can give me a
  good answer :)
 
  I have a 2^k model where k=4 like this:
  Model 1) R~A*B*C*D
 
  If I use the * in R among all elements it means to me to explore all
 interactions and include them in the model i.e. I think this would be the so 
 called
 2-way anova. However, if I do this, it leads to model violations i.e. the
 homoscedasticity is violated, the normality assumption of the sample errors 
 i.e.
 residuals is violated etc. I tried correcting the issues using different 
 standard
 transformations: log, sqrt, Box-Cox forms etc but none really improve the 
 result.
 In this case even though the model assumptions do not hold, some of the
 interactions are found to significatively influence the response variable. 
 But then
 shall I

Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-21 Thread Rob Griffin
the way I interpret the problem (and I may be wrong here, I don't think you 
have been particularly clear with your question) is that you are trying to 
make a factorial anova where you are trying to explain R as a result of 
A,B,C and D, and their interaction terms. so using A*B*C*D.
what you should consider is the error family of your data (poisson, 
binomial...) and use: model-glm(R~A*B*C*D) and then simplify your model.
I suggest reading chapter 7 in Crawleys Statistics: an introduction using 
R. and combined with the statistical knowledge you have learnt on one of 
your courses you should hopefully find the answer. Perhaps you could also 
speak to someone within the course you are registered to and some statistics 
focussed forums - it tends to annoy some people on here when they find a 
stats question on their R mailing list, obviously they don't have a delete 
button...


Good luck.
Rob



-Original Message- 
From: Giovanni Azua

Sent: Monday, November 21, 2011 4:59 PM
To: r-help@r-project.org
Subject: Re: [R] [OT] 1 vs 2-way anova technical question

Hello Bert,

Thank you for taking the time to try to answer.

1) I know this, however if one is interested in only interaction between two 
specific factors then in R one uses I(A*B*C) meaning 3-way anova for that 
and not the implicit 2-ways that would otherwise be computed.


2) True, but it fails.

3) No, I don't have any factors with one level, I never said that. It would 
not be a 2^k experiment otherwise, my OP states this clearly, this is a 2^k 
experimental design ___2___


4) this is only your judgmental attitude that many people unfortunately have 
in some of these lists, focussing on ad-hominem judgements or even attacks 
to try to prove their superiority without actually answering nor adding any 
value to the question at hand. I have taken many graduate courses in 
subjects that have all Statistics in the title and passed all of them. 
However, as an experienced Software Engineer working for more than 10 years 
in the field, I can tell you that there is a huge difference between solving 
toy problems to implementing real-life complex projects.  Same rules apply 
here, one thing is the toy examples one finds in R books and course 
exercises and another totally different story is the real life data I am 
trying to model. I'm a student in the quantitative part and learning, so I 
do have some gaps, I am curious and trying to learn and I think there is no 
shame in that. If this makes you upset maybe you should ask to split the 
list in two or more: Advanc!
ed-PhD-black-belt-10th-dan-in-Statistics-and-R level list and newbies 
list.


Best regards,
Giovanni

On Nov 21, 2011, at 3:55 PM, Bert Gunter wrote:


Giovanni:

1. Please read ?formula and/or An Introduction to R for how to specify
linear models in R.

2. Correct specification of what you want (if I understand correctly) is
log(R) ~ A*B + C + D

3. ... which presumably will also fail because some of your factors
have only one level, which means that you cannot use them in your
model.

4. ... which, in turn, suggests you don't know what your doing
statistically and should seek local assistance, especially in trying
to interpret a fit to an unbalanced model (you can't do it as you
probably think you can).

I should say in your defense that posts on this list indicate that
point 4 is a widely shared problem among posters here.

Cheers,
Bert

On Mon, Nov 21, 2011 at 5:02 AM, Giovanni Azua brave...@gmail.com wrote:

Hello,

Couple of clarifications:
- A,B,C,D are factors and I am also interested in possible interactions 
but the model that comes out from aov R~A*B*C*D violates the model 
assumptions
- My 2^k is unbalanced i.e. missing data and an additional level I also 
include in one of the factors i.e. C
- I was referring in the OP to the 4-way interactions and not 2-way, I'm 
sorry for my confusion.
- I tried to create an aov model with less interactions this way but I 
get the following error:


model.aov - aov(log(R)~A+B+I(A*B)+C+D,data=throughput)
Error in `contrasts-`(`*tmp*`, value = contr.treatment) :
 contrasts can be applied only to factors with 2 or more levels
In addition: Warning message:
In Ops.factor(A, B) : * not meaningful for factors

Here I was trying to say: do a one-way anova except for the A and B 
factors for which I would like to get their 2-way interactions ...


Thanks in advance,
Best regards,
Giovanni

On Nov 21, 2011, at 12:04 PM, Giovanni Azua wrote:


Hello,

I know there is plenty of people in this group who can give me a good 
answer :)


I have a 2^k model where k=4 like this:
Model 1) R~A*B*C*D

If I use the * in R among all elements it means to me to explore all 
interactions and include them in the model i.e. I think this would be 
the so called 2-way anova. However, if I do this, it leads to model 
violations i.e. the homoscedasticity is violated, the normality 
assumption of the sample errors i.e. residuals is violated etc. I tried 
correcting

Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-21 Thread Bert Gunter
Thanks Thierry:

I had missed that the OP's failure to read the formula docs and use of
I(A*B) was what caused the error. Mea Culpa.

However, I actually agree with Giovanni's remarks about the difference
between what is typically taught and what one faces in practice. Where
we disagree is that I think data analysts with limited statistical
backgrounds should consult with local statisticians instead of trying
to muddle through on their own thru lists like this. This is not meant
to be arrogance on my part -- though it may seem to come across that
way -- but rather a plea for good science. I believe that bad
statistics -- bad science, a problem that I see as pervasive and
inimical to scientific progress, especially in today's data saturated
world.

But enough of my off topic B.S. Please reply privately to not waste
yet more space here (positively or negatively -- stone throwers need
to catch them, too).

Cheers,

-- Bert

On Mon, Nov 21, 2011 at 8:20 AM, ONKELINX, Thierry
thierry.onkel...@inbo.be wrote:
 Giovanni,

 Have you tried Bert suggestion 2)? Because his log(R) ~ A*B + C + D is NOT 
 the same as your log(R)~A+B+I(A*B)+C+D

 Note that I(A * B) means: create a new variable that is the product of A and 
 B. Which is not meaningfull if A and B are factors (hence the warning you 
 got).
 So I(A * B) is not the interaction between A and B. You need A:B if you want 
 the interaction.

 Thierry


 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 Namens Giovanni Azua
 Verzonden: maandag 21 november 2011 17:00
 Aan: r-help@r-project.org
 Onderwerp: Re: [R] [OT] 1 vs 2-way anova technical question

 Hello Bert,

 Thank you for taking the time to try to answer.

 1) I know this, however if one is interested in only interaction between two
 specific factors then in R one uses I(A*B*C) meaning 3-way anova for that and
 not the implicit 2-ways that would otherwise be computed.

 2) True, but it fails.

 3) No, I don't have any factors with one level, I never said that. It would 
 not be a
 2^k experiment otherwise, my OP states this clearly, this is a 2^k 
 experimental
 design ___2___

 4) this is only your judgmental attitude that many people unfortunately have 
 in
 some of these lists, focussing on ad-hominem judgements or even attacks to 
 try
 to prove their superiority without actually answering nor adding any value 
 to the
 question at hand. I have taken many graduate courses in subjects that have 
 all
 Statistics in the title and passed all of them. However, as an experienced
 Software Engineer working for more than 10 years in the field, I can tell 
 you that
 there is a huge difference between solving toy problems to implementing real-
 life complex projects.  Same rules apply here, one thing is the toy examples 
 one
 finds in R books and course exercises and another totally different story is 
 the
 real life data I am trying to model. I'm a student in the quantitative part 
 and
 learning, so I do have some gaps, I am curious and trying to learn and I 
 think
 there is no shame in that. If this makes you upset maybe you should ask to 
 split
 the list in two or more: Advanc!
  ed-PhD-black-belt-10th-dan-in-Statistics-and-R level list and newbies 
 list.

 Best regards,
 Giovanni

 On Nov 21, 2011, at 3:55 PM, Bert Gunter wrote:

  Giovanni:
 
  1. Please read ?formula and/or An Introduction to R for how to specify
  linear models in R.
 
  2. Correct specification of what you want (if I understand correctly)
  is
  log(R) ~ A*B + C + D
 
  3. ... which presumably will also fail because some of your factors
  have only one level, which means that you cannot use them in your
  model.
 
  4. ... which, in turn, suggests you don't know what your doing
  statistically and should seek local assistance, especially in trying
  to interpret a fit to an unbalanced model (you can't do it as you
  probably think you can).
 
  I should say in your defense that posts on this list indicate that
  point 4 is a widely shared problem among posters here.
 
  Cheers,
  Bert
 
  On Mon, Nov 21, 2011 at 5:02 AM, Giovanni Azua brave...@gmail.com
 wrote:
  Hello,
 
  Couple of clarifications:
  - A,B,C,D are factors and I am also interested in possible
  interactions but the model that comes out from aov R~A*B*C*D violates
  the model assumptions
  - My 2^k is unbalanced i.e. missing data and an additional level I
  also include in one of the factors i.e. C
  - I was referring in the OP to the 4-way interactions and not 2-way, I'm 
  sorry
 for my confusion.
  - I tried to create an aov model with less interactions this way but I 
  get the
 following error:
 
  model.aov - aov(log(R)~A+B+I(A*B)+C+D,data=throughput)
  Error in `contrasts-`(`*tmp*`, value = contr.treatment) :
   contrasts can be applied only to factors with 2 or more levels In
  addition: Warning message:
  In Ops.factor(A, B) : * not meaningful for factors
 
  Here I was trying to say: do a one

Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-21 Thread Giovanni Azua
Hello Rob,

Thank you for your suggestions. I tried glm too without success. Anyhow I 
include all the information just in case someone with good knowledge can give 
me a hand with this. I take log of the response variable because: 
- its values span across multiple orders of magnitudes 
- the diagnostic plots e.g. QQ, residuals vs fitted etc do improve with that.

Below I include:
1) general summary of my data
2) 1-way anova and summary of the model
3) 4-way anova and summary of the model  

Attached:
a) Overview of the data (where main interactions occur i.e. No_databases and 
No_middlewares)
b) diagnostic plots for 2) Here the Normality assumption of the residuals looks 
reasonable
c) diagnostic plots for 3) Here the Normality assumption of the residuals does 
not seem to hold so it invalidates the 4-way aov model?

I tried glm and it delivers similar results as 3)

My impression is that my system is heavily polluted with outliers one can see 
that from plot a) how much the mean and the median differ due to the outliers. 
That's just the way the system I implemented behaves. Btw the system is a 
multi-tiered architecture that I developed in Java from scratch that includes 
XA and different data access and partitioning patterns. I need to 
quantitatively analyze and draw conclusion from this system. Most of my class 
mates just make it real simple: make 2^k experiments take one grand mean out of 
each experiment and do the ANOVA on those means i.e. 1-repetition, compute the 
fraction of variation and that's it. I am trying to model it more deeply by 
checking model assumptions, etc. 

Many thanks in advance,
Best regards,
Giovanni

 str(throughput)
'data.frame':   479 obs. of  9 variables:
 $ Time  : num  7 8 9 10 11 12 13 14 15 16 ...
 $ Throughput: int  155 155 154 157 155 214 4631 2118 136 132 ...
 $ Workload  : chr  All All All All ...
 $ No_databases  : Factor w/ 2 levels 1,4: 1 1 1 1 1 1 1 1 1 1 ...
 $ Partitioning  : Factor w/ 2 levels sharding,replication: 1 1 1 1 1 1 
1 1 1 1 ...
 $ No_middlewares: Factor w/ 3 levels 1,2,4: 1 1 1 1 1 1 1 1 1 1 ...
 $ Queue_size: Factor w/ 2 levels 40,100: 1 1 1 1 1 1 1 1 1 1 ...
 $ No_clients: Factor w/ 1 level 64: 1 1 1 1 1 1 1 1 1 1 ...
 $ Experimental_error: Factor w/ 1 level 1: 1 1 1 1 1 1 1 1 1 1 ...

 summary(throughput)
  Time Throughput   Workload No_databases  
Partitioning No_middlewares
 Min.   : 7.00   Min.   :  35.0   Length:479 1:239sharding   
:240   1:160 
 1st Qu.:11.50   1st Qu.:  50.5   Class :character   4:240
replication:239   2:159 
 Median :16.00   Median : 744.0   Mode  :character  
4:160 
 Mean   :16.48   Mean   : 830.3 
  
 3rd Qu.:21.00   3rd Qu.:1205.5 
  
 Max.   :26.00   Max.   :4631.0 
  
 Queue_size No_clients Experimental_error
 40 :24064:479 1:479 
 100:239   

## ###
##
##  ANOVA one-way interaction
##
## ###
 throughput.aov - 
 aov(log(Throughput)~No_databases+Partitioning+No_middlewares+Queue_size,data=throughput)
 throughput.aov
Call:
   aov(formula = log(Throughput) ~ No_databases + Partitioning + 
No_middlewares + Queue_size, data = throughput)

Terms:
No_databases Partitioning No_middlewares Queue_size Residuals
Sum of Squares  521.5264   5.697150.5814 0.4628  476.6826
Deg. of Freedom11  2  1   473

Residual standard error: 1.003885 
Estimated effects may be unbalanced
 summary(throughput.aov)
Df Sum Sq Mean Sq  F valuePr(F)
No_databases  1 521.53  521.53 517.4974  2.2e-16 ***
Partitioning   1   5.705.70   5.6530   0.01782 *  
No_middlewares   2  50.58   25.29  25.0953 4.381e-11 ***
Queue_size  1   0.460.46   0.4592   0.49833
Residuals  473 476.681.01   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 

## ###
##
##  ANOVA 4-way interaction
##
## ###

 throughput.aov - 
 aov(log(Throughput)~No_databases*Partitioning*No_middlewares*Queue_size,data=throughput)
 throughput.aov
Call:
   aov(formula = log(Throughput) ~ No_databases * Partitioning * 
No_middlewares * Queue_size, data = throughput)

Terms:
No_databases Partitioning No_middlewares Queue_size 
No_databases:Partitioning
Sum of Squares  521.5264   5.697150.5814 0.4628 
  96.9198
Deg. of Freedom11  2  1

Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-21 Thread Giovanni Azua

On Nov 21, 2011, at 8:31 PM, Bert Gunter wrote:
 we disagree is that I think data analysts with limited statistical
 backgrounds should consult with local statisticians instead of trying
 to muddle through on their own thru lists like this. This is not meant

I think that people lacking reading skills should not be subscribed in lists 
like this one, bullying and creating confusion around. 

I will asks as many times as I want/need and the way I use lists if none of 
your f. b. 

 to be arrogance on my part -- though it may seem to come across that
 way -- but rather a plea for good science. I believe that bad
 statistics -- bad science, a problem that I see as pervasive and
 inimical to scientific progress, especially in today's data saturated
 world.

 But enough of my off topic B.S. Please reply privately to not waste
 yet more space here (positively or negatively -- stone throwers need
 to catch them, too).
 

You are actually full of your off topic prime matter, you arrogant prick.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-21 Thread Rolf Turner


This sort of post seems to me to be completely unacceptable.
Is there a mechanism by which the list manager can unsubscribe
Mr. Azua and keep him unsubscribed until he learns some manners?

cheers,

Rolf Turner

On 22/11/11 10:28, Giovanni Azua wrote:

On Nov 21, 2011, at 8:31 PM, Bert Gunter wrote:

we disagree is that I think data analysts with limited statistical
backgrounds should consult with local statisticians instead of trying
to muddle through on their own thru lists like this. This is not meant

I think that people lacking reading skills should not be subscribed in lists 
like this one, bullying and creating confusion around.

I will asks as many times as I want/need and the way I use lists if none of 
your f. b.


to be arrogance on my part -- though it may seem to come across that
way -- but rather a plea for good science. I believe that bad
statistics --  bad science, a problem that I see as pervasive and
inimical to scientific progress, especially in today's data saturated
world.
But enough of my off topic B.S. Please reply privately to not waste
yet more space here (positively or negatively -- stone throwers need
to catch them, too).


You are actually full of your off topic prime matter, you arrogant prick.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.