Hi James M. Clark Professor & Chair of Psychology [email protected] Room 4L41A 204-786-9757 204-774-4134 Fax Dept of Psychology, U of Winnipeg 515 Portage Ave, Winnipeg, MB R3B 0R4 CANADA
>>> "Mike Palij" <[email protected]> 12-Apr-13 7:00 AM >>> On Thu, 11 Apr 2013 19:50:14 -0700, Jim Clark wrote: Consider Leo Dicara's research and if a meta-analysis were done of his research on operant conditioning of the autonomic nervous system. After the initial positive findings, replications fail and stop being done. But, net-net, there will be some non-zero effect size because (a) of the early effects and (b) the overall large sample size. But Dworkin and Miller show the problems in this: JC: I was simply asking a statistical question about whether aggregating p values would produce different results than a test on the entire sample using exactly the same data. As Mike P has focused on cases where there is no effect, I changed my simulation so that the null was true (i.e., samples drawn from mu = 50 vs pop mu = 50). The aggregate results for 250 samples of 10 produced a p = ~.5 using Fisher's procedure. The overall t was also not even close to significant. Mike P carries on: You consistently avoid the issue of: (1) Making a firm decision of what effect size a researcher thinks is present and whether it is best to view it as a fixed effect or a random effect. JC: Yes, I was not interested in that question, as noted above, except to examine the case of an effect with low power for individual tests. And I'm not aware that fixed vs random factors applies to a single sample test. I thought it had to do with the levels of a factor when multiple conditions were being compared, and across studies the conditions stayed the same (fixed) or varied (random). Mike P: (2) Doing an a priori power analysis in order to determine what is the probability of detecting an effect (i.e., prob of reject a false null hypothesis). If statistical power is less than .50, I think that it is unethical to allow such research to be done -- who wants to do research where the probability of making a Type II error is greater than 50%. In the course of doing an a priori power analysis, one can determine what the number of subjects/participants one will need to detect the effect size one has specified (and the total sample will probably some more people in order to take into account subject loss due to attrition, errors made in procedure, acts of God, etc.). JC: I've never been enamored of the idea of mixing methods and ethics questions, but perhaps you have a more positive view of REBs than I do. And if a researcher recognizes the dangers of weak power and acts accordingly, I'm not sure what the ethical issues would be. If you can't get enough subjects for an acceptable level of power (e.g., power= .80, which I consider to be low because it means that there is a 20% chance of committing a Type II error, a rate 4 times that of making a Type I error -- this makes clear a researcher's bias and costs associated with making errors), one shouldn't do the study. One might consider doing a pilot study to get an estimate of the effect size that one might obtain and if it is too small to be detected given your resources, do a qualitative study instead. Neuroimaging studies are expensive all over the place and it is very bad practice to use them in studies where it is almost impossible to detect a false null hypothesis. They should be used only in studies where firm conclusions can be reached (i..e., high powered, properly conducted studies). This is a waste of precious resources and this is the type of practice the Button et al complain about. Using low power studies and then meta-analyzing them may result in one detecting systematic errors and bias unrelated to the phenomenon being study (i.e., the "tweaking" that researchers do to get statistically significant results). Meta-analyze Dicara's published studies and tell me what is the mean effect is that one obtains. After you do so, I'll explain why it's wrong. JC: There may very well be biases in the system (researchers, publication practices, press releases, ...), which I acknowledged several times, that make the aggregate approach problematic. My question was simply whether the statistical approach of aggregating ps itself was problematic. Under the idealistic circumstances of a simulation, it indeed appears to NOT DETECT effects that are NOT there, and to DETECT effects that ARE there, albeit rarely significant in the individual samples. Take care Jim --- You are currently subscribed to tips as: [email protected]. To unsubscribe click here: http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=24986 or send a blank email to leave-24986-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu
<<attachment: Jim_Clark.vcf>>
