Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
"Conclusion: try to be/become a good scientist: with a high prevalence of good ideas." Or, I would say: "try to publish only good and mature ideas". Gauss said it best "pauca sed matura" or "few, but ripe." Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Claudia Beleites Sent: Friday, January 07, 2011 1:46 PM To: r-help@r-project.org Subject: Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity On 01/07/2011 06:13 AM, Spencer Graves wrote: > A more insidious problem, that may not affect the work of Jonah > Lehrer, is political corruption in the way research is funded, with > less public and more private funding of research Maybe I'm too pessimistic, but the term _political_ corruption reminds me that I can just as easily imagine a "funding bias"* in public funding. And I'm not sure it is (or would be) less of a problem just because the interests of private funding are easier to spot. * I think of bias on both sides: the funding agency selecting the studies to support and the researcher subconsciously complying to the expectations of the funding agency. On 01/07/2011 08:06 AM, Peter Langfelder wrote: > > From a purely statistical and maybe somewhat naive point of view, > published p-values should be corrected for the multiple testing that > is effectively happening because of the large number of published > studies. My experience is also that people will often try several > statistical methods to get the most significant p-value but neglect to > share that fact with the audience and/or at least attempt to correct > the p-values for the selection bias. Even if the number of all the tests were known, I have the impression that the corrected p-value would be kind of the right answer to the wrong question. I'm not particularly interested in the probability of arriving at the presented findings if the null hypothesis were true. I'd rather know the probability that the conclusions are true. Switching to the language of clinical chemistry, this is: I'm presented with the sensitivity of a test, but I really want to know the positive predictive value. What is still missing with the corrected p-values is the "prevalence of good ideas" of the publishing scientist (not even known for all scientists). And I'm not sure this is not decreasing if the scientist generates and tests more and more ideas. I found my rather hazy thoughts about this much better expressed in the books of Beck-Bornholdt and Dubben (which I'm afraid are only available in German). Conclusion: try to be/become a good scientist: with a high prevalence of good ideas. At least with a high prevalence of good ideas among the tested hypotheses. Including thinking first which hypotheses are the ones to test, and not giving in to the temptation to try out more and more things as one gets more familiar with the experiment/data set/problem. The latter I find very difficult. Including the experience of giving a presentation where I explicitly talked about why I did not do any data-driven optimization of my models. Yet in the discussion I was very prominently told I need to try in addition these other pre-processing techniques and these other modeling techniques - even by people whom I know to be very much aware and concerned about optimistically biased validation results. Which were of course very valid questions (and easy to comply), but I conclude it is common/natural/human to have and want to try out more ideas. Also, after several years in the field and with the same kind of samples of course I run the risk of my ideas being overfit to our kind of samples - this is a cost that I have to pay for the gain due to experience/expertise. Some more thoughts: - reproducibility: I'm analytical chemist. We have huge amounts of work going into round robin trials in order to measure the "natural" variability of different labs on very defined systems. - we also have huge amounts of work going into calibration transfer, i.e. making quantitative predictive models work on a different instrument. This is always a whole lot of work, and for some fields of problems at the moment considered basically impossible even between two instruments of the same model and manufacturer. The quoted results on the mice are not very astonishing to me... ;-) - Talking about (not so) astonishing differences between between replications of experiments: I find myself moving from reporting ± 1 standard deviation to reporting e.g. the 5th to 95th percentiles. Not onl
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
The issue Spencer brings up is a problem whether the funding is private or public. Just as businesses fund studies that support their goals, government agencies fund studies that justify the need for their services and expansion of their powers and budgets. In fact, there's a whole field of study variously called "public choice economics" and "the new institutional economics" that study these and related issues. On a related note, there is certainly a lot of self-selection bias in what fields of study people choose to enter. For just one example, it isn't too difficult to believe that of the pool of people talented and interested in statistics, those who choose to enter public health or epidemiology might be more likely to want research that justifies expansion of public health and environmental agencies' regulatory powers and this might affect the research questions they ask, the ways they design and select their statistical models, and what results they choose to include and exclude from publications. AFAIK, there is substantial evidence that researchers, espeically in non-experimental studies, tend to get results they "expect" or "hope" to find, even if they feel no conscious bias. This is likely one of the reasons observational studies are so frequently overturned by randomized controlled trials. RCT's provide less room for confirmation bias to rear its ugly head. Joel -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Spencer Graves Sent: Thursday, January 06, 2011 9:13 PM To: Carl Witthoft Cc: r-help@r-project.org Subject: Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity > A more insidious problem, that may not affect the work of Jonah Lehrer, >is political corruption in the way research is funded, with less public and more private funding of research (http://portal.unesco.org/education/en/ev.php-URL_ID=21052&URL_DO=DO_TOPIC&U RL_SECTION=201.html). ...as private funding replaces tax money for basic science, we must expect an increase in research results that match the needs of the funding agency while degrading the quality of published research. This produces more research that can not be replicated -- effects that get smaller upon replication. (My wife and I routinely avoid certain therapies recommended by physicians, because the physicians get much of their information on recent drugs from the pharmaceuticals, who have a vested interest in presenting their products in the most positive light.) Spencer On 1/6/2011 2:39 PM, Carl Witthoft wrote: > The next week's New Yorker has some decent rebuttal letters. The case > is hardly as clear-cut as the author would like to believe. > > Carl > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
On 01/07/2011 06:13 AM, Spencer Graves wrote: A more insidious problem, that may not affect the work of Jonah Lehrer, is political corruption in the way research is funded, with less public and more private funding of research Maybe I'm too pessimistic, but the term _political_ corruption reminds me that I can just as easily imagine a "funding bias"* in public funding. And I'm not sure it is (or would be) less of a problem just because the interests of private funding are easier to spot. * I think of bias on both sides: the funding agency selecting the studies to support and the researcher subconsciously complying to the expectations of the funding agency. On 01/07/2011 08:06 AM, Peter Langfelder wrote: > From a purely statistical and maybe somewhat naive point of view, published p-values should be corrected for the multiple testing that is effectively happening because of the large number of published studies. My experience is also that people will often try several statistical methods to get the most significant p-value but neglect to share that fact with the audience and/or at least attempt to correct the p-values for the selection bias. Even if the number of all the tests were known, I have the impression that the corrected p-value would be kind of the right answer to the wrong question. I'm not particularly interested in the probability of arriving at the presented findings if the null hypothesis were true. I'd rather know the probability that the conclusions are true. Switching to the language of clinical chemistry, this is: I'm presented with the sensitivity of a test, but I really want to know the positive predictive value. What is still missing with the corrected p-values is the "prevalence of good ideas" of the publishing scientist (not even known for all scientists). And I'm not sure this is not decreasing if the scientist generates and tests more and more ideas. I found my rather hazy thoughts about this much better expressed in the books of Beck-Bornholdt and Dubben (which I'm afraid are only available in German). Conclusion: try to be/become a good scientist: with a high prevalence of good ideas. At least with a high prevalence of good ideas among the tested hypotheses. Including thinking first which hypotheses are the ones to test, and not giving in to the temptation to try out more and more things as one gets more familiar with the experiment/data set/problem. The latter I find very difficult. Including the experience of giving a presentation where I explicitly talked about why I did not do any data-driven optimization of my models. Yet in the discussion I was very prominently told I need to try in addition these other pre-processing techniques and these other modeling techniques - even by people whom I know to be very much aware and concerned about optimistically biased validation results. Which were of course very valid questions (and easy to comply), but I conclude it is common/natural/human to have and want to try out more ideas. Also, after several years in the field and with the same kind of samples of course I run the risk of my ideas being overfit to our kind of samples - this is a cost that I have to pay for the gain due to experience/expertise. Some more thoughts: - reproducibility: I'm analytical chemist. We have huge amounts of work going into round robin trials in order to measure the "natural" variability of different labs on very defined systems. - we also have huge amounts of work going into calibration transfer, i.e. making quantitative predictive models work on a different instrument. This is always a whole lot of work, and for some fields of problems at the moment considered basically impossible even between two instruments of the same model and manufacturer. The quoted results on the mice are not very astonishing to me... ;-) - Talking about (not so) astonishing differences between between replications of experiments: I find myself moving from reporting ± 1 standard deviation to reporting e.g. the 5th to 95th percentiles. Not only because my data distributions are often not symmetric, but also because I find Im not able to directly perceive the real spread of the data from a standard deviation error bar. This is all about perception, of course I can reflect about the meaning. Such a reflection also tells me that one student having a really unlikely number of right guesses is unlikely but not impossible. There is no statistical law stating that unlikely events happen only with large sample sizes/number of tests. Yet the immediate perception is completely different. - I happily agree with the ideas of publishing findings (conclusions) as well as the data and data analysis code I used to arrive there. But I'm aware that part of this agreement is due to the fact that I'm quite interested in the data analytical methods (I'd say as well as in the particular chemical-analytical problem at hand, but rat
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
--- On Fri, 1/7/11, Peter Langfelder wrote: > From: Peter Langfelder > Subject: Re: [R] Waaaayy off topic...Statistical methods, pub bias, > scientific validity > To: "r-help@r-project.org" > Received: Friday, January 7, 2011, 2:06 AM > >From a purely statistical and > maybe somewhat naive point of view, > published p-values should be corrected for the multiple > testing that > is effectively happening because of the large number of > published > studies. My experience is also that people will often try > several > statistical methods to get the most significant p-value but > neglect to > share that fact with the audience and/or at least attempt > to correct > the p-values for the selection bias. > > That being said, it would seem that biomedical sciences do > make > progress, so some of the published results are presumably > correct :) > Totally a placebo effect :) > Peter > > On Thu, Jan 6, 2011 at 9:13 PM, Spencer Graves > > wrote: > > Part of the phenomenon can be explained by the > natural censorship in > > what is accepted for publication: Stronger results > tend to have less > > difficulty getting published. Therefore, given that > a result is published, > > it is evident that the estimated magnitude of the > effect is in average > > larger than it is in reality, just by the fact that > weaker results are less > > likely to be published. A study of the literature on > this subject might > > yield an interesting and valuable estimate of the > magnitude of this > > selection bias. > > > > > > A more insidious problem, that may not affect > the work of Jonah Lehrer, > > is political corruption in the way research is funded, > with less public and > > more private funding of research > > (http://portal.unesco.org/education/en/ev.php-URL_ID=21052&URL_DO=DO_TOPIC&URL_SECTION=201.html). > > For example, I've heard claims (which I cannot > substantiate right now) that > > cell phone companies allegedly lobbied successfully to > block funding for > > researchers they thought were likely to document > health problems with their > > products. Related claims have been made by > scientists in the US Food and > > Drug Administration that certain therapies were > approved on political > > grounds in spite of substantive questions about the > validity of the research > > backing the request for approval (e.g., > > www.naturalnews.com/025298_the_FDA_scientists.html). > Some of these > > accusations of political corruption may be groundless. > However, as private > > funding replaces tax money for basic science, we must > expect an increase in > > research results that match the needs of the funding > agency while degrading > > the quality of published research. This produces > more research that can not > > be replicated -- effects that get smaller upon > replication. (My wife and I > > routinely avoid certain therapies recommended by > physicians, because the > > physicians get much of their information on recent > drugs from the > > pharmaceuticals, who have a vested interest in > presenting their products in > > the most positive light.) > > > > > > Spencer > > > > > > On 1/6/2011 2:39 PM, Carl Witthoft wrote: > >> > >> The next week's New Yorker has some decent > rebuttal letters. The case is > >> hardly as clear-cut as the author would like to > believe. > >> > >> Carl > > __ > R-help@r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
I think that the strategy of Editors simply telling the authors to share or perish is a bit naïve. There are a number of practical challenges that need to be addressed in order to create a fair and effective open-learning environment. Eysenbach (BMJ 2001) and Vickers (2006) discuss these and some partial solutions. We need more creative thinking that uses both carrot and sticks. We also need more empirical experience with this. Perhaps, we can learn from fields, if there are any, that do a good job of data sharing and open learning. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu -Original Message- From: Spencer Graves [mailto:spencer.gra...@structuremonitoring.com] Sent: Friday, January 07, 2011 1:01 PM To: Ravi Varadhan Cc: 'Mike Marchywka'; r-help@r-project.org Subject: Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity I applaud your efforts, Ravi. Regarding "Whose data is it?", I humbly suggest that referees and editorial boards push (demand?) for rules that require the raw data be made available to the referees and concurrent with publication. Spencer On 1/7/2011 8:43 AM, Ravi Varadhan wrote: > I have just recently written about this issue (i.e. open learning and data > sharing) in a manuscript that is currently under review in a clinical > journal. I have argued that data hoarding is unethical. Participants in > research studies give their time, effort, saliva and blood in the altruistic > hope that their sacrifice will benefit humankind. If they were to realize > that the real (ulterior) motive of the study investigators is only to > advance their careers, they would really think hard about participating in > the studies. The study participants should only consent to participate if > they can get a signed assurance from the investigators that the > investigators will make their data available for scrutiny and for public use > (under some reasonable conditions that are fair to the study investigators). > As Vickers (Trials 2006) says, "whose data is it anyway?" I believe that we > can achieve great progress in clinical research if and only if we make a > concerted effort towards open learning. Stakeholders (i.e. patients, > clinicians, policy-makers) should demand that all the data that is > potentially relevant to addressing a critical clinical question should be > made available in an open learning environment. Unless, we can achieve this > we cannot solve the problems of publication bias and inefficient and > sub-optimal use of data. > > Best, > Ravi. > --- > Ravi Varadhan, Ph.D. > Assistant Professor, > Division of Geriatric Medicine and Gerontology School of Medicine Johns > Hopkins University > > Ph. (410) 502-2619 > email: rvarad...@jhmi.edu > > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Spencer Graves > Sent: Friday, January 07, 2011 8:26 AM > To: Mike Marchywka > Cc: r-help@r-project.org > Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, > scientific validity > > I wholeheartedly agree with the trend towards publishing datasets. > One way to do that is as datasets in an R package contributed to CRAN. > > > Beyond this, there seems to be an increasing trend towards journals > requiring authors of scientific research to publish their data as well. The > Public Library of Science (PLOS) has such a policy, but it is not enforced: > Savage and Vickers (2010) were able to get the raw data behind only one of > ten published articles they tried, and that one came only after reminding > the author that s/he had agreed to making the data available as a condition > of publishing in PLOS. (Four other authors refused to share their data in > spite of their legal and moral commitment to do so as a condition of > publishing in PLOS.) > > > There are other venues for publishing data. For example, much > astronomical data is now routinely web published so anyone interested can > test their pet algorithm on real data > (http://sites.google.com/site/vousergroup/presentations/publishing-astronomi > cal-data). > > > > Regarding my earlier comment, I just found a Wikipedia article on > "scientific misconduct" that mentioned the tendency to refuse to publish > research that proves your new drug is positively harmful. This is an > extreme version of both types of bias I previously mentioned: (1) only > significant resu
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
I applaud your efforts, Ravi. Regarding "Whose data is it?", I humbly suggest that referees and editorial boards push (demand?) for rules that require the raw data be made available to the referees and concurrent with publication. Spencer On 1/7/2011 8:43 AM, Ravi Varadhan wrote: I have just recently written about this issue (i.e. open learning and data sharing) in a manuscript that is currently under review in a clinical journal. I have argued that data hoarding is unethical. Participants in research studies give their time, effort, saliva and blood in the altruistic hope that their sacrifice will benefit humankind. If they were to realize that the real (ulterior) motive of the study investigators is only to advance their careers, they would really think hard about participating in the studies. The study participants should only consent to participate if they can get a signed assurance from the investigators that the investigators will make their data available for scrutiny and for public use (under some reasonable conditions that are fair to the study investigators). As Vickers (Trials 2006) says, "whose data is it anyway?" I believe that we can achieve great progress in clinical research if and only if we make a concerted effort towards open learning. Stakeholders (i.e. patients, clinicians, policy-makers) should demand that all the data that is potentially relevant to addressing a critical clinical question should be made available in an open learning environment. Unless, we can achieve this we cannot solve the problems of publication bias and inefficient and sub-optimal use of data. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Spencer Graves Sent: Friday, January 07, 2011 8:26 AM To: Mike Marchywka Cc: r-help@r-project.org Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, scientific validity I wholeheartedly agree with the trend towards publishing datasets. One way to do that is as datasets in an R package contributed to CRAN. Beyond this, there seems to be an increasing trend towards journals requiring authors of scientific research to publish their data as well. The Public Library of Science (PLOS) has such a policy, but it is not enforced: Savage and Vickers (2010) were able to get the raw data behind only one of ten published articles they tried, and that one came only after reminding the author that s/he had agreed to making the data available as a condition of publishing in PLOS. (Four other authors refused to share their data in spite of their legal and moral commitment to do so as a condition of publishing in PLOS.) There are other venues for publishing data. For example, much astronomical data is now routinely web published so anyone interested can test their pet algorithm on real data (http://sites.google.com/site/vousergroup/presentations/publishing-astronomi cal-data). Regarding my earlier comment, I just found a Wikipedia article on "scientific misconduct" that mentioned the tendency to refuse to publish research that proves your new drug is positively harmful. This is an extreme version of both types of bias I previously mentioned: (1) only significant results get published. (2) private funding provides its own biases. Spencer # Savage and Vickers (2010) "Empirical Study Of Data Sharing By Authors Publishing In PLoS Journals", Scientific Data Sharing, added Apr. 26, 2010 (http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b y-authors-publishing-in-plos-journals-2 <http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b y-authors-publishing-in-plos-journals-2/>). On 1/7/2011 4:08 AM, Mike Marchywka wrote: Date: Thu, 6 Jan 2011 23:06:44 -0800 From: peter.langfel...@gmail.com To: r-help@r-project.org Subject: Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity From a purely statistical and maybe somewhat naive point of view, published p-values should be corrected for the multiple testing that is effectively happening because of the large number of published studies. My experience is also that people will often try several statistical methods to get the most significant p-value but neglect to share that fact with the audience and/or at least attempt to correct the p-values for the selection bias. You see this everywhere in one form or another from medical to financial modelling. My solution here is simply to publish more raw data in a computer readable form, in this case of course something easy to get with R, so disinterested or adversa
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
Bert, consider the short rebuttal offered by George Musser in Scientific American: http://www.scientificamerican.com/blog/post.cfm?id=in-praise-of-scientific-error-2010-12-20 Perhaps a more realistic assessment of the (acknowledged) problem. Regards, Alan Kelly Trinity College Dublin On 7 Jan 2011, at 11:00, mailto:r-help-requ...@r-project.org>> mailto:r-help-requ...@r-project.org>> wrote: Message: 54 Date: Thu, 6 Jan 2011 10:56:34 -0800 From: Bert Gunter mailto:gunter.ber...@gene.com>> To: r-help@r-project.org<mailto:r-help@r-project.org> Subject: [R] Wyy off topic...Statistical methods, pub bias, scientific validity Message-ID: mailto:aanlktinvwp0bm864aedpr=hb-r=e_=b7zgftwdbxn...@mail.gmail.com>> Content-Type: text/plain; charset=ISO-8859-1 Folks: The following has NOTHING (obvious) to do with R. But I believe that all on this list would find it relevant and, I hope, informative. It is LONG. I apologize in advance to those who feel I have wasted their time. http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer Best regards to all, Bert -- Bert Gunter Genentech Nonclinical Biostatistics [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
I have just recently written about this issue (i.e. open learning and data sharing) in a manuscript that is currently under review in a clinical journal. I have argued that data hoarding is unethical. Participants in research studies give their time, effort, saliva and blood in the altruistic hope that their sacrifice will benefit humankind. If they were to realize that the real (ulterior) motive of the study investigators is only to advance their careers, they would really think hard about participating in the studies. The study participants should only consent to participate if they can get a signed assurance from the investigators that the investigators will make their data available for scrutiny and for public use (under some reasonable conditions that are fair to the study investigators). As Vickers (Trials 2006) says, "whose data is it anyway?" I believe that we can achieve great progress in clinical research if and only if we make a concerted effort towards open learning. Stakeholders (i.e. patients, clinicians, policy-makers) should demand that all the data that is potentially relevant to addressing a critical clinical question should be made available in an open learning environment. Unless, we can achieve this we cannot solve the problems of publication bias and inefficient and sub-optimal use of data. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Spencer Graves Sent: Friday, January 07, 2011 8:26 AM To: Mike Marchywka Cc: r-help@r-project.org Subject: Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity I wholeheartedly agree with the trend towards publishing datasets. One way to do that is as datasets in an R package contributed to CRAN. Beyond this, there seems to be an increasing trend towards journals requiring authors of scientific research to publish their data as well. The Public Library of Science (PLOS) has such a policy, but it is not enforced: Savage and Vickers (2010) were able to get the raw data behind only one of ten published articles they tried, and that one came only after reminding the author that s/he had agreed to making the data available as a condition of publishing in PLOS. (Four other authors refused to share their data in spite of their legal and moral commitment to do so as a condition of publishing in PLOS.) There are other venues for publishing data. For example, much astronomical data is now routinely web published so anyone interested can test their pet algorithm on real data (http://sites.google.com/site/vousergroup/presentations/publishing-astronomi cal-data). Regarding my earlier comment, I just found a Wikipedia article on "scientific misconduct" that mentioned the tendency to refuse to publish research that proves your new drug is positively harmful. This is an extreme version of both types of bias I previously mentioned: (1) only significant results get published. (2) private funding provides its own biases. Spencer # Savage and Vickers (2010) "Empirical Study Of Data Sharing By Authors Publishing In PLoS Journals", Scientific Data Sharing, added Apr. 26, 2010 (http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b y-authors-publishing-in-plos-journals-2 <http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b y-authors-publishing-in-plos-journals-2/>). On 1/7/2011 4:08 AM, Mike Marchywka wrote: > > > > > > >> Date: Thu, 6 Jan 2011 23:06:44 -0800 >> From: peter.langfel...@gmail.com >> To: r-help@r-project.org >> Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, >> scientific validity >> >> > From a purely statistical and maybe somewhat naive point of view, >> published p-values should be corrected for the multiple testing that >> is effectively happening because of the large number of published >> studies. My experience is also that people will often try several >> statistical methods to get the most significant p-value but neglect >> to share that fact with the audience and/or at least attempt to >> correct the p-values for the selection bias. > You see this everywhere in one form or another from medical to > financial modelling. My solution here is simply to publish more raw > data in a computer readable form, in this case of course something > easy to get with R, so disinterested or adversarial parties can run their own "analysis." > I think there was also a push to create a data base for failed drug > trials that
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
I wholeheartedly agree with the trend towards publishing datasets. One way to do that is as datasets in an R package contributed to CRAN. Beyond this, there seems to be an increasing trend towards journals requiring authors of scientific research to publish their data as well. The Public Library of Science (PLOS) has such a policy, but it is not enforced: Savage and Vickers (2010) were able to get the raw data behind only one of ten published articles they tried, and that one came only after reminding the author that s/he had agreed to making the data available as a condition of publishing in PLOS. (Four other authors refused to share their data in spite of their legal and moral commitment to do so as a condition of publishing in PLOS.) There are other venues for publishing data. For example, much astronomical data is now routinely web published so anyone interested can test their pet algorithm on real data (http://sites.google.com/site/vousergroup/presentations/publishing-astronomical-data). Regarding my earlier comment, I just found a Wikipedia article on "scientific misconduct" that mentioned the tendency to refuse to publish research that proves your new drug is positively harmful. This is an extreme version of both types of bias I previously mentioned: (1) only significant results get published. (2) private funding provides its own biases. Spencer # Savage and Vickers (2010) "Empirical Study Of Data Sharing By Authors Publishing In PLoS Journals", Scientific Data Sharing, added Apr. 26, 2010 (http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-by-authors-publishing-in-plos-journals-2 <http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-by-authors-publishing-in-plos-journals-2/>). On 1/7/2011 4:08 AM, Mike Marchywka wrote: > > > > > > >> Date: Thu, 6 Jan 2011 23:06:44 -0800 >> From: peter.langfel...@gmail.com >> To: r-help@r-project.org >> Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, >> scientific validity >> >> > From a purely statistical and maybe somewhat naive point of view, >> published p-values should be corrected for the multiple testing that >> is effectively happening because of the large number of published >> studies. My experience is also that people will often try several >> statistical methods to get the most significant p-value but neglect to >> share that fact with the audience and/or at least attempt to correct >> the p-values for the selection bias. > You see this everywhere in one form or another from medical to financial > modelling. My solution here is simply to publish more raw data in a computer > readable form, in this case of course something easy to get with R, > so disinterested or adversarial parties can run their own "analysis." > I think there was also a push to create a data base for failed drug > trials that may contain data of some value later. The value of R with > easily available data for a large cross section of users could be to moderate > problems like the one cited here. > > I almost > slammed a poster here earlier who wanted a simple rule for "when do I use > this test" with something like " when your mom tells you to" since post > hoc you do just about everything to assume you messed up and missed something > but a priori you hope you have designed a good hypothesis. And at the end of > the day, a given p-value is one piece of evidence in the overall objective > of learning about some system, not appeasing a sponsor. Personally I'm a big > fan of post hoc analysis on biotech data in some cases, especially as more > pathway or other theory > is published, but it is easy to become deluded if you have a conclusion that > you > know JUST HAS TO BE RIGHT. > > Also FWIW, in the few cases I've examined with FDA-sponsor rhetoric, the > data I've been able to get tends to make me side with the FDA and I still > hate the > idea of any regulation or access restrictions but it seems to be the only way > to keep sponsors honest to any extent. Your mileage > may vary however, take a look at some rather loud disagreement with FDA > over earlier DNDN panel results, possibly involving threats against critics. > LOL. > > > > > >> That being said, it would seem that biomedical sciences do make >> progress, so some of the published results are presumably correct :) >> >> Peter >> >> On Thu, Jan 6, 2011 at 9:13 PM, Spencer Graves >> wrote: >>> Part of the phenomenon can be explained by the natural censorship in >>> what is accepted for publication: Stronger results tend to
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
> Date: Thu, 6 Jan 2011 23:06:44 -0800 > From: peter.langfel...@gmail.com > To: r-help@r-project.org > Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, > scientific validity > > >From a purely statistical and maybe somewhat naive point of view, > published p-values should be corrected for the multiple testing that > is effectively happening because of the large number of published > studies. My experience is also that people will often try several > statistical methods to get the most significant p-value but neglect to > share that fact with the audience and/or at least attempt to correct > the p-values for the selection bias. You see this everywhere in one form or another from medical to financial modelling. My solution here is simply to publish more raw data in a computer readable form, in this case of course something easy to get with R, so disinterested or adversarial parties can run their own "analysis." I think there was also a push to create a data base for failed drug trials that may contain data of some value later. The value of R with easily available data for a large cross section of users could be to moderate problems like the one cited here. I almost slammed a poster here earlier who wanted a simple rule for "when do I use this test" with something like " when your mom tells you to" since post hoc you do just about everything to assume you messed up and missed something but a priori you hope you have designed a good hypothesis. And at the end of the day, a given p-value is one piece of evidence in the overall objective of learning about some system, not appeasing a sponsor. Personally I'm a big fan of post hoc analysis on biotech data in some cases, especially as more pathway or other theory is published, but it is easy to become deluded if you have a conclusion that you know JUST HAS TO BE RIGHT. Also FWIW, in the few cases I've examined with FDA-sponsor rhetoric, the data I've been able to get tends to make me side with the FDA and I still hate the idea of any regulation or access restrictions but it seems to be the only way to keep sponsors honest to any extent. Your mileage may vary however, take a look at some rather loud disagreement with FDA over earlier DNDN panel results, possibly involving threats against critics. LOL. > > That being said, it would seem that biomedical sciences do make > progress, so some of the published results are presumably correct :) > > Peter > > On Thu, Jan 6, 2011 at 9:13 PM, Spencer Graves > wrote: > > Part of the phenomenon can be explained by the natural censorship in > > what is accepted for publication: Stronger results tend to have less > > difficulty getting published. Therefore, given that a result is published, > > it is evident that the estimated magnitude of the effect is in average > > larger than it is in reality, just by the fact that weaker results are less > > likely to be published. A study of the literature on this subject might > > yield an interesting and valuable estimate of the magnitude of this > > selection bias. > > > > > > A more insidious problem, that may not affect the work of Jonah Lehrer, > > is political corruption in the way research is funded, with less public and > > more private funding of research > > (http://portal.unesco.org/education/en/ev.php-URL_ID=21052&URL_DO=DO_TOPIC&URL_SECTION=201.html). > > For example, I've heard claims (which I cannot substantiate right now) that > > cell phone companies allegedly lobbied successfully to block funding for > > researchers they thought were likely to document health problems with their > > products. Related claims have been made by scientists in the US Food and > > Drug Administration that certain therapies were approved on political > > grounds in spite of substantive questions about the validity of the research > > backing the request for approval (e.g., > > www.naturalnews.com/025298_the_FDA_scientists.html). Some of these > > accusations of political corruption may be groundless. However, as private > > funding replaces tax money for basic science, we must expect an increase in > > research results that match the needs of the funding agency while degrading > > the quality of published research. This produces more research that can not > > be replicated -- effects that get smaller upon replication. (My wife and I > > routinely avoid certain therapies recommended by physicians, because the > > physicians get much of their information on recent drugs from the > > pharmaceuticals, who have a vested interest in presenting their products in > > the most positive light.) > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
>From a purely statistical and maybe somewhat naive point of view, published p-values should be corrected for the multiple testing that is effectively happening because of the large number of published studies. My experience is also that people will often try several statistical methods to get the most significant p-value but neglect to share that fact with the audience and/or at least attempt to correct the p-values for the selection bias. That being said, it would seem that biomedical sciences do make progress, so some of the published results are presumably correct :) Peter On Thu, Jan 6, 2011 at 9:13 PM, Spencer Graves wrote: > Part of the phenomenon can be explained by the natural censorship in > what is accepted for publication: Stronger results tend to have less > difficulty getting published. Therefore, given that a result is published, > it is evident that the estimated magnitude of the effect is in average > larger than it is in reality, just by the fact that weaker results are less > likely to be published. A study of the literature on this subject might > yield an interesting and valuable estimate of the magnitude of this > selection bias. > > > A more insidious problem, that may not affect the work of Jonah Lehrer, > is political corruption in the way research is funded, with less public and > more private funding of research > (http://portal.unesco.org/education/en/ev.php-URL_ID=21052&URL_DO=DO_TOPIC&URL_SECTION=201.html). > For example, I've heard claims (which I cannot substantiate right now) that > cell phone companies allegedly lobbied successfully to block funding for > researchers they thought were likely to document health problems with their > products. Related claims have been made by scientists in the US Food and > Drug Administration that certain therapies were approved on political > grounds in spite of substantive questions about the validity of the research > backing the request for approval (e.g., > www.naturalnews.com/025298_the_FDA_scientists.html). Some of these > accusations of political corruption may be groundless. However, as private > funding replaces tax money for basic science, we must expect an increase in > research results that match the needs of the funding agency while degrading > the quality of published research. This produces more research that can not > be replicated -- effects that get smaller upon replication. (My wife and I > routinely avoid certain therapies recommended by physicians, because the > physicians get much of their information on recent drugs from the > pharmaceuticals, who have a vested interest in presenting their products in > the most positive light.) > > > Spencer > > > On 1/6/2011 2:39 PM, Carl Witthoft wrote: >> >> The next week's New Yorker has some decent rebuttal letters. The case is >> hardly as clear-cut as the author would like to believe. >> >> Carl __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
Part of the phenomenon can be explained by the natural censorship in what is accepted for publication: Stronger results tend to have less difficulty getting published. Therefore, given that a result is published, it is evident that the estimated magnitude of the effect is in average larger than it is in reality, just by the fact that weaker results are less likely to be published. A study of the literature on this subject might yield an interesting and valuable estimate of the magnitude of this selection bias. A more insidious problem, that may not affect the work of Jonah Lehrer, is political corruption in the way research is funded, with less public and more private funding of research (http://portal.unesco.org/education/en/ev.php-URL_ID=21052&URL_DO=DO_TOPIC&URL_SECTION=201.html). For example, I've heard claims (which I cannot substantiate right now) that cell phone companies allegedly lobbied successfully to block funding for researchers they thought were likely to document health problems with their products. Related claims have been made by scientists in the US Food and Drug Administration that certain therapies were approved on political grounds in spite of substantive questions about the validity of the research backing the request for approval (e.g., www.naturalnews.com/025298_the_FDA_scientists.html). Some of these accusations of political corruption may be groundless. However, as private funding replaces tax money for basic science, we must expect an increase in research results that match the needs of the funding agency while degrading the quality of published research. This produces more research that can not be replicated -- effects that get smaller upon replication. (My wife and I routinely avoid certain therapies recommended by physicians, because the physicians get much of their information on recent drugs from the pharmaceuticals, who have a vested interest in presenting their products in the most positive light.) Spencer On 1/6/2011 2:39 PM, Carl Witthoft wrote: The next week's New Yorker has some decent rebuttal letters. The case is hardly as clear-cut as the author would like to believe. Carl __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
I was very impressed with Lehrer's article. I look forward to seeing what the rebuttals come up with. The picture that Lehrer paints of the quality of scientific publications is very dark, and it seems to me, quite plausible. Note that Lehrer is the author of "Proust Was a Neuroscientist" which is one of the best non-fiction books I've ever come across. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Wyy-off-topic-Statistical-methods-pub-bias-scientific-validity-tp3177982p3178603.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
The next week's New Yorker has some decent rebuttal letters. The case is hardly as clear-cut as the author would like to believe. Carl __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
Folks: The following has NOTHING (obvious) to do with R. But I believe that all on this list would find it relevant and, I hope, informative. It is LONG. I apologize in advance to those who feel I have wasted their time. http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer Best regards to all, Bert -- Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.