Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Ravi Varadhan
"Conclusion: try to be/become a good scientist: with a high prevalence of
good ideas."

Or, I would say:  "try to publish only good and mature ideas".  Gauss said
it best "pauca sed matura"  or "few, but ripe."


Ravi.
---
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Claudia Beleites
Sent: Friday, January 07, 2011 1:46 PM
To: r-help@r-project.org
Subject: Re: [R] Waaaayy off topic...Statistical methods, pub bias,
scientific validity

On 01/07/2011 06:13 AM, Spencer Graves wrote:
>   A more insidious problem, that may not affect the work of Jonah 
> Lehrer, is political corruption in the way research is funded, with 
> less public and more private funding of research 
Maybe I'm too pessimistic, but the term _political_ corruption reminds 
me that I can just as easily imagine a "funding bias"* in public 
funding. And I'm not sure it is (or would be) less of a problem just 
because the interests of private funding are easier to spot.

* I think of bias on both sides: the funding agency selecting the 
studies to support and the researcher subconsciously complying to the 
expectations of the funding agency.

On 01/07/2011 08:06 AM, Peter Langfelder wrote:
> > From a purely statistical and maybe somewhat naive point of view,
> published p-values should be corrected for the multiple testing that
> is effectively happening because of the large number of published
> studies. My experience is also that people will often try several
> statistical methods to get the most significant p-value but neglect to
> share that fact with the audience and/or at least attempt to correct
> the p-values for the selection bias.
Even if the number of all the tests were known, I have the impression 
that the corrected p-value would be kind of the right answer to the 
wrong question. I'm not particularly interested in the probability of 
arriving at  the presented findings if the null hypothesis were true. 
I'd rather know the probability that the conclusions are true. Switching 
to the language of clinical chemistry, this is: I'm presented with the 
sensitivity of a test, but I really want to know the positive predictive 
value. What is still missing with the corrected p-values is the 
"prevalence of good ideas" of the publishing scientist (not even known 
for all scientists).  And I'm not sure this is not decreasing if the 
scientist generates and tests more and more ideas.
I found my rather hazy thoughts about this much better expressed in the 
books of Beck-Bornholdt and Dubben (which I'm afraid are only available 
in German).

Conclusion: try to be/become a good scientist: with a high prevalence of 
good ideas. At least with a high prevalence of good ideas among the 
tested hypotheses. Including thinking first which hypotheses are the 
ones to test, and not giving in to the temptation to try out more and 
more things as one gets more familiar with the experiment/data set/problem.
The latter I find very difficult. Including the experience of giving a 
presentation where I explicitly talked about why I did not do any 
data-driven optimization of my models. Yet in the discussion I was very 
prominently told I need to try in addition these other pre-processing 
techniques and these other modeling techniques - even by people whom I 
know to be very much aware and concerned about optimistically biased 
validation results. Which were of course very valid questions (and easy 
to comply), but I conclude it is common/natural/human to have and want 
to try out more ideas.
Also, after several years in the field and with the same kind of samples 
of course I run the risk of my ideas being overfit to our kind of 
samples - this is a cost that I have to pay for the gain due to 
experience/expertise.

Some more thoughts:
- reproducibility: I'm analytical chemist. We have huge amounts of work 
going into round robin trials in order to measure the "natural" 
variability of different labs on very defined systems.
- we also have huge amounts of work going into calibration transfer, 
i.e. making quantitative predictive models work on a different 
instrument. This is always a whole lot of work, and for some fields of 
problems at the moment considered basically impossible even between two 
instruments of the same model and manufacturer.
The quoted results on the mice are not very astonishing to me... ;-)

- Talking about (not so) astonishing differences between between 
replications of experiments:
I find myself moving from reporting ± 1 standard deviation to reporting 
e.g. the 5th to 95th percentiles. Not onl

Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Joel Schwartz
The issue Spencer brings up is a problem whether the funding is private or
public. Just as businesses fund studies that support their goals, government
agencies fund studies that justify the need for their services and expansion
of their powers and budgets. In fact, there's a whole field of study
variously called "public choice economics" and "the new institutional
economics" that study these and related issues. 

On a related note, there is certainly a lot of self-selection bias in what
fields of study people choose to enter. For just one example, it isn't too
difficult to believe that of the pool of people talented and interested in
statistics, those who choose to enter public health or epidemiology might be
more likely to want research that justifies expansion of public health and
environmental agencies' regulatory powers and this might affect the research
questions they ask, the ways they design and select their statistical
models, and what results they choose to include and exclude from
publications. AFAIK, there is substantial evidence that researchers,
espeically in non-experimental studies, tend to get results they "expect" or
"hope" to find, even if they feel no conscious bias. This is likely one of
the reasons observational studies are so frequently overturned by randomized
controlled trials. RCT's provide less room for confirmation bias to rear its
ugly head. 

Joel 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Spencer Graves
Sent: Thursday, January 06, 2011 9:13 PM
To: Carl Witthoft
Cc: r-help@r-project.org
Subject: Re: [R] Waaaayy off topic...Statistical methods, pub bias,
scientific validity


>   A more insidious problem, that may not affect the work of Jonah
Lehrer, >is political corruption in the way research is funded, with less
public and more private funding of research
(http://portal.unesco.org/education/en/ev.php-URL_ID=21052&URL_DO=DO_TOPIC&U
RL_SECTION=201.html).  
...as private funding replaces tax money for basic science, we must expect
an increase in research results that match the needs of the funding agency
while degrading the quality of published research.  This produces more
research that can not be replicated -- effects that get smaller upon
replication.  (My wife and I routinely avoid certain therapies recommended
by physicians, because the physicians get much of their information on
recent drugs from the pharmaceuticals, who have a vested interest in
presenting their products in the most positive light.)


   Spencer


On 1/6/2011 2:39 PM, Carl Witthoft wrote:
> The next week's New Yorker has some decent rebuttal letters.  The case 
> is hardly as clear-cut as the author would like to believe.
>
> Carl
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Claudia Beleites

On 01/07/2011 06:13 AM, Spencer Graves wrote:
  A more insidious problem, that may not affect the work of Jonah 
Lehrer, is political corruption in the way research is funded, with 
less public and more private funding of research 
Maybe I'm too pessimistic, but the term _political_ corruption reminds 
me that I can just as easily imagine a "funding bias"* in public 
funding. And I'm not sure it is (or would be) less of a problem just 
because the interests of private funding are easier to spot.


* I think of bias on both sides: the funding agency selecting the 
studies to support and the researcher subconsciously complying to the 
expectations of the funding agency.


On 01/07/2011 08:06 AM, Peter Langfelder wrote:

> From a purely statistical and maybe somewhat naive point of view,
published p-values should be corrected for the multiple testing that
is effectively happening because of the large number of published
studies. My experience is also that people will often try several
statistical methods to get the most significant p-value but neglect to
share that fact with the audience and/or at least attempt to correct
the p-values for the selection bias.
Even if the number of all the tests were known, I have the impression 
that the corrected p-value would be kind of the right answer to the 
wrong question. I'm not particularly interested in the probability of 
arriving at  the presented findings if the null hypothesis were true. 
I'd rather know the probability that the conclusions are true. Switching 
to the language of clinical chemistry, this is: I'm presented with the 
sensitivity of a test, but I really want to know the positive predictive 
value. What is still missing with the corrected p-values is the 
"prevalence of good ideas" of the publishing scientist (not even known 
for all scientists).  And I'm not sure this is not decreasing if the 
scientist generates and tests more and more ideas.
I found my rather hazy thoughts about this much better expressed in the 
books of Beck-Bornholdt and Dubben (which I'm afraid are only available 
in German).


Conclusion: try to be/become a good scientist: with a high prevalence of 
good ideas. At least with a high prevalence of good ideas among the 
tested hypotheses. Including thinking first which hypotheses are the 
ones to test, and not giving in to the temptation to try out more and 
more things as one gets more familiar with the experiment/data set/problem.
The latter I find very difficult. Including the experience of giving a 
presentation where I explicitly talked about why I did not do any 
data-driven optimization of my models. Yet in the discussion I was very 
prominently told I need to try in addition these other pre-processing 
techniques and these other modeling techniques - even by people whom I 
know to be very much aware and concerned about optimistically biased 
validation results. Which were of course very valid questions (and easy 
to comply), but I conclude it is common/natural/human to have and want 
to try out more ideas.
Also, after several years in the field and with the same kind of samples 
of course I run the risk of my ideas being overfit to our kind of 
samples - this is a cost that I have to pay for the gain due to 
experience/expertise.


Some more thoughts:
- reproducibility: I'm analytical chemist. We have huge amounts of work 
going into round robin trials in order to measure the "natural" 
variability of different labs on very defined systems.
- we also have huge amounts of work going into calibration transfer, 
i.e. making quantitative predictive models work on a different 
instrument. This is always a whole lot of work, and for some fields of 
problems at the moment considered basically impossible even between two 
instruments of the same model and manufacturer.

The quoted results on the mice are not very astonishing to me... ;-)

- Talking about (not so) astonishing differences between between 
replications of experiments:
I find myself moving from reporting ± 1 standard deviation to reporting 
e.g. the 5th to 95th percentiles. Not only because my data distributions 
are often not symmetric, but also because I find Im not able to directly 
perceive the real spread of the data from a standard deviation error 
bar. This is all about perception, of course I can reflect about the 
meaning. Such a reflection also tells me that one student having a 
really unlikely number of right guesses is unlikely but not impossible. 
There is no statistical law stating that unlikely events happen only 
with large sample sizes/number of tests. Yet the immediate perception is 
completely different.


- I happily agree with the ideas of publishing findings (conclusions) as 
well as the data and data analysis code I used to arrive there. But I'm 
aware that part of this agreement is due to the fact that I'm quite 
interested in the data analytical methods (I'd say as well as in the 
particular chemical-analytical problem at hand, but rat

Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread John Kane


--- On Fri, 1/7/11, Peter Langfelder  wrote:

> From: Peter Langfelder 
> Subject: Re: [R] Waaaayy off topic...Statistical methods, pub bias, 
> scientific validity
> To: "r-help@r-project.org" 
> Received: Friday, January 7, 2011, 2:06 AM
> >From a purely statistical and
> maybe somewhat naive point of view,
> published p-values should be corrected for the multiple
> testing that
> is effectively happening because of the large number of
> published
> studies. My experience is also that people will often try
> several
> statistical methods to get the most significant p-value but
> neglect to
> share that fact with the audience and/or at least attempt
> to correct
> the p-values for the selection bias.
> 
> That being said, it would seem that biomedical sciences do
> make
> progress, so some of the published results are presumably
> correct :)
> 


Totally a placebo effect :)

> Peter
> 
> On Thu, Jan 6, 2011 at 9:13 PM, Spencer Graves
> 
> wrote:
> >      Part of the phenomenon can be explained by the
> natural censorship in
> > what is accepted for publication:  Stronger results
> tend to have less
> > difficulty getting published.  Therefore, given that
> a result is published,
> > it is evident that the estimated magnitude of the
> effect is in average
> > larger than it is in reality, just by the fact that
> weaker results are less
> > likely to be published.  A study of the literature on
> this subject might
> > yield an interesting and valuable estimate of the
> magnitude of this
> > selection bias.
> >
> >
> >      A more insidious problem, that may not affect
> the work of Jonah Lehrer,
> > is political corruption in the way research is funded,
> with less public and
> > more private funding of research
> > (http://portal.unesco.org/education/en/ev.php-URL_ID=21052&URL_DO=DO_TOPIC&URL_SECTION=201.html).
> >  For example, I've heard claims (which I cannot
> substantiate right now) that
> > cell phone companies allegedly lobbied successfully to
> block funding for
> > researchers they thought were likely to document
> health problems with their
> > products.  Related claims have been made by
> scientists in the US Food and
> > Drug Administration that certain therapies were
> approved on political
> > grounds in spite of substantive questions about the
> validity of the research
> > backing the request for approval (e.g.,
> > www.naturalnews.com/025298_the_FDA_scientists.html).
>  Some of these
> > accusations of political corruption may be groundless.
>  However, as private
> > funding replaces tax money for basic science, we must
> expect an increase in
> > research results that match the needs of the funding
> agency while degrading
> > the quality of published research.  This produces
> more research that can not
> > be replicated -- effects that get smaller upon
> replication.  (My wife and I
> > routinely avoid certain therapies recommended by
> physicians, because the
> > physicians get much of their information on recent
> drugs from the
> > pharmaceuticals, who have a vested interest in
> presenting their products in
> > the most positive light.)
> >
> >
> >      Spencer
> >
> >
> > On 1/6/2011 2:39 PM, Carl Witthoft wrote:
> >>
> >> The next week's New Yorker has some decent
> rebuttal letters.  The case is
> >> hardly as clear-cut as the author would like to
> believe.
> >>
> >> Carl
> 
> __
> R-help@r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
> 



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Ravi Varadhan
I think that the strategy of Editors simply telling the authors to share or
perish is a bit naïve.  There are a number of practical challenges that need
to be addressed in order to create a fair and effective open-learning
environment.  Eysenbach (BMJ 2001) and Vickers (2006) discuss these and some
partial solutions.  We need more creative thinking that uses both carrot and
sticks. We also need more empirical experience with this.  Perhaps, we can
learn from fields, if there are any, that do a good job of data sharing and
open learning.

Best,
Ravi.

---
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


-Original Message-
From: Spencer Graves [mailto:spencer.gra...@structuremonitoring.com] 
Sent: Friday, January 07, 2011 1:01 PM
To: Ravi Varadhan
Cc: 'Mike Marchywka'; r-help@r-project.org
Subject: Re: [R] Waaaayy off topic...Statistical methods, pub bias,
scientific validity

   I applaud your efforts, Ravi.  Regarding "Whose data is it?", I 
humbly suggest that referees and editorial boards push (demand?) for 
rules that require the raw data be made available to the referees and 
concurrent with publication.


   Spencer


On 1/7/2011 8:43 AM, Ravi Varadhan wrote:
> I have just recently written about this issue (i.e. open learning and data
> sharing) in a manuscript that is currently under review in a clinical
> journal.  I have argued that data hoarding is unethical.  Participants in
> research studies give their time, effort, saliva and blood in the
altruistic
> hope that their sacrifice will benefit humankind.  If they were to realize
> that the real (ulterior) motive of the study investigators is only to
> advance their careers, they would really think hard about participating in
> the studies.  The study participants should only consent to participate if
> they can get a signed assurance from the investigators that the
> investigators will make their data available for scrutiny and for public
use
> (under some reasonable conditions that are fair to the study
investigators).
> As Vickers (Trials 2006) says, "whose data is it anyway?"  I believe that
we
> can achieve great progress in clinical research if and only if we make a
> concerted effort towards open learning. Stakeholders (i.e. patients,
> clinicians, policy-makers) should demand that all the data that is
> potentially relevant to addressing a critical clinical question should be
> made available in an open learning environment.  Unless, we can achieve
this
> we cannot solve the problems of publication bias and inefficient and
> sub-optimal use of data.
>
> Best,
> Ravi.
> ---
> Ravi Varadhan, Ph.D.
> Assistant Professor,
> Division of Geriatric Medicine and Gerontology School of Medicine Johns
> Hopkins University
>
> Ph. (410) 502-2619
> email: rvarad...@jhmi.edu
>
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On
> Behalf Of Spencer Graves
> Sent: Friday, January 07, 2011 8:26 AM
> To: Mike Marchywka
> Cc: r-help@r-project.org
> Subject: Re: [R] Wyy off topic...Statistical methods, pub bias,
> scientific validity
>
> I wholeheartedly agree with the trend towards publishing datasets.
> One way to do that is as datasets in an R package contributed to CRAN.
>
>
> Beyond this, there seems to be an increasing trend towards
journals
> requiring authors of scientific research to publish their data as well.
The
> Public Library of Science (PLOS) has such a policy, but it is not
enforced:
> Savage and Vickers (2010) were able to get the raw data behind only one of
> ten published articles they tried, and that one came only after reminding
> the author that s/he had agreed to making the data available as a
condition
> of publishing in PLOS.  (Four other authors refused to share their data in
> spite of their legal and moral commitment to do so as a condition of
> publishing in PLOS.)
>
>
> There are other venues for publishing data.  For example, much
> astronomical data is now routinely web published so anyone interested can
> test their pet algorithm on real data
>
(http://sites.google.com/site/vousergroup/presentations/publishing-astronomi
> cal-data).
>
>
>
> Regarding my earlier comment, I just found a Wikipedia article on
> "scientific misconduct" that mentioned the tendency to refuse to publish
> research that proves your new drug is positively harmful.  This is an
> extreme version of both types of bias I previously mentioned:  (1) only
> significant resu

Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Spencer Graves
  I applaud your efforts, Ravi.  Regarding "Whose data is it?", I 
humbly suggest that referees and editorial boards push (demand?) for 
rules that require the raw data be made available to the referees and 
concurrent with publication.



  Spencer


On 1/7/2011 8:43 AM, Ravi Varadhan wrote:

I have just recently written about this issue (i.e. open learning and data
sharing) in a manuscript that is currently under review in a clinical
journal.  I have argued that data hoarding is unethical.  Participants in
research studies give their time, effort, saliva and blood in the altruistic
hope that their sacrifice will benefit humankind.  If they were to realize
that the real (ulterior) motive of the study investigators is only to
advance their careers, they would really think hard about participating in
the studies.  The study participants should only consent to participate if
they can get a signed assurance from the investigators that the
investigators will make their data available for scrutiny and for public use
(under some reasonable conditions that are fair to the study investigators).
As Vickers (Trials 2006) says, "whose data is it anyway?"  I believe that we
can achieve great progress in clinical research if and only if we make a
concerted effort towards open learning. Stakeholders (i.e. patients,
clinicians, policy-makers) should demand that all the data that is
potentially relevant to addressing a critical clinical question should be
made available in an open learning environment.  Unless, we can achieve this
we cannot solve the problems of publication bias and inefficient and
sub-optimal use of data.

Best,
Ravi.
---
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Spencer Graves
Sent: Friday, January 07, 2011 8:26 AM
To: Mike Marchywka
Cc: r-help@r-project.org
Subject: Re: [R] Wyy off topic...Statistical methods, pub bias,
scientific validity

I wholeheartedly agree with the trend towards publishing datasets.
One way to do that is as datasets in an R package contributed to CRAN.


Beyond this, there seems to be an increasing trend towards journals
requiring authors of scientific research to publish their data as well.  The
Public Library of Science (PLOS) has such a policy, but it is not enforced:
Savage and Vickers (2010) were able to get the raw data behind only one of
ten published articles they tried, and that one came only after reminding
the author that s/he had agreed to making the data available as a condition
of publishing in PLOS.  (Four other authors refused to share their data in
spite of their legal and moral commitment to do so as a condition of
publishing in PLOS.)


There are other venues for publishing data.  For example, much
astronomical data is now routinely web published so anyone interested can
test their pet algorithm on real data
(http://sites.google.com/site/vousergroup/presentations/publishing-astronomi
cal-data).



Regarding my earlier comment, I just found a Wikipedia article on
"scientific misconduct" that mentioned the tendency to refuse to publish
research that proves your new drug is positively harmful.  This is an
extreme version of both types of bias I previously mentioned:  (1) only
significant results get published.  (2) private funding provides its own
biases.


Spencer


#
Savage and Vickers (2010) "Empirical Study Of Data Sharing By Authors
Publishing In PLoS Journals", Scientific Data Sharing, added Apr. 26, 2010
(http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b
y-authors-publishing-in-plos-journals-2
<http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b
y-authors-publishing-in-plos-journals-2/>).




On 1/7/2011 4:08 AM, Mike Marchywka wrote:







Date: Thu, 6 Jan 2011 23:06:44 -0800
From: peter.langfel...@gmail.com
To: r-help@r-project.org
Subject: Re: [R] Waaaayy off topic...Statistical methods, pub bias,
scientific validity


 From a purely statistical and maybe somewhat naive point of view,

published p-values should be corrected for the multiple testing that
is effectively happening because of the large number of published
studies. My experience is also that people will often try several
statistical methods to get the most significant p-value but neglect
to share that fact with the audience and/or at least attempt to
correct the p-values for the selection bias.

You see this everywhere in one form or another from medical to
financial modelling. My solution here is simply to publish more raw
data in a computer readable form, in this case of course something
easy to get with R, so disinterested or adversa

Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Alan Kelly
Bert, consider the short rebuttal offered by George Musser in Scientific 
American:

http://www.scientificamerican.com/blog/post.cfm?id=in-praise-of-scientific-error-2010-12-20

Perhaps a more realistic assessment of the (acknowledged) problem.

Regards,
Alan Kelly
Trinity College Dublin

On 7 Jan 2011, at 11:00, 
mailto:r-help-requ...@r-project.org>> 
mailto:r-help-requ...@r-project.org>> wrote:

Message: 54
Date: Thu, 6 Jan 2011 10:56:34 -0800
From: Bert Gunter mailto:gunter.ber...@gene.com>>
To: r-help@r-project.org<mailto:r-help@r-project.org>
Subject: [R] Wyy off topic...Statistical methods, pub bias,
   scientific validity
Message-ID:
   
mailto:aanlktinvwp0bm864aedpr=hb-r=e_=b7zgftwdbxn...@mail.gmail.com>>
Content-Type: text/plain; charset=ISO-8859-1

Folks:

The following has NOTHING (obvious) to do with R. But I believe that
all on this list would find it relevant and, I hope, informative. It
is LONG. I apologize in advance to those who feel I have wasted their
time.

http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer

Best regards to all,

Bert

--
Bert Gunter
Genentech Nonclinical Biostatistics




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Ravi Varadhan
I have just recently written about this issue (i.e. open learning and data
sharing) in a manuscript that is currently under review in a clinical
journal.  I have argued that data hoarding is unethical.  Participants in
research studies give their time, effort, saliva and blood in the altruistic
hope that their sacrifice will benefit humankind.  If they were to realize
that the real (ulterior) motive of the study investigators is only to
advance their careers, they would really think hard about participating in
the studies.  The study participants should only consent to participate if
they can get a signed assurance from the investigators that the
investigators will make their data available for scrutiny and for public use
(under some reasonable conditions that are fair to the study investigators).
As Vickers (Trials 2006) says, "whose data is it anyway?"  I believe that we
can achieve great progress in clinical research if and only if we make a
concerted effort towards open learning. Stakeholders (i.e. patients,
clinicians, policy-makers) should demand that all the data that is
potentially relevant to addressing a critical clinical question should be
made available in an open learning environment.  Unless, we can achieve this
we cannot solve the problems of publication bias and inefficient and
sub-optimal use of data.

Best,
Ravi.
---
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Spencer Graves
Sent: Friday, January 07, 2011 8:26 AM
To: Mike Marchywka
Cc: r-help@r-project.org
Subject: Re: [R] Waaaayy off topic...Statistical methods, pub bias,
scientific validity

   I wholeheartedly agree with the trend towards publishing datasets.
One way to do that is as datasets in an R package contributed to CRAN.


   Beyond this, there seems to be an increasing trend towards journals
requiring authors of scientific research to publish their data as well.  The
Public Library of Science (PLOS) has such a policy, but it is not enforced:
Savage and Vickers (2010) were able to get the raw data behind only one of
ten published articles they tried, and that one came only after reminding
the author that s/he had agreed to making the data available as a condition
of publishing in PLOS.  (Four other authors refused to share their data in
spite of their legal and moral commitment to do so as a condition of
publishing in PLOS.)


   There are other venues for publishing data.  For example, much
astronomical data is now routinely web published so anyone interested can
test their pet algorithm on real data
(http://sites.google.com/site/vousergroup/presentations/publishing-astronomi
cal-data). 



   Regarding my earlier comment, I just found a Wikipedia article on
"scientific misconduct" that mentioned the tendency to refuse to publish
research that proves your new drug is positively harmful.  This is an
extreme version of both types of bias I previously mentioned:  (1) only
significant results get published.  (2) private funding provides its own
biases.


   Spencer


#
Savage and Vickers (2010) "Empirical Study Of Data Sharing By Authors
Publishing In PLoS Journals", Scientific Data Sharing, added Apr. 26, 2010
(http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b
y-authors-publishing-in-plos-journals-2
<http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b
y-authors-publishing-in-plos-journals-2/>). 




On 1/7/2011 4:08 AM, Mike Marchywka wrote:
>
>
>
>
>
>
>> Date: Thu, 6 Jan 2011 23:06:44 -0800
>> From: peter.langfel...@gmail.com
>> To: r-help@r-project.org
>> Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, 
>> scientific validity
>>
>> > From a purely statistical and maybe somewhat naive point of view,
>> published p-values should be corrected for the multiple testing that 
>> is effectively happening because of the large number of published 
>> studies. My experience is also that people will often try several 
>> statistical methods to get the most significant p-value but neglect 
>> to share that fact with the audience and/or at least attempt to 
>> correct the p-values for the selection bias.
> You see this everywhere in one form or another from medical to 
> financial modelling. My solution here is simply to publish more raw 
> data in a computer readable form, in this case of course something 
> easy to get with R, so disinterested or adversarial parties can run their
own "analysis."
> I think there was also a push to create a data base for failed drug 
> trials that 

Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Spencer Graves
   I wholeheartedly agree with the trend towards publishing 
datasets.  One way to do that is as datasets in an R package contributed 
to CRAN.


   Beyond this, there seems to be an increasing trend towards 
journals requiring authors of scientific research to publish their data 
as well.  The Public Library of Science (PLOS) has such a policy, but it 
is not enforced:  Savage and Vickers (2010) were able to get the raw 
data behind only one of ten published articles they tried, and that one 
came only after reminding the author that s/he had agreed to making the 
data available as a condition of publishing in PLOS.  (Four other 
authors refused to share their data in spite of their legal and moral 
commitment to do so as a condition of publishing in PLOS.)


   There are other venues for publishing data.  For example, much 
astronomical data is now routinely web published so anyone interested 
can test their pet algorithm on real data 
(http://sites.google.com/site/vousergroup/presentations/publishing-astronomical-data).
 



   Regarding my earlier comment, I just found a Wikipedia article on 
"scientific misconduct" that mentioned the tendency to refuse to publish 
research that proves your new drug is positively harmful.  This is an 
extreme version of both types of bias I previously mentioned:  (1) only 
significant results get published.  (2) private funding provides its own 
biases.


   Spencer


#
Savage and Vickers (2010) "Empirical Study Of Data Sharing By Authors 
Publishing In PLoS Journals", Scientific Data Sharing, added Apr. 26, 
2010 
(http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-by-authors-publishing-in-plos-journals-2
 
<http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-by-authors-publishing-in-plos-journals-2/>).
 




On 1/7/2011 4:08 AM, Mike Marchywka wrote:
>
>
>
>
>
>
>> Date: Thu, 6 Jan 2011 23:06:44 -0800
>> From: peter.langfel...@gmail.com
>> To: r-help@r-project.org
>> Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, 
>> scientific validity
>>
>> > From a purely statistical and maybe somewhat naive point of view,
>> published p-values should be corrected for the multiple testing that
>> is effectively happening because of the large number of published
>> studies. My experience is also that people will often try several
>> statistical methods to get the most significant p-value but neglect to
>> share that fact with the audience and/or at least attempt to correct
>> the p-values for the selection bias.
> You see this everywhere in one form or another from medical to financial
> modelling. My solution here is simply to publish more raw data in a computer
> readable form, in this case of course something easy to get with R,
> so disinterested or adversarial parties can run their own "analysis."
> I think there was also a push to create a data base for failed drug
> trials that may contain data of some value later. The value of R with
> easily available data for a large cross section of users could be to moderate
> problems like the one cited here.
>
> I almost
> slammed a poster here earlier who wanted a simple rule for "when do I use
> this test" with something like " when your mom tells you to" since post
> hoc you do just about everything to assume you messed up and missed something
> but a priori you hope you have designed a good hypothesis. And at the end of
> the day, a given p-value is one piece of evidence in the overall objective
> of learning about some system, not appeasing a sponsor. Personally I'm a big
> fan of post hoc analysis on biotech data in some cases, especially as more 
> pathway or other theory
> is published, but it is easy to become deluded if you have a conclusion that 
> you
> know JUST HAS TO BE RIGHT.
>
> Also FWIW, in the few cases I've examined with FDA-sponsor rhetoric, the
> data I've been able to get tends to make me side with the FDA and I still 
> hate the
> idea of any regulation or access restrictions but it seems to be the only way
> to keep sponsors honest to any extent. Your mileage
> may vary however, take a look at some rather loud disagreement with FDA
> over earlier DNDN panel results, possibly involving threats against critics. 
> LOL.
>
>
>
>
>
>> That being said, it would seem that biomedical sciences do make
>> progress, so some of the published results are presumably correct :)
>>
>> Peter
>>
>> On Thu, Jan 6, 2011 at 9:13 PM, Spencer Graves
>>   wrote:
>>>   Part of the phenomenon can be explained by the natural censorship in
>>> what is accepted for publication:  Stronger results tend to

Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Mike Marchywka







> Date: Thu, 6 Jan 2011 23:06:44 -0800
> From: peter.langfel...@gmail.com
> To: r-help@r-project.org
> Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, 
> scientific validity
>
> >From a purely statistical and maybe somewhat naive point of view,
> published p-values should be corrected for the multiple testing that
> is effectively happening because of the large number of published
> studies. My experience is also that people will often try several
> statistical methods to get the most significant p-value but neglect to
> share that fact with the audience and/or at least attempt to correct
> the p-values for the selection bias.

You see this everywhere in one form or another from medical to financial
modelling. My solution here is simply to publish more raw data in a computer
readable form, in this case of course something easy to get with R,
so disinterested or adversarial parties can run their own "analysis."
I think there was also a push to create a data base for failed drug
trials that may contain data of some value later. The value of R with
easily available data for a large cross section of users could be to moderate 
problems like the one cited here. 

I almost
slammed a poster here earlier who wanted a simple rule for "when do I use
this test" with something like " when your mom tells you to" since post
hoc you do just about everything to assume you messed up and missed something
but a priori you hope you have designed a good hypothesis. And at the end of
the day, a given p-value is one piece of evidence in the overall objective
of learning about some system, not appeasing a sponsor. Personally I'm a big
fan of post hoc analysis on biotech data in some cases, especially as more 
pathway or other theory
is published, but it is easy to become deluded if you have a conclusion that you
know JUST HAS TO BE RIGHT. 

Also FWIW, in the few cases I've examined with FDA-sponsor rhetoric, the
data I've been able to get tends to make me side with the FDA and I still hate 
the
idea of any regulation or access restrictions but it seems to be the only way
to keep sponsors honest to any extent. Your mileage
may vary however, take a look at some rather loud disagreement with FDA
over earlier DNDN panel results, possibly involving threats against critics. 
LOL.





>
> That being said, it would seem that biomedical sciences do make
> progress, so some of the published results are presumably correct :)
>
> Peter
>
> On Thu, Jan 6, 2011 at 9:13 PM, Spencer Graves
>  wrote:
> >  Part of the phenomenon can be explained by the natural censorship in
> > what is accepted for publication:  Stronger results tend to have less
> > difficulty getting published.  Therefore, given that a result is published,
> > it is evident that the estimated magnitude of the effect is in average
> > larger than it is in reality, just by the fact that weaker results are less
> > likely to be published.  A study of the literature on this subject might
> > yield an interesting and valuable estimate of the magnitude of this
> > selection bias.
> >
> >
> >  A more insidious problem, that may not affect the work of Jonah Lehrer,
> > is political corruption in the way research is funded, with less public and
> > more private funding of research
> > (http://portal.unesco.org/education/en/ev.php-URL_ID=21052&URL_DO=DO_TOPIC&URL_SECTION=201.html).
> >  For example, I've heard claims (which I cannot substantiate right now) that
> > cell phone companies allegedly lobbied successfully to block funding for
> > researchers they thought were likely to document health problems with their
> > products.  Related claims have been made by scientists in the US Food and
> > Drug Administration that certain therapies were approved on political
> > grounds in spite of substantive questions about the validity of the research
> > backing the request for approval (e.g.,
> > www.naturalnews.com/025298_the_FDA_scientists.html).  Some of these
> > accusations of political corruption may be groundless.  However, as private
> > funding replaces tax money for basic science, we must expect an increase in
> > research results that match the needs of the funding agency while degrading
> > the quality of published research.  This produces more research that can not
> > be replicated -- effects that get smaller upon replication.  (My wife and I
> > routinely avoid certain therapies recommended by physicians, because the
> > physicians get much of their information on recent drugs from the
> > pharmaceuticals, who have a vested interest in presenting their products in
> > the most positive light.)
> >

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-06 Thread Peter Langfelder
>From a purely statistical and maybe somewhat naive point of view,
published p-values should be corrected for the multiple testing that
is effectively happening because of the large number of published
studies. My experience is also that people will often try several
statistical methods to get the most significant p-value but neglect to
share that fact with the audience and/or at least attempt to correct
the p-values for the selection bias.

That being said, it would seem that biomedical sciences do make
progress, so some of the published results are presumably correct :)

Peter

On Thu, Jan 6, 2011 at 9:13 PM, Spencer Graves
 wrote:
>      Part of the phenomenon can be explained by the natural censorship in
> what is accepted for publication:  Stronger results tend to have less
> difficulty getting published.  Therefore, given that a result is published,
> it is evident that the estimated magnitude of the effect is in average
> larger than it is in reality, just by the fact that weaker results are less
> likely to be published.  A study of the literature on this subject might
> yield an interesting and valuable estimate of the magnitude of this
> selection bias.
>
>
>      A more insidious problem, that may not affect the work of Jonah Lehrer,
> is political corruption in the way research is funded, with less public and
> more private funding of research
> (http://portal.unesco.org/education/en/ev.php-URL_ID=21052&URL_DO=DO_TOPIC&URL_SECTION=201.html).
>  For example, I've heard claims (which I cannot substantiate right now) that
> cell phone companies allegedly lobbied successfully to block funding for
> researchers they thought were likely to document health problems with their
> products.  Related claims have been made by scientists in the US Food and
> Drug Administration that certain therapies were approved on political
> grounds in spite of substantive questions about the validity of the research
> backing the request for approval (e.g.,
> www.naturalnews.com/025298_the_FDA_scientists.html).  Some of these
> accusations of political corruption may be groundless.  However, as private
> funding replaces tax money for basic science, we must expect an increase in
> research results that match the needs of the funding agency while degrading
> the quality of published research.  This produces more research that can not
> be replicated -- effects that get smaller upon replication.  (My wife and I
> routinely avoid certain therapies recommended by physicians, because the
> physicians get much of their information on recent drugs from the
> pharmaceuticals, who have a vested interest in presenting their products in
> the most positive light.)
>
>
>      Spencer
>
>
> On 1/6/2011 2:39 PM, Carl Witthoft wrote:
>>
>> The next week's New Yorker has some decent rebuttal letters.  The case is
>> hardly as clear-cut as the author would like to believe.
>>
>> Carl

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-06 Thread Spencer Graves
  Part of the phenomenon can be explained by the natural censorship 
in what is accepted for publication:  Stronger results tend to have less 
difficulty getting published.  Therefore, given that a result is 
published, it is evident that the estimated magnitude of the effect is 
in average larger than it is in reality, just by the fact that weaker 
results are less likely to be published.  A study of the literature on 
this subject might yield an interesting and valuable estimate of the 
magnitude of this selection bias.



  A more insidious problem, that may not affect the work of Jonah 
Lehrer, is political corruption in the way research is funded, with less 
public and more private funding of research 
(http://portal.unesco.org/education/en/ev.php-URL_ID=21052&URL_DO=DO_TOPIC&URL_SECTION=201.html).  
For example, I've heard claims (which I cannot substantiate right now) 
that cell phone companies allegedly lobbied successfully to block 
funding for researchers they thought were likely to document health 
problems with their products.  Related claims have been made by 
scientists in the US Food and Drug Administration that certain therapies 
were approved on political grounds in spite of substantive questions 
about the validity of the research backing the request for approval 
(e.g., www.naturalnews.com/025298_the_FDA_scientists.html).  Some of 
these accusations of political corruption may be groundless.  However, 
as private funding replaces tax money for basic science, we must expect 
an increase in research results that match the needs of the funding 
agency while degrading the quality of published research.  This produces 
more research that can not be replicated -- effects that get smaller 
upon replication.  (My wife and I routinely avoid certain therapies 
recommended by physicians, because the physicians get much of their 
information on recent drugs from the pharmaceuticals, who have a vested 
interest in presenting their products in the most positive light.)



  Spencer


On 1/6/2011 2:39 PM, Carl Witthoft wrote:
The next week's New Yorker has some decent rebuttal letters.  The case 
is hardly as clear-cut as the author would like to believe.


Carl

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-06 Thread Frank Harrell

I was very impressed with Lehrer's article.  I look forward to seeing what
the rebuttals come up with.  The picture that Lehrer paints of the quality
of scientific publications is very dark, and it seems to me, quite
plausible.  Note that Lehrer is the author of "Proust Was a Neuroscientist"
which is one of the best non-fiction books I've ever come across.

Frank


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Wyy-off-topic-Statistical-methods-pub-bias-scientific-validity-tp3177982p3178603.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-06 Thread Carl Witthoft
The next week's New Yorker has some decent rebuttal letters.  The case 
is hardly as clear-cut as the author would like to believe.


Carl

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-06 Thread Bert Gunter
Folks:

The following has NOTHING (obvious) to do with R. But I believe that
all on this list would find it relevant and, I hope, informative. It
is LONG. I apologize in advance to those who feel I have wasted their
time.

 http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer

Best regards to all,

Bert

-- 
Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.