Re: [ccp4bb] The importance of USING our validation tools

2007-08-18 Thread Juergen Bosch

Hi Mischa,

I think you are right with ligand structures and it would be very 
difficult if not impossible to distinguish between real measured data 
and faked data. You just need to run a docking program dock the ligand 
calculate new structure factors add some noise and combine that with 
your real data of the unliganded structure.
I'm not an expert, but how would one be able to detect say a molecule 
which is in the order of 300-600 Da within an average protein of perhaps 
40 kDa if it's true data or faked + noise ?


In Germany we have to keep data (data meaning everything, from clones, 
scans of gels, sizing profiles to xray diffraction images etc.) for 10 
years. Not sure how this is in US.


Juergen

Mischa Machius wrote:

I agree. However, I am personally not so much worried about entire protein 
structures being wrong or fabricated. I am much more worried about 
co-crystal structures. Capturing a binding partner, a reaction 
intermediate or a substrate in an active site is often as spectacular an 
achievement as determining a novel membrane protein structure. The 
threshold for over-interpreting densities for ligands is rather low, and 
wishful thinking can turn into model bias much more easily than for a 
protein structure alone; not to mention making honest mistakes. 

Just for plain and basic scientific purposes, it would be helpful every 
now and then to have access to the orginal images. 

As to the matter of fabricating ligand densities, I surmise, that is much 
easier than fabricating entire protein structures. The potential rewards 
(in terms of high-profile publications and obtaining grants) are just as 
high. There is enough incentive to apply lax scientific standards. 

If a simple means exists, beyond what is available today, that can help 
tremendously in identifying honest mistakes, and perhaps a rare 
fabrication, I think it should seriously be considered. 

Best - MM 





On Sat, 18 Aug 2007, George M. Sheldrick wrote: 

 

There are good reasons for preserving frames, but most of all for the 
crystals that appeared to diffract but did not lead to a successful 
structure solution, publication, and PDB deposition. Maybe in the future 
there will be improved data processing software (for example to integrate 
non-merohedral twins) that will enable good structures to be obtained from 
such data. At the moment most such data is thrown away. However, forcing 
everyone to deposit their frames each time they deposit a structure with 
the PDB would be a thorough nuisance and major logistic hassle. 

It is also a complete illusion to believe that the reviewers for Nature 
etc. would process or even look at frames, even if they could download 
them with the manuscript. 

For small molecules, many journals require an 'ORTEP plot' to be submitted 
with the paper. As older readers who have experienced Dick Harlow's 'ORTEP 
of the year' competition at ACA Meetings will remember, even a viewer 
with little experience of small-molecule crystallography can see from the 
ORTEP plot within seconds if something is seriously wrong, and many 
non-crystallographic referees for e.g. the journal Inorganic Chemistry 
can even make a good guess as to what is wrong (e.g wrong element assigned 
to an atom). It would be nice if we could find something similar for 
macromolecules that the author would have to submit with the paper. One 
immediate bonus is that the authors would look at it carefully 
themselves before submitting, which could lead to an improvement of the 
quality of structures being submitted. My suggestion is that the wwPDB 
might provide say a one-page diagnostic summary when they allocate each 
PDB ID that could be used for this purpose. 

A good first pass at this would be the output that the MolProbity server 
http://molprobity.biochem.duke.edu/ sends when is given a PDB file. It 
starts with a few lines of summary in which bad things are marked red 
and the structure is assigned to a pecentile: a percentile of 6% means 
that 93% of the sturcture in the PDB with a similar resolution are 
'better' and 5% are 'worse'. This summary can be understood with very 
little crystallographic background and a similar summary can 
of course be produced for NMR structures. The summary is followed by 
diagnostics for each residue, normally if the summary looks good it 
would not be necessary for the editor or referee to look at the rest. 

Although this server was intended to help us to improve our structures 
rather than detect manipulated or fabricated data, I asked it for a 
report on 2HR0 to see what it would do (probably many other people were 
trying to do exactly the same, the server was slower than usual). 
Although the structure got poor marks on most tests, MolProbity 
generously assigned it overall to the 6th pecentile, I suppose that 
this is about par for structures submitted to Nature (!). However there 
was one feature that was unlike anything I have ever seen before 
although I have fed 

Re: [ccp4bb] The importance of USING our validation tools

2007-08-18 Thread Thomas Stout

To complete your analogy to the ORTEP of the year, the summary page could be 
accompanied by a backbone ribbon drawing of the macromolecule, with a red 
sphere at each residue that has an error.  You could get fancy and scale the 
sphere according to the severity of the error.

-Tom

-Original Message-
From: CCP4 bulletin board on behalf of George M. Sheldrick
Sent: Sat 8/18/2007 6:26 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] The importance of USING our validation tools
 
There are good reasons for preserving frames, but most of all for the 
crystals that appeared to diffract but did not lead to a successful 
structure solution, publication, and PDB deposition. Maybe in the future 
there will be improved data processing software (for example to integrate 
non-merohedral twins) that will enable good structures to be obtained from 
such data. At the moment most such data is thrown away. However, forcing 
everyone to deposit their frames each time they deposit a structure with 
the PDB would be a thorough nuisance and major logistic hassle.

It is also a complete illusion to believe that the reviewers for Nature 
etc. would process or even look at frames, even if they could download 
them with the manuscript. 

For small molecules, many journals require an 'ORTEP plot' to be submitted 
with the paper. As older readers who have experienced Dick Harlow's 'ORTEP 
of the year' competition at ACA Meetings will remember, even a viewer 
with little experience of small-molecule crystallography can see from the 
ORTEP plot within seconds if something is seriously wrong, and many 
non-crystallographic referees for e.g. the journal Inorganic Chemistry 
can even make a good guess as to what is wrong (e.g wrong element assigned 
to an atom). It would be nice if we could find something similar for 
macromolecules that the author would have to submit with the paper. One 
immediate bonus is that the authors would look at it carefully 
themselves before submitting, which could lead to an improvement of the 
quality of structures being submitted. My suggestion is that the wwPDB 
might provide say a one-page diagnostic summary when they allocate each 
PDB ID that could be used for this purpose.

A good first pass at this would be the output that the MolProbity server 
http://molprobity.biochem.duke.edu/ sends when is given a PDB file. It 
starts with a few lines of summary in which bad things are marked red 
and the structure is assigned to a pecentile: a percentile of 6% means 
that 93% of the sturcture in the PDB with a similar resolution are 
'better' and 5% are 'worse'. This summary can be understood with very 
little crystallographic background and a similar summary can 
of course be produced for NMR structures. The summary is followed by 
diagnostics for each residue, normally if the summary looks good it 
would not be necessary for the editor or referee to look at the rest.

Although this server was intended to help us to improve our structures 
rather than detect manipulated or fabricated data, I asked it for a 
report on 2HR0 to see what it would do (probably many other people were 
trying to do exactly the same, the server was slower than usual). 
Although the structure got poor marks on most tests, MolProbity 
generously assigned it overall to the 6th pecentile, I suppose that 
this is about par for structures submitted to Nature (!). However there 
was one feature that was unlike anything I have ever seen before 
although I have fed the MolProbity server with some pretty ropey PDB 
files in the past: EVERY residue, including EVERY WATER molecule, made 
either at least one bad contact or was a Ramachandran outlier or was a 
rotamer outlier (or more than one of these). This surely would ring 
all the alarm bells!

So I would suggest that the wwPDB could coordinate, with the help of the 
validation experts, software to produce a short summary report that 
would be automatically provided in the same email that allocates the PDB 
ID. This email could make the strong recommendation that the report file 
be submitted with the publication, and maybe in the fullness of time 
even the Editors of high profile journals would require this report for 
the referees (or even read it themselves!). To gain acceptance for such 
a procedure the report would have to be short and comprehensible to 
non-crystallographers; the MolProbity summary is an excellent first 
pass in this respect, but (partially with a view to detecting 
manipulation of the data) a couple of tests could be added based on the 
data statistics as reported in the PDB file or even better the 
reflection data if submitted). Most of the necessary software already 
exists, much of it produced by regular readers of this bb, it just needs 
to be adapted so that the results can be digested by referees and 
editors with little or no crystallographic experience. And most important, 
a PDB ID should always be released only in combination with such a 

Re: [ccp4bb] The importance of USING our validation tools

2007-08-18 Thread Artem Evdokimov
The literature already contains quite a few papers discussing ligand-protein
interactions derived from low-resolution data, noisy data, etc. It's
relatively easy to take a low-quality map; dock the molecule willy-nilly
into the poorly defined 'blobule' of density, and derive spectacular
conclusions. However, in order for such conclusions to be credible one needs
to support them with orthogonal data such as biological assay results,
mutagenesis, etc. This is not limited to crystallography as such, and it's
the referee's job to be thorough in such cases. To the author's credit, in
*most* cases the questionable crystallographic data is supported by
biological data of high quality. So, even with the images, etc. - it's still
quite possible to be honestly mislead. Which is why we value biological
data.

Consequently, if one's conclusions are wrong - this will inevitably show up
later in the results of other experiments (such as SAR inconsistencies for
example). Science tends to be self-correcting - our errors (whether honest
or malicious) are not going to withstand the test of time.

Assuming that the proportion of deliberate faking in scientific literature
is quite small (and we really have no reason to think otherwise!), I really
see no reason to worry too much about the ligand-protein interactions. Any
referee evaluating ligand-based structural papers can ask to see an omit map
(or a difference density map before any ligand was built) and a decent
biological data set supporting the structural conclusions. In the case of
*sophisticated deliberate faking*, there is not much a reviewer can do
except trying to actually reproduce the claimed results.

On the other hand, the 'wholesale' errors can be harder to catch, since the
dataset and the resulting structure are typically the *only* evidence
available. If both are suspect, the reviewer needs to rely on something else
to make a judgement, which is where a one-page summary would come handy.

Artem

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Juergen Bosch
Sent: Saturday, August 18, 2007 12:20 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] The importance of USING our validation tools

Hi Mischa,

I think you are right with ligand structures and it would be very 
difficult if not impossible to distinguish between real measured data 
and faked data. You just need to run a docking program dock the ligand 
calculate new structure factors add some noise and combine that with 
your real data of the unliganded structure.
I'm not an expert, but how would one be able to detect say a molecule 
which is in the order of 300-600 Da within an average protein of perhaps 
40 kDa if it's true data or faked + noise ?

In Germany we have to keep data (data meaning everything, from clones, 
scans of gels, sizing profiles to xray diffraction images etc.) for 10 
years. Not sure how this is in US.

Juergen
 
Mischa Machius wrote:

I agree. However, I am personally not so much worried about entire protein 
structures being wrong or fabricated. I am much more worried about 
co-crystal structures. Capturing a binding partner, a reaction 
intermediate or a substrate in an active site is often as spectacular an 
achievement as determining a novel membrane protein structure. The 
threshold for over-interpreting densities for ligands is rather low, and 
wishful thinking can turn into model bias much more easily than for a 
protein structure alone; not to mention making honest mistakes. 

Just for plain and basic scientific purposes, it would be helpful every 
now and then to have access to the orginal images. 

As to the matter of fabricating ligand densities, I surmise, that is much 
easier than fabricating entire protein structures. The potential rewards 
(in terms of high-profile publications and obtaining grants) are just as 
high. There is enough incentive to apply lax scientific standards. 

If a simple means exists, beyond what is available today, that can help 
tremendously in identifying honest mistakes, and perhaps a rare 
fabrication, I think it should seriously be considered. 

Best - MM 




On Sat, 18 Aug 2007, George M. Sheldrick wrote: 

  

There are good reasons for preserving frames, but most of all for the 
crystals that appeared to diffract but did not lead to a successful 
structure solution, publication, and PDB deposition. Maybe in the future 
there will be improved data processing software (for example to integrate 
non-merohedral twins) that will enable good structures to be obtained from

such data. At the moment most such data is thrown away. However, forcing 
everyone to deposit their frames each time they deposit a structure with 
the PDB would be a thorough nuisance and major logistic hassle. 

It is also a complete illusion to believe that the reviewers for Nature 
etc. would process or even look at frames, even if they could download 
them with the manuscript. 

For small molecules, many journals require an 

Re: [ccp4bb] The importance of USING our validation tools

2007-08-18 Thread Lisa A Nagy
Dear all,
I agree with MM about the ligand and complex structures. Even in the
most honest circumstances, it is easy to get carried away with hopes and
excitement. My personal embarassing experience was some years ago. It
involved a protein that I had crystallized in a different space group in
the presence of inhibitor- 2.5A data. The MR model had some gaps a
moderate distance from the binding pocket. Lo and behold, some new, very
rough  density appeared very very close to a binding site- close enough
to get my hopes up. I communicated my elation to the PI, handed over
pictures of the rough blobs of density, and started trying to build the
ligand in. 

I should have moderated my emotions in light of the early state of the
refinement. After finding a somewhat plausible fit in the density, I ran
several rounds of the Wonderful Amazing Revealer of Proteindensity
program. By the end I was almost in tears. The difference density began
to take on a helical shape, and then the connections started growing,
leading all the way up to one of the gaps. Side chains too, so I had no
trouble with the register. The R-factors didn't change too much, but the
geometries and maps in the area started looking really nice. Or should I
say, proper.

Very nice silver platter (that my head was on when it was handed it back
to me).

Lisa