Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread George M. Sheldrick

Dominika is entirely correct, the F and (especially) sigma(F) values 
are clearly inconsistent with my naive suggestion that columns could 
have been swapped accidentally in an mtz file. 

George

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry, 
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582


On Thu, 16 Aug 2007, Dominika Borek wrote:

> There are several issues under current discussion. We outline a few of
> these below, in order of importance.
> 
> The structure 2hr0 is unambiguously fake. Valid arguments have already been
> published in a Brief Communication by Janssen et. al (Nature, 448:E1-E2, 9
> August 2007). However, the published response from the authors of the
> questioned deposit may sound to unfamiliar person as an issue of a
> scientific controversy. There are many additional independent signs of
> intentional data fabrication in this case, above and beyond those already
> mentioned.
> 
> One diagnostic is related to the fact that fabricating data will not show
> proper features of proteins with respect to disorder. The reported case has
> a very high ratio of “Fobs”/atom parameters, thus the phase uncertainty is
> small. In real structures fully solvent exposed chains without stabilizing
> interactions display intrinsically high disorder, yet in this structure
> these residues (e.g., Arg932B, Met1325B, Glu1138B, Arg459A, etc.) are
> impossibly well ordered.
> 
> The second set of diagnostics is the observation of perfect electron
> density around impossible geometries. For example, the electron density is
> perfect (visible even at the 4 sigma level in a 2Fo-Fc map) with no
> significant negative peaks in an Fo-Fc map around the guanidinium group of
> Arg1112B, which is in an outrageously close contact to carbon atoms of
> Lys1117B. This observation appears in many other places in the map as well.
> The issue is not the presence of bad contacts, but the lack of disorder
> (high B-factors) or negative peaks in an Fo-Fc map in this region that
> could explain why the bad contacts remain in the model.
> 
> The third set of diagnostics are statistics that do not occur in real
> structures. The ones mentioned previously are already very convincing
> (moments, B-factor plots, bulk solvent issues, etc.). We can add more
> evidence from a round of Refmac refinement of the deposited model versus
> the deposited structure factors. The anisotropic scaling factor obtained,
> which for a structure in a low symmetry space group such as C2 that has an
> inherent lack of constraint in packing symmetry, is unreasonable
> (particularly in view of the problems with lattice contacts already
> mentioned). The values from a Refmac refinement for a typical structure in
> space group C2 are: B11 =  0.72 B22 =  1.15 B33 = -2.12 B12 =  0.00 B13 =
> -1.40 B23 =  0.00 (B12 and B23 are zero due to C2 space group symmetry).
> For structure 2hr0:  B11 = -0.02 B22 =  0.00 B33 =  0.02 B12 =  0.00 B13 =
> 0.01 B23 =  0.00. Statistical reasoning can lead to P-values in the range
> of 10exp(-6) for such values to be produced by chance in a real structure,
> but they are highly likely in a fabricated case.
> 
> The fourth set of diagnostics are significant inconsistencies in published
> methods, e.g. the authors claim that they collected data from four
> crystals, yet their data merging statistics show an R-merge = 0.11 in the
> last resolution shell. It is simply impossible to get such values
> particularly when I/sigma(I) for the last resolution shell was stated as
> 1.32. Moreover, the overall I/sigma(I) for all data is 5.36 and the overall
> R-merge is 0.07 – values highly inconsistent with the reported data
> resolution, quality of map and high data completeness (97.3%).
> 
> Overall this is just a short list of problems, the indicators of data
> fabrication/falsification are plentiful and if needed can be easily
> provided to interested parties.
> 
> We fully support Randy Read's excellent comments with our view of
> retraction and public discussion of this problem:
> 
> “Originally I expected that the publication of our Brief Communication in
> Nature would stimulate a lot of discussion on the bulletin board, but
> clearly it hasn't. One reason is probably that we couldn't be as forthright
> as we wished to be. For its own good reasons, Nature did not allow us to
> use the word "fabricated". Nor were we allowed to discuss other structures
> from the same group, if they weren't published in Nature.”
> 
> One needs to address this policy with publishers in cases of intentional
> fraud that can be proven simply by an analysis of the published results. At
> this point the article needs to be retracted by Nature after Nature's
> internal investigation with input from crystallographic community rather
> then after obtaining results of any potential administrative investigation
> of fraud.
> 
> “Another reason is an understandable reluctance to 

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread William Scott
On Thu, 16 Aug 2007, Clemens Vonrhein wrote:
 
 Maybe we should contact Google to let them do it for us ;-)
 


Better yet, simply download your images to a computer that uses AT&T as an 
internet service provider.  All the information will be automatically 
copied and stored by the NSA.

cf:  http://www.eff.org/legal/cases/att/faq.php


Bill





 


[ccp4bb] possibility of other fabricated structures

2007-08-16 Thread Petr Leiman

A small, but very important excerpt from the original Randy Read's message

"... Nature did not allow us to use the word "fabricated". Nor were we 
allowed to discuss other structures from the same group, if they weren't 
published in Nature."


So, are there OTHER SUSPECT STRUCTURES from the same group or same authors 
published elsewhere???


Petr Leiman
Dept. of Biological Sciences
Purdue University
West Lafayette, IN 


Re: [ccp4bb] nature cb3 response

2007-08-16 Thread Marcos Vicente de Albuquerque Salles Navarro
For the purposes of evaluating a manuscript, the editorial policy of NSMB
explicitly states that the atomic coordinates and structure factors files
should be provided to reviewers and editors upon request, if those are not
already freely accessible in a publicly available and recognized database.
http://www.nature.com/nsmb/about/ed_policies/index.html

Maybe this should be a widespread policy in journals that publish
crystallographic structures.

Marcos

[EMAIL PROTECTED] esceveu: 
>

A comment from my collaborator's student suggests a partial 
>answer.  This afternoon he happened to say "but of course the 
>reviewers will look at the model, I just deposited it!".  He was 
>shocked to find that "hold for pub" means that even reviewers can't 
>access the data.  Can that be changed?  It would take a bit of 
>coordination between journals and the PDB, but I think the student is 
>right - it is rather shocking that the data is sitting there nicely 
>deposited but the reviewers can't review it.
> Phoebe Rice
>
>At 05:33 PM 8/16/2007, Bernhard Rupp wrote:
>>Ok, enough political (in)correctness. Irrespective of fabricated or not,
>>I think this points to a general problem of commercial journals and
>>their review process, as it seems that selling (.com) hot stuff
>>induces an extraordinary capability of denial.
>>
>>The comment, as someone noted, does not address the allegations
>>at all. This is reminiscent of my dealings with Nature in two
>>related cases: They ignore or stonewall until the dispute is ended
>>with an irrelevant comment. In one case, Axel B later proved
>>with the correct structure that what we had commented on earlier
>>was entirely correct.
>>In the second case, the comment (by some of the leading experts,
>>not just by me nobody) was rejected with no recourse based on another
>>non-fact-addressing author comment and not published at all.
>>
>>Compare this to a similar case, when the Jacs editor (.org <--) contacted
>me
>>
>>on its own accord to check for a related problem, leading to retraction
>>of the paper after the editor (a scientist himself) evaluated
>>facts and response.
>>
>>It also seems to depend on the handling Nature editor. I have made maps
>of
>>several structures from data unhesitantly provided by the editor when I
>>had reason to ask for them during review. Those were also responsive to
>>a mini-table-1-comment I sent on cb3, but I did not hear from the editor
>>assigned to cb3.
>>
>>This time again, the review completely failed (table 1 and comment
>issues),
>>and
>>the editorial process failed as well, because the response is not
>adequate.
>>If someone - as tentatively and tactfully it may have been phrased -
>accused
>>
>>me of faking data they'd eat shit until hell freezes over
>>
>>It is as simple as that: Extraordinary claim (super structure, bizarre
>stats
>>and properties) requires extraordinary proof. This rule has not been
>>followed, which reflects poorly on the scientific process in this case.
>>
>>I also note that in no case known to me, persons involved in
>irregularities
>>have ever appeared as frequent (or at all) communicators on the ccp4bb.
>>
>>As long as grant review and tenure committees rely on automated
>>bibliometrics
>>and impact factors (and who knows who) to decide academic careers and
>>funding,
>>the big journals will remain the winners. The system has become
>>self-perpetuating.
>>
>>Back to grant writing now.
>>Need to get that paper out to nature...
>>
>>Cheers, br
>>
>>PS: it is pointless flaming me. I am the messenger only.
>>
>>-Original Message-
>>From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
>>Bernhard Rupp
>>Sent: Thursday, August 16, 2007 2:03 PM
>>To: CCP4BB@JISCMAIL.AC.UK
>>Subject: Re: [ccp4bb] nature cb3 comment pdf
>>
>>thxthxthx to all the day and night owls for the many copies
>>The winners have been selected, no more entries needed.
>>thx again br
>>
>>-Original Message-
>>From: Miriam Hirshberg [mailto:[EMAIL PROTECTED] On Behalf Of
>Miriam
>>Hirshberg
>>Sent: Thursday, August 16, 2007 1:58 PM
>>To: Bernhard Rupp
>>Subject: Re: [ccp4bb] nature cb3 comment pdf
>>
>>
>>attached, Miri
>>
>>
>>On Thu, 16 Aug 2007, Bernhard Rupp wrote:
>>
>> > my nature web connection just died for good (probably a preventive
>> > measure..)
>> > Could someone kindly email me the pdfs of the comment and response?
>> > Thx br
>> > -
>> > Bernhard Rupp
>> > 001 (925) 209-7429
>> > +43 (676) 571-0536
>> > [EMAIL PROTECTED]
>> > [EMAIL PROTECTED]
>> > http://www.ruppweb.org/
>> > -
>> > People can be divided in three classes:
>> > The few who make things happen
>> > The many who watch things happen
>> > And the overwhelming majority
>> > who have no idea what is happening.
>> > -
>> >
>
>---

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread William Scott
 No one knows definitively if this was fabricated.

Well, at least one person does.  

But I agree, it is important to keep in mind that the proper venue for 
determining guilt or innocence in the case of fraud is the court system.

Until fairly recently, the idea of presumed innocence and the right to 
cross-examine accusers and witnesses has been considered fundamental to 
civil society.

The case certainly sounds compelling, but this is all the more reason to 
adhere to these ideals.

Bill Scott


Re: [ccp4bb] nature cb3 response

2007-08-16 Thread Bernhard Rupp
Nature DOES require availability of structure factors and coordinates as
a matter of policy, and also to make them available for review on demand.
If the reviewer does not want them, the editor can't do anything about.

One also cannot demand of a biologist reviewer to reconstruct
maps, but others long ago and I recently have suggested in nature to make 
at least the RSCC mandatory reading for to reviewers - a picture
says more than words... 

One way would be to carefully pair reviewers for crystallographic papers - 
a competent biologist and a competent crystallographer. 
Being not a famous biologist I am generally unimpressed by the 
story, and unemotional about the crystallography. The biology reviewer 
on the other hand could make the point how relevant and exciting 
the structure and its biological implications are. The 
proper pairing is something where I would lay the responsibility 
heavy on the journal editors. That is just a matter of due diligence. 
  
br

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Thursday, August 16, 2007 5:10 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] nature cb3 response

A comment from my collaborator's student suggests a partial answer.  This
afternoon he happened to say "but of course the reviewers will look at the
model, I just deposited it!".  He was shocked to find that "hold for pub"
means that even reviewers can't access the data.  Can that be changed?  It
would take a bit of coordination between journals and the PDB, but I think
the student is right - it is rather shocking that the data is sitting there
nicely deposited but the reviewers can't review it.
 Phoebe Rice


Re: [ccp4bb] nature cb3 response

2007-08-16 Thread price
A comment from my collaborator's student suggests a partial 
answer.  This afternoon he happened to say "but of course the 
reviewers will look at the model, I just deposited it!".  He was 
shocked to find that "hold for pub" means that even reviewers can't 
access the data.  Can that be changed?  It would take a bit of 
coordination between journals and the PDB, but I think the student is 
right - it is rather shocking that the data is sitting there nicely 
deposited but the reviewers can't review it.

Phoebe Rice

At 05:33 PM 8/16/2007, Bernhard Rupp wrote:

Ok, enough political (in)correctness. Irrespective of fabricated or not,
I think this points to a general problem of commercial journals and
their review process, as it seems that selling (.com) hot stuff
induces an extraordinary capability of denial.

The comment, as someone noted, does not address the allegations
at all. This is reminiscent of my dealings with Nature in two
related cases: They ignore or stonewall until the dispute is ended
with an irrelevant comment. In one case, Axel B later proved
with the correct structure that what we had commented on earlier
was entirely correct.
In the second case, the comment (by some of the leading experts,
not just by me nobody) was rejected with no recourse based on another
non-fact-addressing author comment and not published at all.

Compare this to a similar case, when the Jacs editor (.org <--) contacted me

on its own accord to check for a related problem, leading to retraction
of the paper after the editor (a scientist himself) evaluated
facts and response.

It also seems to depend on the handling Nature editor. I have made maps of
several structures from data unhesitantly provided by the editor when I
had reason to ask for them during review. Those were also responsive to
a mini-table-1-comment I sent on cb3, but I did not hear from the editor
assigned to cb3.

This time again, the review completely failed (table 1 and comment issues),
and
the editorial process failed as well, because the response is not adequate.
If someone - as tentatively and tactfully it may have been phrased - accused

me of faking data they'd eat shit until hell freezes over

It is as simple as that: Extraordinary claim (super structure, bizarre stats
and properties) requires extraordinary proof. This rule has not been
followed, which reflects poorly on the scientific process in this case.

I also note that in no case known to me, persons involved in irregularities
have ever appeared as frequent (or at all) communicators on the ccp4bb.

As long as grant review and tenure committees rely on automated
bibliometrics
and impact factors (and who knows who) to decide academic careers and
funding,
the big journals will remain the winners. The system has become
self-perpetuating.

Back to grant writing now.
Need to get that paper out to nature...

Cheers, br

PS: it is pointless flaming me. I am the messenger only.

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Bernhard Rupp
Sent: Thursday, August 16, 2007 2:03 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] nature cb3 comment pdf

thxthxthx to all the day and night owls for the many copies
The winners have been selected, no more entries needed.
thx again br

-Original Message-
From: Miriam Hirshberg [mailto:[EMAIL PROTECTED] On Behalf Of Miriam
Hirshberg
Sent: Thursday, August 16, 2007 1:58 PM
To: Bernhard Rupp
Subject: Re: [ccp4bb] nature cb3 comment pdf


attached, Miri


On Thu, 16 Aug 2007, Bernhard Rupp wrote:

> my nature web connection just died for good (probably a preventive
> measure..)
> Could someone kindly email me the pdfs of the comment and response?
> Thx br
> -
> Bernhard Rupp
> 001 (925) 209-7429
> +43 (676) 571-0536
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]
> http://www.ruppweb.org/
> -
> People can be divided in three classes:
> The few who make things happen
> The many who watch things happen
> And the overwhelming majority
> who have no idea what is happening.
> -
>


---
Phoebe A. Rice
Assoc. Prof., Dept. of Biochemistry & Molecular Biology
The University of Chicago
phone 773 834 1723
fax 773 702 0439
http://bmb.bsd.uchicago.edu/index.html
http://www.nasa.gov/mission_pages/cassini/multimedia/pia06064.html 


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Dima Klenchin

I like to emphasize that the infamous table 1 alone should
have immediately tipped off any competent reviewer.
The last shell I/Isig is 1.3 and rmerge 0.11 (!).


And keep in mind that this statistics comes from
merging data from FOUR different crystals! (That's
clearly and unambigously stated in Methods section).

Dima


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Mischa Machius
Due to these recent, highly publicized irregularities and ample  
(snide) remarks I hear about them from non-crystallographers, I am  
wondering if the trust in macromolecular crystallography is beginning  
to erode. It is often very difficult even for experts to distinguish  
fake or wishful thinking from reality. Non-crystallographers will  
have no chance at all and will consequently not rely on our results  
as much as we are convinced they could and should. If that is indeed  
the case, something needs to be done, and rather sooner than later.   
Best - MM


 


Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353


[ccp4bb] nature cb3 response

2007-08-16 Thread Bernhard Rupp
Ok, enough political (in)correctness. Irrespective of fabricated or not,
I think this points to a general problem of commercial journals and 
their review process, as it seems that selling (.com) hot stuff 
induces an extraordinary capability of denial.

The comment, as someone noted, does not address the allegations
at all. This is reminiscent of my dealings with Nature in two
related cases: They ignore or stonewall until the dispute is ended
with an irrelevant comment. In one case, Axel B later proved
with the correct structure that what we had commented on earlier
was entirely correct.
In the second case, the comment (by some of the leading experts,
not just by me nobody) was rejected with no recourse based on another
non-fact-addressing author comment and not published at all.

Compare this to a similar case, when the Jacs editor (.org <--) contacted me

on its own accord to check for a related problem, leading to retraction 
of the paper after the editor (a scientist himself) evaluated
facts and response.

It also seems to depend on the handling Nature editor. I have made maps of
several structures from data unhesitantly provided by the editor when I 
had reason to ask for them during review. Those were also responsive to 
a mini-table-1-comment I sent on cb3, but I did not hear from the editor 
assigned to cb3.  

This time again, the review completely failed (table 1 and comment issues),
and 
the editorial process failed as well, because the response is not adequate.
If someone - as tentatively and tactfully it may have been phrased - accused

me of faking data they'd eat shit until hell freezes over

It is as simple as that: Extraordinary claim (super structure, bizarre stats
and properties) requires extraordinary proof. This rule has not been
followed, which reflects poorly on the scientific process in this case.

I also note that in no case known to me, persons involved in irregularities 
have ever appeared as frequent (or at all) communicators on the ccp4bb. 

As long as grant review and tenure committees rely on automated
bibliometrics 
and impact factors (and who knows who) to decide academic careers and
funding, 
the big journals will remain the winners. The system has become
self-perpetuating.

Back to grant writing now.
Need to get that paper out to nature...

Cheers, br

PS: it is pointless flaming me. I am the messenger only.

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Bernhard Rupp
Sent: Thursday, August 16, 2007 2:03 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] nature cb3 comment pdf

thxthxthx to all the day and night owls for the many copies 
The winners have been selected, no more entries needed.
thx again br

-Original Message-
From: Miriam Hirshberg [mailto:[EMAIL PROTECTED] On Behalf Of Miriam
Hirshberg
Sent: Thursday, August 16, 2007 1:58 PM
To: Bernhard Rupp
Subject: Re: [ccp4bb] nature cb3 comment pdf


attached, Miri


On Thu, 16 Aug 2007, Bernhard Rupp wrote:

> my nature web connection just died for good (probably a preventive
> measure..)
> Could someone kindly email me the pdfs of the comment and response?
> Thx br
> -
> Bernhard Rupp
> 001 (925) 209-7429
> +43 (676) 571-0536
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]
> http://www.ruppweb.org/
> -
> People can be divided in three classes:
> The few who make things happen
> The many who watch things happen
> And the overwhelming majority
> who have no idea what is happening.
> -
>


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Dunten, Pete W.
A few thoughts following on Richard Baxter and George Sheldrick . . .

Re: gaps in the lattice – see the tyr-tRNA synthase structures (1tya for 
example).  Fersht has written a whole book full of insights from these 
structures.  

Re: Phaser Z scores.  For some MR work with two xtal forms of a structure, I 
got Z scores of 4.0 and 4.3 for the rotation and translation searches in one 
form, and 8.7 and 3.5 for the other, using a model with 18% sequence identity.  
So you don't need great Z scores for the solution to be right.  The map 
calculated with MR phases had a correlation coefficient of 0.22 with the final 
model.

Re: confusing columns in an mtz file.  I had the same thought.  If the column 
types were different for experimental versus calculated F's, and refmac only 
allowed you to refine against an experimental F, could this kind of trouble be 
avoided?  Of course you'd want an option to override the default, for people 
doing weird things.  Dunno about cns or phenix, but didn't we recently see 
messages about how hard it was to work with cns reflection files, leading to a 
new conversion program from Kevin?  It seems possible to get the wrong column 
there as well.

Re: images.  Be careful what you sign - the user agreements with synchrotron 
facilities in the USA may state that the data are public, and not private (as 
the funding is from the public). 

Pete


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Dominika Borek
There are several issues under current discussion. We 
outline a few of these below, in order of importance.


The structure 2hr0 is unambiguously fake. Valid arguments 
have already been published in a Brief Communication by 
Janssen et. al (Nature, 448:E1-E2, 9 August 2007). However, 
the published response from the authors of the questioned 
deposit may sound to unfamiliar person as an issue of a 
scientific controversy. There are many additional 
independent signs of intentional data fabrication in this 
case, above and beyond those already mentioned.


One diagnostic is related to the fact that fabricating data 
will not show proper features of proteins with respect to 
disorder. The reported case has a very high ratio of 
“Fobs”/atom parameters, thus the phase uncertainty is small. 
In real structures fully solvent exposed chains without 
stabilizing interactions display intrinsically high 
disorder, yet in this structure these residues (e.g., 
Arg932B, Met1325B, Glu1138B, Arg459A, etc.) are impossibly 
well ordered.


The second set of diagnostics is the observation of perfect 
electron density around impossible geometries. For example, 
the electron density is perfect (visible even at the 4 sigma 
level in a 2Fo-Fc map) with no significant negative peaks in 
an Fo-Fc map around the guanidinium group of Arg1112B, which 
is in an outrageously close contact to carbon atoms of 
Lys1117B. This observation appears in many other places in 
the map as well. The issue is not the presence of bad 
contacts, but the lack of disorder (high B-factors) or 
negative peaks in an Fo-Fc map in this region that could 
explain why the bad contacts remain in the model.


The third set of diagnostics are statistics that do not 
occur in real structures. The ones mentioned previously are 
already very convincing (moments, B-factor plots, bulk 
solvent issues, etc.). We can add more evidence from a round 
of Refmac refinement of the deposited model versus the 
deposited structure factors. The anisotropic scaling factor 
obtained, which for a structure in a low symmetry space 
group such as C2 that has an inherent lack of constraint in 
packing symmetry, is unreasonable (particularly in view of 
the problems with lattice contacts already mentioned). The 
values from a Refmac refinement for a typical structure in 
space group C2 are: B11 =  0.72 B22 =  1.15 B33 = -2.12 B12 
=  0.00 B13 = -1.40 B23 =  0.00 (B12 and B23 are zero due to 
C2 space group symmetry). For structure 2hr0:  B11 = -0.02 
B22 =  0.00 B33 =  0.02 B12 =  0.00 B13 =  0.01 B23 =  0.00. 
Statistical reasoning can lead to P-values in the range of 
10exp(-6) for such values to be produced by chance in a real 
structure, but they are highly likely in a fabricated case.


The fourth set of diagnostics are significant 
inconsistencies in published methods, e.g. the authors claim 
that they collected data from four crystals, yet their data 
merging statistics show an R-merge = 0.11 in the last 
resolution shell. It is simply impossible to get such values 
particularly when I/sigma(I) for the last resolution shell 
was stated as 1.32. Moreover, the overall I/sigma(I) for all 
data is 5.36 and the overall R-merge is 0.07 – values highly 
inconsistent with the reported data resolution, quality of 
map and high data completeness (97.3%).


Overall this is just a short list of problems, the 
indicators of data fabrication/falsification are plentiful 
and if needed can be easily provided to interested parties.


We fully support Randy Read's excellent comments with our 
view of retraction and public discussion of this problem:


“Originally I expected that the publication of our Brief 
Communication in Nature would stimulate a lot of discussion 
on the bulletin board, but clearly it hasn't. One reason is 
probably that we couldn't be as forthright as we wished to 
be. For its own good reasons, Nature did not allow us to use 
the word "fabricated". Nor were we allowed to discuss other 
structures from the same group, if they weren't published in 
Nature.”


One needs to address this policy with publishers in cases of 
intentional fraud that can be proven simply by an analysis 
of the published results. At this point the article needs to 
be retracted by Nature after Nature's internal investigation 
with input from crystallographic community rather then after 
obtaining results of any potential administrative 
investigation of fraud.


“Another reason is an understandable reluctance to make 
allegations in public, and the CCP4 bulletin board probably 
isn't the best place to do that.”


The discussion of fraud allegation was initiated by public 
reply to a question addressed to a single person, so it 
happened by chance rather than by intention, but with no 
complaint from our side.


On a different aspect of the discussion – namely, data 
preservation—currently, funding agencies as well as 
scientific responsibility requires authors of any 
publication to preserve and ke

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread James Whisstock
Dear all

Personally I feel that we really have an obligation to make the raw images 
available, even if it is painful to do so in terms of data storage.  It is the 
only way of truly allowing others to reproduce our experiments, which is a 
basic requirement for all published scientific work crystallographic or 
non-crystallographic.  I also hope it would allow structures to be improved as 
future tools develop, for errors to be corrected and perhaps one day to apply 
new methods (for example to deal more appropriately with twinned structures).  
Now people have developed ways to routinely share vast quantities of data 
(music movies etc) then it should not be beyond our community to release image 
data.  Helen Berman from the pdb mentioned that the pdb could store unmerged 
reflection data which is great, however, I still feel that we should make our 
images available so everything can be reprocessed if someone wishes to. In 
regards to the 2hr0 structure Piet Gros and colleagues made the comment  that 
the issue could be resolved with the release of the images - no mention of that 
request is made in the reply.

James 

Richard Baxter <[EMAIL PROTECTED]> wrote:> 
> Dear All,
> 
> Without passing any judgement on the veracity of C3b structure 2hr0, I
> note that the Ca RMSD of this structure with C3 structure 2a73 was
> unusually low, compared to the RMSD of 2a73 to the related entries 2a74
> and 2i07 by the same group, bovine C3 structure 2b39 and C3b and C3c
> structures 2ice and 2icef. If one took a high resolution structure as a
> molecular replacement solution of a new structure at lower resolution
> this might be expected, but not vice versa?
> 
> As to whether the structures problem arise from malfeasance or neglect,
> I do not understand why the journal did not require the raw images be
> made available given the evidence presented against the published data,
> isn't that what is done in other fields when such issues are raised?
> Isn't it more practical to make the availability of raw data upon
> request a requirement of publication more practical than trying to set
> up a vast repository of images when submission to that repository is
> still a matter of choice?
> 
> I have several questions regarding the reply that I would like to hear
> an answer to, perhaps Todd can help obtain them:
> 
> 1. Could the statement "Statistical disorder resulting in apparent
> 'gaps' in the lattice has been observed for other proteins" not be
> referenced by citation to numerous deposited structures if they indeed
> exist?
> 
> 2. I was not convinced that the Z-scores of the PHASER solutions were
> significant, shouldn't they be greater than 6.0? It didn't look like
> density at 0.7 sigma was contiguous over the main chain.
> 
> 3. Can the domain suggested to fill the void in the asymmetric unit be a
> "contaminant" when it must be present in stoichiometric ratio in order
> to provide lattice contacts? Why not present a SDS/PAGE gel of a
> redissolved crystal, surely that domain would show up.
> 
> 3. I don't understand why the statement "Bulk-solvent modelling is
> contentious, making many refinements necessary to constrain parameters
> to obtain acceptable values" was considered an acceptable response to
> the question of the low resolution data. Whether one chooses to include
> low-resolution data with bulk solvent modelling or to truncate the low
> res data is a separate issue from the physical effect of solvent on
> intensities at low resolution.
> 
> One point in the reply that seemed reasonable is the issue of B-factor
> variation, because the deposited C3 structures do exhibit a wide range
> in the average , also resolution, and whether TLS refinement was used
> and how heavily restraints were set. However, that does not really
> address the issue of seemingly random coil without other contacts having
> such great contours at 2.5 sigma.
> 
> I would look forward to learning from people with more experience on
> these matters.
> 
> sincerely,
> 
> Richard Baxter
> 
> On Thu, 2007-08-16 at 10:11, Green, Todd wrote:
>> Hello all,
>>
>> I started to write a response to this thread yesterday. I thought the
>> title was great even the content of Eleanor's email was very helpful.
>> What I didn't like was the indictment in the next to last paragraph.
>> This has been followed up with the word fabrication by others. No one
>> knows definitively if this was fabricated. You have your suspicions,
>> but you don't "know." Fabrication suggests malicious wrong-doing. I
>> actually don't think this was the case. I'm probably a bit biased
>> because the work comes from an office down the hall from my own. I'd
>> like to think that if the structure is wrong that it could be chalked
>> up to inexperience rather than malice. To me, this scenario of
>> inexperience seems like one that could become more and more prevalent
>> as our field opens up to more and more scientists doing structural
>> work who are not dedicated crys

Re: [ccp4bb] nature cb3 comment pdf

2007-08-16 Thread Bernhard Rupp
thxthxthx to all the day and night owls for the many copies 
The winners have been selected, no more entries needed.
thx again br

-Original Message-
From: Miriam Hirshberg [mailto:[EMAIL PROTECTED] On Behalf Of Miriam
Hirshberg
Sent: Thursday, August 16, 2007 1:58 PM
To: Bernhard Rupp
Subject: Re: [ccp4bb] nature cb3 comment pdf


attached, Miri


On Thu, 16 Aug 2007, Bernhard Rupp wrote:

> my nature web connection just died for good (probably a preventive
> measure..)
> Could someone kindly email me the pdfs of the comment and response?
> Thx br
> -
> Bernhard Rupp
> 001 (925) 209-7429
> +43 (676) 571-0536
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]
> http://www.ruppweb.org/
> -
> People can be divided in three classes:
> The few who make things happen
> The many who watch things happen
> And the overwhelming majority
> who have no idea what is happening.
> -
>


[ccp4bb] nature cb3 comment pdf

2007-08-16 Thread Bernhard Rupp
my nature web connection just died for good (probably a preventive
measure..)
Could someone kindly email me the pdfs of the comment and response?
Thx br
-
Bernhard Rupp
001 (925) 209-7429
+43 (676) 571-0536
[EMAIL PROTECTED]
[EMAIL PROTECTED] 
http://www.ruppweb.org/ 
-
People can be divided in three classes:
The few who make things happen
The many who watch things happen
And the overwhelming majority 
who have no idea what is happening.
-


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread George M. Sheldrick
The deposited structure 2HR0 shows all the signs of having been refined, 
deliberately or accidentally, against 'calculated' data. The model used 
to 'calculate' the data had (almost) constant B-values in a rather empty 
cell containing no solvent. For example, it could have been a (partial?)
molecular replacement solution obtained using real data. It seems to me 
that it is perfectly possible that two reflection files (or two columns 
in an mtz file) were carelessly exchanged by a crystallographically 
inexperienced researcher. This even explains the low CA RMSD to the 2A73 
structure, if that had been used as a search fragment; even the 
suspiciously poor Phaser Z scores can be explained (maybe it was only a 
partially correct MR solution against the real data). So although my 
first reaction was that there was overwhelming evidence of fraud, on 
reflection a relatively benign explanation is still possible.

The situation could be clarified fairly quickly if the frames or a crystal 
or even the original HKL2000 .sca file could be found. What I really don't 
understand is how the Editors of the revered journal Nature allowed a 
'reply' to be printed which made no reference to the request for the 
essential experimental evidence, i.e. the raw diffraction data, to be 
produced. Protein crystallography is an experimental science just like any 
other, even if the results it produces usually stand the test of time 
better.

George  

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry, 
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Bernhard Rupp
I like to emphasize that the infamous table 1 alone should
have immediately tipped off any competent reviewer.
The last shell I/Isig is 1.3 and rmerge 0.11 (!). Rfree and R
have extraorinarily low gaps. And all that for a 
large, porportedly flexible multidomain molecule.
Enough to ask more questions, even without initially having model, 
data, frames available.
 
Maybe the infamous Table 1 is still good for something after all.
Hiding it in supplemental material does not promote 
reading it.
 
br


From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Anastassis Perrakis
Sent: Thursday, August 16, 2007 8:13 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] The importance of USING our validation tools


1. Make the images available and demand a public apology for spoiling their
name.
2. Shut up, retract the paper, buy property in Alaska and disappear.


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Richard Baxter
Dear All,

Without passing any judgement on the veracity of C3b structure 2hr0, I
note that the Ca RMSD of this structure with C3 structure 2a73 was
unusually low, compared to the RMSD of 2a73 to the related entries 2a74
and 2i07 by the same group, bovine C3 structure 2b39 and C3b and C3c
structures 2ice and 2icef. If one took a high resolution structure as a
molecular replacement solution of a new structure at lower resolution
this might be expected, but not vice versa?

As to whether the structures problem arise from malfeasance or neglect,
I do not understand why the journal did not require the raw images be
made available given the evidence presented against the published data,
isn't that what is done in other fields when such issues are raised?
Isn't it more practical to make the availability of raw data upon
request a requirement of publication more practical than trying to set
up a vast repository of images when submission to that repository is
still a matter of choice?

I have several questions regarding the reply that I would like to hear
an answer to, perhaps Todd can help obtain them:

1. Could the statement "Statistical disorder resulting in apparent
'gaps' in the lattice has been observed for other proteins" not be
referenced by citation to numerous deposited structures if they indeed
exist?

2. I was not convinced that the Z-scores of the PHASER solutions were
significant, shouldn't they be greater than 6.0? It didn't look like
density at 0.7 sigma was contiguous over the main chain.

3. Can the domain suggested to fill the void in the asymmetric unit be a
"contaminant" when it must be present in stoichiometric ratio in order
to provide lattice contacts? Why not present a SDS/PAGE gel of a
redissolved crystal, surely that domain would show up.

3. I don't understand why the statement "Bulk-solvent modelling is
contentious, making many refinements necessary to constrain parameters
to obtain acceptable values" was considered an acceptable response to
the question of the low resolution data. Whether one chooses to include
low-resolution data with bulk solvent modelling or to truncate the low
res data is a separate issue from the physical effect of solvent on
intensities at low resolution. 

One point in the reply that seemed reasonable is the issue of B-factor
variation, because the deposited C3 structures do exhibit a wide range
in the average , also resolution, and whether TLS refinement was used
and how heavily restraints were set. However, that does not really
address the issue of seemingly random coil without other contacts having
such great contours at 2.5 sigma.

I would look forward to learning from people with more experience on
these matters.

sincerely,

Richard Baxter

On Thu, 2007-08-16 at 10:11, Green, Todd wrote:
> Hello all,
> 
> I started to write a response to this thread yesterday. I thought the
> title was great even the content of Eleanor's email was very helpful.
> What I didn't like was the indictment in the next to last paragraph.
> This has been followed up with the word fabrication by others. No one
> knows definitively if this was fabricated. You have your suspicions,
> but you don't "know." Fabrication suggests malicious wrong-doing. I
> actually don't think this was the case. I'm probably a bit biased
> because the work comes from an office down the hall from my own. I'd
> like to think that if the structure is wrong that it could be chalked
> up to inexperience rather than malice. To me, this scenario of
> inexperience seems like one that could become more and more prevalent
> as our field opens up to more and more scientists doing structural
> work who are not dedicated crystallographers.
> 
> Having said that, I think Eleanor started an extremely useful thread
> as a way of avoiding the pitfalls of crystallography whether you are a
> novice or an expert. There's no question that this board is the best
> way to advance one's knowledge of crystallography. I actually gave a
> homework assignment that was simply to sign up for the ccp4bb.
> 
> In reference to the previously mentioned work, I'd also like to hear
> discussion concurring or not the response letter some of which seems
> plausible to me.
> 
> I hope I don't ruffle anyones feathers by my email, but I just thought
> that it should be said.
> 
> Cheers-
> Todd
> 
> 
> -Original Message-
> From: CCP4 bulletin board on behalf of Randy J. Read
> Sent: Thu 8/16/2007 8:22 AM
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] The importance of USING our validation tools
> 
> On Aug 16 2007, Eleanor Dodson wrote:
> 
> >The weighting in REFMAC is a function of SigmA ( plotted in log
> file).
> >For this example it will be nearly 1 for all resolutions ranges so
> the
> >weights are pretty constant. There is also a contribution from the
> >"experimental" sigma, which in this case seems to be proportional to
> |F|
> 
> Originally I expected that the publication of our Brief Communication
> in
> Nature would sti

[ccp4bb] Best Practices in Virtual Screening

2007-08-16 Thread Barry Hardy
Perhaps the most frequent topic of discussion that I have seen 
consistently arising in my recent conversations with drug discovery 
researchers, is the topic of Virtual Screening and its complexities, 
confusions, and varying validity and reliability.  John Irwin and I 
initiated the idea of a best practice initiative last Autumn 
(http://barryhardy.blogs.com/cheminfostream/2006/10/could_we_take_a.html).  
We realise this will take time but I believe it is an endeavour worth 
undertaking that will be of significant benefit to both industry and 
academic researchers.  To this end we are supporting workshop and wiki 
activity this Autumn to initiate such a program.


The Virtual Screening Community of Practice Workshop and Forum will take 
place 15-16 October at Bryn Mawr, Philadelphia to further the above 
goals.  This activity will consist of the following components:


1. Workshop to share experiences on current practices in virtual 
screening and to collaboratively develop best practices for comparison 
studies. (morning/afternoon of October 15).
2. Conference session on latest method developments with presentations 
and panel discussion. (October 16)
3. Poster Session (evening of October 16). NOTE: If interested in 
presenting a poster, please send an abstract (ca. 300-500 words) for 
review to eCheminfo (-at-) douglasconnect.com We have also left space on 
the program schedule to feature a selection of the abstracts submitted 
as oral presentations.
4. Virtual communication and collaboration approaches will be used pre- 
and post-event to maximise the benefit of the workshop activity. In 
particular a wiki will be opened prior to the workshop to commence 
documentation of supporting materials and to start to populate the area 
with initial suggestions, ideas, practices and methods. The wiki will 
also support subsequent practice group activities and development 
initiatives, including future ongoing meetings and workshops and 
research and development projects. (Realising this activity needs to be 
in progress for quite some time.)


The agenda of workshop will be designed so as to maximise interaction, 
discussion, issue resolution, and action plans for cooperation. Workshop 
activities will address the specific challenges:
* statistically significant relationships between docking scores and 
ligand affinity
* practices and procedures for the operation of community-based 
screening and docking comparisons including tests and interpretation of 
results, in a way that everyone can agree is fair.

* peer review, data compilation, running of programs, judgement of results
* workflow descriptions for comparisons
* beyond conformational energetics in the rank ordering of diverse 
compounds in high throughput virtual screening

* measurement and benchmarking
* binding mode prediction, virtual screening for lead identification, 
rank-ordering by affinity for lead optimization
* atom typing, ligand preparation (ionic forms, tautomers, ...), ligand 
conformer generation, protein preparation (protonation, residue 
orientation, ...), ligand placement (top-down, bottom-up, fragment 
based, group based, ...), energy calculation (force field type, grid 
type, algorithm, ...), constraint handling (global and local 
optimization strategy? process to escape local minima?), scoring 
(single-objective, multi-objective, consensus, ...)

* separation of test set information from model development
* validation datasets, results and applicability domains
* objective comparisons of standardized test datasets
* extraction of data from the scientific literature
* methods and procedures for secure testing of commercial data that 
could be acceptable to industry

* frameworks for computational model testing and validation
* impact of knowledge management approaches
* collaboration and community support structures and environments

We welcome the collaboration and participation of all academic, 
government and industry practitioners in drug discovery in strengthening 
the scientific foundations of this valuable set of cheminformatics 
techniques.


More Information
Website: http://www.echeminfo.com/COMTY_screeningforumbm07
Pdf Download: 
http://barryhardy.blogs.com/cheminfostream/files/eChemProgramBrynMawr07-web1.PDF



best regards
Barry Hardy
eCheminfo Community of Practice

Barry Hardy, PhD
Douglas Connect
Zeiningen, CH-4314
Switzerland
Tel: +41 61 851 0170
Blog: http://barryhardy.blogs.com/cheminfostream/




[ccp4bb] Correct H-bond length in CYS.cif ?

2007-08-16 Thread Juergen J. Mueller

Dear all,
using refmac5 to provide H-atoms for a protein structure the
distance between CYS-CG and HG is defined to 1.34 Ang. in CYS.cif.
That distance has been mentioned by a non-CCP4 program
by
* Poor covalent bond length of 1.33954 for hydrogen atom 

In an other library-file CSH.cif the same distance is defined to 1.1 Ang.
What is the correct one?
Could you comment on this?
(Of course I know hydrogens will not be refined ...).
Thank you,
Juergen


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Mark J. van Raaij
PS I wasn't aware Nature now requires structure factors to be  
submitted - which breaks down one of my arguments...
I still hope the authors provide the images though, otherwise I will  
start suspecting much more structure than I do now.
In the meantime, all for required submission of raw images, both to  
prevent misconduct and to help program developers.

Mark
Mark J. van Raaij
Unidad de Bioquímica Estructural
Dpto de Bioquímica, Facultad de Farmacia
and
Unidad de Rayos X, Edificio CACTUS
Universidad de Santiago
15782 Santiago de Compostela
Spain
http://web.usc.es/~vanraaij/


On 16 Aug 2007, at 17:59, Eleanor Dodson wrote:

This structure (1h6w) provides an interesting comparison; it looks  
just as I would expect though for such an interesting extended fold.
There are big peaks on the 3-fold axis; there is wispy density  
which would be very hard to model - I found an ILE in the wrong  
rotamer (341A) - (there is ALWAYS something you can improve) - in  
other words it looks like a real map..


And the intensity plots look as expected too..
Eleanor



Mark J. van Raaij wrote:

Dear all,

With regards to the possible "fabrication" of the 2hr0 structure,  
why would the authors have deposited the structure factors if this  
is not required by the journal? Also, why would they have  
"fabricated" a structure with gaps along c if they could have done  
so without the gap?


I few years ago, I had to cope with two structures with gaps along  
c, pdb codes 1h6w and 1ocy those of you who are interested,  
structure factors are available from the pdb, unmerged intensities/ 
raw images I will look for and provide if requested...


Without further evidence, I suspect their structure is real,  
perhaps not optimally refined and treated though, but then again,  
this seems commonplace in "Nature" structures, perhaps due to lack  
of time/experience and, in some cases, putting too much pressure  
on the PhD students/postdocs involved instead of mentoring and  
checking them. I hope the authors provide the raw diffraction  
images to dispel any doubts and would be curious to learn about  
the other structures of the same group - anyone has a  
comprehensive, annotated list of them?


Greetings,

Mark J. van Raaij
Unidad de Bioquímica Estructural
Dpto de Bioquímica, Facultad de Farmacia
and
Unidad de Rayos X, Edificio CACTUS
Universidad de Santiago
15782 Santiago de Compostela
Spain
http://web.usc.es/~vanraaij/ 


On 16 Aug 2007, at 15:22, Randy J. Read wrote:


On Aug 16 2007, Eleanor Dodson wrote:

The weighting in REFMAC is a function of SigmA ( plotted in log  
file).
For this example it will be nearly 1 for all resolutions ranges  
so the weights are pretty constant. There is also a contribution  
from the "experimental" sigma, which in this case seems to be  
proportional to |F|


Originally I expected that the publication of our Brief  
Communication in Nature would stimulate a lot of discussion on  
the bulletin board, but clearly it hasn't. One reason is probably  
that we couldn't be as forthright as we wished to be. For its own  
good reasons, Nature did not allow us to use the word  
"fabricated". Nor were we allowed to discuss other structures  
from the same group, if they weren't published in Nature.


Another reason is an understandable reluctance to make  
allegations in public, and the CCP4 bulletin board probably isn't  
the best place to do that.


But I think the case raises essential topics for the community to  
discuss, and this is a good forum for those discussions. We need  
to consider how to ensure the integrity of the structural  
databases and the associated publications.


So here are some questions to start a discussion, with some  
suggestions of partial answers.


1. How many structures in the PDB are fabricated?

I don't know, but I think (or at least hope) that the number is  
very small.


2. How easy is it to fabricate a structure?

It's very easy, if no-one will be examining it with a suspicious  
mind, but it's extremely difficult to do well. No matter how well  
a structure is fabricated, it will violate something that is  
known now or learned later about the properties of real  
macromolecules and their diffraction data. If you're clever  
enough to do this really well, then you should be clever enough  
to determine the real structure of an interesting protein.


3. How can we tell whether structures in the PDB are fabricated,  
or just poorly refined?


The current standard validation tools are aimed at detecting  
errors in structure determination or the effects of poor  
refinement practice. None of them are aimed at detecting specific  
signs of fabrication because we assume (almost always correctly)  
that others are acting in good faith.


The more information that is available, the easier it will be to  
detect fabrication (because it is harder to make up more  
information convincingly). For instance, if the diffraction data  
are de

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Jacob Keller
Hello All,

This debacle is actually quite reminiscent of a similar incident that Wayne 
Hendrickson caught in
the 1970's concerning purported "tRNA crystals." Turned out to be completely 
fabricated, and the
guy's career went down the drain, I think. A good example to tell your trainees.

Jacob Keller

The ref's:


1.   True identity of a diffraction pattern attributed to valyl tRNA

WAYNE A. HENDRICKSON, BROR E. STRANDBERG, ANDERS LILJAS, L. MARIO AMZEL, EATON 
E. LATTMAN

CONTEXT: SIR - We have examined in detail several publications by H.H. 
Paradies. One is a report in 
Nature on 11 April 1970 about single crystals of a valine-specific tRNA from 
yeast1. We find that
the diffraction pattern attributed to valyl tRNA...

Nature 303, 195 - 195 (19 May 1983) Correspondence


2.   A reply from Paradies

H.H. PARADIES

Nature 303, 196 - 196 (19 May 1983) Correspondence


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Peter Keller

On Thu, 16 Aug 2007, Kay Diederichs wrote:


Date: Thu, 16 Aug 2007 17:16:54 +0200
From: Kay Diederichs <[EMAIL PROTECTED]>
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] The importance of USING our validation tools

I'm glad that the discussion has finally set in, and would only like to 
comment on the practicability of storing images.


Disciplines such as astronomy have management and processing requirements for 
image data that make our diffraction images look like pretty minor stuff. Data 
rates of ~1Tb/day don't make these guys wonder whether it can be done or not. 
Come 2013, the Large Synoptic Survey Telescope is expected to be producing 
115Tb (yes: Tb) of data per day, and that is each and every day, not just now 
and again.


I got these figures from:

http://www.ast.cam.ac.uk/~wfcam/docs/papers/JRL_ADASS_paper.pdf
http://www.aspbooks.org/a/volumes/article_details/?paper_id=771
http://www.gridtoday.com/grid/803617.html
http://www.lsst.org/Project/docs/lsst_data_man_prospects.pdf

A more general article on "The Data Deluge" covering a number of fields 
including our own is at: 
http://users.ecs.soton.ac.uk/ajgh/DataDeluge(final).pdf


If comprehensive archiving of diffraction seems daunting to some, it is only 
in comparison with what we have been doing up to now in our own field of 
macromolecular crystallography. In comparison to what other people are doing, 
it doesn't seem that bad to me!


Regards,
Peter.


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread James Whisstock
The pdb will give the depositor the results of their validation runs and 
identify problems - however they cannot force depositors to address those 
problems...

J
Gina Clayton <[EMAIL PROTECTED]> wrote:> 
> I thought that when a structure is deposited the databank does run its
> own
> refinement validation and geometry checks and gives you back what it
> finds i.e
> distance problems etc and rfactor?
> 
> 
> Quoting Eleanor Dodson <[EMAIL PROTECTED]>:
> 
>> The weighting in REFMAC is a function of SigmA ( plotted in log file).
>> For this example it will be nearly 1 for all resolutions ranges so
>> the weights are pretty constant. There is also a contribution from
>> the "experimental" sigma, which in this case seems to be proportional
>> to |F|
>>
>> Yesterday I attached the wrong TRUNCATE log file - here is the
>> correct one, and if you look at the plot
>> "Amplitude Analysis against resolution" it also includes a plot of
>>  
>>
>> Eleanor
>>
>> Dominika Borek wrote:
>>> There are many more interesting things about this structure -
>>> obvious fake - refined against fabricated data.
>>>
>>> After running refmac I have noticed discrepancies between R and
>>> weighted R-factors. However, I do not know how the weights are
>>> calculated and applied - it could maybe help to find out how these
>>> data were created. Could you help?
>>>
>>> M(4SSQ/LL) NR_used %_obs M(Fo_used) M(Fc_used) Rf_used WR_used
>>> NR_free M(Fo_free) M(Fc_free) Rf_free   WR_free $$
>>> $$
>>>  0.0052205  98.77  3800.5  3687.2  0.12  0.30 121 4133.9
>>> 4042.7  0.12  0.28
>>>  0.0153952  99.90  1932.9  1858.7  0.20  0.60 197 2010.5
>>> 1880.5  0.21  0.40
>>>  0.0255026  99.81  1577.9  1512.3  0.23  0.62 283 1565.0
>>> 1484.6  0.26  0.54
>>>  0.0345988  99.76  1598.0  1541.5  0.23  0.61 307 1625.7
>>> 1555.6  0.23  0.42
>>>  0.0446751  99.79  1521.2  1481.6  0.18  0.41 338 1550.3
>>> 1523.8  0.18  0.61
>>>  0.0547469  99.81  1314.5  1291.2  0.14  0.29 391 1348.3
>>> 1337.7  0.15  0.27
>>>  0.0648078  99.87  .5  1089.1  0.16  0.36 465 1096.1
>>> 1077.9  0.18  0.42
>>>  0.0738642  99.84   976.7   959.2  0.15  0.32 488  995.3
>>> 988.4  0.16  0.50
>>>  0.0839255  99.88   866.4   848.0  0.16  0.36 490  856.8
>>> 846.0  0.17  0.38
>>>  0.0939778  99.88   747.6   731.4  0.16  0.36 515  772.8
>>> 747.3  0.18  0.38
>>>  0.103   10225  99.86   662.6   649.1  0.17  0.38 547  658.9
>>> 643.6  0.20  0.36
>>>  0.113   10768  99.83   597.2   584.7  0.18  0.42 538  593.4
>>> 590.0  0.20  0.49
>>>  0.122   11121  99.86   535.5   521.9  0.19  0.48 607  556.2
>>> 542.0  0.20  0.47
>>>  0.132   11692  99.85   489.3   479.2  0.19  0.46 607  476.4
>>> 467.3  0.23  0.42
>>>  0.142   11999  99.83   453.9   443.1  0.19  0.48 621  455.3
>>> 440.6  0.22  0.55
>>>  0.152   12463  99.79   419.2   407.3  0.19  0.44 655  435.3
>>> 424.3  0.22  0.53
>>>  0.162   12885  99.78   384.0   373.9  0.20  0.53 632  384.1
>>> 376.1  0.22  0.43
>>>  0.171   12698  95.96   357.2   348.5  0.21  0.57 686  353.9
>>> 338.6  0.24  0.51
>>>  0.181   11926  87.78   332.0   323.3  0.21  0.66 590  333.4
>>> 322.6  0.24  0.57
>>>  0.191   11204  80.39   309.9   299.6  0.22  0.59 600  302.1
>>> 296.3  0.26  0.77
>>> $$
>>>
>>>
>>>
>>>
>>> Eleanor Dodson wrote:
 There is a correspondence in last weeks Nature commenting on the
 disparities between  three C3B structures. These are:
 2icf   solved at 4.0A resolution, 2i07 at 4.1A resolution, and 2hr0
 at 2.26A resolution.

 The A chains of all 3 structures agree closely, with each other and
 other deposited structures.
 The B chains of 2icf and 2i07 are in reasonable agreement, but
 there are enormous differences to the B chain of 2hr0.
 This structure is surprisingly out of step, and by many criteria
 likely to be wrong.

 There has been many articles written on validation and it seems
 worth reminding crystallographers
 of  some of tests which make 2hr0 suspect.

 1) The cell content analysis suggests there is 80% solvent in the
 asymmetric unit.
 Such crystals have been observed but they rarely diffract to 2.26A.

 2) Data Analysis:
 The reflection data has been deposited so it can be analysed.
 The plots provided by TRUNCATE showing intensity statistic features
 are not compatible with such a high solvent ratio.   They are too
 perfect; the moments are perfectly linear, unlikely with such large
 volumes of the crystal containing solvent, and there is absolutely
 no evidence of anisotropy, again unlikely with high solvent content.

 3)  Structure analysis
 a) The Ramachandran plot is very poor ( 84% allowed) with many
 residues in disallowed regions.
 b) The distribution of residue B values is quite unrealistic. There
 is a very low spread,  which is most unusual for a structure with
>>

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Clemens Vonrhein
On Thu, Aug 16, 2007 at 03:13:29PM +0100, Phil Evans wrote:
> What do you count as raw data? Rawest are the images - everything  
> beyond that is modellling - but archiving images is _expensive_!  

Maybe we should contact Google to let them do it for us ;-)

  http://news.bbc.co.uk/2/hi/technology/6425975.stm

I doubt every crystallographer would want access to all raw datasets -
but for developers it would be ABSOLUTELY FANTASTIC (similar to things
like the JCSG archive). And just imagine all those well collected
datasets of > 10 years ago and what we could learn from those (and the
better structures we could determine) with the modern tools and
programs ...

Clemens

-- 

***
* Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com
*
*  Global Phasing Ltd.
*  Sheraton House, Castle Park 
*  Cambridge CB3 0AX, UK
*--
* BUSTER Development Group  (http://www.globalphasing.com)
***


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Juergen Bosch
I think the average structure is much less than 20 GB since most data 
seems to be collected as SAD. I  quickly looked at my data ~20 
structures 3 MAD, 9 SAD, 3 MIR, 4 MR, number of amino acids per asu 150 
- 9600, the average was closer to 3 GB (compressed). The largest dataset 
24 GB (compressed), smallest 300 MB (compressed).


Juergen

Mischa Machius wrote:

Hmm - I think I miscalculated, by a factor of 100 even!... need more  
coffee. In any case, I still think it would be doable. Best - MM



On Aug 16, 2007, at 9:30 AM, Mischa Machius wrote:

I don't think archiving images would be that expensive. For one, I  
have found that most formats can be compressed quite substantially  
using simple, standard procedures like bzip2. If optimized, raw  
images won't take up that much space. Also, initially, only those  
images that have been used to obtain phases and to refine finally  
deposited structures could be archived. If the average structure  
takes up 20GB of space, 5,000 structures would be 1TB, which fits  on 
a single hard drive for less than $400. If the community thinks  this 
is a worthwhile endeavor, money should be available from  granting 
agencies to establish a central repository (e.g., at the  RCSB). 
Imagine what could be done with as little as $50,000. For  large 
detectors, binning could be used, but giving current hard  drive 
prices and future developments, that won't be necessary. Best  - MM



On Aug 16, 2007, at 9:13 AM, Phil Evans wrote:

What do you count as raw data? Rawest are the images - everything  
beyond that is modellling - but archiving images is _expensive_!  
Unmerged intensities are probably more manageable


Phil


On  16 Aug 2007, at 15:05, Ashley Buckle wrote:


Dear Randy

These are very valid points, and I'm so glad you've taken the  
important step of initiating this. For now I'd like to respond to  
one of them, as it concerns something I and colleagues in  
Australia are doing:




The more information that is available, the easier it will be to  
detect fabrication (because it is harder to make up more  
information convincingly). For instance, if the diffraction data  
are deposited, we can check for consistency with the known  
properties of real macromolecular crystals, e.g. that they  
contain disordered solvent and not vacuum. As Tassos Perrakis  has 
discovered, there are characteristic ways in which the  standard 
deviations depend on the intensities and the  resolution. If 
unmerged data are deposited, there will probably  be evidence of 
radiation damage, weak effects from intrinsic  anomalous 
scatterers, etc. Raw images are probably even harder  to simulate 
convincingly.



After the recent Science retractions we realised that its about  
time raw data was made available. So, we have set about creating  
the necessary IT and software to do this for our diffraction  data, 
and are encouraging Australian colleagues to do the same.  We are 
about a week away from launching a web-accessible  repository for 
our recently published (eg deposited in PDB) data,  and this should 
coincide with an upcoming publication describing  a new structure 
from our labs. The aim is that publication occurs  simultaneously 
with release in PDB as well as raw diffraction  data on our 
website. We hope to house as much of our data as  possible, as well 
as data from other Australian labs, but  obviously the potential 
dataset will be huge, so we are trying to  develop, and make 
available freely to the community, software  tools that allow 
others to easily setup their own repositories.   After brief 
discussion with PDB the plan is that PDB include  links from 
coordinates/SF's to the raw data using a simple handle  that can be 
incorporated into a URL.  We would hope that we can  convince the 
journals that raw data must be made available at the  time of 
publication, in the same way as coordinates and structure  
factors.  Of course, we realise that there will be many hurdles  
along the way but we are convinced that simply making the raw  data 
available ASAP is a 'good thing'.


We are happy to share more details of our IT plans with the  
CCP4BB, such that they can be improved, and look forward to  
hearing feedback


cheers





-- 
--

Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353




 


Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353




--
Jürgen Bosch
University of Washington
Dept. of Biochemistry, K-426
1705 NE Pacific Street
Seattle, WA 98195
Box 357742
Phone:   +1-206-616-4510
FAX: +1-206-685-7002


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Eleanor Dodson
This structure (1h6w) provides an interesting comparison; it looks just 
as I would expect though for such an interesting extended fold.
There are big peaks on the 3-fold axis; there is wispy density which 
would be very hard to model - I found an ILE in the wrong rotamer (341A) 
- (there is ALWAYS something you can improve) - in other words it looks 
like a real map..


And the intensity plots look as expected too..
Eleanor



Mark J. van Raaij wrote:

Dear all,

With regards to the possible "fabrication" of the 2hr0 structure, why 
would the authors have deposited the structure factors if this is not 
required by the journal? Also, why would they have "fabricated" a 
structure with gaps along c if they could have done so without the gap?


I few years ago, I had to cope with two structures with gaps along c, 
pdb codes 1h6w and 1ocy those of you who are interested, structure 
factors are available from the pdb, unmerged intensities/raw images I 
will look for and provide if requested...


Without further evidence, I suspect their structure is real, perhaps 
not optimally refined and treated though, but then again, this seems 
commonplace in "Nature" structures, perhaps due to lack of 
time/experience and, in some cases, putting too much pressure on the 
PhD students/postdocs involved instead of mentoring and checking them. 
I hope the authors provide the raw diffraction images to dispel any 
doubts and would be curious to learn about the other structures of the 
same group - anyone has a comprehensive, annotated list of them?


Greetings,

Mark J. van Raaij
Unidad de Bioquímica Estructural
Dpto de Bioquímica, Facultad de Farmacia
and
Unidad de Rayos X, Edificio CACTUS
Universidad de Santiago
15782 Santiago de Compostela
Spain
http://web.usc.es/~vanraaij/ 


On 16 Aug 2007, at 15:22, Randy J. Read wrote:


On Aug 16 2007, Eleanor Dodson wrote:


The weighting in REFMAC is a function of SigmA ( plotted in log file).
For this example it will be nearly 1 for all resolutions ranges so 
the weights are pretty constant. There is also a contribution from 
the "experimental" sigma, which in this case seems to be 
proportional to |F|


Originally I expected that the publication of our Brief Communication 
in Nature would stimulate a lot of discussion on the bulletin board, 
but clearly it hasn't. One reason is probably that we couldn't be as 
forthright as we wished to be. For its own good reasons, Nature did 
not allow us to use the word "fabricated". Nor were we allowed to 
discuss other structures from the same group, if they weren't 
published in Nature.


Another reason is an understandable reluctance to make allegations in 
public, and the CCP4 bulletin board probably isn't the best place to 
do that.


But I think the case raises essential topics for the community to 
discuss, and this is a good forum for those discussions. We need to 
consider how to ensure the integrity of the structural databases and 
the associated publications.


So here are some questions to start a discussion, with some 
suggestions of partial answers.


1. How many structures in the PDB are fabricated?

I don't know, but I think (or at least hope) that the number is very 
small.


2. How easy is it to fabricate a structure?

It's very easy, if no-one will be examining it with a suspicious 
mind, but it's extremely difficult to do well. No matter how well a 
structure is fabricated, it will violate something that is known now 
or learned later about the properties of real macromolecules and 
their diffraction data. If you're clever enough to do this really 
well, then you should be clever enough to determine the real 
structure of an interesting protein.


3. How can we tell whether structures in the PDB are fabricated, or 
just poorly refined?


The current standard validation tools are aimed at detecting errors 
in structure determination or the effects of poor refinement 
practice. None of them are aimed at detecting specific signs of 
fabrication because we assume (almost always correctly) that others 
are acting in good faith.


The more information that is available, the easier it will be to 
detect fabrication (because it is harder to make up more information 
convincingly). For instance, if the diffraction data are deposited, 
we can check for consistency with the known properties of real 
macromolecular crystals, e.g. that they contain disordered solvent 
and not vacuum. As Tassos Perrakis has discovered, there are 
characteristic ways in which the standard deviations depend on the 
intensities and the resolution. If unmerged data are deposited, there 
will probably be evidence of radiation damage, weak effects from 
intrinsic anomalous scatterers, etc. Raw images are probably even 
harder to simulate convincingly.


If a structure is fabricated by making up a new crystal form, perhaps 
a complex of previously-known components, then the crystal packing 
interactions should look like the intera

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Andrew Raine

Ashley Buckle wrote:

By raw data I mean images. We think this is only manageable using a 
distributed data grid model (eg Universities/institutions setup their 
own repositories using open standards, and PDB aggregate the links to 
them. URL persistence will be a hurdle I admit).


This reminded me of the LOCKSS project (Lots Of Copies Keeps Stuff Safe
- http://www.lockss.org) which claims to do just this.  I've not used
it, but it looks interesting, and rather than re-inventing the wheel...

Regards,

Andrew

--
Dr. Andrew Raine, Head of IT, MRC Dunn Human Nutrition Unit,
Wellcome Trust/MRC Building, Hills Road, Cambridge, CB2 2XY, UK
phone: +44 (0)1223 252830   fax: +44 (0)1223 252835
web: www.mrc-dunn.cam.ac.uk email: [EMAIL PROTECTED]


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Edward A Berry

For nice crystals data processing is straightforward. For crystals with
large unit cells, high mosaicity, and diffuse scattering, processing
can be critical. It may be that future advances in integration
software will allow one to extract far better data from such a
diffraction dataset than can be obtained now. Even short of that,
systematic optimization of things like assumed mosaicity, integration
box parameters, and which parameters to fix or refine for the crystal
or refine for each frame can make a big difference. With high
profile structures, the rush to publish as soon as possible does
not often permit this kind of refinement.

Obviously beamline personel should strive to get the correct
values in the header, but if they are even close enough to
allow indexing, beam center and distance can be refined
together with crystal parameters.

And if we consider fabrication, obviously it would be trivial
to take Fcalcs, add random noise and call them Fobs, whereas
generating a convincing diffraction pattern, that will give
the stated Fobs when integrated, would take a lot more work.
(I think James Holten has such a program, but presumably
it leaves obvious tracks that would allow one to detect
its misuse?)

So I think it is each PI's responsibility to keep the images
from any structure that was published, maybe not publicly
accessible on a web site but at least on DVD's in a drawer
somewhere, so it could be pulled out for reprocessing if an
improved algorithm appears, or for presentation to a funding
agency if a question of scientific misconduct is being investigated.
Saving the crystal in liquid N2 might be good also, if it is not
already burned up by radiation damage.

Ed

Santarsiero, Bernard D. wrote:

Sorry, I think it's a waste of resources to store the raw images. I think
we should trust people to be able to at least process their own data set.
Besides, you would need to include beamline parameters, beam position,
detector distances, etc. that may or may not be correct in the image
headers. I'm all for storage and retrieval of a primary intensity data
file (I or F^2 with esds).

Bernie Santarsiero


On Thu, August 16, 2007 9:46 am, Mischa Machius wrote:

Hmm - I think I miscalculated, by a factor of 100 even!... need more
coffee. In any case, I still think it would be doable. Best - MM


On Aug 16, 2007, at 9:30 AM, Mischa Machius wrote:


I don't think archiving images would be that expensive. For one, I
have found that most formats can be compressed quite substantially
using simple, standard procedures like bzip2. If optimized, raw
images won't take up that much space. Also, initially, only those
images that have been used to obtain phases and to refine finally
deposited structures could be archived. If the average structure
takes up 20GB of space, 5,000 structures would be 1TB, which fits
on a single hard drive for less than $400. If the community thinks
this is a worthwhile endeavor, money should be available from
granting agencies to establish a central repository (e.g., at the
RCSB). Imagine what could be done with as little as $50,000. For
large detectors, binning could be used, but giving current hard
drive prices and future developments, that won't be necessary. Best
- MM


On Aug 16, 2007, at 9:13 AM, Phil Evans wrote:


What do you count as raw data? Rawest are the images - everything
beyond that is modellling - but archiving images is _expensive_!
Unmerged intensities are probably more manageable

Phil


On  16 Aug 2007, at 15:05, Ashley Buckle wrote:


Dear Randy

These are very valid points, and I'm so glad you've taken the
important step of initiating this. For now I'd like to respond to
one of them, as it concerns something I and colleagues in
Australia are doing:

The more information that is available, the easier it will be to
detect fabrication (because it is harder to make up more
information convincingly). For instance, if the diffraction data
are deposited, we can check for consistency with the known
properties of real macromolecular crystals, e.g. that they
contain disordered solvent and not vacuum. As Tassos Perrakis
has discovered, there are characteristic ways in which the
standard deviations depend on the intensities and the
resolution. If unmerged data are deposited, there will probably
be evidence of radiation damage, weak effects from intrinsic
anomalous scatterers, etc. Raw images are probably even harder
to simulate convincingly.

After the recent Science retractions we realised that its about
time raw data was made available. So, we have set about creating
the necessary IT and software to do this for our diffraction
data, and are encouraging Australian colleagues to do the same.
We are about a week away from launching a web-accessible
repository for our recently published (eg deposited in PDB) data,
and this should coincide with an upcoming publication describing
a new structure from our labs. The aim is that publication occurs
simultaneously with release in PDB as we

Re: [ccp4bb] Problem with VDW restraints in Refmac

2007-08-16 Thread Eleanor Dodson
Refmac will not introduce a repulsion unless the sum of the occupancies 
of the two neighbouring atoms id > 1.00 . Is that the case for you? ( It 
might list the close contacts - but shouldnt use them)
If you want a link between the ligand and something else though you must 
label them both A or B

Eleanor



Alastair McEwen wrote:

Dear all,

I am refining a structure with a partially occupied ligand. The 
binding site contains a Glutamine residue with a dual conformation 
with the ‘B’ conformation overlapping with the ligand. I have named 
the ligand ‘ALIG’ but when refining, Refmac notes a number of VDW 
deviations and the ligand and side-chain are moved away from each 
other and out of the density. When I refine without VDW restraints the 
ligand and side-chain refine well in the density but I would like to 
be able to use these restraints for the rest of the structure. Is it 
possible to have Refmac ignore these restraints just for this ligand 
and residue?


Many thanks,

Alastair



Dr. Alastair McEwen
Département de Biologie et Génomique Structurales
IGBMC, 1 rue Laurent Fries, BP10142
67404 ILLKIRCH, France
Tel: +33 (0)3 88 65 57 73
Fax: +33 (0)3 88 65 32 76
email: [EMAIL PROTECTED]





Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Kay Diederichs
I'm glad that the discussion has finally set in, and would only like to 
comment on the practicability of storing images.


Mischa Machius schrieb:
I don't think archiving images would be that expensive. For one, I have 
found that most formats can be compressed quite substantially using 
simple, standard procedures like bzip2. If optimized, raw images won't 
take up that much space. Also, initially, only those images that have 
been used to obtain phases and to refine finally deposited structures 
could be archived. If the average structure takes up 20GB of space, 


that's on the high side I'd say; I would have estimated 1.5 GB (native 
alone) to 5 GB for e.g. a native and 3 wavelengths (after bzip2).



5,000 structures would be 1TB, which fits on a single hard drive for


5,000 structures of 20GB would be 100 TB

If the PDB would require all images of a _single_ dataset for 
molecular-replacement structures or mutant studies, and all images of 
all wavelengths/derivatives for experimentally phased structures, that 
would come to roughly (40,000 X-ray structures) * (on average 2 GB per 
structure) = 80 TB of data. At €250 per TB, that would be 20,000 € - an 
estimate of what it takes to store all the raw data for _all_ the X-ray 
structures in the PDB - less than what a single a single protein 
cloning/purification/crystallization/structure project costs per year.



less than $400. If the community thinks this is a worthwhile endeavor, 
money should be available from granting agencies to establish a central 
repository (e.g., at the RCSB). Imagine what could be done with as 
little as $50,000. For large detectors, binning could be used, but 
giving current hard drive prices and future developments, that won't be 
necessary. Best - MM




Archiving images is quite practical even for those data that do not 
directly correspond to deposited PDB entries.
In 1999 we abandoned tape storage of raw data in favor of disk storage. 
Everything we collected at synchrotrons since then still fits on two 
750GB disks. In 2000 we also needed two disks, and have been upgrading 
the disks when the old ones were full. To have these data online means 
that one can easily look at them again, for testing data reduction and 
phasing programs, and for trying to solve, using new programs, those 
structures where crystals could never be reproduced.


just my 2 cents -

Kay Diederichs
--
Kay Diederichs http://strucbio.biologie.uni-konstanz.de
email: [EMAIL PROTECTED] Tel +49 7531 88 4049 Fax 3183
Fachbereich Biologie, Universitaet Konstanz, Box M647, D-78457 Konstanz



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Ashley Buckle
Validation aside, access to raw data is also helpful for method  
development (eg integration and scaling algorithms), on which we all  
rely.

Ashley

On 17/08/2007, at 1:04 AM, Santarsiero, Bernard D. wrote:

Sorry, I think it's a waste of resources to store the raw images. I  
think
we should trust people to be able to at least process their own  
data set.

Besides, you would need to include beamline parameters, beam position,
detector distances, etc. that may or may not be correct in the image
headers. I'm all for storage and retrieval of a primary intensity data
file (I or F^2 with esds).

Bernie Santarsiero


On Thu, August 16, 2007 9:46 am, Mischa Machius wrote:

Hmm - I think I miscalculated, by a factor of 100 even!... need more
coffee. In any case, I still think it would be doable. Best - MM


On Aug 16, 2007, at 9:30 AM, Mischa Machius wrote:


I don't think archiving images would be that expensive. For one, I
have found that most formats can be compressed quite substantially
using simple, standard procedures like bzip2. If optimized, raw
images won't take up that much space. Also, initially, only those
images that have been used to obtain phases and to refine finally
deposited structures could be archived. If the average structure
takes up 20GB of space, 5,000 structures would be 1TB, which fits
on a single hard drive for less than $400. If the community thinks
this is a worthwhile endeavor, money should be available from
granting agencies to establish a central repository (e.g., at the
RCSB). Imagine what could be done with as little as $50,000. For
large detectors, binning could be used, but giving current hard
drive prices and future developments, that won't be necessary. Best
- MM


On Aug 16, 2007, at 9:13 AM, Phil Evans wrote:


What do you count as raw data? Rawest are the images - everything
beyond that is modellling - but archiving images is _expensive_!
Unmerged intensities are probably more manageable

Phil


On  16 Aug 2007, at 15:05, Ashley Buckle wrote:


Dear Randy

These are very valid points, and I'm so glad you've taken the
important step of initiating this. For now I'd like to respond to
one of them, as it concerns something I and colleagues in
Australia are doing:


The more information that is available, the easier it will be to
detect fabrication (because it is harder to make up more
information convincingly). For instance, if the diffraction data
are deposited, we can check for consistency with the known
properties of real macromolecular crystals, e.g. that they
contain disordered solvent and not vacuum. As Tassos Perrakis
has discovered, there are characteristic ways in which the
standard deviations depend on the intensities and the
resolution. If unmerged data are deposited, there will probably
be evidence of radiation damage, weak effects from intrinsic
anomalous scatterers, etc. Raw images are probably even harder
to simulate convincingly.


After the recent Science retractions we realised that its about
time raw data was made available. So, we have set about creating
the necessary IT and software to do this for our diffraction
data, and are encouraging Australian colleagues to do the same.
We are about a week away from launching a web-accessible
repository for our recently published (eg deposited in PDB) data,
and this should coincide with an upcoming publication describing
a new structure from our labs. The aim is that publication occurs
simultaneously with release in PDB as well as raw diffraction
data on our website. We hope to house as much of our data as
possible, as well as data from other Australian labs, but
obviously the potential dataset will be huge, so we are trying to
develop, and make available freely to the community, software
tools that allow others to easily setup their own repositories.
After brief discussion with PDB the plan is that PDB include
links from coordinates/SF's to the raw data using a simple handle
that can be incorporated into a URL.  We would hope that we can
convince the journals that raw data must be made available at the
time of publication, in the same way as coordinates and structure
factors.  Of course, we realise that there will be many hurdles
along the way but we are convinced that simply making the raw
data available ASAP is a 'good thing'.

We are happy to share more details of our IT plans with the
CCP4BB, such that they can be improved, and look forward to
hearing feedback

cheers



 
--

--
Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353



- 
---


Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Clemens Vonrhein
On Thu, Aug 16, 2007 at 03:13:29PM +0100, Phil Evans wrote:
> What do you count as raw data? Rawest are the images - everything  
> beyond that is modellling - but archiving images is _expensive_!  

Hmmm - not sure: let's say that a typical dataset requires about 180
images with 10Mb each image. With the current amount of roughly 4
X-ray structures in the PDB this is:

  4 * 180 * 10Mb = ~ 70 Tb of data

With simple 1TB external disk at about GBP 200 we get a price of GBP
14000, i.e. 35 pence per dataset.

Ok, this is not a proper calculation (more data collected, fine-phi
slicing, MAD datasets etc etc) and lets apply a 'safety factor' of 10:
but even then I think this is easily doable.

As Tassos remarked as well: if we could store/deposit and manage PDB
files in the 70s we should be able to do the same now (30 years
later!) with images ... easily.

Cheers

Clemens

> Unmerged intensities are probably more manageable
> 
> Phil
> 
> 
> On  16 Aug 2007, at 15:05, Ashley Buckle wrote:
> 
> >Dear Randy
> >
> >These are very valid points, and I'm so glad you've taken the  
> >important step of initiating this. For now I'd like to respond to  
> >one of them, as it concerns something I and colleagues in Australia  
> >are doing:
> >>
> >>The more information that is available, the easier it will be to  
> >>detect fabrication (because it is harder to make up more  
> >>information convincingly). For instance, if the diffraction data  
> >>are deposited, we can check for consistency with the known  
> >>properties of real macromolecular crystals, e.g. that they contain  
> >>disordered solvent and not vacuum. As Tassos Perrakis has  
> >>discovered, there are characteristic ways in which the standard  
> >>deviations depend on the intensities and the resolution. If  
> >>unmerged data are deposited, there will probably be evidence of  
> >>radiation damage, weak effects from intrinsic anomalous  
> >>scatterers, etc. Raw images are probably even harder to simulate  
> >>convincingly.
> >
> >After the recent Science retractions we realised that its about  
> >time raw data was made available. So, we have set about creating  
> >the necessary IT and software to do this for our diffraction data,  
> >and are encouraging Australian colleagues to do the same. We are  
> >about a week away from launching a web-accessible repository for  
> >our recently published (eg deposited in PDB) data, and this should  
> >coincide with an upcoming publication describing a new structure  
> >from our labs. The aim is that publication occurs simultaneously  
> >with release in PDB as well as raw diffraction data on our website.  
> >We hope to house as much of our data as possible, as well as data  
> >from other Australian labs, but obviously the potential dataset  
> >will be huge, so we are trying to develop, and make available  
> >freely to the community, software tools that allow others to easily  
> >setup their own repositories.  After brief discussion with PDB the  
> >plan is that PDB include links from coordinates/SF's to the raw  
> >data using a simple handle that can be incorporated into a URL.  We  
> >would hope that we can convince the journals that raw data must be  
> >made available at the time of publication, in the same way as  
> >coordinates and structure factors.  Of course, we realise that  
> >there will be many hurdles along the way but we are convinced that  
> >simply making the raw data available ASAP is a 'good thing'.
> >
> >We are happy to share more details of our IT plans with the CCP4BB,  
> >such that they can be improved, and look forward to hearing feedback
> >
> >cheers
> 

-- 

***
* Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com
*
*  Global Phasing Ltd.
*  Sheraton House, Castle Park 
*  Cambridge CB3 0AX, UK
*--
* BUSTER Development Group  (http://www.globalphasing.com)
***


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Anastassis Perrakis


On Aug 16, 2007, at 15:22, Randy J. Read wrote:


Raw images are probably even harder to simulate convincingly.


If i was to fabricate a structure, I would get first 'Fobs', then  
expand, then get the images
(I am sure one can hack 'strategy' or 'predict' or even 'mosflm' to  
tell you in which image every reflection is)
and then add noise in the images themselves. The process the images  
and go on from there ;-)


The thing that is certainly stopping me is that its much more  
difficult to do that, than solving the structure ...
but it would admittedly be quite some fun doing it right if one would  
ignore the tiny issue of the ethical side of such activity.


About archiving images, I have a feeling that the cost per Gb is the  
same as it was for structure factors in early 90's.


Last but not least, some EDS data mining we did here, agrees with  
Randy: very very few other structures, if any, appear to have
really strange statistics in the subset of the PDB with structure  
factors (aka EDS...). That is a relief.


As for the Nature debate, I am only disappointed and confused by one  
thing: Randy et al, ask for the images, like
one can ask for the dated logbook, in any other scientific  
discipline. For me that qualifies only two reactions from the group  
of Murthy:


1. Make the images available and demand a public apology for spoiling  
their name.

2. Shut up, retract the paper, buy property in Alaska and disappear.

The mumbo jumbo of the reply is so tragically irrelevant that I fail  
to understand how Nature tolerated it.


Tassos

PS the algorithm for the calculation of the sigmas (assuming they  
were calculated) does not look that naive actually.
Far from a simple linear relationship. They put some thought on it,  
but lets say that if you want to apply a 2D function

to simulate noise, don't do it along the principle axes ;-)




Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Mark J. van Raaij

Dear all,

With regards to the possible "fabrication" of the 2hr0 structure, why  
would the authors have deposited the structure factors if this is not  
required by the journal? Also, why would they have "fabricated" a  
structure with gaps along c if they could have done so without the gap?


I few years ago, I had to cope with two structures with gaps along c,  
pdb codes 1h6w and 1ocy those of you who are interested, structure  
factors are available from the pdb, unmerged intensities/raw images I  
will look for and provide if requested...


Without further evidence, I suspect their structure is real, perhaps  
not optimally refined and treated though, but then again, this seems  
commonplace in "Nature" structures, perhaps due to lack of time/ 
experience and, in some cases, putting too much pressure on the PhD  
students/postdocs involved instead of mentoring and checking them. I  
hope the authors provide the raw diffraction images to dispel any  
doubts and would be curious to learn about the other structures of  
the same group - anyone has a comprehensive, annotated list of them?


Greetings,

Mark J. van Raaij
Unidad de Bioquímica Estructural
Dpto de Bioquímica, Facultad de Farmacia
and
Unidad de Rayos X, Edificio CACTUS
Universidad de Santiago
15782 Santiago de Compostela
Spain
http://web.usc.es/~vanraaij/


On 16 Aug 2007, at 15:22, Randy J. Read wrote:


On Aug 16 2007, Eleanor Dodson wrote:

The weighting in REFMAC is a function of SigmA ( plotted in log  
file).
For this example it will be nearly 1 for all resolutions ranges so  
the weights are pretty constant. There is also a contribution from  
the "experimental" sigma, which in this case seems to be  
proportional to |F|


Originally I expected that the publication of our Brief  
Communication in Nature would stimulate a lot of discussion on the  
bulletin board, but clearly it hasn't. One reason is probably that  
we couldn't be as forthright as we wished to be. For its own good  
reasons, Nature did not allow us to use the word "fabricated". Nor  
were we allowed to discuss other structures from the same group, if  
they weren't published in Nature.


Another reason is an understandable reluctance to make allegations  
in public, and the CCP4 bulletin board probably isn't the best  
place to do that.


But I think the case raises essential topics for the community to  
discuss, and this is a good forum for those discussions. We need to  
consider how to ensure the integrity of the structural databases  
and the associated publications.


So here are some questions to start a discussion, with some  
suggestions of partial answers.


1. How many structures in the PDB are fabricated?

I don't know, but I think (or at least hope) that the number is  
very small.


2. How easy is it to fabricate a structure?

It's very easy, if no-one will be examining it with a suspicious  
mind, but it's extremely difficult to do well. No matter how well a  
structure is fabricated, it will violate something that is known  
now or learned later about the properties of real macromolecules  
and their diffraction data. If you're clever enough to do this  
really well, then you should be clever enough to determine the real  
structure of an interesting protein.


3. How can we tell whether structures in the PDB are fabricated, or  
just poorly refined?


The current standard validation tools are aimed at detecting errors  
in structure determination or the effects of poor refinement  
practice. None of them are aimed at detecting specific signs of  
fabrication because we assume (almost always correctly) that others  
are acting in good faith.


The more information that is available, the easier it will be to  
detect fabrication (because it is harder to make up more  
information convincingly). For instance, if the diffraction data  
are deposited, we can check for consistency with the known  
properties of real macromolecular crystals, e.g. that they contain  
disordered solvent and not vacuum. As Tassos Perrakis has  
discovered, there are characteristic ways in which the standard  
deviations depend on the intensities and the resolution. If  
unmerged data are deposited, there will probably be evidence of  
radiation damage, weak effects from intrinsic anomalous scatterers,  
etc. Raw images are probably even harder to simulate convincingly.


If a structure is fabricated by making up a new crystal form,  
perhaps a complex of previously-known components, then the crystal  
packing interactions should look like the interactions seen in real  
crystals. If it's fabricated by homology modelling, then the  
internal packing is likely to be suboptimal. I'm told by David  
Baker (who knows a thing or two about this) that it is extremely  
difficult to make a homology model that both obeys what we know  
about torsion angle preferences and is packed as well as a real  
protein structure.


I'm very interested in hearing about new ideas along these lines.  

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Green, Todd
Hello all,

I started to write a response to this thread yesterday. I thought the title was 
great even the content of Eleanor's email was very helpful. What I didn't like 
was the indictment in the next to last paragraph. This has been followed up 
with the word fabrication by others. No one knows definitively if this was 
fabricated. You have your suspicions, but you don't "know." Fabrication 
suggests malicious wrong-doing. I actually don't think this was the case. I'm 
probably a bit biased because the work comes from an office down the hall from 
my own. I'd like to think that if the structure is wrong that it could be 
chalked up to inexperience rather than malice. To me, this scenario of 
inexperience seems like one that could become more and more prevalent as our 
field opens up to more and more scientists doing structural work who are not 
dedicated crystallographers.

Having said that, I think Eleanor started an extremely useful thread as a way 
of avoiding the pitfalls of crystallography whether you are a novice or an 
expert. There's no question that this board is the best way to advance one's 
knowledge of crystallography. I actually gave a homework assignment that was 
simply to sign up for the ccp4bb. 

In reference to the previously mentioned work, I'd also like to hear discussion 
concurring or not the response letter some of which seems plausible to me.

I hope I don't ruffle anyones feathers by my email, but I just thought that it 
should be said.

Cheers-
Todd


-Original Message-
From: CCP4 bulletin board on behalf of Randy J. Read
Sent: Thu 8/16/2007 8:22 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] The importance of USING our validation tools
 
On Aug 16 2007, Eleanor Dodson wrote:

>The weighting in REFMAC is a function of SigmA ( plotted in log file).
>For this example it will be nearly 1 for all resolutions ranges so the 
>weights are pretty constant. There is also a contribution from the 
>"experimental" sigma, which in this case seems to be proportional to |F|

Originally I expected that the publication of our Brief Communication in 
Nature would stimulate a lot of discussion on the bulletin board, but 
clearly it hasn't. One reason is probably that we couldn't be as forthright 
as we wished to be. For its own good reasons, Nature did not allow us to 
use the word "fabricated". Nor were we allowed to discuss other structures 
from the same group, if they weren't published in Nature.



[ccp4bb] Problem with VDW restraints in Refmac

2007-08-16 Thread Alastair McEwen

Dear all,

I am refining a structure with a partially 
occupied ligand. The binding site contains a 
Glutamine residue with a dual conformation with 
the ‘B’ conformation overlapping with the ligand. 
I have named the ligand ‘ALIG’ but when refining, 
Refmac notes a number of VDW deviations and the 
ligand and side-chain are moved away from each 
other and out of the density. When I refine 
without VDW restraints the ligand and side-chain 
refine well in the density but I would like to be 
able to use these restraints for the rest of the 
structure. Is it possible to have Refmac ignore 
these restraints just for this ligand and residue?


Many thanks,

Alastair



Dr. Alastair McEwen
Département de Biologie et Génomique Structurales
IGBMC, 1 rue Laurent Fries, BP10142
67404 ILLKIRCH, France
Tel:  +33 (0)3 88 65 57 73
Fax: +33 (0)3 88 65 32 76
email: [EMAIL PROTECTED]


[ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Santarsiero, Bernard D.
Sorry, I think it's a waste of resources to store the raw images. I think
we should trust people to be able to at least process their own data set.
Besides, you would need to include beamline parameters, beam position,
detector distances, etc. that may or may not be correct in the image
headers. I'm all for storage and retrieval of a primary intensity data
file (I or F^2 with esds).

Bernie Santarsiero


On Thu, August 16, 2007 9:46 am, Mischa Machius wrote:
> Hmm - I think I miscalculated, by a factor of 100 even!... need more
> coffee. In any case, I still think it would be doable. Best - MM
>
>
> On Aug 16, 2007, at 9:30 AM, Mischa Machius wrote:
>
>> I don't think archiving images would be that expensive. For one, I
>> have found that most formats can be compressed quite substantially
>> using simple, standard procedures like bzip2. If optimized, raw
>> images won't take up that much space. Also, initially, only those
>> images that have been used to obtain phases and to refine finally
>> deposited structures could be archived. If the average structure
>> takes up 20GB of space, 5,000 structures would be 1TB, which fits
>> on a single hard drive for less than $400. If the community thinks
>> this is a worthwhile endeavor, money should be available from
>> granting agencies to establish a central repository (e.g., at the
>> RCSB). Imagine what could be done with as little as $50,000. For
>> large detectors, binning could be used, but giving current hard
>> drive prices and future developments, that won't be necessary. Best
>> - MM
>>
>>
>> On Aug 16, 2007, at 9:13 AM, Phil Evans wrote:
>>
>>> What do you count as raw data? Rawest are the images - everything
>>> beyond that is modellling - but archiving images is _expensive_!
>>> Unmerged intensities are probably more manageable
>>>
>>> Phil
>>>
>>>
>>> On  16 Aug 2007, at 15:05, Ashley Buckle wrote:
>>>
 Dear Randy

 These are very valid points, and I'm so glad you've taken the
 important step of initiating this. For now I'd like to respond to
 one of them, as it concerns something I and colleagues in
 Australia are doing:
>
> The more information that is available, the easier it will be to
> detect fabrication (because it is harder to make up more
> information convincingly). For instance, if the diffraction data
> are deposited, we can check for consistency with the known
> properties of real macromolecular crystals, e.g. that they
> contain disordered solvent and not vacuum. As Tassos Perrakis
> has discovered, there are characteristic ways in which the
> standard deviations depend on the intensities and the
> resolution. If unmerged data are deposited, there will probably
> be evidence of radiation damage, weak effects from intrinsic
> anomalous scatterers, etc. Raw images are probably even harder
> to simulate convincingly.

 After the recent Science retractions we realised that its about
 time raw data was made available. So, we have set about creating
 the necessary IT and software to do this for our diffraction
 data, and are encouraging Australian colleagues to do the same.
 We are about a week away from launching a web-accessible
 repository for our recently published (eg deposited in PDB) data,
 and this should coincide with an upcoming publication describing
 a new structure from our labs. The aim is that publication occurs
 simultaneously with release in PDB as well as raw diffraction
 data on our website. We hope to house as much of our data as
 possible, as well as data from other Australian labs, but
 obviously the potential dataset will be huge, so we are trying to
 develop, and make available freely to the community, software
 tools that allow others to easily setup their own repositories.
 After brief discussion with PDB the plan is that PDB include
 links from coordinates/SF's to the raw data using a simple handle
 that can be incorporated into a URL.  We would hope that we can
 convince the journals that raw data must be made available at the
 time of publication, in the same way as coordinates and structure
 factors.  Of course, we realise that there will be many hurdles
 along the way but we are convinced that simply making the raw
 data available ASAP is a 'good thing'.

 We are happy to share more details of our IT plans with the
 CCP4BB, such that they can be improved, and look forward to
 hearing feedback

 cheers
>>
>>
>> --
>> --
>> Mischa Machius, PhD
>> Associate Professor
>> UT Southwestern Medical Center at Dallas
>> 5323 Harry Hines Blvd.; ND10.214A
>> Dallas, TX 75390-8816; U.S.A.
>> Tel: +1 214 645 6381
>> Fax: +1 214 645 6353
>
>
> 
> 
> Mischa Machius, PhD
> Associate Profes

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Mischa Machius
Hmm - I think I miscalculated, by a factor of 100 even!... need more  
coffee. In any case, I still think it would be doable. Best - MM



On Aug 16, 2007, at 9:30 AM, Mischa Machius wrote:

I don't think archiving images would be that expensive. For one, I  
have found that most formats can be compressed quite substantially  
using simple, standard procedures like bzip2. If optimized, raw  
images won't take up that much space. Also, initially, only those  
images that have been used to obtain phases and to refine finally  
deposited structures could be archived. If the average structure  
takes up 20GB of space, 5,000 structures would be 1TB, which fits  
on a single hard drive for less than $400. If the community thinks  
this is a worthwhile endeavor, money should be available from  
granting agencies to establish a central repository (e.g., at the  
RCSB). Imagine what could be done with as little as $50,000. For  
large detectors, binning could be used, but giving current hard  
drive prices and future developments, that won't be necessary. Best  
- MM



On Aug 16, 2007, at 9:13 AM, Phil Evans wrote:

What do you count as raw data? Rawest are the images - everything  
beyond that is modellling - but archiving images is _expensive_!  
Unmerged intensities are probably more manageable


Phil


On  16 Aug 2007, at 15:05, Ashley Buckle wrote:


Dear Randy

These are very valid points, and I'm so glad you've taken the  
important step of initiating this. For now I'd like to respond to  
one of them, as it concerns something I and colleagues in  
Australia are doing:


The more information that is available, the easier it will be to  
detect fabrication (because it is harder to make up more  
information convincingly). For instance, if the diffraction data  
are deposited, we can check for consistency with the known  
properties of real macromolecular crystals, e.g. that they  
contain disordered solvent and not vacuum. As Tassos Perrakis  
has discovered, there are characteristic ways in which the  
standard deviations depend on the intensities and the  
resolution. If unmerged data are deposited, there will probably  
be evidence of radiation damage, weak effects from intrinsic  
anomalous scatterers, etc. Raw images are probably even harder  
to simulate convincingly.


After the recent Science retractions we realised that its about  
time raw data was made available. So, we have set about creating  
the necessary IT and software to do this for our diffraction  
data, and are encouraging Australian colleagues to do the same.  
We are about a week away from launching a web-accessible  
repository for our recently published (eg deposited in PDB) data,  
and this should coincide with an upcoming publication describing  
a new structure from our labs. The aim is that publication occurs  
simultaneously with release in PDB as well as raw diffraction  
data on our website. We hope to house as much of our data as  
possible, as well as data from other Australian labs, but  
obviously the potential dataset will be huge, so we are trying to  
develop, and make available freely to the community, software  
tools that allow others to easily setup their own repositories.   
After brief discussion with PDB the plan is that PDB include  
links from coordinates/SF's to the raw data using a simple handle  
that can be incorporated into a URL.  We would hope that we can  
convince the journals that raw data must be made available at the  
time of publication, in the same way as coordinates and structure  
factors.  Of course, we realise that there will be many hurdles  
along the way but we are convinced that simply making the raw  
data available ASAP is a 'good thing'.


We are happy to share more details of our IT plans with the  
CCP4BB, such that they can be improved, and look forward to  
hearing feedback


cheers



-- 
--

Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353



 


Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Mischa Machius
I don't think archiving images would be that expensive. For one, I  
have found that most formats can be compressed quite substantially  
using simple, standard procedures like bzip2. If optimized, raw  
images won't take up that much space. Also, initially, only those  
images that have been used to obtain phases and to refine finally  
deposited structures could be archived. If the average structure  
takes up 20GB of space, 5,000 structures would be 1TB, which fits on  
a single hard drive for less than $400. If the community thinks this  
is a worthwhile endeavor, money should be available from granting  
agencies to establish a central repository (e.g., at the RCSB).  
Imagine what could be done with as little as $50,000. For large  
detectors, binning could be used, but giving current hard drive  
prices and future developments, that won't be necessary. Best - MM



On Aug 16, 2007, at 9:13 AM, Phil Evans wrote:

What do you count as raw data? Rawest are the images - everything  
beyond that is modellling - but archiving images is _expensive_!  
Unmerged intensities are probably more manageable


Phil


On  16 Aug 2007, at 15:05, Ashley Buckle wrote:


Dear Randy

These are very valid points, and I'm so glad you've taken the  
important step of initiating this. For now I'd like to respond to  
one of them, as it concerns something I and colleagues in  
Australia are doing:


The more information that is available, the easier it will be to  
detect fabrication (because it is harder to make up more  
information convincingly). For instance, if the diffraction data  
are deposited, we can check for consistency with the known  
properties of real macromolecular crystals, e.g. that they  
contain disordered solvent and not vacuum. As Tassos Perrakis has  
discovered, there are characteristic ways in which the standard  
deviations depend on the intensities and the resolution. If  
unmerged data are deposited, there will probably be evidence of  
radiation damage, weak effects from intrinsic anomalous  
scatterers, etc. Raw images are probably even harder to simulate  
convincingly.


After the recent Science retractions we realised that its about  
time raw data was made available. So, we have set about creating  
the necessary IT and software to do this for our diffraction data,  
and are encouraging Australian colleagues to do the same. We are  
about a week away from launching a web-accessible repository for  
our recently published (eg deposited in PDB) data, and this should  
coincide with an upcoming publication describing a new structure  
from our labs. The aim is that publication occurs simultaneously  
with release in PDB as well as raw diffraction data on our  
website. We hope to house as much of our data as possible, as well  
as data from other Australian labs, but obviously the potential  
dataset will be huge, so we are trying to develop, and make  
available freely to the community, software tools that allow  
others to easily setup their own repositories.  After brief  
discussion with PDB the plan is that PDB include links from  
coordinates/SF's to the raw data using a simple handle that can be  
incorporated into a URL.  We would hope that we can convince the  
journals that raw data must be made available at the time of  
publication, in the same way as coordinates and structure  
factors.  Of course, we realise that there will be many hurdles  
along the way but we are convinced that simply making the raw data  
available ASAP is a 'good thing'.


We are happy to share more details of our IT plans with the  
CCP4BB, such that they can be improved, and look forward to  
hearing feedback


cheers



 


Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Ashley Buckle
By raw data I mean images. We think this is only manageable using a  
distributed data grid model (eg Universities/institutions setup their  
own repositories using open standards, and PDB aggregate the links to  
them. URL persistence will be a hurdle I admit). You are right in  
that a single-repository solution would be impractical.  We would  
hope that the PDB could store the unmerged intensities.

cheers
ashley

On 17/08/2007, at 12:13 AM, Phil Evans wrote:

What do you count as raw data? Rawest are the images - everything  
beyond that is modellling - but archiving images is _expensive_!  
Unmerged intensities are probably more manageable


Phil


On  16 Aug 2007, at 15:05, Ashley Buckle wrote:


Dear Randy

These are very valid points, and I'm so glad you've taken the  
important step of initiating this. For now I'd like to respond to  
one of them, as it concerns something I and colleagues in  
Australia are doing:


The more information that is available, the easier it will be to  
detect fabrication (because it is harder to make up more  
information convincingly). For instance, if the diffraction data  
are deposited, we can check for consistency with the known  
properties of real macromolecular crystals, e.g. that they  
contain disordered solvent and not vacuum. As Tassos Perrakis has  
discovered, there are characteristic ways in which the standard  
deviations depend on the intensities and the resolution. If  
unmerged data are deposited, there will probably be evidence of  
radiation damage, weak effects from intrinsic anomalous  
scatterers, etc. Raw images are probably even harder to simulate  
convincingly.


After the recent Science retractions we realised that its about  
time raw data was made available. So, we have set about creating  
the necessary IT and software to do this for our diffraction data,  
and are encouraging Australian colleagues to do the same. We are  
about a week away from launching a web-accessible repository for  
our recently published (eg deposited in PDB) data, and this should  
coincide with an upcoming publication describing a new structure  
from our labs. The aim is that publication occurs simultaneously  
with release in PDB as well as raw diffraction data on our  
website. We hope to house as much of our data as possible, as well  
as data from other Australian labs, but obviously the potential  
dataset will be huge, so we are trying to develop, and make  
available freely to the community, software tools that allow  
others to easily setup their own repositories.  After brief  
discussion with PDB the plan is that PDB include links from  
coordinates/SF's to the raw data using a simple handle that can be  
incorporated into a URL.  We would hope that we can convince the  
journals that raw data must be made available at the time of  
publication, in the same way as coordinates and structure  
factors.  Of course, we realise that there will be many hurdles  
along the way but we are convinced that simply making the raw data  
available ASAP is a 'good thing'.


We are happy to share more details of our IT plans with the  
CCP4BB, such that they can be improved, and look forward to  
hearing feedback


cheers


*NOTE* My new tel. no: (03) 9902 0269

Ashley Buckle Ph.D
NHMRC Senior Research Fellow
The Department of Biochemistry and Molecular Biology
School of Biomedical Sciences, Faculty of Medicine &
Victorian Bioinformatics Consortium (VBC)
Monash University, Clayton, Vic 3800
Australia

http://www.med.monash.edu.au/biochem/staff/abuckle.html
iChat/AIM: blindcaptaincat
skype: ashley.buckle
Tel: (613) 9902 0269 (office)
Tel: (613) 9905 1653 (lab)

Fax : (613) 9905 4699





Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Phil Evans
What do you count as raw data? Rawest are the images - everything  
beyond that is modellling - but archiving images is _expensive_!  
Unmerged intensities are probably more manageable


Phil


On  16 Aug 2007, at 15:05, Ashley Buckle wrote:


Dear Randy

These are very valid points, and I'm so glad you've taken the  
important step of initiating this. For now I'd like to respond to  
one of them, as it concerns something I and colleagues in Australia  
are doing:


The more information that is available, the easier it will be to  
detect fabrication (because it is harder to make up more  
information convincingly). For instance, if the diffraction data  
are deposited, we can check for consistency with the known  
properties of real macromolecular crystals, e.g. that they contain  
disordered solvent and not vacuum. As Tassos Perrakis has  
discovered, there are characteristic ways in which the standard  
deviations depend on the intensities and the resolution. If  
unmerged data are deposited, there will probably be evidence of  
radiation damage, weak effects from intrinsic anomalous  
scatterers, etc. Raw images are probably even harder to simulate  
convincingly.


After the recent Science retractions we realised that its about  
time raw data was made available. So, we have set about creating  
the necessary IT and software to do this for our diffraction data,  
and are encouraging Australian colleagues to do the same. We are  
about a week away from launching a web-accessible repository for  
our recently published (eg deposited in PDB) data, and this should  
coincide with an upcoming publication describing a new structure  
from our labs. The aim is that publication occurs simultaneously  
with release in PDB as well as raw diffraction data on our website.  
We hope to house as much of our data as possible, as well as data  
from other Australian labs, but obviously the potential dataset  
will be huge, so we are trying to develop, and make available  
freely to the community, software tools that allow others to easily  
setup their own repositories.  After brief discussion with PDB the  
plan is that PDB include links from coordinates/SF's to the raw  
data using a simple handle that can be incorporated into a URL.  We  
would hope that we can convince the journals that raw data must be  
made available at the time of publication, in the same way as  
coordinates and structure factors.  Of course, we realise that  
there will be many hurdles along the way but we are convinced that  
simply making the raw data available ASAP is a 'good thing'.


We are happy to share more details of our IT plans with the CCP4BB,  
such that they can be improved, and look forward to hearing feedback


cheers


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Ashley Buckle

Dear Randy

These are very valid points, and I'm so glad you've taken the  
important step of initiating this. For now I'd like to respond to one  
of them, as it concerns something I and colleagues in Australia are  
doing:


The more information that is available, the easier it will be to  
detect fabrication (because it is harder to make up more  
information convincingly). For instance, if the diffraction data  
are deposited, we can check for consistency with the known  
properties of real macromolecular crystals, e.g. that they contain  
disordered solvent and not vacuum. As Tassos Perrakis has  
discovered, there are characteristic ways in which the standard  
deviations depend on the intensities and the resolution. If  
unmerged data are deposited, there will probably be evidence of  
radiation damage, weak effects from intrinsic anomalous scatterers,  
etc. Raw images are probably even harder to simulate convincingly.


After the recent Science retractions we realised that its about time  
raw data was made available. So, we have set about creating the  
necessary IT and software to do this for our diffraction data, and  
are encouraging Australian colleagues to do the same. We are about a  
week away from launching a web-accessible repository for our recently  
published (eg deposited in PDB) data, and this should coincide with  
an upcoming publication describing a new structure from our labs. The  
aim is that publication occurs simultaneously with release in PDB as  
well as raw diffraction data on our website. We hope to house as much  
of our data as possible, as well as data from other Australian labs,  
but obviously the potential dataset will be huge, so we are trying to  
develop, and make available freely to the community, software tools  
that allow others to easily setup their own repositories.  After  
brief discussion with PDB the plan is that PDB include links from  
coordinates/SF's to the raw data using a simple handle that can be  
incorporated into a URL.  We would hope that we can convince the  
journals that raw data must be made available at the time of  
publication, in the same way as coordinates and structure factors.   
Of course, we realise that there will be many hurdles along the way  
but we are convinced that simply making the raw data available ASAP  
is a 'good thing'.


We are happy to share more details of our IT plans with the CCP4BB,  
such that they can be improved, and look forward to hearing feedback


cheers
Ashley Buckle and James Whisstock







If a structure is fabricated by making up a new crystal form,  
perhaps a complex of previously-known components, then the crystal  
packing interactions should look like the interactions seen in real  
crystals. If it's fabricated by homology modelling, then the  
internal packing is likely to be suboptimal. I'm told by David  
Baker (who knows a thing or two about this) that it is extremely  
difficult to make a homology model that both obeys what we know  
about torsion angle preferences and is packed as well as a real  
protein structure.


I'm very interested in hearing about new ideas along these lines.  
The wwPDB has agreed to sponsor a workshop next year where we will  
propose and test new validation criteria.


4. If new validation criteria are applied at the PDB, won't someone  
who wants to fabricate a structure just keep improving their  
fabricated model until it passes all the tests?


That's a possibility, but I think the deterrence effect of knowing  
that there are measures to detect fabrication will outweigh this.  
And it isn't enough for a fabricated structure to pass today's  
tests; it has to pass all the new tests devised for the rest of the  
person's life, or at least their career.


5. What should we do if tests suggest that a structure may be  
fabricated?


I think we need to be extremely careful. Conclusions should not be  
drawn on the basis of a few numbers. The tests can just point up  
which structures should be examined closely. Close examination  
would then involve less automated criteria, such as whether the  
structure agrees with all the biochemical data about the system. As  
in the process followed by Nature, you also have to start by giving  
the people who deposited the structure an opportunity to explain  
the anomalies.


Randy Read


*NOTE* My new tel. no: (03) 9902 0269

Ashley Buckle Ph.D
NHMRC Senior Research Fellow
The Department of Biochemistry and Molecular Biology
School of Biomedical Sciences, Faculty of Medicine &
Victorian Bioinformatics Consortium (VBC)
Monash University, Clayton, Vic 3800
Australia

http://www.med.monash.edu.au/biochem/staff/abuckle.html
iChat/AIM: blindcaptaincat
skype: ashley.buckle
Tel: (613) 9902 0269 (office)
Tel: (613) 9905 1653 (lab)

Fax : (613) 9905 4699





[ccp4bb] NSCM keyword in MolRep

2007-08-16 Thread Cynthia Czyrphony
Dear all,
I've got a question regarding NSCM keyword usage in MolRep. If I am
searching for a trimer in the ASU, but I use a monomer as model, should I
put NSCM=3 or =1?
Thanks,

Cynthia.


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Eleanor Dodson

I believe that is so.
In this case the Rfactor against the deposited data is low. The question 
to be addressed is whether the deposited data is of acceptable quality.


There are some poor distances but not many - the asymmetric unit is very 
empty.
The Ramachandran plot is not good, and an author would be queried about 
that. However you can choose to ignore their warnings.


Eleanor
Gina Clayton wrote:
I thought that when a structure is deposited the databank does run its 
own
refinement validation and geometry checks and gives you back what it 
finds i.e

distance problems etc and rfactor?


Quoting Eleanor Dodson <[EMAIL PROTECTED]>:


The weighting in REFMAC is a function of SigmA ( plotted in log file).
For this example it will be nearly 1 for all resolutions ranges so 
the weights are pretty constant. There is also a contribution from 
the "experimental" sigma, which in this case seems to be proportional 
to |F|


Yesterday I attached the wrong TRUNCATE log file - here is the 
correct one, and if you look at the plot
"Amplitude Analysis against resolution" it also includes a plot of 
 


Eleanor

Dominika Borek wrote:
There are many more interesting things about this structure - 
obvious fake - refined against fabricated data.


After running refmac I have noticed discrepancies between R and 
weighted R-factors. However, I do not know how the weights are 
calculated and applied - it could maybe help to find out how these 
data were created. Could you help?


M(4SSQ/LL) NR_used %_obs M(Fo_used) M(Fc_used) Rf_used WR_used
NR_free M(Fo_free) M(Fc_free) Rf_free   WR_free $$
$$
 0.0052205  98.77  3800.5  3687.2  0.12  0.30 121 4133.9  
4042.7  0.12  0.28
 0.0153952  99.90  1932.9  1858.7  0.20  0.60 197 2010.5  
1880.5  0.21  0.40
 0.0255026  99.81  1577.9  1512.3  0.23  0.62 283 1565.0  
1484.6  0.26  0.54
 0.0345988  99.76  1598.0  1541.5  0.23  0.61 307 1625.7  
1555.6  0.23  0.42
 0.0446751  99.79  1521.2  1481.6  0.18  0.41 338 1550.3  
1523.8  0.18  0.61
 0.0547469  99.81  1314.5  1291.2  0.14  0.29 391 1348.3  
1337.7  0.15  0.27
 0.0648078  99.87  .5  1089.1  0.16  0.36 465 1096.1  
1077.9  0.18  0.42
 0.0738642  99.84   976.7   959.2  0.15  0.32 488  995.3   
988.4  0.16  0.50
 0.0839255  99.88   866.4   848.0  0.16  0.36 490  856.8   
846.0  0.17  0.38
 0.0939778  99.88   747.6   731.4  0.16  0.36 515  772.8   
747.3  0.18  0.38
 0.103   10225  99.86   662.6   649.1  0.17  0.38 547  658.9   
643.6  0.20  0.36
 0.113   10768  99.83   597.2   584.7  0.18  0.42 538  593.4   
590.0  0.20  0.49
 0.122   11121  99.86   535.5   521.9  0.19  0.48 607  556.2   
542.0  0.20  0.47
 0.132   11692  99.85   489.3   479.2  0.19  0.46 607  476.4   
467.3  0.23  0.42
 0.142   11999  99.83   453.9   443.1  0.19  0.48 621  455.3   
440.6  0.22  0.55
 0.152   12463  99.79   419.2   407.3  0.19  0.44 655  435.3   
424.3  0.22  0.53
 0.162   12885  99.78   384.0   373.9  0.20  0.53 632  384.1   
376.1  0.22  0.43
 0.171   12698  95.96   357.2   348.5  0.21  0.57 686  353.9   
338.6  0.24  0.51
 0.181   11926  87.78   332.0   323.3  0.21  0.66 590  333.4   
322.6  0.24  0.57
 0.191   11204  80.39   309.9   299.6  0.22  0.59 600  302.1   
296.3  0.26  0.77

$$




Eleanor Dodson wrote:
There is a correspondence in last weeks Nature commenting on the 
disparities between  three C3B structures. These are:
2icf   solved at 4.0A resolution, 2i07 at 4.1A resolution, and 2hr0 
at 2.26A resolution.


The A chains of all 3 structures agree closely, with each other and 
other deposited structures.
The B chains of 2icf and 2i07 are in reasonable agreement, but 
there are enormous differences to the B chain of 2hr0.
This structure is surprisingly out of step, and by many criteria 
likely to be wrong.


There has been many articles written on validation and it seems 
worth reminding crystallographers

of  some of tests which make 2hr0 suspect.

1) The cell content analysis suggests there is 80% solvent in the 
asymmetric unit.

Such crystals have been observed but they rarely diffract to 2.26A.

2) Data Analysis:
The reflection data has been deposited so it can be analysed.
The plots provided by TRUNCATE showing intensity statistic features 
are not compatible with such a high solvent ratio.   They are too 
perfect; the moments are perfectly linear, unlikely with such large 
volumes of the crystal containing solvent, and there is absolutely 
no evidence of anisotropy, again unlikely with high solvent content.


3)  Structure analysis
a) The Ramachandran plot is very poor ( 84% allowed) with many 
residues in disallowed regions.
b) The distribution of residue B values is quite unrealistic. There 
is a very low spread,  which is most unusual for a structure with 
long stretches of exposed chain.  The baverage log file is attached.


c) There does not seem to be enough contacts to maintain the 
crystalline st

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Randy J. Read

On Aug 16 2007, Eleanor Dodson wrote:


The weighting in REFMAC is a function of SigmA ( plotted in log file).
For this example it will be nearly 1 for all resolutions ranges so the 
weights are pretty constant. There is also a contribution from the 
"experimental" sigma, which in this case seems to be proportional to |F|


Originally I expected that the publication of our Brief Communication in 
Nature would stimulate a lot of discussion on the bulletin board, but 
clearly it hasn't. One reason is probably that we couldn't be as forthright 
as we wished to be. For its own good reasons, Nature did not allow us to 
use the word "fabricated". Nor were we allowed to discuss other structures 
from the same group, if they weren't published in Nature.


Another reason is an understandable reluctance to make allegations in 
public, and the CCP4 bulletin board probably isn't the best place to do 
that.


But I think the case raises essential topics for the community to discuss, 
and this is a good forum for those discussions. We need to consider how to 
ensure the integrity of the structural databases and the associated 
publications.


So here are some questions to start a discussion, with some suggestions of 
partial answers.


1. How many structures in the PDB are fabricated?

I don't know, but I think (or at least hope) that the number is very small.

2. How easy is it to fabricate a structure?

It's very easy, if no-one will be examining it with a suspicious mind, but 
it's extremely difficult to do well. No matter how well a structure is 
fabricated, it will violate something that is known now or learned later 
about the properties of real macromolecules and their diffraction data. If 
you're clever enough to do this really well, then you should be clever 
enough to determine the real structure of an interesting protein.


3. How can we tell whether structures in the PDB are fabricated, or just 
poorly refined?


The current standard validation tools are aimed at detecting errors in 
structure determination or the effects of poor refinement practice. None of 
them are aimed at detecting specific signs of fabrication because we assume 
(almost always correctly) that others are acting in good faith.


The more information that is available, the easier it will be to detect 
fabrication (because it is harder to make up more information 
convincingly). For instance, if the diffraction data are deposited, we can 
check for consistency with the known properties of real macromolecular 
crystals, e.g. that they contain disordered solvent and not vacuum. As 
Tassos Perrakis has discovered, there are characteristic ways in which the 
standard deviations depend on the intensities and the resolution. If 
unmerged data are deposited, there will probably be evidence of radiation 
damage, weak effects from intrinsic anomalous scatterers, etc. Raw images 
are probably even harder to simulate convincingly.


If a structure is fabricated by making up a new crystal form, perhaps a 
complex of previously-known components, then the crystal packing 
interactions should look like the interactions seen in real crystals. If 
it's fabricated by homology modelling, then the internal packing is likely 
to be suboptimal. I'm told by David Baker (who knows a thing or two about 
this) that it is extremely difficult to make a homology model that both 
obeys what we know about torsion angle preferences and is packed as well as 
a real protein structure.


I'm very interested in hearing about new ideas along these lines. The wwPDB 
has agreed to sponsor a workshop next year where we will propose and test 
new validation criteria.


4. If new validation criteria are applied at the PDB, won't someone who 
wants to fabricate a structure just keep improving their fabricated model 
until it passes all the tests?


That's a possibility, but I think the deterrence effect of knowing that 
there are measures to detect fabrication will outweigh this. And it isn't 
enough for a fabricated structure to pass today's tests; it has to pass all 
the new tests devised for the rest of the person's life, or at least their 
career.


5. What should we do if tests suggest that a structure may be fabricated?

I think we need to be extremely careful. Conclusions should not be drawn on 
the basis of a few numbers. The tests can just point up which structures 
should be examined closely. Close examination would then involve less 
automated criteria, such as whether the structure agrees with all the 
biochemical data about the system. As in the process followed by Nature, 
you also have to start by giving the people who deposited the structure an 
opportunity to explain the anomalies.


Randy Read


Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Gina Clayton

I thought that when a structure is deposited the databank does run its own
refinement validation and geometry checks and gives you back what it finds i.e
distance problems etc and rfactor?


Quoting Eleanor Dodson <[EMAIL PROTECTED]>:


The weighting in REFMAC is a function of SigmA ( plotted in log file).
For this example it will be nearly 1 for all resolutions ranges so 
the weights are pretty constant. There is also a contribution from 
the "experimental" sigma, which in this case seems to be proportional 
to |F|


Yesterday I attached the wrong TRUNCATE log file - here is the 
correct one, and if you look at the plot

"Amplitude Analysis against resolution" it also includes a plot of  

Eleanor

Dominika Borek wrote:
There are many more interesting things about this structure - 
obvious fake - refined against fabricated data.


After running refmac I have noticed discrepancies between R and 
weighted R-factors. However, I do not know how the weights are 
calculated and applied - it could maybe help to find out how these 
data were created. Could you help?


M(4SSQ/LL) NR_used %_obs M(Fo_used) M(Fc_used) Rf_used WR_used
NR_free M(Fo_free) M(Fc_free) Rf_free   WR_free $$
$$
 0.0052205  98.77  3800.5  3687.2  0.12  0.30 121 4133.9  
4042.7  0.12  0.28
 0.0153952  99.90  1932.9  1858.7  0.20  0.60 197 2010.5  
1880.5  0.21  0.40
 0.0255026  99.81  1577.9  1512.3  0.23  0.62 283 1565.0  
1484.6  0.26  0.54
 0.0345988  99.76  1598.0  1541.5  0.23  0.61 307 1625.7  
1555.6  0.23  0.42
 0.0446751  99.79  1521.2  1481.6  0.18  0.41 338 1550.3  
1523.8  0.18  0.61
 0.0547469  99.81  1314.5  1291.2  0.14  0.29 391 1348.3  
1337.7  0.15  0.27
 0.0648078  99.87  .5  1089.1  0.16  0.36 465 1096.1  
1077.9  0.18  0.42
 0.0738642  99.84   976.7   959.2  0.15  0.32 488  995.3   
988.4  0.16  0.50
 0.0839255  99.88   866.4   848.0  0.16  0.36 490  856.8   
846.0  0.17  0.38
 0.0939778  99.88   747.6   731.4  0.16  0.36 515  772.8   
747.3  0.18  0.38
 0.103   10225  99.86   662.6   649.1  0.17  0.38 547  658.9   
643.6  0.20  0.36
 0.113   10768  99.83   597.2   584.7  0.18  0.42 538  593.4   
590.0  0.20  0.49
 0.122   11121  99.86   535.5   521.9  0.19  0.48 607  556.2   
542.0  0.20  0.47
 0.132   11692  99.85   489.3   479.2  0.19  0.46 607  476.4   
467.3  0.23  0.42
 0.142   11999  99.83   453.9   443.1  0.19  0.48 621  455.3   
440.6  0.22  0.55
 0.152   12463  99.79   419.2   407.3  0.19  0.44 655  435.3   
424.3  0.22  0.53
 0.162   12885  99.78   384.0   373.9  0.20  0.53 632  384.1   
376.1  0.22  0.43
 0.171   12698  95.96   357.2   348.5  0.21  0.57 686  353.9   
338.6  0.24  0.51
 0.181   11926  87.78   332.0   323.3  0.21  0.66 590  333.4   
322.6  0.24  0.57
 0.191   11204  80.39   309.9   299.6  0.22  0.59 600  302.1   
296.3  0.26  0.77

$$




Eleanor Dodson wrote:
There is a correspondence in last weeks Nature commenting on the 
disparities between  three C3B structures. These are:
2icf   solved at 4.0A resolution, 2i07 at 4.1A resolution, and 2hr0 
at 2.26A resolution.


The A chains of all 3 structures agree closely, with each other and 
other deposited structures.
The B chains of 2icf and 2i07 are in reasonable agreement, but 
there are enormous differences to the B chain of 2hr0.
This structure is surprisingly out of step, and by many criteria 
likely to be wrong.


There has been many articles written on validation and it seems 
worth reminding crystallographers

of  some of tests which make 2hr0 suspect.

1) The cell content analysis suggests there is 80% solvent in the 
asymmetric unit.

Such crystals have been observed but they rarely diffract to 2.26A.

2) Data Analysis:
The reflection data has been deposited so it can be analysed.
The plots provided by TRUNCATE showing intensity statistic features 
are not compatible with such a high solvent ratio.   They are too 
perfect; the moments are perfectly linear, unlikely with such large 
volumes of the crystal containing solvent, and there is absolutely 
no evidence of anisotropy, again unlikely with high solvent content.


3)  Structure analysis
a) The Ramachandran plot is very poor ( 84% allowed) with many 
residues in disallowed regions.
b) The distribution of residue B values is quite unrealistic. There 
is a very low spread,  which is most unusual for a structure with 
long stretches of exposed chain.  The baverage log file is attached.


c) There does not seem to be enough contacts to maintain the 
crystalline state.







Re: [ccp4bb] Structure help

2007-08-16 Thread Eleanor Dodson

Not enough information but some suggestions.

Are you sure the data is OK? Any sign of twinning?  That is suspicious 
if you cant decide between P43 and P43212


(Run SFCHECK on the amplitudes and try to understand output! )
Or send it and I will provide commentary.
Eleanor



Yanming Zhang wrote:

Hi ,
Please help me on this structure:
Data 1.8A: cell 84.892 84.892 172.580 90 90 90 with tetrahedral index
Space group: P43212 or P43, Final refinement Rfree indicates P43 is 
better so most likely P43(also the systematic absence indicate P43 is it)

Matthew coeffient indicates: P43 3mol/a.u(2.58,52%) 2mol/a.u(3.87, 68%)

MR: Both Molerep and Phaser find out 2mol/au initially. Refinement 
reaches Rfree 30% , R 27% finally. WE think, comparing with 1.8 
resolution, The R and Rfree are too high. Also considering the 1.8A 
resolution, there should be 3 molecules in total. So we engaged into 
the effort to find out the 3rd molecule.


Phased MR by MOLREP did find out the 3rd, so did Phaser(fix two, find 
out the 3rd). Both solutions gave relatively poorer statistics but 
packing is perfect. The 3rd molecule, solved by phased MR of MOLREP 
and PHaser respectively, can not superpose  showing slightly different 
orientations, but position is similar. Refinement indicates PHased 
MOLREP MR gives lower Rfree.


Using totally 3 molecule, When I restrain or constrain NCS during 
refinement(used TLS),R free goes way up (R~30, Rfree > 36%). If I 
refine without NCS,R factors slip right away to 27/30 but this strange 
thing
happens: two copies of the original solution refines very well - low B 
factors,very good looking map. The 3rd copy, however, B factors are 
high and the map is bad, no good density match with the model.And the 
3rd molecule did no help to Rfree or R.


Story ends with my questions:
1,It seems the 3rd copy non-exist or globally disordered, under this 
circumstance, can I say 'because of the 3rd molecule is globally 
disordered so Rfree and R stay high (2mol/a.u with Rfree 30% R 27%, 
1.8A data)?

2,Anything I should worry about in the process finding out the 3rd?
3,Are there suggestions to further lower down Rfree (the reasonable 
Rfree should be around 22%)?


THANKS
Yanming




Re: [ccp4bb] "Dry" structures

2007-08-16 Thread Benini, Stefano
Dear Elisabetta,


from the statistics you attach it seems that your low resolution extends only 
to 6.72 A

My guess is that if you could include as much low resolution  as possible (e.g. 
up to 20A or more)it will help a lot in getting better maps etc.,

I hope this is the answer you are looking for!!
please let me know what happens next!

ciao ciao

Stefano

***
Stefano Benini PhD
Structural Biology - DECS
Mereside 50S38
Alderley Park
Phone:+44-1625-518293 (ext.: 28293)
http://structuralbiology-ap.rd.astrazeneca.net/people/Stefano-Benini/Stefano-Benini.html
**


-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] Behalf Of
Sabini, Elisabetta
Sent: 15 August 2007 18:35
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] "Dry" structures


Dear all,

what does it mean when a structure doesn't have many water molecules?

I have a 2.3A data set, 2 molecules in the AU (260 residues each), space
group C2221 - I haven't finished my water search but I don't seem to have
more than 20-30 obvious/good water molecules. The Rfac/Rfree are 22/30%,
the protein is modeled and the ligands are in.


I collected the data at APS SERCAT ID-22. Did I have a dry crystal (!!) or
did the beamline dried it up?!

How many water molecules should I expect at this resolution?

Also, what does it mean when a strong peak in the Fobs-Fc doesn't have the
corresponding 2Fobs-Fc map covering it even at very low sigma?

Thank you!

Eli :o)

PS: I have attached the statistics for the data set from XDS

-- 
Elisabetta Sabini, Ph.D.
Research Assistant Professor
University of Illinois at Chicago
Department of Biochemistry and Molecular Genetics
Molecular Biology Research Building, Rm. 1108
900 South Ashland Avenue
Chicago, IL 60607
U.S.A.

Tel: (312) 996-6299
Fax: (312) 355-4535
E-mail: [EMAIL PROTECTED]


Re: [ccp4bb] "Dry" structures

2007-08-16 Thread Eleanor Dodson
What does the matthews_coeff indicate?  You would expect more water than 
that, but maybe you have a low Matthews_coeff indicating little solvent? 
maybe you have lost the low resolution data which makes it harder to 
find water?   Maybe you have refined with bulk solvent scaling - 
sometimes that also masks the water and you have to look at lower sigma 
levels.  


Your strong Fo-Fc peak might indicate a rogue bit of data..
Eleanor



Sabini, Elisabetta wrote:

Dear all,

what does it mean when a structure doesn't have many water molecules?

I have a 2.3A data set, 2 molecules in the AU (260 residues each), space
group C2221 - I haven't finished my water search but I don't seem to have
more than 20-30 obvious/good water molecules. The Rfac/Rfree are 22/30%,
the protein is modeled and the ligands are in.


I collected the data at APS SERCAT ID-22. Did I have a dry crystal (!!) or
did the beamline dried it up?!

How many water molecules should I expect at this resolution?

Also, what does it mean when a strong peak in the Fobs-Fc doesn't have the
corresponding 2Fobs-Fc map covering it even at very low sigma?

Thank you!

Eli :o)

PS: I have attached the statistics for the data set from XDS