Re: [ccp4bb] an over refined structure

2008-02-12 Thread Dale Tronrud

Edward Berry wrote:

Dirk Kostrewa wrote:

Dear Dean and others,

Peter Zwart gave me a similar reply. This is very interesting 
discussion, and I would like to have a somewhat closer look to this to 
maybe make things a little bit clearer (please, excuse the general 
explanations - this might be interesting for beginners as well):


1). Ccrystallographic symmetry can be applied to the whole crystal and 
results in symmetry-equivalent intensities in reciprocal space. If you 
refine your model in a lower space group, there will be reflections in 
the test-set that are symmetry-equivalent in the higher space group to 
reflections in the working set. If you refine the 
(symmetry-equivalent) copies in your crystal independently, they will 
diverge due to resolution and data quality, and R-work and R-free will 
diverge to some extend due to this. If you force the copies to be 
identical, the R-work  R-free will still be different due to 
observational errors. In both cases, however, the R-free will be very 
close to the R-work.



Ah- that's going way to fast for the beginners, at least one of them!
Can someone explain why the R-free will be very close to the R-work,
preferably in simple concrete terms like Fo, Fc, at sym-related
reflections, and the change in the Fc resulting from a step of refinement?

Ed


Dear Ed,

   Some years ago I was castigated in group meeting for stating that
the question posed by a post-doc was a bad question.  I gather this
is considered rude behavior.  My belief is that if you say good
question to all questions you degrade the value of those truly good
questions when they come along.  Yours is a good question and
demands a proper answer.  Like all good questions, however, the answer
is neither easy nor short.  I'm going to make a stab at it, and I may
end up far from the mark, but I'm sure someone will point out my
failings in follow-up letters.  At least I'll get these ideas out
of my head so I can get back to my real work.

   The other attempts to answer this question, including my own, have
included terms such as error and bias and, without definitions for
these terms, are ultimately unsatisfying.

   It seems to me that the whole point of refinement is to bias the
model to the observations, so the real matter is inappropriate bias.
This brings up the question of what a model is intended to fit and
what it is not.  When I first implemented an overall anisotropic B
correction in TNT I noticed that the correction for a given model
would grow larger as more refinement cycles were run.  It appears
that a model consisting of only atomic positions and isotropic B's
can be created where the Fc's have an anisotropic fall off in
resolution.  When the isotropic model was refined with the anisotropy
uncorrected the parameters managed to find a way to fit that anisotropy.
When the anisotropy was properly modeled the positions and isotropic
B's could go back to their job of fitting the signal they were designed
to fit.

   This is what I would define a inappropriate bias.  The parameters
of the model are attempting to fit a signal they were not designed to
fit.  In this example, the distortion of the parameters is distributed
over a large number and each parameter is changed by a small amount;
an amount usually considered too small to be significant, but in
aggregate they produce a significant signal (the anisotropic falloff
of the model's Fc's).  A more trivial example would the the location
of the side chains of amino acids near the density of an unmodeled
ligand.  Refinement will tend to move the side chains away from the
center of their own density toward the unfilled density, perhaps
even inappropriately placing a side chain in the ligand density
instead of its own.  Again, the fit of the parameters to the signal
they were designed to fit has been degraded by the attempt to fit a
signal they were not, and could never, fit properly.

   When well designed parameters fit the signal they were designed to
fit the model has predictive power.  I guess that is what designed
is defined to mean in this case.  A model that can't predict things
is useless, and that is why the free R is such a good test of a model.
If the parameters of a model are fitting signal in the data that they
were not designed to fit, all bets are off.  There is no reason to
expect that they will have the same predictive power, except by
happenstance or (bad) luck.  Placing the end of an arginine residue
in the density of a ligand does, at least, put a few atoms in places
where atoms should be, and that will tend to lower the free R, but
the requirement that there be bridging atoms linking those atoms
to the main chain of the protein will cause the parameters of the
middle atoms to engage is contortions to try to fit the data, and
those contortions will harm the ability of the model to make correct
predictions.  Going back to the first example, there is also no
reason to expect that the small perturbations in an 

Re: [ccp4bb] an over refined structure

2008-02-12 Thread Edward Berry

Dale Tronrud wrote:





   In summary, this argument depends on two assertions that you can
argue with me about:

   1) When a parameter is being used to fit the signal it was designed
for, the resulting model develops predictive power and can lower
both the working and free R.  When a signal is perturbing the value
of a parameter for which is was not designed, it is unlikely to improve
its predictive power and the working R will tend to drop, but the free
R will not (and may rise).

   2) If the unmodeled signal in the data set is a property in real
space and has the same symmetry as the molecule in the unit cell,
the inappropriate fitting of parameters will be systematic with
respect to that symmetry and the presence of a reflection in the
working set will tend to cause its symmetry mate in the test set
to be better predicted despite the fact that this predictive power
does not extend to reflections that are unrelated by symmetry.
This bias will occur for any kind of error as long as that
error obeys the symmetry of the unit cell in real space.



Dear Dale,
Thanks for taking the time to think about my problem and for
composing what is obviously a well-thought-out explanation.
I am a little over my head here, but I think I see your point.

Inappropriate fitting of this residual error has poor predictive
power so does not reduce {Fc-Fo| for general free reflections.
However the error is symmetrical, so attempts to fit it will
result in symmetrical changes which reduce |Fo-Fc| for those
free reflections that are related to working reflections.

I need to read the references that were mentioned in this
discussion, and think about it a little more in order to
resolve some remaining conflicts in my thinking.
But I don't need to bother everyone else with my
struggles, unless I come up with something useful.
Thanks for the guidance!

Ed


Re: [ccp4bb] an over refined structure

2008-02-08 Thread Dirk Kostrewa
 down unfairly.

Doug Ohlendorf

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On  
Behalf Of

Eleanor Dodson
Sent: Tuesday, February 05, 2008 3:38 AM
To: CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] an over refined structure

I agree that the difference in Rwork to Rfree is quite acceptable  
at your resolution. You cannot/ should not use Rfactors as a  
criteria for structure correctness.
As Ian points out - choosing a different Rfree set of reflections  
can change Rfree a good deal.
certain NCS operators can relate reflections exactly making it  
hard to get a truly independent Free R set, and there are other  
reasons to make it a blunt edged tool.


The map is the best validator - are there blobs still not fitted?  
(maybe side chains you have placed wrongly..) Are there many  
positive or negative peaks in the difference map? How well does  
the NCS match the 2 molecules?


etc etc.
Eleanor

George M. Sheldrick wrote:

Dear Sun,

If we take Ian's formula for the ratio of R(free) to R(work)  
from his paper Acta D56 (2000) 442-450 and make some reasonable  
approximations,

we can reformulate it as:

R(free)/R(work) = sqrt[(1+Q)/(1-Q)]  with  Q = 0.025pd^3(1-s)

where s is the fractional solvent content, d is the resolution,  
p is
the effective number of parameters refined per atom after  
allowing for
the restraints applied, d^3 means d cubed and sqrt means square  
root.


The difficult number to estimate is p. It would be 4 for an  
isotropic refinement without any restraints. I guess that p=1.5  
might be an appropriate value for a typical protein refinement  
(giving an R-factor
ratio of about 1.4 for s=0.6 and d=2.8). In that case, your R- 
factor ratio of 0.277/0.215 = 1.29 is well within the allowed  
range!


However it should be added that this formula is almost a self- 
fulfilling prophesy. If we relax the geometric restraints we

increase p, which then leads to a larger 'allowed' R-factor ratio!

Best wishes, George


Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582




***
Dirk Kostrewa
Gene Center, A 5.07
Ludwig-Maximilians-University
Feodor-Lynen-Str. 25
81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:  +49-89-2180-76999
E-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 
muenchen.de

***


--
Dean R. Madden, Ph.D.
Department of Biochemistry
Dartmouth Medical School
7200 Vail Building
Hanover, NH 03755-3844 USA

tel: +1 (603) 650-1164
fax: +1 (603) 650-1128
e-mail: [EMAIL PROTECTED]



***
Dirk Kostrewa
Gene Center, A 5.07
Ludwig-Maximilians-University
Feodor-Lynen-Str. 25
81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:+49-89-2180-76999
E-mail: [EMAIL PROTECTED]
***




Re: [ccp4bb] an over refined structure

2008-02-08 Thread Dale Tronrud
): in this 
case, putting one symmetry mate in the Rfree set, and one in the Rwork 
set will guarantee that Rfree tracks Rwork. The same effect applies to 
a large extent even if the NCS is not crystallographic.


Bottom line: thin shells are not a perfect solution, but if NCS is 
present, choosing the free set randomly is *never* a better choice, 
and almost always significantly worse. Together with multicopy 
refinement, randomly chosen test sets were almost certainly a major 
contributor to the spuriously good Rfree values associated with the 
retracted MsbA and EmrE structures.


Best wishes,
Dean

Dirk Kostrewa wrote:

Dear CCP4ers,
I'm not convinced, that thin shells are sufficient: I think, in 
principle, one should omit thick shells (greater than the diameter of 
the G-function of the molecule/assembly that is used to describe 
NCS-interactions in reciprocal space), and use the inner thin layer 
of these thick shells, because only those should be completely 
independent of any working set reflections. But this would be too 
expensive given the low number of observed reflections that one 
usually has ...
However, if you don't apply NCS restraints/constraints, there is no 
need for any such precautions.

Best regards,
Dirk.
Am 07.02.2008 um 16:35 schrieb Doug Ohlendorf:

It is important when using NCS that the Rfree reflections be selected is
distributed thin resolution shells. That way application of NCS 
should not

mix Rwork and Rfree sets.  Normal random selection or Rfree + NCS
(especially 4x or higher) will drive Rfree down unfairly.

Doug Ohlendorf

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Eleanor Dodson
Sent: Tuesday, February 05, 2008 3:38 AM
To: CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] an over refined structure

I agree that the difference in Rwork to Rfree is quite acceptable at 
your resolution. You cannot/ should not use Rfactors as a criteria 
for structure correctness.
As Ian points out - choosing a different Rfree set of reflections 
can change Rfree a good deal.
certain NCS operators can relate reflections exactly making it hard 
to get a truly independent Free R set, and there are other reasons 
to make it a blunt edged tool.


The map is the best validator - are there blobs still not fitted? 
(maybe side chains you have placed wrongly..) Are there many 
positive or negative peaks in the difference map? How well does the 
NCS match the 2 molecules?


etc etc.
Eleanor

George M. Sheldrick wrote:

Dear Sun,

If we take Ian's formula for the ratio of R(free) to R(work) from 
his paper Acta D56 (2000) 442-450 and make some reasonable 
approximations,

we can reformulate it as:

R(free)/R(work) = sqrt[(1+Q)/(1-Q)]  with  Q = 0.025pd^3(1-s)

where s is the fractional solvent content, d is the resolution, p is
the effective number of parameters refined per atom after allowing for
the restraints applied, d^3 means d cubed and sqrt means square root.

The difficult number to estimate is p. It would be 4 for an 
isotropic refinement without any restraints. I guess that p=1.5 
might be an appropriate value for a typical protein refinement 
(giving an R-factor
ratio of about 1.4 for s=0.6 and d=2.8). In that case, your 
R-factor ratio of 0.277/0.215 = 1.29 is well within the allowed range!


However it should be added that this formula is almost a 
self-fulfilling prophesy. If we relax the geometric restraints we

increase p, which then leads to a larger 'allowed' R-factor ratio!

Best wishes, George


Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582




***
Dirk Kostrewa
Gene Center, A 5.07
Ludwig-Maximilians-University
Feodor-Lynen-Str. 25
81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:  +49-89-2180-76999
E-mail: [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED]

***


--
Dean R. Madden, Ph.D.
Department of Biochemistry
Dartmouth Medical School
7200 Vail Building
Hanover, NH 03755-3844 USA

tel: +1 (603) 650-1164
fax: +1 (603) 650-1128
e-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]



***
Dirk Kostrewa
Gene Center, A 5.07
Ludwig-Maximilians-University
Feodor-Lynen-Str. 25
81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:  +49-89-2180-76999
E-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
***




Re: [ccp4bb] an over refined structure

2008-02-08 Thread Dirk Kostrewa
 regards,
Dirk.
Am 07.02.2008 um 18:57 schrieb Dean Madden:

Hi Dirk,

I disagree with your final sentence. Even if you don't apply NCS  
restraints/constraints during refinement, there is a serious risk  
of NCS contaminating your Rfree. Consider the limiting case in  
which the NCS is produced simply by working in an artificially  
low symmetry space-group (e.g. P1, when the true symmetry is P2):  
in this case, putting one symmetry mate in the Rfree set, and one  
in the Rwork set will guarantee that Rfree tracks Rwork. The same  
effect applies to a large extent even if the NCS is not  
crystallographic.


Bottom line: thin shells are not a perfect solution, but if NCS  
is present, choosing the free set randomly is *never* a better  
choice, and almost always significantly worse. Together with  
multicopy refinement, randomly chosen test sets were almost  
certainly a major contributor to the spuriously good Rfree values  
associated with the retracted MsbA and EmrE structures.


Best wishes,
Dean

Dirk Kostrewa wrote:

Dear CCP4ers,
I'm not convinced, that thin shells are sufficient: I think, in  
principle, one should omit thick shells (greater than the  
diameter of the G-function of the molecule/assembly that is used  
to describe NCS-interactions in reciprocal space), and use the  
inner thin layer of these thick shells, because only those  
should be completely independent of any working set reflections.  
But this would be too expensive given the low number of  
observed reflections that one usually has ...
However, if you don't apply NCS restraints/constraints, there is  
no need for any such precautions.

Best regards,
Dirk.
Am 07.02.2008 um 16:35 schrieb Doug Ohlendorf:
It is important when using NCS that the Rfree reflections be  
selected is
distributed thin resolution shells. That way application of NCS  
should not

mix Rwork and Rfree sets.  Normal random selection or Rfree + NCS
(especially 4x or higher) will drive Rfree down unfairly.

Doug Ohlendorf

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On  
Behalf Of

Eleanor Dodson
Sent: Tuesday, February 05, 2008 3:38 AM
To: CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] an over refined structure

I agree that the difference in Rwork to Rfree is quite  
acceptable at your resolution. You cannot/ should not use  
Rfactors as a criteria for structure correctness.
As Ian points out - choosing a different Rfree set of  
reflections can change Rfree a good deal.
certain NCS operators can relate reflections exactly making it  
hard to get a truly independent Free R set, and there are other  
reasons to make it a blunt edged tool.


The map is the best validator - are there blobs still not  
fitted? (maybe side chains you have placed wrongly..) Are there  
many positive or negative peaks in the difference map? How well  
does the NCS match the 2 molecules?


etc etc.
Eleanor

George M. Sheldrick wrote:

Dear Sun,

If we take Ian's formula for the ratio of R(free) to R(work)  
from his paper Acta D56 (2000) 442-450 and make some  
reasonable approximations,

we can reformulate it as:

R(free)/R(work) = sqrt[(1+Q)/(1-Q)]  with  Q = 0.025pd^3(1-s)

where s is the fractional solvent content, d is the  
resolution, p is
the effective number of parameters refined per atom after  
allowing for
the restraints applied, d^3 means d cubed and sqrt means  
square root.


The difficult number to estimate is p. It would be 4 for an  
isotropic refinement without any restraints. I guess that  
p=1.5 might be an appropriate value for a typical protein  
refinement (giving an R-factor
ratio of about 1.4 for s=0.6 and d=2.8). In that case, your R- 
factor ratio of 0.277/0.215 = 1.29 is well within the allowed  
range!


However it should be added that this formula is almost a self- 
fulfilling prophesy. If we relax the geometric restraints we
increase p, which then leads to a larger 'allowed' R-factor  
ratio!


Best wishes, George


Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582




***
Dirk Kostrewa
Gene Center, A 5.07
Ludwig-Maximilians-University
Feodor-Lynen-Str. 25
81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:  +49-89-2180-76999
E-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 
muenchen.de

***


--
Dean R. Madden, Ph.D.
Department of Biochemistry
Dartmouth Medical School
7200 Vail Building
Hanover, NH 03755-3844 USA

tel: +1 (603) 650-1164
fax: +1 (603) 650-1128
e-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]

***
Dirk Kostrewa
Gene Center, A 5.07
Ludwig-Maximilians-University
Feodor-Lynen-Str. 25
81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:  +49-89-2180-76999
E-mail: [EMAIL PROTECTED] mailto:[EMAIL

Re: [ccp4bb] an over refined structure

2008-02-08 Thread Sue Roberts
Back in the old days, when I worked on crystal structures with 15 or  
20 atoms or so, the symptoms of missed crystallographic symmetry  
included instability of the refinement, high correlations between  
parameters, and (relatively) large deviations between equivalent bond  
distances and bond angles.  There can be real consequences of missing  
symmetry and divergences between copies of molecules, even when  
resolution and data quality were not an issue, because the refinement  
can become unstable.  Hence, I'm always skeptical of the assumption  
that structures can be safely refined in space groups of too low  
symmetry.  I've assumed that, when people chose to (or accidently)  
refine protein structures in lower symmetry space groups, geometrical  
and NCS restraints keep the refinement under control.  Is there a  
publication somewhere that has looked at the effect of deliberate  
refinement in space groups of lower than correct symmetry?


Sue


On Feb 8, 2008, at 11:07 AM, Edward Berry wrote:


Dirk Kostrewa wrote:

Dear Dean and others,
Peter Zwart gave me a similar reply. This is very interesting  
discussion, and I would like to have a somewhat closer look to  
this to maybe make things a little bit clearer (please, excuse the  
general explanations - this might be interesting for beginners as  
well):
1). Ccrystallographic symmetry can be applied to the whole crystal  
and results in symmetry-equivalent intensities in reciprocal  
space. If you refine your model in a lower space group, there will  
be reflections in the test-set that are symmetry-equivalent in the  
higher space group to reflections in the working set. If you  
refine the (symmetry-equivalent) copies in your crystal  
independently, they will diverge due to resolution and data  
quality, and R-work and R-free will diverge to some extend due to  
this. If you force the copies to be identical, the R-work  R-free  
will still be different due to observational errors. In both  
cases, however, the R-free will be very close to the R-work.




Sue Roberts
Biochemistry  Biophysics
University of Arizona

[EMAIL PROTECTED]


Re: [ccp4bb] an over refined structure

2008-02-08 Thread price
Rotational near-crystallographic ncs is easy to handle this way, but 
what about translational pseudo-symmetry (or should that be 
pseudo-translational symmetry)? In such cases one whole set of spots 
is systematically weaker than the other set.  Then what is the 
theoretically correct way to calculate Rfree?  Write one's own code 
to sort the spots into two piles?

Phoebe

At 01:05 PM 2/8/2008, Axel Brunger wrote:

In such cases, we always define the test set first in the high-symmetry
space group choice.  Then, if it is warranted to lower the crystallographic
symmetry and replace with NCS symmetry, we expand the test set
to the lower symmetry space group.  In other words, the test set itself
will be invariant upon applying any of the crystallographic or NCS operators,
so will be maximally free in these cases.   It is then also possible to
directly compare the free R between the high and low crystallographic
space group choices.

Our recent Neuroligin structure is such an example (Arac et al., 
Neuron 56, 992-, 2007).



Axel




On Feb 8, 2008, at 10:48 AM, Ronald E Stenkamp wrote:


I've looked at about 10 cases where structures have been refined in lower
symmetry space groups.  When you make the NCS operators into 
crystallographic

operators, you don't change the refinement much, at least in terms of
structural changes.  That's the case whether NCS restraints have 
been applied
or not. In the cases I've re-done, changing the refinement program 
and dealing

with test set choices makes some difference in the R and Rfree values.  One
effect of changing the space group is whether you realize the copies of the
molecule in the lower symmetry asymmetric unit are identical or 
not.  (Where

identical means crystallographically identical, i.e., in the same packing
environments, subject to all the caveats about accuracy, precision, thermal
motion, etc).  Another effect of going to higher symmetry space groups of
course has to do with explaining the experimental data with simpler 
and smaller

mathematical models (Occam's razor or the Principle of Parsimony).

Ron


Axel T. Brunger
Investigator,  Howard Hughes Medical Institute
Professor of Molecular and Cellular Physiology
Stanford University

Web:http://atb.slac.stanford.eduhttp://atb.slac.stanford.edu
Email:  mailto:[EMAIL PROTECTED][EMAIL PROTECTED]
Phone:  +1 650-736-1031
Fax:+1 650-745-1463




---
Phoebe A. Rice
Assoc. Prof., Dept. of Biochemistry  Molecular Biology
The University of Chicago
phone 773 834 1723
fax 773 702 0439
http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123
http://www.nasa.gov/mission_pages/cassini/multimedia/pia06064.html 

Re: [ccp4bb] an over refined structure

2008-02-08 Thread Dale Tronrud

[EMAIL PROTECTED] wrote:
 Rotational near-crystallographic ncs is easy to handle this way, but
 what about translational pseudo-symmetry (or should that be
 pseudo-translational symmetry)? In such cases one whole set of spots is
 systematically weaker than the other set.  Then what is the
 theoretically correct way to calculate Rfree?  Write one's own code to
 sort the spots into two piles?
 Phoebe


Dear Phoebe,

   I've always been a fan of splitting the test set in these situations.
The weak set of reflections provide information about the differences
between the ncs mates (and the deviation of the ncs operator from a
true crystallography operator) while the strong reflections provide
information about the average of the ncs mates.  If you mix the two
sets in your Rfree calculation the strong set will tend to dominate
and will obscure the consequences of allowing you ncs mates too much
freedom to differ.

   Let's say you have a pseudo C2 crystal with the dimer as the ncs
pair and you are starting with a perfect C2 symmetry model.  The
initial rigid body refinement will cause the Rfree(weak) to drop
because the initial model had Fc's equal to zero for all these
reflections and the deviation from crystal symmetry allows nonzero
values to arise.

   Now you want to test if there are real differences between the
two copies.  If you allow variation between the two copies but
monitor the Rfree(strong) you are actually monitoring the quality
of the average of the two copies, and you basically have a two-fold
multimodel.  It is the same as putting two molecules at each site
in the crystal and forcing both models to have perfect ncs.

   Axel Brunger's Methods in Enzymology chapter indicates that a
two-fold multimodel is expected to have a lower Rfree than a single
model and we would expect in our imaginary crystal that the
Rfree(strong) will drop even if there is no real difference between
the ncs mates.  When you allow differences between the ncs mates
the Rfree(strong) will tend to drop even if those differences are
not real.

   The Rfree(weak) is a different story, however.  It is controlled
specifically by the differences between the two ncs mates and will
drop only if the refinement creates differences that are significant.
This is the statistic that can be used to determine the ncs weight.
(Or probably the log likelihood gain (weak))

   If you insist on mixing the strong and weak reflections in your
test set you have to design your null hypothesis test differently.
First you should do a refinement where you have two models at
each site, with exact ncs imposed.  The you do a refinement with one
copy at each site but allow differences between the ncs mates.
Compare the Rfree of each model to decide which is the better model.
There are exactly the same number of parameters in each model but
one allows the ncs to be violated and the other does not.

   Even so, the signal in the Rfree is mixed unless you split the
systematically weak from the systematically strong.

   If you have a general ncs and don't have weak and strong subsets
of reflections you still have to worry about the multimodel affect.
If a refinement that allows ncs violations does not drop the Rfree
by more that a two-fold multimodel with perfect ncs you cannot
justify breaking your ncs.  A drop in Rfree when you break ncs does
not necessarily mean that breaking ncs is a good idea.  You always
have to perform the proper null hypothesis test.

Dale Tronrud


At 01:05 PM 2/8/2008, Axel Brunger wrote:

In such cases, we always define the test set first in the high-symmetry
space group choice.  Then, if it is warranted to lower the 
crystallographic

symmetry and replace with NCS symmetry, we expand the test set
to the lower symmetry space group.  In other words, the test set itself
will be invariant upon applying any of the crystallographic or NCS 
operators,

so will be maximally free in these cases.   It is then also possible to
directly compare the free R between the high and low crystallographic
space group choices. 

Our recent Neuroligin structure is such an example (Arac et al., 
Neuron 56, 992-, 2007).
 


Axel




On Feb 8, 2008, at 10:48 AM, Ronald E Stenkamp wrote:

I've looked at about 10 cases where structures have been refined in 
lower
symmetry space groups.  When you make the NCS operators into 
crystallographic

operators, you don't change the refinement much, at least in terms of
structural changes.  That's the case whether NCS restraints have been 
applied
or not. In the cases I've re-done, changing the refinement program 
and dealing
with test set choices makes some difference in the R and Rfree 
values.  One
effect of changing the space group is whether you realize the copies 
of the
molecule in the lower symmetry asymmetric unit are identical or 
not.  (Where
identical means crystallographically identical, i.e., in the same 
packing
environments, subject to all the caveats about accuracy, precision, 
thermal

Re: [ccp4bb] an over refined structure

2008-02-08 Thread Axel Brunger

In such cases, we always define the test set first in the high-symmetry
space group choice.  Then, if it is warranted to lower the  
crystallographic

symmetry and replace with NCS symmetry, we expand the test set
to the lower symmetry space group.  In other words, the test set itself
will be invariant upon applying any of the crystallographic or NCS  
operators,
so will be maximally free in these cases.   It is then also possible  
to

directly compare the free R between the high and low crystallographic
space group choices.

Our recent Neuroligin structure is such an example (Arac et al.,  
Neuron 56, 992-, 2007).



Axel




On Feb 8, 2008, at 10:48 AM, Ronald E Stenkamp wrote:

I've looked at about 10 cases where structures have been refined in  
lower
symmetry space groups.  When you make the NCS operators into  
crystallographic

operators, you don't change the refinement much, at least in terms of
structural changes.  That's the case whether NCS restraints have  
been applied
or not. In the cases I've re-done, changing the refinement program  
and dealing
with test set choices makes some difference in the R and Rfree  
values.  One
effect of changing the space group is whether you realize the copies  
of the
molecule in the lower symmetry asymmetric unit are identical or  
not.  (Where
identical means crystallographically identical, i.e., in the same  
packing
environments, subject to all the caveats about accuracy, precision,  
thermal
motion, etc).  Another effect of going to higher symmetry space  
groups of
course has to do with explaining the experimental data with simpler  
and smaller

mathematical models (Occam's razor or the Principle of Parsimony).

Ron



Axel T. Brunger
Investigator,  Howard Hughes Medical Institute
Professor of Molecular and Cellular Physiology
Stanford University

Web:http://atb.slac.stanford.edu
Email:  [EMAIL PROTECTED]
Phone:  +1 650-736-1031
Fax:+1 650-745-1463





Re: [ccp4bb] an over refined structure

2008-02-08 Thread Dale Tronrud

Bart Hazes wrote:

Dale Tronrud wrote:

[EMAIL PROTECTED] wrote:
  Rotational near-crystallographic ncs is easy to handle this way, but
  what about translational pseudo-symmetry (or should that be
  pseudo-translational symmetry)? In such cases one whole set of 
spots is

  systematically weaker than the other set.  Then what is the
  theoretically correct way to calculate Rfree?  Write one's own 
code to

  sort the spots into two piles?
  Phoebe
 

Dear Phoebe,

   I've always been a fan of splitting the test set in these situations.
The weak set of reflections provide information about the differences
between the ncs mates (and the deviation of the ncs operator from a
true crystallography operator) while the strong reflections provide
information about the average of the ncs mates.  If you mix the two
sets in your Rfree calculation the strong set will tend to dominate
and will obscure the consequences of allowing you ncs mates too much
freedom to differ.


I haven't had to deal with this situation but my first impression is to 
use the strong reflections for Rfree. For the strong reflections, and 
any normal data, Rwork  Rfree are dominated by model errors and not 
measurement errors. For the weak reflections measurement errors become 
more significant if not dominant. In that case Rwork  Rfree will not be 
a sensitive measure to judge model improvement and refinement strategy.


A second and possibly more important issue arises with determination of 
Sigmaa values for maximum likelihood refinement. Sigmaa values are 
related to the correlation between Fc and Fo amplitudes. When half of 
your observed data is systematically weakened then this correlation is 
going to be very high, even if the model is poor or completely wrong, as 
long as it obeys the same pseudo-translation. If you only use the strong 
reflections for Rfree I expect that should get around some of the issue.


Of course it can be valuable to also monitor the weak reflections to 
optimize NCS restraints but probably not to drive maximum likelihood 
refinement or to make general refinement strategy choices.


Bart


Dear Bart,

   I agree that the way one uses the test set depends critically on the
question you are asking.  In my letter I was focusing on that aspect
of the pseudo centered crystal problem where the strong/weak divide can
be used to particular advantage.

   I have not thought as much about the matter of using the test set
to estimate the level of uncertainty in the parameters of a given model.
My gut response is that the strong/weak distinction is still significant.
Since the weak reflections contain information about the differences
between the two, ncs related, copies I suspect that a great many systematic
errors are subtracted out.

   For example, if your model contains isotropic B's when, of course,
the atoms move anisotropically, your maps will contain difference features
due to these unmodeled motions.  Since the anisotropic motions are
probably common to the two molecules, these features will be present in
the average structure described by the strong reflections but will be
subtracted out in the difference structure described by the weak
reflections.  This argument implies to me that the strong reflections
need to be judged by the Sigma A derived from the strong test set and
the weak reflections judged by the weak test set.

Dale Tronrud


Re: [ccp4bb] an over refined structure

2008-02-07 Thread Anastassis Perrakis
Bottom line: thin shells are not a perfect solution, but if NCS is  
present, choosing the free set randomly is *never* a better choice,  
and almost always significantly worse.


hmmm ... I wonder if that is true. For low order NCS (two- three-  
fold, even five-fold) I don't believe that thin shells are better,  
since they are a systematic omission of data (whcih can affect maps)  
and in my experience they do not add much. I have only limited  
experience on this but I somehow tried both and I seem to have  
settled with random Rfree. With an NCS axis parallel to a  
crystallographic one (or when translation NCS is there) that might be  
a whole different ball game though ... not sure.


A.


Together with multicopy refinement, randomly chosen test sets were  
almost certainly a major contributor to the spuriously good Rfree  
values associated with the retracted MsbA and EmrE structures.


ehm ... I think 16 models systematically displaced along a direction  
parallel to helix axes contributed much more to that ... as the  
authors basically said in the original publication if my recollection  
is not bad.


A.


Re: [ccp4bb] an over refined structure

2008-02-07 Thread Dean Madden

Hi Dirk,

I disagree with your final sentence. Even if you don't apply NCS 
restraints/constraints during refinement, there is a serious risk of NCS 
contaminating your Rfree. Consider the limiting case in which the 
NCS is produced simply by working in an artificially low symmetry 
space-group (e.g. P1, when the true symmetry is P2): in this case, 
putting one symmetry mate in the Rfree set, and one in the Rwork set 
will guarantee that Rfree tracks Rwork. The same effect applies to a 
large extent even if the NCS is not crystallographic.


Bottom line: thin shells are not a perfect solution, but if NCS is 
present, choosing the free set randomly is *never* a better choice, and 
almost always significantly worse. Together with multicopy refinement, 
randomly chosen test sets were almost certainly a major contributor to 
the spuriously good Rfree values associated with the retracted MsbA and 
EmrE structures.


Best wishes,
Dean

Dirk Kostrewa wrote:

Dear CCP4ers,

I'm not convinced, that thin shells are sufficient: I think, in 
principle, one should omit thick shells (greater than the diameter of 
the G-function of the molecule/assembly that is used to describe 
NCS-interactions in reciprocal space), and use the inner thin layer of 
these thick shells, because only those should be completely independent 
of any working set reflections. But this would be too expensive given 
the low number of observed reflections that one usually has ...
However, if you don't apply NCS restraints/constraints, there is no need 
for any such precautions.


Best regards,

Dirk.

Am 07.02.2008 um 16:35 schrieb Doug Ohlendorf:


It is important when using NCS that the Rfree reflections be selected is
distributed thin resolution shells. That way application of NCS should not
mix Rwork and Rfree sets.  Normal random selection or Rfree + NCS
(especially 4x or higher) will drive Rfree down unfairly.

Doug Ohlendorf

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Eleanor Dodson
Sent: Tuesday, February 05, 2008 3:38 AM
To: CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] an over refined structure

I agree that the difference in Rwork to Rfree is quite acceptable at 
your resolution. You cannot/ should not use Rfactors as a criteria for 
structure correctness.
As Ian points out - choosing a different Rfree set of reflections can 
change Rfree a good deal.
certain NCS operators can relate reflections exactly making it hard to 
get a truly independent Free R set, and there are other reasons to make 
it a blunt edged tool.


The map is the best validator - are there blobs still not fitted? (maybe 
side chains you have placed wrongly..) Are there many positive or 
negative peaks in the difference map? How well does the NCS match the 2 
molecules?


etc etc.
Eleanor

George M. Sheldrick wrote:

Dear Sun,

If we take Ian's formula for the ratio of R(free) to R(work) from his 
paper Acta D56 (2000) 442-450 and make some reasonable approximations,

we can reformulate it as:

R(free)/R(work) = sqrt[(1+Q)/(1-Q)]  with  Q = 0.025pd^3(1-s)

where s is the fractional solvent content, d is the resolution, p is
the effective number of parameters refined per atom after allowing for
the restraints applied, d^3 means d cubed and sqrt means square root.

The difficult number to estimate is p. It would be 4 for an isotropic 
refinement without any restraints. I guess that p=1.5 might be an 
appropriate value for a typical protein refinement (giving an R-factor
ratio of about 1.4 for s=0.6 and d=2.8). In that case, your R-factor 
ratio of 0.277/0.215 = 1.29 is well within the allowed range!


However it should be added that this formula is almost a 
self-fulfilling prophesy. If we relax the geometric restraints we

increase p, which then leads to a larger 'allowed' R-factor ratio!

Best wishes, George


Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582






***
Dirk Kostrewa
Gene Center, A 5.07
Ludwig-Maximilians-University
Feodor-Lynen-Str. 25
81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:  +49-89-2180-76999
E-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
***




--
Dean R. Madden, Ph.D.
Department of Biochemistry
Dartmouth Medical School
7200 Vail Building
Hanover, NH 03755-3844 USA

tel: +1 (603) 650-1164
fax: +1 (603) 650-1128
e-mail: [EMAIL PROTECTED]


Re: [ccp4bb] an over refined structure

2008-02-07 Thread Edward Berry

Actually the bottom lines below were my argument in the case
that you DO apply strict NCS (although the argument runs into
some questionable points if you follow it out).

In the case that you DO NOT apply NCS, there is a second
decoupling mechanism:
Not only the error in Fo may be opposite for the two reflections,
but also the change in Fc upon applying a non-symmetrical
modification to the structure is likely to be opposite. So there
is no way of predicting whether |Fo-Fc| will move in the same
direction for the two reflections. I completely agree with Dirk
(although I am willing to listen to anyone explain why I am wrong).

Ed


Edward Berry wrote:

Dean Madden wrote:

Hi Dirk,

I disagree with your final sentence. Even if you don't apply NCS 
restraints/constraints during refinement, there is a serious risk of 
NCS contaminating your Rfree. Consider the limiting case in which 
the NCS is produced simply by working in an artificially low 
symmetry space-group (e.g. P1, when the true symmetry is P2): in this 
case, putting one symmetry mate in the Rfree set, and one in the Rwork 
set will guarantee that Rfree tracks Rwork.


I don't think this is right- remember Rfree is not just based on Fc
but Fo-Fc. Working in your lower symmetry space group you will have
separate values for the Fo at the two ncs-related reflections.
Each observation will have its own random error, and like as not
the error will be in the opposite direction for the two reflections.

Hence a structural modification that improves Fo-Fc at one reflection
is equally likely to improve or worsen the fit at the related reflection.
The only way they are coupled is through the basic tenet of R-free:
If it makes the structure better, it is likely to improve the fit
at all reflections.

For sure R-free will go down when you apply NCS- but this is because
you drastically improve your data/parameters ratio.

Best,
Ed


Re: [ccp4bb] an over refined structure

2008-02-07 Thread Phil Jeffrey
Here I will disagree.  R-free rewards you for putting in atom in density 
which an atom belongs in.  It doesn't necessarily reward you for putting 
the *right* atom in that density, but it does become difficult to do 
that under normal circumstances unless you have approximately the right 
structure.


However in the case of multi-copy refinement at low resolution, the 
refinement is perfectly capable of shoving any old atom in density 
corresponding to any other old atom if you give it enough leeway. 
Remember that there's a big difference between R-free for a single copy 
(45%) and a 16-fold multicopy (38%) in MsbA's P1 form, and almost the 
same amount (41% vs 33%) with MsbA's P21 form.  (These are E.coli and 
V.cholerae respectively).  Both single copy and multicopy refinements 
were NCS-restrained, as far as I know.


So there's evidence, w/o simulation, that the 12-fold or 16-fold 
multicopy refinements are worth 7-8% in R-free, and I'm doubtful that 
NCS can generate that sort of gain in either crystal form.  I've 
certainly never seen that in my own experience at low resolution.


I've been meaning to put online the Powerpoint from the CCP4 talk with 
all these numbers in it, but I regret it's sitting on my iBook at home 
as of writing.


Phil Jeffrey

Dean Madden wrote:
It is true that multicopy refinement was essential for the suppression 
of Rwork. However, the whole point of the Rfree is that it is supposed 
to be independent of the number of parameters you're refining. Simply 
throwing multiple copies of the model into the refinement shouldn't have 
affected Rfree, IF IT WERE TRULY FREE.


It was almost certainly NCS-mediated spillover that allowed the 
multicopy, parameter-driven reduction in Rwork to pull down the Rfree 
values as well. The experiment is probably not worth the time it would 
take to do, but I suspect that if MsbA and EmrE test sets had been 
chosen in thin shells, then Rfree wouldn't have shown nearly the 
improvement it did.


Dean


Phil Jeffrey wrote:
While NCS probably played a role in the first crystal form of MsbA 
(P1, 8 monomers), this is also the one that showed the greatest 
improvement in R-free once the structure was correctly redetermined 
(7% or 14% depending on which refinement protocols you compare).


The other crystal form of MsbA and the crystal forms of EmrE didn't 
have particularly high-copy NCS (2 dimers, 4 monomers, dimer, 2 
tetramers) and the R-frees were somewhat comparable in all cases 
(31-36% for the redetermined structures).


The *major* source of the R-free suppression in all these cases with 
the inappropriate use of multi-copy refinement at low resolution.


Phil Jeffrey
Princeton


Dean Madden wrote:

Hi Dirk,

I disagree with your final sentence. Even if you don't apply NCS 
restraints/constraints during refinement, there is a serious risk of 
NCS contaminating your Rfree. Consider the limiting case in which 
the NCS is produced simply by working in an artificially low 
symmetry space-group (e.g. P1, when the true symmetry is P2): in this 
case, putting one symmetry mate in the Rfree set, and one in the 
Rwork set will guarantee that Rfree tracks Rwork. The same effect 
applies to a large extent even if the NCS is not crystallographic.


Bottom line: thin shells are not a perfect solution, but if NCS is 
present, choosing the free set randomly is *never* a better choice, 
and almost always significantly worse. Together with multicopy 
refinement, randomly chosen test sets were almost certainly a major 
contributor to the spuriously good Rfree values associated with the 
retracted MsbA and EmrE structures.


Best wishes,
Dean

Dirk Kostrewa wrote:

Dear CCP4ers,

I'm not convinced, that thin shells are sufficient: I think, in 
principle, one should omit thick shells (greater than the diameter 
of the G-function of the molecule/assembly that is used to describe 
NCS-interactions in reciprocal space), and use the inner thin layer 
of these thick shells, because only those should be completely 
independent of any working set reflections. But this would be too 
expensive given the low number of observed reflections that one 
usually has ...
However, if you don't apply NCS restraints/constraints, there is no 
need for any such precautions.


Best regards,

Dirk.

Am 07.02.2008 um 16:35 schrieb Doug Ohlendorf:

It is important when using NCS that the Rfree reflections be 
selected is
distributed thin resolution shells. That way application of NCS 
should not

mix Rwork and Rfree sets.  Normal random selection or Rfree + NCS
(especially 4x or higher) will drive Rfree down unfairly.

Doug Ohlendorf

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Eleanor Dodson
Sent: Tuesday, February 05, 2008 3:38 AM
To: CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] an over refined structure

I agree that the difference in Rwork to Rfree is quite acceptable 
at your resolution. You cannot

Re: [ccp4bb] an over refined structure

2008-02-07 Thread Phil Jeffrey
While NCS probably played a role in the first crystal form of MsbA (P1, 
8 monomers), this is also the one that showed the greatest improvement 
in R-free once the structure was correctly redetermined (7% or 14% 
depending on which refinement protocols you compare).


The other crystal form of MsbA and the crystal forms of EmrE didn't have 
particularly high-copy NCS (2 dimers, 4 monomers, dimer, 2 tetramers) 
and the R-frees were somewhat comparable in all cases (31-36% for the 
redetermined structures).


The *major* source of the R-free suppression in all these cases with the 
inappropriate use of multi-copy refinement at low resolution.


Phil Jeffrey
Princeton


Dean Madden wrote:

Hi Dirk,

I disagree with your final sentence. Even if you don't apply NCS 
restraints/constraints during refinement, there is a serious risk of NCS 
contaminating your Rfree. Consider the limiting case in which the 
NCS is produced simply by working in an artificially low symmetry 
space-group (e.g. P1, when the true symmetry is P2): in this case, 
putting one symmetry mate in the Rfree set, and one in the Rwork set 
will guarantee that Rfree tracks Rwork. The same effect applies to a 
large extent even if the NCS is not crystallographic.


Bottom line: thin shells are not a perfect solution, but if NCS is 
present, choosing the free set randomly is *never* a better choice, and 
almost always significantly worse. Together with multicopy refinement, 
randomly chosen test sets were almost certainly a major contributor to 
the spuriously good Rfree values associated with the retracted MsbA and 
EmrE structures.


Best wishes,
Dean

Dirk Kostrewa wrote:

Dear CCP4ers,

I'm not convinced, that thin shells are sufficient: I think, in 
principle, one should omit thick shells (greater than the diameter of 
the G-function of the molecule/assembly that is used to describe 
NCS-interactions in reciprocal space), and use the inner thin layer of 
these thick shells, because only those should be completely 
independent of any working set reflections. But this would be too 
expensive given the low number of observed reflections that one 
usually has ...
However, if you don't apply NCS restraints/constraints, there is no 
need for any such precautions.


Best regards,

Dirk.

Am 07.02.2008 um 16:35 schrieb Doug Ohlendorf:


It is important when using NCS that the Rfree reflections be selected is
distributed thin resolution shells. That way application of NCS 
should not

mix Rwork and Rfree sets.  Normal random selection or Rfree + NCS
(especially 4x or higher) will drive Rfree down unfairly.

Doug Ohlendorf

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Eleanor Dodson
Sent: Tuesday, February 05, 2008 3:38 AM
To: CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] an over refined structure

I agree that the difference in Rwork to Rfree is quite acceptable at 
your resolution. You cannot/ should not use Rfactors as a criteria 
for structure correctness.
As Ian points out - choosing a different Rfree set of reflections can 
change Rfree a good deal.
certain NCS operators can relate reflections exactly making it hard 
to get a truly independent Free R set, and there are other reasons to 
make it a blunt edged tool.


The map is the best validator - are there blobs still not fitted? 
(maybe side chains you have placed wrongly..) Are there many positive 
or negative peaks in the difference map? How well does the NCS match 
the 2 molecules?


etc etc.
Eleanor

George M. Sheldrick wrote:

Dear Sun,

If we take Ian's formula for the ratio of R(free) to R(work) from 
his paper Acta D56 (2000) 442-450 and make some reasonable 
approximations,

we can reformulate it as:

R(free)/R(work) = sqrt[(1+Q)/(1-Q)]  with  Q = 0.025pd^3(1-s)

where s is the fractional solvent content, d is the resolution, p is
the effective number of parameters refined per atom after allowing for
the restraints applied, d^3 means d cubed and sqrt means square root.

The difficult number to estimate is p. It would be 4 for an 
isotropic refinement without any restraints. I guess that p=1.5 
might be an appropriate value for a typical protein refinement 
(giving an R-factor
ratio of about 1.4 for s=0.6 and d=2.8). In that case, your R-factor 
ratio of 0.277/0.215 = 1.29 is well within the allowed range!


However it should be added that this formula is almost a 
self-fulfilling prophesy. If we relax the geometric restraints we

increase p, which then leads to a larger 'allowed' R-factor ratio!

Best wishes, George


Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582






***
Dirk Kostrewa
Gene Center, A 5.07
Ludwig-Maximilians-University
Feodor-Lynen-Str. 25
81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:  +49-89-2180-76999
E-mail

Re: [ccp4bb] an over refined structure

2008-02-07 Thread Dean Madden

Hi Ed,

This is an intriguing argument, but I know (having caught such a case as 
a reviewer) that even in cases of low NCS symmetry, Rfree can be 
significantly biased. I think the reason is that the discrepancy between 
pairs of NCS-related reflections (i.e. Fo-Fo') is generally 
significantly smaller than |Fo-Fc|. (In general, Rsym (on F) is lower 
than Rfree.) Thus, moving Fc closer to Fo will also move its NCS partner 
Fc' closer to Fo' *on average*, if they are coupled.


Dean

Edward Berry wrote:

Actually the bottom lines below were my argument in the case
that you DO apply strict NCS (although the argument runs into
some questionable points if you follow it out).

In the case that you DO NOT apply NCS, there is a second
decoupling mechanism:
Not only the error in Fo may be opposite for the two reflections,
but also the change in Fc upon applying a non-symmetrical
modification to the structure is likely to be opposite. So there
is no way of predicting whether |Fo-Fc| will move in the same
direction for the two reflections. I completely agree with Dirk
(although I am willing to listen to anyone explain why I am wrong).

Ed


Edward Berry wrote:

Dean Madden wrote:

Hi Dirk,

I disagree with your final sentence. Even if you don't apply NCS 
restraints/constraints during refinement, there is a serious risk of 
NCS contaminating your Rfree. Consider the limiting case in which 
the NCS is produced simply by working in an artificially low 
symmetry space-group (e.g. P1, when the true symmetry is P2): in this 
case, putting one symmetry mate in the Rfree set, and one in the 
Rwork set will guarantee that Rfree tracks Rwork.


I don't think this is right- remember Rfree is not just based on Fc
but Fo-Fc. Working in your lower symmetry space group you will have
separate values for the Fo at the two ncs-related reflections.
Each observation will have its own random error, and like as not
the error will be in the opposite direction for the two reflections.

Hence a structural modification that improves Fo-Fc at one reflection
is equally likely to improve or worsen the fit at the related reflection.
The only way they are coupled is through the basic tenet of R-free:
If it makes the structure better, it is likely to improve the fit
at all reflections.

For sure R-free will go down when you apply NCS- but this is because
you drastically improve your data/parameters ratio.

Best,
Ed




--
Dean R. Madden, Ph.D.
Department of Biochemistry
Dartmouth Medical School
7200 Vail Building
Hanover, NH 03755-3844 USA

tel: +1 (603) 650-1164
fax: +1 (603) 650-1128
e-mail: [EMAIL PROTECTED]


Re: [ccp4bb] an over refined structure

2008-02-07 Thread Jon Wright

Dear Ed,

I don't see how you decouple symmetry mates in the case of a wrong 
space group. Symmetry mates should agree with each other typically 
within R_sym or R_merge percent, eg; about 2-5% . Observed and 
calculated reflections agree within R_Factor of each other, so about 
20-30%. The experimental errors are pretty much negligible and 
overfitting is not a question about error bars; it is about how hard to 
push a round peg into a square hole?


Cheers,

Jon

Edward Berry wrote:

Actually the bottom lines below were my argument in the case
that you DO apply strict NCS (although the argument runs into
some questionable points if you follow it out).

In the case that you DO NOT apply NCS, there is a second
decoupling mechanism:
Not only the error in Fo may be opposite for the two reflections,
but also the change in Fc upon applying a non-symmetrical
modification to the structure is likely to be opposite. So there
is no way of predicting whether |Fo-Fc| will move in the same
direction for the two reflections. I completely agree with Dirk
(although I am willing to listen to anyone explain why I am wrong).

Ed


Edward Berry wrote:

Dean Madden wrote:

Hi Dirk,

I disagree with your final sentence. Even if you don't apply NCS 
restraints/constraints during refinement, there is a serious risk of 
NCS contaminating your Rfree. Consider the limiting case in which 
the NCS is produced simply by working in an artificially low 
symmetry space-group (e.g. P1, when the true symmetry is P2): in this 
case, putting one symmetry mate in the Rfree set, and one in the 
Rwork set will guarantee that Rfree tracks Rwork.


I don't think this is right- remember Rfree is not just based on Fc
but Fo-Fc. Working in your lower symmetry space group you will have
separate values for the Fo at the two ncs-related reflections.
Each observation will have its own random error, and like as not
the error will be in the opposite direction for the two reflections.

Hence a structural modification that improves Fo-Fc at one reflection
is equally likely to improve or worsen the fit at the related reflection.
The only way they are coupled is through the basic tenet of R-free:
If it makes the structure better, it is likely to improve the fit
at all reflections.

For sure R-free will go down when you apply NCS- but this is because
you drastically improve your data/parameters ratio.

Best,
Ed


Re: [ccp4bb] an over refined structure

2008-02-07 Thread Edward Berry

Dean Madden wrote:

Hi Dirk,

I disagree with your final sentence. Even if you don't apply NCS 
restraints/constraints during refinement, there is a serious risk of NCS 
contaminating your Rfree. Consider the limiting case in which the 
NCS is produced simply by working in an artificially low symmetry 
space-group (e.g. P1, when the true symmetry is P2): in this case, 
putting one symmetry mate in the Rfree set, and one in the Rwork set 
will guarantee that Rfree tracks Rwork. 



I don't think this is right- remember Rfree is not just based on Fc
but Fo-Fc. Working in your lower symmetry space group you will have
separate values for the Fo at the two ncs-related reflections.
Each observation will have its own random error, and like as not
the error will be in the opposite direction for the two reflections.

Hence a structural modification that improves Fo-Fc at one reflection
is equally likely to improve or worsen the fit at the related reflection.
The only way they are coupled is through the basic tenet of R-free:
If it makes the structure better, it is likely to improve the fit
at all reflections.

For sure R-free will go down when you apply NCS- but this is because
you drastically improve your data/parameters ratio.

Best,
Ed


Re: [ccp4bb] an over refined structure

2008-02-07 Thread Edward Berry

Agreed, and this is even more true if you consider R-merge is calculated
on I's and Rfree on F's, Rmerge of 5% should contribute 2.5% to Rfree;
and furthermore errors add vectorially so it would be
more like ,025/sqrt(2).

I guess I have to take all those other errors that have to do with
the inability of a simple atomic model to account for the diffraction
of a crystal, lump them together and assume they have nothing to do
with NCS and are not affected by the simple modification under
consideration.

I am thinking about the CHANGE in |Fo-Fc| at two sym-related reflections
when the refinement program moves a single atom from position 1 to
position 2. If we do not apply NCS, this is the only atom that
will move, and for Fc we can definitely say there is no reason
to expect the two Fc's to move in the same direction, therefore
there is no coupling in the case we do not apply NCS.

If we apply strict NCS then granted the sym related Fc's are equal
before and after the change, so they move in the same direction.
As I said, the argument is weaker now. If there are systematic
errors contributing to the gap between Rfree and 0.5*Rmerge/sqrt(2),
and if these systematic errors follow the NCS, then initial Fo-Fc
is likely to be of the same sign at the related reflections and
larger than the change in Fc, so |Fo-Fc| would go in the same
direction.  But to justify this you would have to explin why
the systematic errors follow ncs. Crystal morphology related
to ncs resulting in similar absorption errors? But how large
are absorbtion errors, and is there any reason for morphology
to follow NCS?

After reading Dean Madden's latest-
We might need some assumption here that we are reasonably close
to the refined structure. If we start with random atoms then
shoving the atoms around in a way that fits the density better
might be seen as improving the structure from the point of
modeling the density, but not from the point of approximating the
real structure. But in this case the change in sign of Fc is
completely decoupled between sym-related reflections, and if you
enforce symmetry you will be enforcing the wrong symmetry and
worsening both the structure and the fit to the density.

I think Gerard Kleywegt has an example of enforcing NCS on a
an erroneous structure, and it was not very effective
at reducing Rfree? And in that case the structure may have had some
resemblence to the density at low resolution,the NCS may have been
somewhat correct.

I guess there are two questions depending whether you are at the
beginning at the beginning of a refinement and may have a completely
wrong structure, or whether refinement isnearly complete and you
want to know whether the further improvement you get on applg NCS
is real.

Jon Wright wrote:

Dear Ed,

I don't see how you decouple symmetry mates in the case of a wrong 
space group. Symmetry mates should agree with each other typically 
within R_sym or R_merge percent, eg; about 2-5% . Observed and 
calculated reflections agree within R_Factor of each other, so about 
20-30%. The experimental errors are pretty much negligible and 
overfitting is not a question about error bars; it is about how hard to 
push a round peg into a square hole?


Cheers,

Jon

Edward Berry wrote:

Actually the bottom lines below were my argument in the case
that you DO apply strict NCS (although the argument runs into
some questionable points if you follow it out).

In the case that you DO NOT apply NCS, there is a second
decoupling mechanism:
Not only the error in Fo may be opposite for the two reflections,
but also the change in Fc upon applying a non-symmetrical
modification to the structure is likely to be opposite. So there
is no way of predicting whether |Fo-Fc| will move in the same
direction for the two reflections. I completely agree with Dirk
(although I am willing to listen to anyone explain why I am wrong).

Ed


Edward Berry wrote:

Dean Madden wrote:

Hi Dirk,

I disagree with your final sentence. Even if you don't apply NCS 
restraints/constraints during refinement, there is a serious risk of 
NCS contaminating your Rfree. Consider the limiting case in which 
the NCS is produced simply by working in an artificially low 
symmetry space-group (e.g. P1, when the true symmetry is P2): in 
this case, putting one symmetry mate in the Rfree set, and one in 
the Rwork set will guarantee that Rfree tracks Rwork.


I don't think this is right- remember Rfree is not just based on Fc
but Fo-Fc. Working in your lower symmetry space group you will have
separate values for the Fo at the two ncs-related reflections.
Each observation will have its own random error, and like as not
the error will be in the opposite direction for the two reflections.

Hence a structural modification that improves Fo-Fc at one reflection
is equally likely to improve or worsen the fit at the related 
reflection.

The only way they are coupled is through the basic tenet of R-free:
If it makes the structure better, it is likely 

Re: [ccp4bb] an over refined structure

2008-02-07 Thread Phil Jeffrey
If you think about it, there is an analogy to relaxing geometrical 
constraints, which also allows the refinement to put atoms into 
density. The reason it usually doesn't help Rfree is that the density 
is spurious. At least some of the incorrect structure determinations of 
the early 90's (that spurred the introduction of Rfree etc.) had high 
rms deviations, suggesting that this is how the overfitting occurred. 
Nevertheless, once hit with a bit of simulated annealing, the Rfree 
values of such models deteriorated significantly.


If memory serves, the incorrect structures of the 1990's would have had 
relaxed geometry precisely because they needed to do that to reduce R, 
and R used to be the primary indicator of structure quality in the days 
before R-free was introduced.  There's quite a big difference between 
the latitude afforded by relaxing geometry and the degree of freedom 
allowed by multicopy refinement.  Simply increasing the RMS bond length 
deviations from 0.012 to 0.035 Angstrom would move atoms on average by 
only a fraction of a bond length, which is not really enough to jump 
between different atom locations.


In any event, the MsbA statistics can be simply explained from an 
expectation of what happens if you overfit your (wrong) structure using 
techniques inappropriate for the resolution:


R-work goes down
R-free goes down less
(R-free - R-work) goes up

and this happens in general with use of multicopy refinement at anything 
less than quite high resolution - I'm thinking in particular of a 
comment in Chen  Chapman (2001) Biophys J vol. 8, 1466-1472.  So I see 
no reason to suggest NCS is having a particularly extreme, perhaps 
unprecedented, effect.



Phil Jeffrey
(still working on converting Micro$loth Powerpoint to html)


Re: [ccp4bb] an over refined structure

2008-02-07 Thread Dirk Kostrewa

Dear CCP4ers,

I'm not convinced, that thin shells are sufficient: I think, in  
principle, one should omit thick shells (greater than the diameter of  
the G-function of the molecule/assembly that is used to describe NCS- 
interactions in reciprocal space), and use the inner thin layer of  
these thick shells, because only those should be completely  
independent of any working set reflections. But this would be too  
expensive given the low number of observed reflections that one  
usually has ...
However, if you don't apply NCS restraints/constraints, there is no  
need for any such precautions.


Best regards,

Dirk.

Am 07.02.2008 um 16:35 schrieb Doug Ohlendorf:

It is important when using NCS that the Rfree reflections be  
selected is
distributed thin resolution shells. That way application of NCS  
should not

mix Rwork and Rfree sets.  Normal random selection or Rfree + NCS
(especially 4x or higher) will drive Rfree down unfairly.

Doug Ohlendorf

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Eleanor Dodson
Sent: Tuesday, February 05, 2008 3:38 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] an over refined structure

I agree that the difference in Rwork to Rfree is quite acceptable at
your resolution. You cannot/ should not use Rfactors as a criteria for
structure correctness.
As Ian points out - choosing a different Rfree set of reflections can
change Rfree a good deal.
certain NCS operators can relate reflections exactly making it hard to
get a truly independent Free R set, and there are other reasons to  
make

it a blunt edged tool.

The map is the best validator - are there blobs still not fitted?  
(maybe

side chains you have placed wrongly..) Are there many positive or
negative peaks in the difference map? How well does the NCS match  
the 2

molecules?

etc etc.
Eleanor

George M. Sheldrick wrote:

Dear Sun,

If we take Ian's formula for the ratio of R(free) to R(work) from his
paper Acta D56 (2000) 442-450 and make some reasonable  
approximations,

we can reformulate it as:

R(free)/R(work) = sqrt[(1+Q)/(1-Q)]  with  Q = 0.025pd^3(1-s)

where s is the fractional solvent content, d is the resolution, p is
the effective number of parameters refined per atom after allowing  
for

the restraints applied, d^3 means d cubed and sqrt means square root.

The difficult number to estimate is p. It would be 4 for an isotropic
refinement without any restraints. I guess that p=1.5 might be an
appropriate value for a typical protein refinement (giving an R- 
factor

ratio of about 1.4 for s=0.6 and d=2.8). In that case, your R-factor
ratio of 0.277/0.215 = 1.29 is well within the allowed range!

However it should be added that this formula is almost a
self-fulfilling prophesy. If we relax the geometric restraints we
increase p, which then leads to a larger 'allowed' R-factor ratio!

Best wishes, George


Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582






***
Dirk Kostrewa
Gene Center, A 5.07
Ludwig-Maximilians-University
Feodor-Lynen-Str. 25
81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:+49-89-2180-76999
E-mail: [EMAIL PROTECTED]
***




Re: [ccp4bb] an over refined structure

2008-02-07 Thread Dean Madden

Hi Phil,

Here I will disagree.  R-free rewards you for putting in atom in density 
which an atom belongs in.  It doesn't necessarily reward you for putting 
the *right* atom in that density, but it does become difficult to do 
that under normal circumstances unless you have approximately the right 
structure.


However in the case of multi-copy refinement at low resolution, the 
refinement is perfectly capable of shoving any old atom in density 
corresponding to any other old atom if you give it enough leeway. 

...


So there's evidence, w/o simulation, that the 12-fold or 16-fold 
multicopy refinements are worth 7-8% in R-free, and I'm doubtful that 
NCS can generate that sort of gain in either crystal form.  I've 
certainly never seen that in my own experience at low resolution.


Remember that there are two things at work here: putting atoms into real 
density (which does reduce Rfree) and putting atoms into noise 
(overfitting, which shouldn't help Rfree). At low res, there's a lot of 
noise.


If you think about it, there is an analogy to relaxing geometrical 
constraints, which also allows the refinement to put atoms into 
density. The reason it usually doesn't help Rfree is that the density 
is spurious. At least some of the incorrect structure determinations of 
the early 90's (that spurred the introduction of Rfree etc.) had high 
rms deviations, suggesting that this is how the overfitting occurred. 
Nevertheless, once hit with a bit of simulated annealing, the Rfree 
values of such models deteriorated significantly.


I would argue that 12-fold or 16-fold multicopy refinements simply 
permitted overfitting of noise. In other words, it is worth 7-8% in 
R*work*, but not Rfree. In this case, the main reason Rfree also dropped 
is because the test set was coupled *by NCS* to the overfit working set. 
Use of a random test set in the presence of NCS could easily prevent the 
Rfree value from serving as a warning of overfitting.


Of course, to be absolutely sure, one would have to repeat the multicopy 
refinements of the inverted structures with a test set chosen in thin 
shells, and then see if Rfree dropped as before. I think only the 
original authors would be in a position to do that properly.


Dean

--
Dean R. Madden, Ph.D.
Department of Biochemistry
Dartmouth Medical School
7200 Vail Building
Hanover, NH 03755-3844 USA

tel: +1 (603) 650-1164
fax: +1 (603) 650-1128
e-mail: [EMAIL PROTECTED]


Re: [ccp4bb] an over refined structure

2008-02-07 Thread Edward Berry

Dean Madden wrote:

Hi Ed,

This is an intriguing argument, but I know (having caught such a case as 
a reviewer) that even in cases of low NCS symmetry, Rfree can be 
significantly biased. I think the reason is that the discrepancy between 
pairs of NCS-related reflections (i.e. Fo-Fo') is generally 
significantly smaller than |Fo-Fc|. (In general, Rsym (on F) is lower 
than Rfree.) Thus, moving Fc closer to Fo will also move its NCS partner 
Fc' closer to Fo' *on average*, if they are coupled.


OK, I see that now, the systematic errors must be related to NCS
in this case because we know if we reduced the data in the higher
space group, our Rsyms would be OK. I stand educated. But it is
difficult to go from there to real ncs where the large unaccounted
errors may not be related to ncs. Furthermore if you don't enforce
NCS the structural changes are asymmetric and there is no reason to
believe Fc will move in the same direction, even in this artificial
case. So Dirk's assertion still stands, I believe.



Dean

Edward Berry wrote:

Actually the bottom lines below were my argument in the case
that you DO apply strict NCS (although the argument runs into
some questionable points if you follow it out).

In the case that you DO NOT apply NCS, there is a second
decoupling mechanism:
Not only the error in Fo may be opposite for the two reflections,
but also the change in Fc upon applying a non-symmetrical
modification to the structure is likely to be opposite. So there
is no way of predicting whether |Fo-Fc| will move in the same
direction for the two reflections. I completely agree with Dirk
(although I am willing to listen to anyone explain why I am wrong).

Ed


Edward Berry wrote:

Dean Madden wrote:

Hi Dirk,

I disagree with your final sentence. Even if you don't apply NCS 
restraints/constraints during refinement, there is a serious risk of 
NCS contaminating your Rfree. Consider the limiting case in which 
the NCS is produced simply by working in an artificially low 
symmetry space-group (e.g. P1, when the true symmetry is P2): in 
this case, putting one symmetry mate in the Rfree set, and one in 
the Rwork set will guarantee that Rfree tracks Rwork.


I don't think this is right- remember Rfree is not just based on Fc
but Fo-Fc. Working in your lower symmetry space group you will have
separate values for the Fo at the two ncs-related reflections.
Each observation will have its own random error, and like as not
the error will be in the opposite direction for the two reflections.

Hence a structural modification that improves Fo-Fc at one reflection
is equally likely to improve or worsen the fit at the related 
reflection.

The only way they are coupled is through the basic tenet of R-free:
If it makes the structure better, it is likely to improve the fit
at all reflections.

For sure R-free will go down when you apply NCS- but this is because
you drastically improve your data/parameters ratio.

Best,
Ed






Re: [ccp4bb] an over refined structure

2008-02-07 Thread Axel Brunger

A few comments that you might find useful:

1. yes, even if you don't apply NCS restraints/constraints there will  
be correlations between
reflections in cases of NCS symmetry or pseudo-crystallographic NCS  
symmetry.


2. Fabiola, Chapman, et al., published a very nice paper on the topic  
in Acta D. 62, 227-238, 2006.


3. From my experience, the effects for low NCS symmetry are usually  
small, except cases of
pseudo-symmetry which can be easily addressed by defining the test set  
in the high-symmetry
setting.  For high NCS symmetry, the effects are more significant, but  
then the structure

is usually much better determined, anyway, due to averaging.

4. At least the first one of the mentioned MsbA and EmrE structures   
had a very high Rfree in the
absence of multi-copy refinement ( ~ 45%)!   So, the Rfree indicated  
that there was a major

problem.

5. The Rfree should vary relatively little among test sets (see my  
Acta D 49, 24-36, 1993 paper)
- if there are large variations for different test set choices then  
the test set may be too small

or there may be systematic problems with some of the reflections causing
them to dominate the R factors (outliers at low resolution, for  
example).


Axel Brunger


On Feb 7, 2008, at 9:57 AM, Dean Madden wrote:


Hi Dirk,

I disagree with your final sentence. Even if you don't apply NCS  
restraints/constraints during refinement, there is a serious risk of  
NCS contaminating your Rfree. Consider the limiting case in which  
the NCS is produced simply by working in an artificially low  
symmetry space-group (e.g. P1, when the true symmetry is P2): in  
this case, putting one symmetry mate in the Rfree set, and one in  
the Rwork set will guarantee that Rfree tracks Rwork. The same  
effect applies to a large extent even if the NCS is not  
crystallographic.


Bottom line: thin shells are not a perfect solution, but if NCS is  
present, choosing the free set randomly is *never* a better choice,  
and almost always significantly worse. Together with multicopy  
refinement, randomly chosen test sets were almost certainly a major  
contributor to the spuriously good Rfree values associated with the  
retracted MsbA and EmrE structures.


Best wishes,
Dean

Dirk Kostrewa wrote:

Dear CCP4ers,
I'm not convinced, that thin shells are sufficient: I think, in  
principle, one should omit thick shells (greater than the diameter  
of the G-function of the molecule/assembly that is used to describe  
NCS-interactions in reciprocal space), and use the inner thin layer  
of these thick shells, because only those should be completely  
independent of any working set reflections. But this would be too  
expensive given the low number of observed reflections that one  
usually has ...
However, if you don't apply NCS restraints/constraints, there is no  
need for any such precautions.

Best regards,
Dirk.
Am 07.02.2008 um 16:35 schrieb Doug Ohlendorf:
It is important when using NCS that the Rfree reflections be  
selected is
distributed thin resolution shells. That way application of NCS  
should not

mix Rwork and Rfree sets.  Normal random selection or Rfree + NCS
(especially 4x or higher) will drive Rfree down unfairly.

Doug Ohlendorf

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf  
Of

Eleanor Dodson
Sent: Tuesday, February 05, 2008 3:38 AM
To: CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] an over refined structure

I agree that the difference in Rwork to Rfree is quite acceptable  
at your resolution. You cannot/ should not use Rfactors as a  
criteria for structure correctness.
As Ian points out - choosing a different Rfree set of reflections  
can change Rfree a good deal.
certain NCS operators can relate reflections exactly making it  
hard to get a truly independent Free R set, and there are other  
reasons to make it a blunt edged tool.


The map is the best validator - are there blobs still not fitted?  
(maybe side chains you have placed wrongly..) Are there many  
positive or negative peaks in the difference map? How well does  
the NCS match the 2 molecules?


etc etc.
Eleanor

George M. Sheldrick wrote:

Dear Sun,

If we take Ian's formula for the ratio of R(free) to R(work) from  
his paper Acta D56 (2000) 442-450 and make some reasonable  
approximations,

we can reformulate it as:

R(free)/R(work) = sqrt[(1+Q)/(1-Q)]  with  Q = 0.025pd^3(1-s)

where s is the fractional solvent content, d is the resolution, p  
is
the effective number of parameters refined per atom after  
allowing for
the restraints applied, d^3 means d cubed and sqrt means square  
root.


The difficult number to estimate is p. It would be 4 for an  
isotropic refinement without any restraints. I guess that p=1.5  
might be an appropriate value for a typical protein refinement  
(giving an R-factor
ratio of about 1.4 for s=0.6 and d=2.8). In that case, your R- 
factor ratio of 0.277/0.215 = 1.29 is well within

Re: [ccp4bb] an over refined structure

2008-02-05 Thread Clemens Vonrhein
Hi Sun,

On Mon, Feb 04, 2008 at 02:15:05PM -0800, Sun Tang wrote:
 I used NCS before rigid body refinement. After that I did not put
 NCS restraints in the restrained refinement and TLS+restrained
 refinement because it raised the R/Rfree quite a lot.

Use NCS. Really!

There is never a reason for switching off NCS restraints (ok _maybe_
at real atomic or ultra-high resolution ...). Obviously, you'll need
to change the way you apply NCS restraints: from a simple per-chain
definition to maybe a per-domain definition, taking out residues in
crystal contacts, allowing for a different base B-factor of different
chains/domains etc. Some programs do these things fairly automatically
for you.

This might make it awkward to use NCS sometimes, but at 2.8A it is a
must (I think). And if your use of NCS increases Rfree, then there is
aproblem in the setup of NCS restraints, not in the principal
usage. 

Note: re-introducing NCS restraints might increase the R, but if the
Rfree stays similar: who cares?

 --

See also:

  G J Kleywegt, Use of non-crystallographic symmetry in protein
  structure refinement, Acta Crystallographica, D52, 842-857 (1996).

According to

  http://xray.bmc.uu.se/~gerard/citation.html

a classic! paper [gosh ... I really like Gerard's style ;-)].

Cheers

Clemens

-- 

***
* Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com
*
*  Global Phasing Ltd.
*  Sheraton House, Castle Park 
*  Cambridge CB3 0AX, UK
*--
* BUSTER Development Group  (http://www.globalphasing.com)
***


Re: [ccp4bb] an over refined structure

2008-02-05 Thread Eleanor Dodson
I agree that the difference in Rwork to Rfree is quite acceptable at 
your resolution. You cannot/ should not use Rfactors as a criteria for 
structure correctness.
As Ian points out - choosing a different Rfree set of reflections can 
change Rfree a good deal.
certain NCS operators can relate reflections exactly making it hard to 
get a truly independent Free R set, and there are other reasons to make 
it a blunt edged tool.


The map is the best validator - are there blobs still not fitted? (maybe 
side chains you have placed wrongly..) Are there many positive or 
negative peaks in the difference map? How well does the NCS match the 2 
molecules?


etc etc.
Eleanor

George M. Sheldrick wrote:

Dear Sun,

If we take Ian's formula for the ratio of R(free) to R(work) from his 
paper Acta D56 (2000) 442-450 and make some reasonable approximations,

we can reformulate it as:

R(free)/R(work) = sqrt[(1+Q)/(1-Q)]  with  Q = 0.025pd^3(1-s)

where s is the fractional solvent content, d is the resolution, p is
the effective number of parameters refined per atom after allowing for
the restraints applied, d^3 means d cubed and sqrt means square root.

The difficult number to estimate is p. It would be 4 for an isotropic 
refinement without any restraints. I guess that p=1.5 might be an 
appropriate value for a typical protein refinement (giving an R-factor
ratio of about 1.4 for s=0.6 and d=2.8). In that case, your R-factor 
ratio of 0.277/0.215 = 1.29 is well within the allowed range!


However it should be added that this formula is almost a 
self-fulfilling prophesy. If we relax the geometric restraints we

increase p, which then leads to a larger 'allowed' R-factor ratio!

Best wishes, George


Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582


  


Re: [ccp4bb] an over refined structure

2008-02-05 Thread Ian Tickle
Hi Sun

Your bond length  angle RMSD's look suspiciously high for a 2.8 Ang
structure; this usually means that some weighting parameter(s) is/are
not optimal.  2.8 Ang is not that far from the point where the optimal
choice of structure parameters may be torsion angles instead of
Cartesian co-ordinates, in which case the optimal RMSD's for bond
lengths  angles would be exactly zero.

You should be optimising all weights that have been set arbitrarily
by the program (i.e. not obtained from independent experimental
sources), this includes not just the X-ray weight but also the B-factor
restraint weight(s) (the usual culprit) and the NCS restraint weight(s),
as Clemens suggests.  I now use the free log(likelihood) to optimise
the weights rather than Rfree, this is now printed by newer versions
of Refmac, but it's up to you which you believe (the difference in the
results may not be significant anyway).  Alternatively CNS 
phenix.refine have scripts which automatically optimise the weights
(against Rfree) for you (maybe one day CCP4/Refmac will have this very
useful capability ;-) ).

Also I would check your waters manually - don't believe everything
the auto-water placing software tells you, i.e. does the density
look sensible (at least roughly spherical shaped), is it possible
they are something other than water (check for excess density
and/or suspiciously low B factor), do they all H-bond to protein
and/or other waters you are confident about.  I had a structure at
2.9 Ang where I found only 10 good waters and that was for 900 residues
in the a.u. (maybe it had something to do with the fact that the solvent
content and average B were quite high and the data was partially twinned
so the map quality was poor). I'm sure others have opinions on how many
waters you expect to find at various resolutions.

HTH

-- Ian

 -Original Message-
 From: Sun Tang [mailto:[EMAIL PROTECTED] 
 Sent: 04 February 2008 22:32
 To: Ian Tickle
 Cc: CCP4BB@JISCMAIL.AC.UK
 Subject: RE: [ccp4bb] an over refined structure
 
 Hi Ian,
 
 Thank you very much for your detailed information.
 
  I checked the effect of weighter term (wa) in CCP4i for the 
 R/Rfree. When I used  wa=0.01 ,  the  value is 0.225/0.277 
 FOM =0.799.  The values  changed to 0.204/0.269  (FOM=0.806) 
 for  wa= 0.05, 0.195/0.268 (FOM=0.807) for wa=0.1 and 
 0.186/0.267 (FOM=0.807) for wa=0.2, respectively. It seemed 
 that increase in wa decreases both R and Rfree with R more 
 than Rfree. 
 
 Which wa value is the best one in this case?
 
 Thank you very much for your valuable help.
 
 Best,
 
 Sun
 
 Ian Tickle [EMAIL PROTECTED] wrote:
 
 
   Hi Sun Tang
   
   Unfortunately there's no such thing as a fixed value 
 for the maximum acceptable Rfree-Rwork difference that 
 applies in all circumstances, because the 'normal' difference 
 depends on a number of factors, mainly the 
 observation/parameter ratio, which depends in turn on the 
 resolution and the solvent content (a greater solvent content 
 means a bigger cell volume which means more reflections for a 
 given number of ordered atoms in the a.u. and hence a bigger 
 obs/param ratio). The Rfree-Rwork difference also depends on 
 Rwork itself (i.e. you tend to get higher values of 
 Rfree-Rwork for higher values of Rwork), so it's better to 
 think in terms of the Rfree/Rwork ratio (which is independent 
 of Rwork).
   
   So for example at very high resolution a 'normal' value 
 for Rfree-Rwork might be only 0.02 (so 0.05 which is what 
 many people consider acceptable would actually be 
 unacceptably high), whereas at low resolution it might be 0.1 
 (so 0.05 would be unacceptably low). Also you need to bear in 
 mind that Rfree tends to have a quite high uncertainty, 
 particularly at low resolution (because it's usually based on 
 a relatively small number of observations), so the deviation 
 has to be quite big (e.g.  3 SU) before it can be considered 
 to be statistically significant.
   
   So Rfree needs to be compared not with Rwork at all but 
 with the value of the optimal Rfree/Rwork expected on the 
 basis that the model parameterisation and weighting of X-ray 
 terms and restraints are optimal and the errors in the model 
 have the same effect as the random experimental errors in the 
 data (i.e. a statistical 'null hypothesis'). As Tim just 
 pointed out we tried to do this in our Acta D (1998) papers: 
 there you can compare your observed Rfree/Rwork ratio either 
 with the theoretical value or with the value found for 
 'typical' structures in the PDB at the same resolution.
   
   An abnormal Rfree/Rwork ratio could arise from a number 
 of causes, not just over-fitting (I assume that's what you 
 mean by 'over-refinement' - it's not clear to me how a 
 structure can be 'over-refined' since a fundamental 
 requirement of the maximum likelihood method is that the 
 structure is always refined to convergence, and refining 
 beyond

Re: [ccp4bb] an over refined structure

2008-02-04 Thread Anastassis Perrakis

Hi -

I don't think there is something necessarily wrong with the values  
you report.


A few questions to see *if* something is wrong are:

- as you wrote to Tim you have NCS: do you use NCS restraints ?
- what is the resolution / B factor of the data ?
- have the data been checked for twining ? (phenix.xtriage)
- is the N-term domain of one copy really invisible (then indeed do  
remove ...!)

- has TLS been used ?
- did you add waters ? (too many?)

I guess then we can make better suggestions if something is wrong and  
if so how its best to fix.


A.

 I refined a structure with Refmac in CCP4i and the R/Rfree is  
0.215/0.277. The difference between R and Rfree is too much even  
though I used 0.01 for weighting term in the refinement (the  
default value is 0.3). The RMSD for bond length and bond angle is  
0.016 A and 1.7 degree.


Re: [ccp4bb] an over refined structure

2008-02-04 Thread Tim Gruene
I would agree that the difference is suspiciously high. I. Tickle and 
others have published analytical expressions for how to estimate the ratio 
between R and Rfree, just google for tickle rfree to find the 
references.


You easily achieve a large difference by adding too many waters which just 
model noise. There may be other reasons for which more knowledge about the 
structure is required. Do you have large unmodelled regions, like loops 
that do not show in the density map?


Tim


--
Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A


On Mon, 4 Feb 2008, Sun Tang wrote:


Hello All,

I refined a structure with Refmac in CCP4i and the R/Rfree is 0.215/0.277. The 
difference between R and Rfree is too much even though I used 0.01 for 
weighting term in the refinement (the default value is 0.3). The RMSD for bond 
length and bond angle is 0.016 A and 1.7 degree.

What may be wrong with the over-refined structure? What is the reason for 
leading to an over-refined structure? How to avoid it?

Best wishes,

Sun Tang


-
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.


Re: [ccp4bb] an over refined structure

2008-02-04 Thread Ian Tickle
Hi Sun Tang

Unfortunately there's no such thing as a fixed value for the maximum acceptable 
Rfree-Rwork difference that applies in all circumstances, because the 'normal' 
difference depends on a number of factors, mainly the observation/parameter 
ratio, which depends in turn on the resolution and the solvent content (a 
greater solvent content means a bigger cell volume which means more reflections 
for a given number of ordered atoms in the a.u. and hence a bigger obs/param 
ratio).  The Rfree-Rwork difference also depends on Rwork itself (i.e. you tend 
to get higher values of Rfree-Rwork for higher values of Rwork), so it's better 
to think in terms of the Rfree/Rwork ratio (which is independent of Rwork).

So for example at very high resolution a 'normal' value for Rfree-Rwork might 
be only 0.02 (so 0.05 which is what many people consider acceptable would 
actually be unacceptably high), whereas at low resolution it might be 0.1 (so 
0.05 would be unacceptably low).  Also you need to bear in mind that Rfree 
tends to have a quite high uncertainty, particularly at low resolution (because 
it's usually based on a relatively small number of observations), so the 
deviation has to be quite big (e.g.  3 SU) before it can be considered to be 
statistically significant.

So Rfree needs to be compared not with Rwork at all but with the value of the 
optimal Rfree/Rwork expected on the basis that the model parameterisation and 
weighting of X-ray terms and restraints are optimal and the errors in the model 
have the same effect as the random experimental errors in the data (i.e. a 
statistical 'null hypothesis').  As Tim just pointed out we tried to do this in 
our Acta D (1998) papers: there you can compare your observed Rfree/Rwork ratio 
either with the theoretical value or with the value found for 'typical' 
structures in the PDB at the same resolution.

An abnormal Rfree/Rwork ratio could arise from a number of causes, not just 
over-fitting (I assume that's what you mean by 'over-refinement' - it's not 
clear to me how a structure can be 'over-refined' since a fundamental 
requirement of the maximum likelihood method is that the structure is always 
refined to convergence, and refining beyond that will by definition produce no 
further statistically significant changes in the parameters).

For example the number of parameters being refined may be either too low, or 
too high (over-fitting), or the values of the weighting parameters may not be 
appropriate, or there may be something badly wrong with the atomic model (e.g. 
mistraced chain).  Given the values you are reporting I think the latter is 
very unlikely, possibly you just need to tweak the X-ray and/or restraint 
weights.

HTH

Cheers

-- Ian

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Sun Tang
 Sent: 04 February 2008 16:56
 To: Boaz Shaanan
 Cc: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] an over refined structure
 
 Hi Boaz, 
 
 Thank you for your opinions. The resolution is 2.8A and I 
 remembered some people may think the structure is 
 over-refined when the difference between Rfree/Rwork is 
 greater than 6. 
 
 What do you think the greatest acceptable difference between the two?
 
 Best,
 
 Sun 
 
 Boaz Shaanan [EMAIL PROTECTED] wrote:
 
   Hi,

Why do you think this structure is over-refined ? The 
 Rfree/Rwork difference of 6.2% seems fine, although you 
 didn't mention resolution. If anything, an over-refined 
 structure would show a smaller difference, as far as I know. 
 If all the other criteria (Ramachandran outliers, etc., map) 
 are OK you should just be happy with your structure.

Cheers,

Boaz
   
   - Original Message -
   From: Sun Tang [EMAIL PROTECTED]
   Date: Monday, February 4, 2008 18:41
   Subject: [ccp4bb] an over refined structure
   To: CCP4BB@JISCMAIL.AC.UK
   
Hello All,

I refined a structure with Refmac in CCP4i and the R/Rfree is 
0.215/0.277. The difference between R and Rfree is 
 too much even 
though I used 0.01 for weighting term in the refinement (the 
default value is 0.3). The RMSD for bond length and 
 bond angle 
is 0.016 A and 1.7 degree. 

What may be wrong with the over-refined structure? 
 What is the 
reason for leading to an over-refined structure? How 
 to avoid it?

Best wishes,

Sun Tang

   
-
Be a better friend, newshound, and know-it-all with Yahoo! 
Mobile.  Try it now.
 
 
   Boaz Shaanan, Ph.D. 
   Dept. of Life Sciences 
   Ben-Gurion University of the Negev 
   Beer-Sheva 84105 
   Israel 
   Phone: 972-8-647-2220 ; Fax: 646-1710 
   Skype: boaz.shaanan
   
   ââ'¬Å

Re: [ccp4bb] an over refined structure

2008-02-04 Thread Sun Tang
Hi Tim,

Thank you for your and information and suggestions. There are two indepdent 
molecules in the asymmetric unit and one molecule does not have very good 
density, especially in the N-terminus. 

Do you think that I should remove the region in the refinement?

Best,

Sun

Tim Gruene [EMAIL PROTECTED] wrote: I would agree that the difference is 
suspiciously high. I. Tickle and 
others have published analytical expressions for how to estimate the ratio 
between R and Rfree, just google for tickle rfree to find the 
references.

You easily achieve a large difference by adding too many waters which just 
model noise. There may be other reasons for which more knowledge about the 
structure is required. Do you have large unmodelled regions, like loops 
that do not show in the density map?

Tim


--
Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A


On Mon, 4 Feb 2008, Sun Tang wrote:

 Hello All,

 I refined a structure with Refmac in CCP4i and the R/Rfree is 0.215/0.277. 
 The difference between R and Rfree is too much even though I used 0.01 for 
 weighting term in the refinement (the default value is 0.3). The RMSD for 
 bond length and bond angle is 0.016 A and 1.7 degree.

 What may be wrong with the over-refined structure? What is the reason for 
 leading to an over-refined structure? How to avoid it?

 Best wishes,

 Sun Tang


 -
 Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it 
 now.


   
-
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.

Re: [ccp4bb] an over refined structure

2008-02-04 Thread Sun Tang
Hi Ian,

Thank you very much for your detailed information.

 I checked the effect of weighter term (wa) in CCP4i for the R/Rfree. When I 
used  wa=0.01 ,  the  value is 0.225/0.277 FOM =0.799.  The values  changed to 
0.204/0.269  (FOM=0.806) for  wa= 0.05, 0.195/0.268 (FOM=0.807) for wa=0.1 and 
0.186/0.267 (FOM=0.807) for wa=0.2, respectively. It seemed that increase in wa 
decreases both R and Rfree with R more than Rfree. 

Which wa value is the best one in this case?

Thank you very much for your valuable help.

Best,

Sun

Ian Tickle [EMAIL PROTECTED] wrote: 
Hi Sun Tang

Unfortunately there's no such thing as a fixed value for the maximum acceptable 
Rfree-Rwork difference that applies in all circumstances, because the 'normal' 
difference depends on a number of factors, mainly the observation/parameter 
ratio, which depends in turn on the resolution and the solvent content (a 
greater solvent content means a bigger cell volume which means more reflections 
for a given number of ordered atoms in the a.u. and hence a bigger obs/param 
ratio).  The Rfree-Rwork difference also depends on Rwork itself (i.e. you tend 
to get higher values of Rfree-Rwork for higher values of Rwork), so it's better 
to think in terms of the Rfree/Rwork ratio (which is independent of Rwork).

So for example at very high resolution a 'normal' value for Rfree-Rwork might 
be only 0.02 (so 0.05 which is what many people consider acceptable would 
actually be unacceptably high), whereas at low resolution it might be 0.1 (so 
0.05 would be unacceptably low).  Also you need to bear in mind that Rfree 
tends to have a quite high uncertainty, particularly at low resolution (because 
it's usually based on a relatively small number of observations), so the 
deviation has to be quite big (e.g.  3 SU) before it can be considered to be 
statistically significant.

So Rfree needs to be compared not with Rwork at all but with the value of the 
optimal Rfree/Rwork expected on the basis that the model parameterisation and 
weighting of X-ray terms and restraints are optimal and the errors in the model 
have the same effect as the random experimental errors in the data (i.e. a 
statistical 'null hypothesis').  As Tim just pointed out we tried to do this in 
our Acta D (1998) papers: there you can compare your observed Rfree/Rwork ratio 
either with the theoretical value or with the value found for 'typical' 
structures in the PDB at the same resolution.

An abnormal Rfree/Rwork ratio could arise from a number of causes, not just 
over-fitting (I assume that's what you mean by 'over-refinement' - it's not 
clear to me how a structure can be 'over-refined' since a fundamental 
requirement of the maximum likelihood method is that the structure is always 
refined to convergence, and refining beyond that will by definition produce no 
further statistically significant changes in the parameters).

For example the number of parameters being refined may be either too low, or 
too high (over-fitting), or the values of the weighting parameters may not be 
appropriate, or there may be something badly wrong with the atomic model (e.g. 
mistraced chain).  Given the values you are reporting I think the latter is 
very unlikely, possibly you just need to tweak the X-ray and/or restraint 
weights.

HTH

Cheers

-- Ian

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Sun Tang
 Sent: 04 February 2008 16:56
 To: Boaz Shaanan
 Cc: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] an over refined structure
 
 Hi Boaz, 
 
 Thank you for your opinions. The resolution is 2.8A and I 
 remembered some people may think the structure is 
 over-refined when the difference between Rfree/Rwork is 
 greater than 6. 
 
 What do you think the greatest acceptable difference between the two?
 
 Best,
 
 Sun 
 
 Boaz Shaanan  wrote:
 
  Hi,
   
   Why do you think this structure is over-refined ? The 
 Rfree/Rwork difference of 6.2% seems fine, although you 
 didn't mention resolution. If anything, an over-refined 
 structure would show a smaller difference, as far as I know. 
 If all the other criteria (Ramachandran outliers, etc., map) 
 are OK you should just be happy with your structure.
   
   Cheers,
   
   Boaz
  
  - Original Message -
  From: Sun Tang 
  Date: Monday, February 4, 2008 18:41
  Subject: [ccp4bb] an over refined structure
  To: CCP4BB@JISCMAIL.AC.UK
  
   Hello All,
   
   I refined a structure with Refmac in CCP4i and the R/Rfree is 
   0.215/0.277. The difference between R and Rfree is 
 too much even 
   though I used 0.01 for weighting term in the refinement (the 
   default value is 0.3). The RMSD for bond length and 
 bond angle 
   is 0.016 A and 1.7 degree. 
   
   What may be wrong with the over-refined structure? 
 What is the 
   reason for leading to an over-refined structure? How 
 to avoid it?
   
   Best wishes,
   
   Sun Tang

Re: [ccp4bb] an over refined structure

2008-02-04 Thread Sun Tang
Hi Anastassis,

Thank you very much for your suggestions.  I answered the questions as follows.
I used NCS before rigid body refinement. After that I did not put NCS 
restraints in the restrained refinement and TLS+restrained refinement because 
it raised the R/Rfree quite a lot.
The resolution is 2.8 A.
I did not check twinning. I will do that soon.
I used PHASER to solve the structure and the density of the N-domain (~ 50 a.a) 
in one molecule is not good, with a lot of broken density for the backbone.
I used the TLS in the refinement. I usually used the initial TLS parameters 
(with only residues in group, no coordinates for the center) for all the TLS 
refinement. When I used the refined TLS parameters, the refinement would go 
divergence.
I only added about 120 water molecules for the whole structures.
I will update the information after I try further refinement.

Best wishes,

Sun

Anastassis Perrakis [EMAIL PROTECTED] wrote: Hi -

I don't think there is something necessarily wrong with the values  
you report.

A few questions to see *if* something is wrong are:

- as you wrote to Tim you have NCS: do you use NCS restraints ?
- what is the resolution / B factor of the data ?
- have the data been checked for twining ? (phenix.xtriage)
- is the N-term domain of one copy really invisible (then indeed do  
remove ...!)
- has TLS been used ?
- did you add waters ? (too many?)

I guess then we can make better suggestions if something is wrong and  
if so how its best to fix.

A.

  I refined a structure with Refmac in CCP4i and the R/Rfree is  
 0.215/0.277. The difference between R and Rfree is too much even  
 though I used 0.01 for weighting term in the refinement (the  
 default value is 0.3). The RMSD for bond length and bond angle is  
 0.016 A and 1.7 degree.


   
-
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.