Re: [ccp4bb] an over refined structure
Edward Berry wrote: Dirk Kostrewa wrote: Dear Dean and others, Peter Zwart gave me a similar reply. This is very interesting discussion, and I would like to have a somewhat closer look to this to maybe make things a little bit clearer (please, excuse the general explanations - this might be interesting for beginners as well): 1). Ccrystallographic symmetry can be applied to the whole crystal and results in symmetry-equivalent intensities in reciprocal space. If you refine your model in a lower space group, there will be reflections in the test-set that are symmetry-equivalent in the higher space group to reflections in the working set. If you refine the (symmetry-equivalent) copies in your crystal independently, they will diverge due to resolution and data quality, and R-work and R-free will diverge to some extend due to this. If you force the copies to be identical, the R-work R-free will still be different due to observational errors. In both cases, however, the R-free will be very close to the R-work. Ah- that's going way to fast for the beginners, at least one of them! Can someone explain why the R-free will be very close to the R-work, preferably in simple concrete terms like Fo, Fc, at sym-related reflections, and the change in the Fc resulting from a step of refinement? Ed Dear Ed, Some years ago I was castigated in group meeting for stating that the question posed by a post-doc was a bad question. I gather this is considered rude behavior. My belief is that if you say good question to all questions you degrade the value of those truly good questions when they come along. Yours is a good question and demands a proper answer. Like all good questions, however, the answer is neither easy nor short. I'm going to make a stab at it, and I may end up far from the mark, but I'm sure someone will point out my failings in follow-up letters. At least I'll get these ideas out of my head so I can get back to my real work. The other attempts to answer this question, including my own, have included terms such as error and bias and, without definitions for these terms, are ultimately unsatisfying. It seems to me that the whole point of refinement is to bias the model to the observations, so the real matter is inappropriate bias. This brings up the question of what a model is intended to fit and what it is not. When I first implemented an overall anisotropic B correction in TNT I noticed that the correction for a given model would grow larger as more refinement cycles were run. It appears that a model consisting of only atomic positions and isotropic B's can be created where the Fc's have an anisotropic fall off in resolution. When the isotropic model was refined with the anisotropy uncorrected the parameters managed to find a way to fit that anisotropy. When the anisotropy was properly modeled the positions and isotropic B's could go back to their job of fitting the signal they were designed to fit. This is what I would define a inappropriate bias. The parameters of the model are attempting to fit a signal they were not designed to fit. In this example, the distortion of the parameters is distributed over a large number and each parameter is changed by a small amount; an amount usually considered too small to be significant, but in aggregate they produce a significant signal (the anisotropic falloff of the model's Fc's). A more trivial example would the the location of the side chains of amino acids near the density of an unmodeled ligand. Refinement will tend to move the side chains away from the center of their own density toward the unfilled density, perhaps even inappropriately placing a side chain in the ligand density instead of its own. Again, the fit of the parameters to the signal they were designed to fit has been degraded by the attempt to fit a signal they were not, and could never, fit properly. When well designed parameters fit the signal they were designed to fit the model has predictive power. I guess that is what designed is defined to mean in this case. A model that can't predict things is useless, and that is why the free R is such a good test of a model. If the parameters of a model are fitting signal in the data that they were not designed to fit, all bets are off. There is no reason to expect that they will have the same predictive power, except by happenstance or (bad) luck. Placing the end of an arginine residue in the density of a ligand does, at least, put a few atoms in places where atoms should be, and that will tend to lower the free R, but the requirement that there be bridging atoms linking those atoms to the main chain of the protein will cause the parameters of the middle atoms to engage is contortions to try to fit the data, and those contortions will harm the ability of the model to make correct predictions. Going back to the first example, there is also no reason to expect that the small perturbations in an
Re: [ccp4bb] an over refined structure
Dale Tronrud wrote: In summary, this argument depends on two assertions that you can argue with me about: 1) When a parameter is being used to fit the signal it was designed for, the resulting model develops predictive power and can lower both the working and free R. When a signal is perturbing the value of a parameter for which is was not designed, it is unlikely to improve its predictive power and the working R will tend to drop, but the free R will not (and may rise). 2) If the unmodeled signal in the data set is a property in real space and has the same symmetry as the molecule in the unit cell, the inappropriate fitting of parameters will be systematic with respect to that symmetry and the presence of a reflection in the working set will tend to cause its symmetry mate in the test set to be better predicted despite the fact that this predictive power does not extend to reflections that are unrelated by symmetry. This bias will occur for any kind of error as long as that error obeys the symmetry of the unit cell in real space. Dear Dale, Thanks for taking the time to think about my problem and for composing what is obviously a well-thought-out explanation. I am a little over my head here, but I think I see your point. Inappropriate fitting of this residual error has poor predictive power so does not reduce {Fc-Fo| for general free reflections. However the error is symmetrical, so attempts to fit it will result in symmetrical changes which reduce |Fo-Fc| for those free reflections that are related to working reflections. I need to read the references that were mentioned in this discussion, and think about it a little more in order to resolve some remaining conflicts in my thinking. But I don't need to bother everyone else with my struggles, unless I come up with something useful. Thanks for the guidance! Ed
Re: [ccp4bb] an over refined structure
down unfairly. Doug Ohlendorf -Original Message- From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of Eleanor Dodson Sent: Tuesday, February 05, 2008 3:38 AM To: CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] an over refined structure I agree that the difference in Rwork to Rfree is quite acceptable at your resolution. You cannot/ should not use Rfactors as a criteria for structure correctness. As Ian points out - choosing a different Rfree set of reflections can change Rfree a good deal. certain NCS operators can relate reflections exactly making it hard to get a truly independent Free R set, and there are other reasons to make it a blunt edged tool. The map is the best validator - are there blobs still not fitted? (maybe side chains you have placed wrongly..) Are there many positive or negative peaks in the difference map? How well does the NCS match the 2 molecules? etc etc. Eleanor George M. Sheldrick wrote: Dear Sun, If we take Ian's formula for the ratio of R(free) to R(work) from his paper Acta D56 (2000) 442-450 and make some reasonable approximations, we can reformulate it as: R(free)/R(work) = sqrt[(1+Q)/(1-Q)] with Q = 0.025pd^3(1-s) where s is the fractional solvent content, d is the resolution, p is the effective number of parameters refined per atom after allowing for the restraints applied, d^3 means d cubed and sqrt means square root. The difficult number to estimate is p. It would be 4 for an isotropic refinement without any restraints. I guess that p=1.5 might be an appropriate value for a typical protein refinement (giving an R-factor ratio of about 1.4 for s=0.6 and d=2.8). In that case, your R- factor ratio of 0.277/0.215 = 1.29 is well within the allowed range! However it should be added that this formula is almost a self- fulfilling prophesy. If we relax the geometric restraints we increase p, which then leads to a larger 'allowed' R-factor ratio! Best wishes, George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-2582 *** Dirk Kostrewa Gene Center, A 5.07 Ludwig-Maximilians-University Feodor-Lynen-Str. 25 81377 Munich Germany Phone: +49-89-2180-76845 Fax: +49-89-2180-76999 E-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] muenchen.de *** -- Dean R. Madden, Ph.D. Department of Biochemistry Dartmouth Medical School 7200 Vail Building Hanover, NH 03755-3844 USA tel: +1 (603) 650-1164 fax: +1 (603) 650-1128 e-mail: [EMAIL PROTECTED] *** Dirk Kostrewa Gene Center, A 5.07 Ludwig-Maximilians-University Feodor-Lynen-Str. 25 81377 Munich Germany Phone: +49-89-2180-76845 Fax:+49-89-2180-76999 E-mail: [EMAIL PROTECTED] ***
Re: [ccp4bb] an over refined structure
): in this case, putting one symmetry mate in the Rfree set, and one in the Rwork set will guarantee that Rfree tracks Rwork. The same effect applies to a large extent even if the NCS is not crystallographic. Bottom line: thin shells are not a perfect solution, but if NCS is present, choosing the free set randomly is *never* a better choice, and almost always significantly worse. Together with multicopy refinement, randomly chosen test sets were almost certainly a major contributor to the spuriously good Rfree values associated with the retracted MsbA and EmrE structures. Best wishes, Dean Dirk Kostrewa wrote: Dear CCP4ers, I'm not convinced, that thin shells are sufficient: I think, in principle, one should omit thick shells (greater than the diameter of the G-function of the molecule/assembly that is used to describe NCS-interactions in reciprocal space), and use the inner thin layer of these thick shells, because only those should be completely independent of any working set reflections. But this would be too expensive given the low number of observed reflections that one usually has ... However, if you don't apply NCS restraints/constraints, there is no need for any such precautions. Best regards, Dirk. Am 07.02.2008 um 16:35 schrieb Doug Ohlendorf: It is important when using NCS that the Rfree reflections be selected is distributed thin resolution shells. That way application of NCS should not mix Rwork and Rfree sets. Normal random selection or Rfree + NCS (especially 4x or higher) will drive Rfree down unfairly. Doug Ohlendorf -Original Message- From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of Eleanor Dodson Sent: Tuesday, February 05, 2008 3:38 AM To: CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] an over refined structure I agree that the difference in Rwork to Rfree is quite acceptable at your resolution. You cannot/ should not use Rfactors as a criteria for structure correctness. As Ian points out - choosing a different Rfree set of reflections can change Rfree a good deal. certain NCS operators can relate reflections exactly making it hard to get a truly independent Free R set, and there are other reasons to make it a blunt edged tool. The map is the best validator - are there blobs still not fitted? (maybe side chains you have placed wrongly..) Are there many positive or negative peaks in the difference map? How well does the NCS match the 2 molecules? etc etc. Eleanor George M. Sheldrick wrote: Dear Sun, If we take Ian's formula for the ratio of R(free) to R(work) from his paper Acta D56 (2000) 442-450 and make some reasonable approximations, we can reformulate it as: R(free)/R(work) = sqrt[(1+Q)/(1-Q)] with Q = 0.025pd^3(1-s) where s is the fractional solvent content, d is the resolution, p is the effective number of parameters refined per atom after allowing for the restraints applied, d^3 means d cubed and sqrt means square root. The difficult number to estimate is p. It would be 4 for an isotropic refinement without any restraints. I guess that p=1.5 might be an appropriate value for a typical protein refinement (giving an R-factor ratio of about 1.4 for s=0.6 and d=2.8). In that case, your R-factor ratio of 0.277/0.215 = 1.29 is well within the allowed range! However it should be added that this formula is almost a self-fulfilling prophesy. If we relax the geometric restraints we increase p, which then leads to a larger 'allowed' R-factor ratio! Best wishes, George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-2582 *** Dirk Kostrewa Gene Center, A 5.07 Ludwig-Maximilians-University Feodor-Lynen-Str. 25 81377 Munich Germany Phone: +49-89-2180-76845 Fax: +49-89-2180-76999 E-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] *** -- Dean R. Madden, Ph.D. Department of Biochemistry Dartmouth Medical School 7200 Vail Building Hanover, NH 03755-3844 USA tel: +1 (603) 650-1164 fax: +1 (603) 650-1128 e-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] *** Dirk Kostrewa Gene Center, A 5.07 Ludwig-Maximilians-University Feodor-Lynen-Str. 25 81377 Munich Germany Phone: +49-89-2180-76845 Fax: +49-89-2180-76999 E-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] ***
Re: [ccp4bb] an over refined structure
regards, Dirk. Am 07.02.2008 um 18:57 schrieb Dean Madden: Hi Dirk, I disagree with your final sentence. Even if you don't apply NCS restraints/constraints during refinement, there is a serious risk of NCS contaminating your Rfree. Consider the limiting case in which the NCS is produced simply by working in an artificially low symmetry space-group (e.g. P1, when the true symmetry is P2): in this case, putting one symmetry mate in the Rfree set, and one in the Rwork set will guarantee that Rfree tracks Rwork. The same effect applies to a large extent even if the NCS is not crystallographic. Bottom line: thin shells are not a perfect solution, but if NCS is present, choosing the free set randomly is *never* a better choice, and almost always significantly worse. Together with multicopy refinement, randomly chosen test sets were almost certainly a major contributor to the spuriously good Rfree values associated with the retracted MsbA and EmrE structures. Best wishes, Dean Dirk Kostrewa wrote: Dear CCP4ers, I'm not convinced, that thin shells are sufficient: I think, in principle, one should omit thick shells (greater than the diameter of the G-function of the molecule/assembly that is used to describe NCS-interactions in reciprocal space), and use the inner thin layer of these thick shells, because only those should be completely independent of any working set reflections. But this would be too expensive given the low number of observed reflections that one usually has ... However, if you don't apply NCS restraints/constraints, there is no need for any such precautions. Best regards, Dirk. Am 07.02.2008 um 16:35 schrieb Doug Ohlendorf: It is important when using NCS that the Rfree reflections be selected is distributed thin resolution shells. That way application of NCS should not mix Rwork and Rfree sets. Normal random selection or Rfree + NCS (especially 4x or higher) will drive Rfree down unfairly. Doug Ohlendorf -Original Message- From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of Eleanor Dodson Sent: Tuesday, February 05, 2008 3:38 AM To: CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] an over refined structure I agree that the difference in Rwork to Rfree is quite acceptable at your resolution. You cannot/ should not use Rfactors as a criteria for structure correctness. As Ian points out - choosing a different Rfree set of reflections can change Rfree a good deal. certain NCS operators can relate reflections exactly making it hard to get a truly independent Free R set, and there are other reasons to make it a blunt edged tool. The map is the best validator - are there blobs still not fitted? (maybe side chains you have placed wrongly..) Are there many positive or negative peaks in the difference map? How well does the NCS match the 2 molecules? etc etc. Eleanor George M. Sheldrick wrote: Dear Sun, If we take Ian's formula for the ratio of R(free) to R(work) from his paper Acta D56 (2000) 442-450 and make some reasonable approximations, we can reformulate it as: R(free)/R(work) = sqrt[(1+Q)/(1-Q)] with Q = 0.025pd^3(1-s) where s is the fractional solvent content, d is the resolution, p is the effective number of parameters refined per atom after allowing for the restraints applied, d^3 means d cubed and sqrt means square root. The difficult number to estimate is p. It would be 4 for an isotropic refinement without any restraints. I guess that p=1.5 might be an appropriate value for a typical protein refinement (giving an R-factor ratio of about 1.4 for s=0.6 and d=2.8). In that case, your R- factor ratio of 0.277/0.215 = 1.29 is well within the allowed range! However it should be added that this formula is almost a self- fulfilling prophesy. If we relax the geometric restraints we increase p, which then leads to a larger 'allowed' R-factor ratio! Best wishes, George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-2582 *** Dirk Kostrewa Gene Center, A 5.07 Ludwig-Maximilians-University Feodor-Lynen-Str. 25 81377 Munich Germany Phone: +49-89-2180-76845 Fax: +49-89-2180-76999 E-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] muenchen.de *** -- Dean R. Madden, Ph.D. Department of Biochemistry Dartmouth Medical School 7200 Vail Building Hanover, NH 03755-3844 USA tel: +1 (603) 650-1164 fax: +1 (603) 650-1128 e-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] *** Dirk Kostrewa Gene Center, A 5.07 Ludwig-Maximilians-University Feodor-Lynen-Str. 25 81377 Munich Germany Phone: +49-89-2180-76845 Fax: +49-89-2180-76999 E-mail: [EMAIL PROTECTED] mailto:[EMAIL
Re: [ccp4bb] an over refined structure
Back in the old days, when I worked on crystal structures with 15 or 20 atoms or so, the symptoms of missed crystallographic symmetry included instability of the refinement, high correlations between parameters, and (relatively) large deviations between equivalent bond distances and bond angles. There can be real consequences of missing symmetry and divergences between copies of molecules, even when resolution and data quality were not an issue, because the refinement can become unstable. Hence, I'm always skeptical of the assumption that structures can be safely refined in space groups of too low symmetry. I've assumed that, when people chose to (or accidently) refine protein structures in lower symmetry space groups, geometrical and NCS restraints keep the refinement under control. Is there a publication somewhere that has looked at the effect of deliberate refinement in space groups of lower than correct symmetry? Sue On Feb 8, 2008, at 11:07 AM, Edward Berry wrote: Dirk Kostrewa wrote: Dear Dean and others, Peter Zwart gave me a similar reply. This is very interesting discussion, and I would like to have a somewhat closer look to this to maybe make things a little bit clearer (please, excuse the general explanations - this might be interesting for beginners as well): 1). Ccrystallographic symmetry can be applied to the whole crystal and results in symmetry-equivalent intensities in reciprocal space. If you refine your model in a lower space group, there will be reflections in the test-set that are symmetry-equivalent in the higher space group to reflections in the working set. If you refine the (symmetry-equivalent) copies in your crystal independently, they will diverge due to resolution and data quality, and R-work and R-free will diverge to some extend due to this. If you force the copies to be identical, the R-work R-free will still be different due to observational errors. In both cases, however, the R-free will be very close to the R-work. Sue Roberts Biochemistry Biophysics University of Arizona [EMAIL PROTECTED]
Re: [ccp4bb] an over refined structure
Rotational near-crystallographic ncs is easy to handle this way, but what about translational pseudo-symmetry (or should that be pseudo-translational symmetry)? In such cases one whole set of spots is systematically weaker than the other set. Then what is the theoretically correct way to calculate Rfree? Write one's own code to sort the spots into two piles? Phoebe At 01:05 PM 2/8/2008, Axel Brunger wrote: In such cases, we always define the test set first in the high-symmetry space group choice. Then, if it is warranted to lower the crystallographic symmetry and replace with NCS symmetry, we expand the test set to the lower symmetry space group. In other words, the test set itself will be invariant upon applying any of the crystallographic or NCS operators, so will be maximally free in these cases. It is then also possible to directly compare the free R between the high and low crystallographic space group choices. Our recent Neuroligin structure is such an example (Arac et al., Neuron 56, 992-, 2007). Axel On Feb 8, 2008, at 10:48 AM, Ronald E Stenkamp wrote: I've looked at about 10 cases where structures have been refined in lower symmetry space groups. When you make the NCS operators into crystallographic operators, you don't change the refinement much, at least in terms of structural changes. That's the case whether NCS restraints have been applied or not. In the cases I've re-done, changing the refinement program and dealing with test set choices makes some difference in the R and Rfree values. One effect of changing the space group is whether you realize the copies of the molecule in the lower symmetry asymmetric unit are identical or not. (Where identical means crystallographically identical, i.e., in the same packing environments, subject to all the caveats about accuracy, precision, thermal motion, etc). Another effect of going to higher symmetry space groups of course has to do with explaining the experimental data with simpler and smaller mathematical models (Occam's razor or the Principle of Parsimony). Ron Axel T. Brunger Investigator, Howard Hughes Medical Institute Professor of Molecular and Cellular Physiology Stanford University Web:http://atb.slac.stanford.eduhttp://atb.slac.stanford.edu Email: mailto:[EMAIL PROTECTED][EMAIL PROTECTED] Phone: +1 650-736-1031 Fax:+1 650-745-1463 --- Phoebe A. Rice Assoc. Prof., Dept. of Biochemistry Molecular Biology The University of Chicago phone 773 834 1723 fax 773 702 0439 http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123 http://www.nasa.gov/mission_pages/cassini/multimedia/pia06064.html
Re: [ccp4bb] an over refined structure
[EMAIL PROTECTED] wrote: Rotational near-crystallographic ncs is easy to handle this way, but what about translational pseudo-symmetry (or should that be pseudo-translational symmetry)? In such cases one whole set of spots is systematically weaker than the other set. Then what is the theoretically correct way to calculate Rfree? Write one's own code to sort the spots into two piles? Phoebe Dear Phoebe, I've always been a fan of splitting the test set in these situations. The weak set of reflections provide information about the differences between the ncs mates (and the deviation of the ncs operator from a true crystallography operator) while the strong reflections provide information about the average of the ncs mates. If you mix the two sets in your Rfree calculation the strong set will tend to dominate and will obscure the consequences of allowing you ncs mates too much freedom to differ. Let's say you have a pseudo C2 crystal with the dimer as the ncs pair and you are starting with a perfect C2 symmetry model. The initial rigid body refinement will cause the Rfree(weak) to drop because the initial model had Fc's equal to zero for all these reflections and the deviation from crystal symmetry allows nonzero values to arise. Now you want to test if there are real differences between the two copies. If you allow variation between the two copies but monitor the Rfree(strong) you are actually monitoring the quality of the average of the two copies, and you basically have a two-fold multimodel. It is the same as putting two molecules at each site in the crystal and forcing both models to have perfect ncs. Axel Brunger's Methods in Enzymology chapter indicates that a two-fold multimodel is expected to have a lower Rfree than a single model and we would expect in our imaginary crystal that the Rfree(strong) will drop even if there is no real difference between the ncs mates. When you allow differences between the ncs mates the Rfree(strong) will tend to drop even if those differences are not real. The Rfree(weak) is a different story, however. It is controlled specifically by the differences between the two ncs mates and will drop only if the refinement creates differences that are significant. This is the statistic that can be used to determine the ncs weight. (Or probably the log likelihood gain (weak)) If you insist on mixing the strong and weak reflections in your test set you have to design your null hypothesis test differently. First you should do a refinement where you have two models at each site, with exact ncs imposed. The you do a refinement with one copy at each site but allow differences between the ncs mates. Compare the Rfree of each model to decide which is the better model. There are exactly the same number of parameters in each model but one allows the ncs to be violated and the other does not. Even so, the signal in the Rfree is mixed unless you split the systematically weak from the systematically strong. If you have a general ncs and don't have weak and strong subsets of reflections you still have to worry about the multimodel affect. If a refinement that allows ncs violations does not drop the Rfree by more that a two-fold multimodel with perfect ncs you cannot justify breaking your ncs. A drop in Rfree when you break ncs does not necessarily mean that breaking ncs is a good idea. You always have to perform the proper null hypothesis test. Dale Tronrud At 01:05 PM 2/8/2008, Axel Brunger wrote: In such cases, we always define the test set first in the high-symmetry space group choice. Then, if it is warranted to lower the crystallographic symmetry and replace with NCS symmetry, we expand the test set to the lower symmetry space group. In other words, the test set itself will be invariant upon applying any of the crystallographic or NCS operators, so will be maximally free in these cases. It is then also possible to directly compare the free R between the high and low crystallographic space group choices. Our recent Neuroligin structure is such an example (Arac et al., Neuron 56, 992-, 2007). Axel On Feb 8, 2008, at 10:48 AM, Ronald E Stenkamp wrote: I've looked at about 10 cases where structures have been refined in lower symmetry space groups. When you make the NCS operators into crystallographic operators, you don't change the refinement much, at least in terms of structural changes. That's the case whether NCS restraints have been applied or not. In the cases I've re-done, changing the refinement program and dealing with test set choices makes some difference in the R and Rfree values. One effect of changing the space group is whether you realize the copies of the molecule in the lower symmetry asymmetric unit are identical or not. (Where identical means crystallographically identical, i.e., in the same packing environments, subject to all the caveats about accuracy, precision, thermal
Re: [ccp4bb] an over refined structure
In such cases, we always define the test set first in the high-symmetry space group choice. Then, if it is warranted to lower the crystallographic symmetry and replace with NCS symmetry, we expand the test set to the lower symmetry space group. In other words, the test set itself will be invariant upon applying any of the crystallographic or NCS operators, so will be maximally free in these cases. It is then also possible to directly compare the free R between the high and low crystallographic space group choices. Our recent Neuroligin structure is such an example (Arac et al., Neuron 56, 992-, 2007). Axel On Feb 8, 2008, at 10:48 AM, Ronald E Stenkamp wrote: I've looked at about 10 cases where structures have been refined in lower symmetry space groups. When you make the NCS operators into crystallographic operators, you don't change the refinement much, at least in terms of structural changes. That's the case whether NCS restraints have been applied or not. In the cases I've re-done, changing the refinement program and dealing with test set choices makes some difference in the R and Rfree values. One effect of changing the space group is whether you realize the copies of the molecule in the lower symmetry asymmetric unit are identical or not. (Where identical means crystallographically identical, i.e., in the same packing environments, subject to all the caveats about accuracy, precision, thermal motion, etc). Another effect of going to higher symmetry space groups of course has to do with explaining the experimental data with simpler and smaller mathematical models (Occam's razor or the Principle of Parsimony). Ron Axel T. Brunger Investigator, Howard Hughes Medical Institute Professor of Molecular and Cellular Physiology Stanford University Web:http://atb.slac.stanford.edu Email: [EMAIL PROTECTED] Phone: +1 650-736-1031 Fax:+1 650-745-1463
Re: [ccp4bb] an over refined structure
Bart Hazes wrote: Dale Tronrud wrote: [EMAIL PROTECTED] wrote: Rotational near-crystallographic ncs is easy to handle this way, but what about translational pseudo-symmetry (or should that be pseudo-translational symmetry)? In such cases one whole set of spots is systematically weaker than the other set. Then what is the theoretically correct way to calculate Rfree? Write one's own code to sort the spots into two piles? Phoebe Dear Phoebe, I've always been a fan of splitting the test set in these situations. The weak set of reflections provide information about the differences between the ncs mates (and the deviation of the ncs operator from a true crystallography operator) while the strong reflections provide information about the average of the ncs mates. If you mix the two sets in your Rfree calculation the strong set will tend to dominate and will obscure the consequences of allowing you ncs mates too much freedom to differ. I haven't had to deal with this situation but my first impression is to use the strong reflections for Rfree. For the strong reflections, and any normal data, Rwork Rfree are dominated by model errors and not measurement errors. For the weak reflections measurement errors become more significant if not dominant. In that case Rwork Rfree will not be a sensitive measure to judge model improvement and refinement strategy. A second and possibly more important issue arises with determination of Sigmaa values for maximum likelihood refinement. Sigmaa values are related to the correlation between Fc and Fo amplitudes. When half of your observed data is systematically weakened then this correlation is going to be very high, even if the model is poor or completely wrong, as long as it obeys the same pseudo-translation. If you only use the strong reflections for Rfree I expect that should get around some of the issue. Of course it can be valuable to also monitor the weak reflections to optimize NCS restraints but probably not to drive maximum likelihood refinement or to make general refinement strategy choices. Bart Dear Bart, I agree that the way one uses the test set depends critically on the question you are asking. In my letter I was focusing on that aspect of the pseudo centered crystal problem where the strong/weak divide can be used to particular advantage. I have not thought as much about the matter of using the test set to estimate the level of uncertainty in the parameters of a given model. My gut response is that the strong/weak distinction is still significant. Since the weak reflections contain information about the differences between the two, ncs related, copies I suspect that a great many systematic errors are subtracted out. For example, if your model contains isotropic B's when, of course, the atoms move anisotropically, your maps will contain difference features due to these unmodeled motions. Since the anisotropic motions are probably common to the two molecules, these features will be present in the average structure described by the strong reflections but will be subtracted out in the difference structure described by the weak reflections. This argument implies to me that the strong reflections need to be judged by the Sigma A derived from the strong test set and the weak reflections judged by the weak test set. Dale Tronrud
Re: [ccp4bb] an over refined structure
Bottom line: thin shells are not a perfect solution, but if NCS is present, choosing the free set randomly is *never* a better choice, and almost always significantly worse. hmmm ... I wonder if that is true. For low order NCS (two- three- fold, even five-fold) I don't believe that thin shells are better, since they are a systematic omission of data (whcih can affect maps) and in my experience they do not add much. I have only limited experience on this but I somehow tried both and I seem to have settled with random Rfree. With an NCS axis parallel to a crystallographic one (or when translation NCS is there) that might be a whole different ball game though ... not sure. A. Together with multicopy refinement, randomly chosen test sets were almost certainly a major contributor to the spuriously good Rfree values associated with the retracted MsbA and EmrE structures. ehm ... I think 16 models systematically displaced along a direction parallel to helix axes contributed much more to that ... as the authors basically said in the original publication if my recollection is not bad. A.
Re: [ccp4bb] an over refined structure
Hi Dirk, I disagree with your final sentence. Even if you don't apply NCS restraints/constraints during refinement, there is a serious risk of NCS contaminating your Rfree. Consider the limiting case in which the NCS is produced simply by working in an artificially low symmetry space-group (e.g. P1, when the true symmetry is P2): in this case, putting one symmetry mate in the Rfree set, and one in the Rwork set will guarantee that Rfree tracks Rwork. The same effect applies to a large extent even if the NCS is not crystallographic. Bottom line: thin shells are not a perfect solution, but if NCS is present, choosing the free set randomly is *never* a better choice, and almost always significantly worse. Together with multicopy refinement, randomly chosen test sets were almost certainly a major contributor to the spuriously good Rfree values associated with the retracted MsbA and EmrE structures. Best wishes, Dean Dirk Kostrewa wrote: Dear CCP4ers, I'm not convinced, that thin shells are sufficient: I think, in principle, one should omit thick shells (greater than the diameter of the G-function of the molecule/assembly that is used to describe NCS-interactions in reciprocal space), and use the inner thin layer of these thick shells, because only those should be completely independent of any working set reflections. But this would be too expensive given the low number of observed reflections that one usually has ... However, if you don't apply NCS restraints/constraints, there is no need for any such precautions. Best regards, Dirk. Am 07.02.2008 um 16:35 schrieb Doug Ohlendorf: It is important when using NCS that the Rfree reflections be selected is distributed thin resolution shells. That way application of NCS should not mix Rwork and Rfree sets. Normal random selection or Rfree + NCS (especially 4x or higher) will drive Rfree down unfairly. Doug Ohlendorf -Original Message- From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of Eleanor Dodson Sent: Tuesday, February 05, 2008 3:38 AM To: CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] an over refined structure I agree that the difference in Rwork to Rfree is quite acceptable at your resolution. You cannot/ should not use Rfactors as a criteria for structure correctness. As Ian points out - choosing a different Rfree set of reflections can change Rfree a good deal. certain NCS operators can relate reflections exactly making it hard to get a truly independent Free R set, and there are other reasons to make it a blunt edged tool. The map is the best validator - are there blobs still not fitted? (maybe side chains you have placed wrongly..) Are there many positive or negative peaks in the difference map? How well does the NCS match the 2 molecules? etc etc. Eleanor George M. Sheldrick wrote: Dear Sun, If we take Ian's formula for the ratio of R(free) to R(work) from his paper Acta D56 (2000) 442-450 and make some reasonable approximations, we can reformulate it as: R(free)/R(work) = sqrt[(1+Q)/(1-Q)] with Q = 0.025pd^3(1-s) where s is the fractional solvent content, d is the resolution, p is the effective number of parameters refined per atom after allowing for the restraints applied, d^3 means d cubed and sqrt means square root. The difficult number to estimate is p. It would be 4 for an isotropic refinement without any restraints. I guess that p=1.5 might be an appropriate value for a typical protein refinement (giving an R-factor ratio of about 1.4 for s=0.6 and d=2.8). In that case, your R-factor ratio of 0.277/0.215 = 1.29 is well within the allowed range! However it should be added that this formula is almost a self-fulfilling prophesy. If we relax the geometric restraints we increase p, which then leads to a larger 'allowed' R-factor ratio! Best wishes, George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-2582 *** Dirk Kostrewa Gene Center, A 5.07 Ludwig-Maximilians-University Feodor-Lynen-Str. 25 81377 Munich Germany Phone: +49-89-2180-76845 Fax: +49-89-2180-76999 E-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] *** -- Dean R. Madden, Ph.D. Department of Biochemistry Dartmouth Medical School 7200 Vail Building Hanover, NH 03755-3844 USA tel: +1 (603) 650-1164 fax: +1 (603) 650-1128 e-mail: [EMAIL PROTECTED]
Re: [ccp4bb] an over refined structure
Actually the bottom lines below were my argument in the case that you DO apply strict NCS (although the argument runs into some questionable points if you follow it out). In the case that you DO NOT apply NCS, there is a second decoupling mechanism: Not only the error in Fo may be opposite for the two reflections, but also the change in Fc upon applying a non-symmetrical modification to the structure is likely to be opposite. So there is no way of predicting whether |Fo-Fc| will move in the same direction for the two reflections. I completely agree with Dirk (although I am willing to listen to anyone explain why I am wrong). Ed Edward Berry wrote: Dean Madden wrote: Hi Dirk, I disagree with your final sentence. Even if you don't apply NCS restraints/constraints during refinement, there is a serious risk of NCS contaminating your Rfree. Consider the limiting case in which the NCS is produced simply by working in an artificially low symmetry space-group (e.g. P1, when the true symmetry is P2): in this case, putting one symmetry mate in the Rfree set, and one in the Rwork set will guarantee that Rfree tracks Rwork. I don't think this is right- remember Rfree is not just based on Fc but Fo-Fc. Working in your lower symmetry space group you will have separate values for the Fo at the two ncs-related reflections. Each observation will have its own random error, and like as not the error will be in the opposite direction for the two reflections. Hence a structural modification that improves Fo-Fc at one reflection is equally likely to improve or worsen the fit at the related reflection. The only way they are coupled is through the basic tenet of R-free: If it makes the structure better, it is likely to improve the fit at all reflections. For sure R-free will go down when you apply NCS- but this is because you drastically improve your data/parameters ratio. Best, Ed
Re: [ccp4bb] an over refined structure
Here I will disagree. R-free rewards you for putting in atom in density which an atom belongs in. It doesn't necessarily reward you for putting the *right* atom in that density, but it does become difficult to do that under normal circumstances unless you have approximately the right structure. However in the case of multi-copy refinement at low resolution, the refinement is perfectly capable of shoving any old atom in density corresponding to any other old atom if you give it enough leeway. Remember that there's a big difference between R-free for a single copy (45%) and a 16-fold multicopy (38%) in MsbA's P1 form, and almost the same amount (41% vs 33%) with MsbA's P21 form. (These are E.coli and V.cholerae respectively). Both single copy and multicopy refinements were NCS-restrained, as far as I know. So there's evidence, w/o simulation, that the 12-fold or 16-fold multicopy refinements are worth 7-8% in R-free, and I'm doubtful that NCS can generate that sort of gain in either crystal form. I've certainly never seen that in my own experience at low resolution. I've been meaning to put online the Powerpoint from the CCP4 talk with all these numbers in it, but I regret it's sitting on my iBook at home as of writing. Phil Jeffrey Dean Madden wrote: It is true that multicopy refinement was essential for the suppression of Rwork. However, the whole point of the Rfree is that it is supposed to be independent of the number of parameters you're refining. Simply throwing multiple copies of the model into the refinement shouldn't have affected Rfree, IF IT WERE TRULY FREE. It was almost certainly NCS-mediated spillover that allowed the multicopy, parameter-driven reduction in Rwork to pull down the Rfree values as well. The experiment is probably not worth the time it would take to do, but I suspect that if MsbA and EmrE test sets had been chosen in thin shells, then Rfree wouldn't have shown nearly the improvement it did. Dean Phil Jeffrey wrote: While NCS probably played a role in the first crystal form of MsbA (P1, 8 monomers), this is also the one that showed the greatest improvement in R-free once the structure was correctly redetermined (7% or 14% depending on which refinement protocols you compare). The other crystal form of MsbA and the crystal forms of EmrE didn't have particularly high-copy NCS (2 dimers, 4 monomers, dimer, 2 tetramers) and the R-frees were somewhat comparable in all cases (31-36% for the redetermined structures). The *major* source of the R-free suppression in all these cases with the inappropriate use of multi-copy refinement at low resolution. Phil Jeffrey Princeton Dean Madden wrote: Hi Dirk, I disagree with your final sentence. Even if you don't apply NCS restraints/constraints during refinement, there is a serious risk of NCS contaminating your Rfree. Consider the limiting case in which the NCS is produced simply by working in an artificially low symmetry space-group (e.g. P1, when the true symmetry is P2): in this case, putting one symmetry mate in the Rfree set, and one in the Rwork set will guarantee that Rfree tracks Rwork. The same effect applies to a large extent even if the NCS is not crystallographic. Bottom line: thin shells are not a perfect solution, but if NCS is present, choosing the free set randomly is *never* a better choice, and almost always significantly worse. Together with multicopy refinement, randomly chosen test sets were almost certainly a major contributor to the spuriously good Rfree values associated with the retracted MsbA and EmrE structures. Best wishes, Dean Dirk Kostrewa wrote: Dear CCP4ers, I'm not convinced, that thin shells are sufficient: I think, in principle, one should omit thick shells (greater than the diameter of the G-function of the molecule/assembly that is used to describe NCS-interactions in reciprocal space), and use the inner thin layer of these thick shells, because only those should be completely independent of any working set reflections. But this would be too expensive given the low number of observed reflections that one usually has ... However, if you don't apply NCS restraints/constraints, there is no need for any such precautions. Best regards, Dirk. Am 07.02.2008 um 16:35 schrieb Doug Ohlendorf: It is important when using NCS that the Rfree reflections be selected is distributed thin resolution shells. That way application of NCS should not mix Rwork and Rfree sets. Normal random selection or Rfree + NCS (especially 4x or higher) will drive Rfree down unfairly. Doug Ohlendorf -Original Message- From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of Eleanor Dodson Sent: Tuesday, February 05, 2008 3:38 AM To: CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] an over refined structure I agree that the difference in Rwork to Rfree is quite acceptable at your resolution. You cannot
Re: [ccp4bb] an over refined structure
While NCS probably played a role in the first crystal form of MsbA (P1, 8 monomers), this is also the one that showed the greatest improvement in R-free once the structure was correctly redetermined (7% or 14% depending on which refinement protocols you compare). The other crystal form of MsbA and the crystal forms of EmrE didn't have particularly high-copy NCS (2 dimers, 4 monomers, dimer, 2 tetramers) and the R-frees were somewhat comparable in all cases (31-36% for the redetermined structures). The *major* source of the R-free suppression in all these cases with the inappropriate use of multi-copy refinement at low resolution. Phil Jeffrey Princeton Dean Madden wrote: Hi Dirk, I disagree with your final sentence. Even if you don't apply NCS restraints/constraints during refinement, there is a serious risk of NCS contaminating your Rfree. Consider the limiting case in which the NCS is produced simply by working in an artificially low symmetry space-group (e.g. P1, when the true symmetry is P2): in this case, putting one symmetry mate in the Rfree set, and one in the Rwork set will guarantee that Rfree tracks Rwork. The same effect applies to a large extent even if the NCS is not crystallographic. Bottom line: thin shells are not a perfect solution, but if NCS is present, choosing the free set randomly is *never* a better choice, and almost always significantly worse. Together with multicopy refinement, randomly chosen test sets were almost certainly a major contributor to the spuriously good Rfree values associated with the retracted MsbA and EmrE structures. Best wishes, Dean Dirk Kostrewa wrote: Dear CCP4ers, I'm not convinced, that thin shells are sufficient: I think, in principle, one should omit thick shells (greater than the diameter of the G-function of the molecule/assembly that is used to describe NCS-interactions in reciprocal space), and use the inner thin layer of these thick shells, because only those should be completely independent of any working set reflections. But this would be too expensive given the low number of observed reflections that one usually has ... However, if you don't apply NCS restraints/constraints, there is no need for any such precautions. Best regards, Dirk. Am 07.02.2008 um 16:35 schrieb Doug Ohlendorf: It is important when using NCS that the Rfree reflections be selected is distributed thin resolution shells. That way application of NCS should not mix Rwork and Rfree sets. Normal random selection or Rfree + NCS (especially 4x or higher) will drive Rfree down unfairly. Doug Ohlendorf -Original Message- From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of Eleanor Dodson Sent: Tuesday, February 05, 2008 3:38 AM To: CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] an over refined structure I agree that the difference in Rwork to Rfree is quite acceptable at your resolution. You cannot/ should not use Rfactors as a criteria for structure correctness. As Ian points out - choosing a different Rfree set of reflections can change Rfree a good deal. certain NCS operators can relate reflections exactly making it hard to get a truly independent Free R set, and there are other reasons to make it a blunt edged tool. The map is the best validator - are there blobs still not fitted? (maybe side chains you have placed wrongly..) Are there many positive or negative peaks in the difference map? How well does the NCS match the 2 molecules? etc etc. Eleanor George M. Sheldrick wrote: Dear Sun, If we take Ian's formula for the ratio of R(free) to R(work) from his paper Acta D56 (2000) 442-450 and make some reasonable approximations, we can reformulate it as: R(free)/R(work) = sqrt[(1+Q)/(1-Q)] with Q = 0.025pd^3(1-s) where s is the fractional solvent content, d is the resolution, p is the effective number of parameters refined per atom after allowing for the restraints applied, d^3 means d cubed and sqrt means square root. The difficult number to estimate is p. It would be 4 for an isotropic refinement without any restraints. I guess that p=1.5 might be an appropriate value for a typical protein refinement (giving an R-factor ratio of about 1.4 for s=0.6 and d=2.8). In that case, your R-factor ratio of 0.277/0.215 = 1.29 is well within the allowed range! However it should be added that this formula is almost a self-fulfilling prophesy. If we relax the geometric restraints we increase p, which then leads to a larger 'allowed' R-factor ratio! Best wishes, George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-2582 *** Dirk Kostrewa Gene Center, A 5.07 Ludwig-Maximilians-University Feodor-Lynen-Str. 25 81377 Munich Germany Phone: +49-89-2180-76845 Fax: +49-89-2180-76999 E-mail
Re: [ccp4bb] an over refined structure
Hi Ed, This is an intriguing argument, but I know (having caught such a case as a reviewer) that even in cases of low NCS symmetry, Rfree can be significantly biased. I think the reason is that the discrepancy between pairs of NCS-related reflections (i.e. Fo-Fo') is generally significantly smaller than |Fo-Fc|. (In general, Rsym (on F) is lower than Rfree.) Thus, moving Fc closer to Fo will also move its NCS partner Fc' closer to Fo' *on average*, if they are coupled. Dean Edward Berry wrote: Actually the bottom lines below were my argument in the case that you DO apply strict NCS (although the argument runs into some questionable points if you follow it out). In the case that you DO NOT apply NCS, there is a second decoupling mechanism: Not only the error in Fo may be opposite for the two reflections, but also the change in Fc upon applying a non-symmetrical modification to the structure is likely to be opposite. So there is no way of predicting whether |Fo-Fc| will move in the same direction for the two reflections. I completely agree with Dirk (although I am willing to listen to anyone explain why I am wrong). Ed Edward Berry wrote: Dean Madden wrote: Hi Dirk, I disagree with your final sentence. Even if you don't apply NCS restraints/constraints during refinement, there is a serious risk of NCS contaminating your Rfree. Consider the limiting case in which the NCS is produced simply by working in an artificially low symmetry space-group (e.g. P1, when the true symmetry is P2): in this case, putting one symmetry mate in the Rfree set, and one in the Rwork set will guarantee that Rfree tracks Rwork. I don't think this is right- remember Rfree is not just based on Fc but Fo-Fc. Working in your lower symmetry space group you will have separate values for the Fo at the two ncs-related reflections. Each observation will have its own random error, and like as not the error will be in the opposite direction for the two reflections. Hence a structural modification that improves Fo-Fc at one reflection is equally likely to improve or worsen the fit at the related reflection. The only way they are coupled is through the basic tenet of R-free: If it makes the structure better, it is likely to improve the fit at all reflections. For sure R-free will go down when you apply NCS- but this is because you drastically improve your data/parameters ratio. Best, Ed -- Dean R. Madden, Ph.D. Department of Biochemistry Dartmouth Medical School 7200 Vail Building Hanover, NH 03755-3844 USA tel: +1 (603) 650-1164 fax: +1 (603) 650-1128 e-mail: [EMAIL PROTECTED]
Re: [ccp4bb] an over refined structure
Dear Ed, I don't see how you decouple symmetry mates in the case of a wrong space group. Symmetry mates should agree with each other typically within R_sym or R_merge percent, eg; about 2-5% . Observed and calculated reflections agree within R_Factor of each other, so about 20-30%. The experimental errors are pretty much negligible and overfitting is not a question about error bars; it is about how hard to push a round peg into a square hole? Cheers, Jon Edward Berry wrote: Actually the bottom lines below were my argument in the case that you DO apply strict NCS (although the argument runs into some questionable points if you follow it out). In the case that you DO NOT apply NCS, there is a second decoupling mechanism: Not only the error in Fo may be opposite for the two reflections, but also the change in Fc upon applying a non-symmetrical modification to the structure is likely to be opposite. So there is no way of predicting whether |Fo-Fc| will move in the same direction for the two reflections. I completely agree with Dirk (although I am willing to listen to anyone explain why I am wrong). Ed Edward Berry wrote: Dean Madden wrote: Hi Dirk, I disagree with your final sentence. Even if you don't apply NCS restraints/constraints during refinement, there is a serious risk of NCS contaminating your Rfree. Consider the limiting case in which the NCS is produced simply by working in an artificially low symmetry space-group (e.g. P1, when the true symmetry is P2): in this case, putting one symmetry mate in the Rfree set, and one in the Rwork set will guarantee that Rfree tracks Rwork. I don't think this is right- remember Rfree is not just based on Fc but Fo-Fc. Working in your lower symmetry space group you will have separate values for the Fo at the two ncs-related reflections. Each observation will have its own random error, and like as not the error will be in the opposite direction for the two reflections. Hence a structural modification that improves Fo-Fc at one reflection is equally likely to improve or worsen the fit at the related reflection. The only way they are coupled is through the basic tenet of R-free: If it makes the structure better, it is likely to improve the fit at all reflections. For sure R-free will go down when you apply NCS- but this is because you drastically improve your data/parameters ratio. Best, Ed
Re: [ccp4bb] an over refined structure
Dean Madden wrote: Hi Dirk, I disagree with your final sentence. Even if you don't apply NCS restraints/constraints during refinement, there is a serious risk of NCS contaminating your Rfree. Consider the limiting case in which the NCS is produced simply by working in an artificially low symmetry space-group (e.g. P1, when the true symmetry is P2): in this case, putting one symmetry mate in the Rfree set, and one in the Rwork set will guarantee that Rfree tracks Rwork. I don't think this is right- remember Rfree is not just based on Fc but Fo-Fc. Working in your lower symmetry space group you will have separate values for the Fo at the two ncs-related reflections. Each observation will have its own random error, and like as not the error will be in the opposite direction for the two reflections. Hence a structural modification that improves Fo-Fc at one reflection is equally likely to improve or worsen the fit at the related reflection. The only way they are coupled is through the basic tenet of R-free: If it makes the structure better, it is likely to improve the fit at all reflections. For sure R-free will go down when you apply NCS- but this is because you drastically improve your data/parameters ratio. Best, Ed
Re: [ccp4bb] an over refined structure
Agreed, and this is even more true if you consider R-merge is calculated on I's and Rfree on F's, Rmerge of 5% should contribute 2.5% to Rfree; and furthermore errors add vectorially so it would be more like ,025/sqrt(2). I guess I have to take all those other errors that have to do with the inability of a simple atomic model to account for the diffraction of a crystal, lump them together and assume they have nothing to do with NCS and are not affected by the simple modification under consideration. I am thinking about the CHANGE in |Fo-Fc| at two sym-related reflections when the refinement program moves a single atom from position 1 to position 2. If we do not apply NCS, this is the only atom that will move, and for Fc we can definitely say there is no reason to expect the two Fc's to move in the same direction, therefore there is no coupling in the case we do not apply NCS. If we apply strict NCS then granted the sym related Fc's are equal before and after the change, so they move in the same direction. As I said, the argument is weaker now. If there are systematic errors contributing to the gap between Rfree and 0.5*Rmerge/sqrt(2), and if these systematic errors follow the NCS, then initial Fo-Fc is likely to be of the same sign at the related reflections and larger than the change in Fc, so |Fo-Fc| would go in the same direction. But to justify this you would have to explin why the systematic errors follow ncs. Crystal morphology related to ncs resulting in similar absorption errors? But how large are absorbtion errors, and is there any reason for morphology to follow NCS? After reading Dean Madden's latest- We might need some assumption here that we are reasonably close to the refined structure. If we start with random atoms then shoving the atoms around in a way that fits the density better might be seen as improving the structure from the point of modeling the density, but not from the point of approximating the real structure. But in this case the change in sign of Fc is completely decoupled between sym-related reflections, and if you enforce symmetry you will be enforcing the wrong symmetry and worsening both the structure and the fit to the density. I think Gerard Kleywegt has an example of enforcing NCS on a an erroneous structure, and it was not very effective at reducing Rfree? And in that case the structure may have had some resemblence to the density at low resolution,the NCS may have been somewhat correct. I guess there are two questions depending whether you are at the beginning at the beginning of a refinement and may have a completely wrong structure, or whether refinement isnearly complete and you want to know whether the further improvement you get on applg NCS is real. Jon Wright wrote: Dear Ed, I don't see how you decouple symmetry mates in the case of a wrong space group. Symmetry mates should agree with each other typically within R_sym or R_merge percent, eg; about 2-5% . Observed and calculated reflections agree within R_Factor of each other, so about 20-30%. The experimental errors are pretty much negligible and overfitting is not a question about error bars; it is about how hard to push a round peg into a square hole? Cheers, Jon Edward Berry wrote: Actually the bottom lines below were my argument in the case that you DO apply strict NCS (although the argument runs into some questionable points if you follow it out). In the case that you DO NOT apply NCS, there is a second decoupling mechanism: Not only the error in Fo may be opposite for the two reflections, but also the change in Fc upon applying a non-symmetrical modification to the structure is likely to be opposite. So there is no way of predicting whether |Fo-Fc| will move in the same direction for the two reflections. I completely agree with Dirk (although I am willing to listen to anyone explain why I am wrong). Ed Edward Berry wrote: Dean Madden wrote: Hi Dirk, I disagree with your final sentence. Even if you don't apply NCS restraints/constraints during refinement, there is a serious risk of NCS contaminating your Rfree. Consider the limiting case in which the NCS is produced simply by working in an artificially low symmetry space-group (e.g. P1, when the true symmetry is P2): in this case, putting one symmetry mate in the Rfree set, and one in the Rwork set will guarantee that Rfree tracks Rwork. I don't think this is right- remember Rfree is not just based on Fc but Fo-Fc. Working in your lower symmetry space group you will have separate values for the Fo at the two ncs-related reflections. Each observation will have its own random error, and like as not the error will be in the opposite direction for the two reflections. Hence a structural modification that improves Fo-Fc at one reflection is equally likely to improve or worsen the fit at the related reflection. The only way they are coupled is through the basic tenet of R-free: If it makes the structure better, it is likely
Re: [ccp4bb] an over refined structure
If you think about it, there is an analogy to relaxing geometrical constraints, which also allows the refinement to put atoms into density. The reason it usually doesn't help Rfree is that the density is spurious. At least some of the incorrect structure determinations of the early 90's (that spurred the introduction of Rfree etc.) had high rms deviations, suggesting that this is how the overfitting occurred. Nevertheless, once hit with a bit of simulated annealing, the Rfree values of such models deteriorated significantly. If memory serves, the incorrect structures of the 1990's would have had relaxed geometry precisely because they needed to do that to reduce R, and R used to be the primary indicator of structure quality in the days before R-free was introduced. There's quite a big difference between the latitude afforded by relaxing geometry and the degree of freedom allowed by multicopy refinement. Simply increasing the RMS bond length deviations from 0.012 to 0.035 Angstrom would move atoms on average by only a fraction of a bond length, which is not really enough to jump between different atom locations. In any event, the MsbA statistics can be simply explained from an expectation of what happens if you overfit your (wrong) structure using techniques inappropriate for the resolution: R-work goes down R-free goes down less (R-free - R-work) goes up and this happens in general with use of multicopy refinement at anything less than quite high resolution - I'm thinking in particular of a comment in Chen Chapman (2001) Biophys J vol. 8, 1466-1472. So I see no reason to suggest NCS is having a particularly extreme, perhaps unprecedented, effect. Phil Jeffrey (still working on converting Micro$loth Powerpoint to html)
Re: [ccp4bb] an over refined structure
Dear CCP4ers, I'm not convinced, that thin shells are sufficient: I think, in principle, one should omit thick shells (greater than the diameter of the G-function of the molecule/assembly that is used to describe NCS- interactions in reciprocal space), and use the inner thin layer of these thick shells, because only those should be completely independent of any working set reflections. But this would be too expensive given the low number of observed reflections that one usually has ... However, if you don't apply NCS restraints/constraints, there is no need for any such precautions. Best regards, Dirk. Am 07.02.2008 um 16:35 schrieb Doug Ohlendorf: It is important when using NCS that the Rfree reflections be selected is distributed thin resolution shells. That way application of NCS should not mix Rwork and Rfree sets. Normal random selection or Rfree + NCS (especially 4x or higher) will drive Rfree down unfairly. Doug Ohlendorf -Original Message- From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of Eleanor Dodson Sent: Tuesday, February 05, 2008 3:38 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] an over refined structure I agree that the difference in Rwork to Rfree is quite acceptable at your resolution. You cannot/ should not use Rfactors as a criteria for structure correctness. As Ian points out - choosing a different Rfree set of reflections can change Rfree a good deal. certain NCS operators can relate reflections exactly making it hard to get a truly independent Free R set, and there are other reasons to make it a blunt edged tool. The map is the best validator - are there blobs still not fitted? (maybe side chains you have placed wrongly..) Are there many positive or negative peaks in the difference map? How well does the NCS match the 2 molecules? etc etc. Eleanor George M. Sheldrick wrote: Dear Sun, If we take Ian's formula for the ratio of R(free) to R(work) from his paper Acta D56 (2000) 442-450 and make some reasonable approximations, we can reformulate it as: R(free)/R(work) = sqrt[(1+Q)/(1-Q)] with Q = 0.025pd^3(1-s) where s is the fractional solvent content, d is the resolution, p is the effective number of parameters refined per atom after allowing for the restraints applied, d^3 means d cubed and sqrt means square root. The difficult number to estimate is p. It would be 4 for an isotropic refinement without any restraints. I guess that p=1.5 might be an appropriate value for a typical protein refinement (giving an R- factor ratio of about 1.4 for s=0.6 and d=2.8). In that case, your R-factor ratio of 0.277/0.215 = 1.29 is well within the allowed range! However it should be added that this formula is almost a self-fulfilling prophesy. If we relax the geometric restraints we increase p, which then leads to a larger 'allowed' R-factor ratio! Best wishes, George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-2582 *** Dirk Kostrewa Gene Center, A 5.07 Ludwig-Maximilians-University Feodor-Lynen-Str. 25 81377 Munich Germany Phone: +49-89-2180-76845 Fax:+49-89-2180-76999 E-mail: [EMAIL PROTECTED] ***
Re: [ccp4bb] an over refined structure
Hi Phil, Here I will disagree. R-free rewards you for putting in atom in density which an atom belongs in. It doesn't necessarily reward you for putting the *right* atom in that density, but it does become difficult to do that under normal circumstances unless you have approximately the right structure. However in the case of multi-copy refinement at low resolution, the refinement is perfectly capable of shoving any old atom in density corresponding to any other old atom if you give it enough leeway. ... So there's evidence, w/o simulation, that the 12-fold or 16-fold multicopy refinements are worth 7-8% in R-free, and I'm doubtful that NCS can generate that sort of gain in either crystal form. I've certainly never seen that in my own experience at low resolution. Remember that there are two things at work here: putting atoms into real density (which does reduce Rfree) and putting atoms into noise (overfitting, which shouldn't help Rfree). At low res, there's a lot of noise. If you think about it, there is an analogy to relaxing geometrical constraints, which also allows the refinement to put atoms into density. The reason it usually doesn't help Rfree is that the density is spurious. At least some of the incorrect structure determinations of the early 90's (that spurred the introduction of Rfree etc.) had high rms deviations, suggesting that this is how the overfitting occurred. Nevertheless, once hit with a bit of simulated annealing, the Rfree values of such models deteriorated significantly. I would argue that 12-fold or 16-fold multicopy refinements simply permitted overfitting of noise. In other words, it is worth 7-8% in R*work*, but not Rfree. In this case, the main reason Rfree also dropped is because the test set was coupled *by NCS* to the overfit working set. Use of a random test set in the presence of NCS could easily prevent the Rfree value from serving as a warning of overfitting. Of course, to be absolutely sure, one would have to repeat the multicopy refinements of the inverted structures with a test set chosen in thin shells, and then see if Rfree dropped as before. I think only the original authors would be in a position to do that properly. Dean -- Dean R. Madden, Ph.D. Department of Biochemistry Dartmouth Medical School 7200 Vail Building Hanover, NH 03755-3844 USA tel: +1 (603) 650-1164 fax: +1 (603) 650-1128 e-mail: [EMAIL PROTECTED]
Re: [ccp4bb] an over refined structure
Dean Madden wrote: Hi Ed, This is an intriguing argument, but I know (having caught such a case as a reviewer) that even in cases of low NCS symmetry, Rfree can be significantly biased. I think the reason is that the discrepancy between pairs of NCS-related reflections (i.e. Fo-Fo') is generally significantly smaller than |Fo-Fc|. (In general, Rsym (on F) is lower than Rfree.) Thus, moving Fc closer to Fo will also move its NCS partner Fc' closer to Fo' *on average*, if they are coupled. OK, I see that now, the systematic errors must be related to NCS in this case because we know if we reduced the data in the higher space group, our Rsyms would be OK. I stand educated. But it is difficult to go from there to real ncs where the large unaccounted errors may not be related to ncs. Furthermore if you don't enforce NCS the structural changes are asymmetric and there is no reason to believe Fc will move in the same direction, even in this artificial case. So Dirk's assertion still stands, I believe. Dean Edward Berry wrote: Actually the bottom lines below were my argument in the case that you DO apply strict NCS (although the argument runs into some questionable points if you follow it out). In the case that you DO NOT apply NCS, there is a second decoupling mechanism: Not only the error in Fo may be opposite for the two reflections, but also the change in Fc upon applying a non-symmetrical modification to the structure is likely to be opposite. So there is no way of predicting whether |Fo-Fc| will move in the same direction for the two reflections. I completely agree with Dirk (although I am willing to listen to anyone explain why I am wrong). Ed Edward Berry wrote: Dean Madden wrote: Hi Dirk, I disagree with your final sentence. Even if you don't apply NCS restraints/constraints during refinement, there is a serious risk of NCS contaminating your Rfree. Consider the limiting case in which the NCS is produced simply by working in an artificially low symmetry space-group (e.g. P1, when the true symmetry is P2): in this case, putting one symmetry mate in the Rfree set, and one in the Rwork set will guarantee that Rfree tracks Rwork. I don't think this is right- remember Rfree is not just based on Fc but Fo-Fc. Working in your lower symmetry space group you will have separate values for the Fo at the two ncs-related reflections. Each observation will have its own random error, and like as not the error will be in the opposite direction for the two reflections. Hence a structural modification that improves Fo-Fc at one reflection is equally likely to improve or worsen the fit at the related reflection. The only way they are coupled is through the basic tenet of R-free: If it makes the structure better, it is likely to improve the fit at all reflections. For sure R-free will go down when you apply NCS- but this is because you drastically improve your data/parameters ratio. Best, Ed
Re: [ccp4bb] an over refined structure
A few comments that you might find useful: 1. yes, even if you don't apply NCS restraints/constraints there will be correlations between reflections in cases of NCS symmetry or pseudo-crystallographic NCS symmetry. 2. Fabiola, Chapman, et al., published a very nice paper on the topic in Acta D. 62, 227-238, 2006. 3. From my experience, the effects for low NCS symmetry are usually small, except cases of pseudo-symmetry which can be easily addressed by defining the test set in the high-symmetry setting. For high NCS symmetry, the effects are more significant, but then the structure is usually much better determined, anyway, due to averaging. 4. At least the first one of the mentioned MsbA and EmrE structures had a very high Rfree in the absence of multi-copy refinement ( ~ 45%)! So, the Rfree indicated that there was a major problem. 5. The Rfree should vary relatively little among test sets (see my Acta D 49, 24-36, 1993 paper) - if there are large variations for different test set choices then the test set may be too small or there may be systematic problems with some of the reflections causing them to dominate the R factors (outliers at low resolution, for example). Axel Brunger On Feb 7, 2008, at 9:57 AM, Dean Madden wrote: Hi Dirk, I disagree with your final sentence. Even if you don't apply NCS restraints/constraints during refinement, there is a serious risk of NCS contaminating your Rfree. Consider the limiting case in which the NCS is produced simply by working in an artificially low symmetry space-group (e.g. P1, when the true symmetry is P2): in this case, putting one symmetry mate in the Rfree set, and one in the Rwork set will guarantee that Rfree tracks Rwork. The same effect applies to a large extent even if the NCS is not crystallographic. Bottom line: thin shells are not a perfect solution, but if NCS is present, choosing the free set randomly is *never* a better choice, and almost always significantly worse. Together with multicopy refinement, randomly chosen test sets were almost certainly a major contributor to the spuriously good Rfree values associated with the retracted MsbA and EmrE structures. Best wishes, Dean Dirk Kostrewa wrote: Dear CCP4ers, I'm not convinced, that thin shells are sufficient: I think, in principle, one should omit thick shells (greater than the diameter of the G-function of the molecule/assembly that is used to describe NCS-interactions in reciprocal space), and use the inner thin layer of these thick shells, because only those should be completely independent of any working set reflections. But this would be too expensive given the low number of observed reflections that one usually has ... However, if you don't apply NCS restraints/constraints, there is no need for any such precautions. Best regards, Dirk. Am 07.02.2008 um 16:35 schrieb Doug Ohlendorf: It is important when using NCS that the Rfree reflections be selected is distributed thin resolution shells. That way application of NCS should not mix Rwork and Rfree sets. Normal random selection or Rfree + NCS (especially 4x or higher) will drive Rfree down unfairly. Doug Ohlendorf -Original Message- From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of Eleanor Dodson Sent: Tuesday, February 05, 2008 3:38 AM To: CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] an over refined structure I agree that the difference in Rwork to Rfree is quite acceptable at your resolution. You cannot/ should not use Rfactors as a criteria for structure correctness. As Ian points out - choosing a different Rfree set of reflections can change Rfree a good deal. certain NCS operators can relate reflections exactly making it hard to get a truly independent Free R set, and there are other reasons to make it a blunt edged tool. The map is the best validator - are there blobs still not fitted? (maybe side chains you have placed wrongly..) Are there many positive or negative peaks in the difference map? How well does the NCS match the 2 molecules? etc etc. Eleanor George M. Sheldrick wrote: Dear Sun, If we take Ian's formula for the ratio of R(free) to R(work) from his paper Acta D56 (2000) 442-450 and make some reasonable approximations, we can reformulate it as: R(free)/R(work) = sqrt[(1+Q)/(1-Q)] with Q = 0.025pd^3(1-s) where s is the fractional solvent content, d is the resolution, p is the effective number of parameters refined per atom after allowing for the restraints applied, d^3 means d cubed and sqrt means square root. The difficult number to estimate is p. It would be 4 for an isotropic refinement without any restraints. I guess that p=1.5 might be an appropriate value for a typical protein refinement (giving an R-factor ratio of about 1.4 for s=0.6 and d=2.8). In that case, your R- factor ratio of 0.277/0.215 = 1.29 is well within
Re: [ccp4bb] an over refined structure
Hi Sun, On Mon, Feb 04, 2008 at 02:15:05PM -0800, Sun Tang wrote: I used NCS before rigid body refinement. After that I did not put NCS restraints in the restrained refinement and TLS+restrained refinement because it raised the R/Rfree quite a lot. Use NCS. Really! There is never a reason for switching off NCS restraints (ok _maybe_ at real atomic or ultra-high resolution ...). Obviously, you'll need to change the way you apply NCS restraints: from a simple per-chain definition to maybe a per-domain definition, taking out residues in crystal contacts, allowing for a different base B-factor of different chains/domains etc. Some programs do these things fairly automatically for you. This might make it awkward to use NCS sometimes, but at 2.8A it is a must (I think). And if your use of NCS increases Rfree, then there is aproblem in the setup of NCS restraints, not in the principal usage. Note: re-introducing NCS restraints might increase the R, but if the Rfree stays similar: who cares? -- See also: G J Kleywegt, Use of non-crystallographic symmetry in protein structure refinement, Acta Crystallographica, D52, 842-857 (1996). According to http://xray.bmc.uu.se/~gerard/citation.html a classic! paper [gosh ... I really like Gerard's style ;-)]. Cheers Clemens -- *** * Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com * * Global Phasing Ltd. * Sheraton House, Castle Park * Cambridge CB3 0AX, UK *-- * BUSTER Development Group (http://www.globalphasing.com) ***
Re: [ccp4bb] an over refined structure
I agree that the difference in Rwork to Rfree is quite acceptable at your resolution. You cannot/ should not use Rfactors as a criteria for structure correctness. As Ian points out - choosing a different Rfree set of reflections can change Rfree a good deal. certain NCS operators can relate reflections exactly making it hard to get a truly independent Free R set, and there are other reasons to make it a blunt edged tool. The map is the best validator - are there blobs still not fitted? (maybe side chains you have placed wrongly..) Are there many positive or negative peaks in the difference map? How well does the NCS match the 2 molecules? etc etc. Eleanor George M. Sheldrick wrote: Dear Sun, If we take Ian's formula for the ratio of R(free) to R(work) from his paper Acta D56 (2000) 442-450 and make some reasonable approximations, we can reformulate it as: R(free)/R(work) = sqrt[(1+Q)/(1-Q)] with Q = 0.025pd^3(1-s) where s is the fractional solvent content, d is the resolution, p is the effective number of parameters refined per atom after allowing for the restraints applied, d^3 means d cubed and sqrt means square root. The difficult number to estimate is p. It would be 4 for an isotropic refinement without any restraints. I guess that p=1.5 might be an appropriate value for a typical protein refinement (giving an R-factor ratio of about 1.4 for s=0.6 and d=2.8). In that case, your R-factor ratio of 0.277/0.215 = 1.29 is well within the allowed range! However it should be added that this formula is almost a self-fulfilling prophesy. If we relax the geometric restraints we increase p, which then leads to a larger 'allowed' R-factor ratio! Best wishes, George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-2582
Re: [ccp4bb] an over refined structure
Hi Sun Your bond length angle RMSD's look suspiciously high for a 2.8 Ang structure; this usually means that some weighting parameter(s) is/are not optimal. 2.8 Ang is not that far from the point where the optimal choice of structure parameters may be torsion angles instead of Cartesian co-ordinates, in which case the optimal RMSD's for bond lengths angles would be exactly zero. You should be optimising all weights that have been set arbitrarily by the program (i.e. not obtained from independent experimental sources), this includes not just the X-ray weight but also the B-factor restraint weight(s) (the usual culprit) and the NCS restraint weight(s), as Clemens suggests. I now use the free log(likelihood) to optimise the weights rather than Rfree, this is now printed by newer versions of Refmac, but it's up to you which you believe (the difference in the results may not be significant anyway). Alternatively CNS phenix.refine have scripts which automatically optimise the weights (against Rfree) for you (maybe one day CCP4/Refmac will have this very useful capability ;-) ). Also I would check your waters manually - don't believe everything the auto-water placing software tells you, i.e. does the density look sensible (at least roughly spherical shaped), is it possible they are something other than water (check for excess density and/or suspiciously low B factor), do they all H-bond to protein and/or other waters you are confident about. I had a structure at 2.9 Ang where I found only 10 good waters and that was for 900 residues in the a.u. (maybe it had something to do with the fact that the solvent content and average B were quite high and the data was partially twinned so the map quality was poor). I'm sure others have opinions on how many waters you expect to find at various resolutions. HTH -- Ian -Original Message- From: Sun Tang [mailto:[EMAIL PROTECTED] Sent: 04 February 2008 22:32 To: Ian Tickle Cc: CCP4BB@JISCMAIL.AC.UK Subject: RE: [ccp4bb] an over refined structure Hi Ian, Thank you very much for your detailed information. I checked the effect of weighter term (wa) in CCP4i for the R/Rfree. When I used wa=0.01 , the value is 0.225/0.277 FOM =0.799. The values changed to 0.204/0.269 (FOM=0.806) for wa= 0.05, 0.195/0.268 (FOM=0.807) for wa=0.1 and 0.186/0.267 (FOM=0.807) for wa=0.2, respectively. It seemed that increase in wa decreases both R and Rfree with R more than Rfree. Which wa value is the best one in this case? Thank you very much for your valuable help. Best, Sun Ian Tickle [EMAIL PROTECTED] wrote: Hi Sun Tang Unfortunately there's no such thing as a fixed value for the maximum acceptable Rfree-Rwork difference that applies in all circumstances, because the 'normal' difference depends on a number of factors, mainly the observation/parameter ratio, which depends in turn on the resolution and the solvent content (a greater solvent content means a bigger cell volume which means more reflections for a given number of ordered atoms in the a.u. and hence a bigger obs/param ratio). The Rfree-Rwork difference also depends on Rwork itself (i.e. you tend to get higher values of Rfree-Rwork for higher values of Rwork), so it's better to think in terms of the Rfree/Rwork ratio (which is independent of Rwork). So for example at very high resolution a 'normal' value for Rfree-Rwork might be only 0.02 (so 0.05 which is what many people consider acceptable would actually be unacceptably high), whereas at low resolution it might be 0.1 (so 0.05 would be unacceptably low). Also you need to bear in mind that Rfree tends to have a quite high uncertainty, particularly at low resolution (because it's usually based on a relatively small number of observations), so the deviation has to be quite big (e.g. 3 SU) before it can be considered to be statistically significant. So Rfree needs to be compared not with Rwork at all but with the value of the optimal Rfree/Rwork expected on the basis that the model parameterisation and weighting of X-ray terms and restraints are optimal and the errors in the model have the same effect as the random experimental errors in the data (i.e. a statistical 'null hypothesis'). As Tim just pointed out we tried to do this in our Acta D (1998) papers: there you can compare your observed Rfree/Rwork ratio either with the theoretical value or with the value found for 'typical' structures in the PDB at the same resolution. An abnormal Rfree/Rwork ratio could arise from a number of causes, not just over-fitting (I assume that's what you mean by 'over-refinement' - it's not clear to me how a structure can be 'over-refined' since a fundamental requirement of the maximum likelihood method is that the structure is always refined to convergence, and refining beyond
Re: [ccp4bb] an over refined structure
Hi - I don't think there is something necessarily wrong with the values you report. A few questions to see *if* something is wrong are: - as you wrote to Tim you have NCS: do you use NCS restraints ? - what is the resolution / B factor of the data ? - have the data been checked for twining ? (phenix.xtriage) - is the N-term domain of one copy really invisible (then indeed do remove ...!) - has TLS been used ? - did you add waters ? (too many?) I guess then we can make better suggestions if something is wrong and if so how its best to fix. A. I refined a structure with Refmac in CCP4i and the R/Rfree is 0.215/0.277. The difference between R and Rfree is too much even though I used 0.01 for weighting term in the refinement (the default value is 0.3). The RMSD for bond length and bond angle is 0.016 A and 1.7 degree.
Re: [ccp4bb] an over refined structure
I would agree that the difference is suspiciously high. I. Tickle and others have published analytical expressions for how to estimate the ratio between R and Rfree, just google for tickle rfree to find the references. You easily achieve a large difference by adding too many waters which just model noise. There may be other reasons for which more knowledge about the structure is required. Do you have large unmodelled regions, like loops that do not show in the density map? Tim -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A On Mon, 4 Feb 2008, Sun Tang wrote: Hello All, I refined a structure with Refmac in CCP4i and the R/Rfree is 0.215/0.277. The difference between R and Rfree is too much even though I used 0.01 for weighting term in the refinement (the default value is 0.3). The RMSD for bond length and bond angle is 0.016 A and 1.7 degree. What may be wrong with the over-refined structure? What is the reason for leading to an over-refined structure? How to avoid it? Best wishes, Sun Tang - Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.
Re: [ccp4bb] an over refined structure
Hi Sun Tang Unfortunately there's no such thing as a fixed value for the maximum acceptable Rfree-Rwork difference that applies in all circumstances, because the 'normal' difference depends on a number of factors, mainly the observation/parameter ratio, which depends in turn on the resolution and the solvent content (a greater solvent content means a bigger cell volume which means more reflections for a given number of ordered atoms in the a.u. and hence a bigger obs/param ratio). The Rfree-Rwork difference also depends on Rwork itself (i.e. you tend to get higher values of Rfree-Rwork for higher values of Rwork), so it's better to think in terms of the Rfree/Rwork ratio (which is independent of Rwork). So for example at very high resolution a 'normal' value for Rfree-Rwork might be only 0.02 (so 0.05 which is what many people consider acceptable would actually be unacceptably high), whereas at low resolution it might be 0.1 (so 0.05 would be unacceptably low). Also you need to bear in mind that Rfree tends to have a quite high uncertainty, particularly at low resolution (because it's usually based on a relatively small number of observations), so the deviation has to be quite big (e.g. 3 SU) before it can be considered to be statistically significant. So Rfree needs to be compared not with Rwork at all but with the value of the optimal Rfree/Rwork expected on the basis that the model parameterisation and weighting of X-ray terms and restraints are optimal and the errors in the model have the same effect as the random experimental errors in the data (i.e. a statistical 'null hypothesis'). As Tim just pointed out we tried to do this in our Acta D (1998) papers: there you can compare your observed Rfree/Rwork ratio either with the theoretical value or with the value found for 'typical' structures in the PDB at the same resolution. An abnormal Rfree/Rwork ratio could arise from a number of causes, not just over-fitting (I assume that's what you mean by 'over-refinement' - it's not clear to me how a structure can be 'over-refined' since a fundamental requirement of the maximum likelihood method is that the structure is always refined to convergence, and refining beyond that will by definition produce no further statistically significant changes in the parameters). For example the number of parameters being refined may be either too low, or too high (over-fitting), or the values of the weighting parameters may not be appropriate, or there may be something badly wrong with the atomic model (e.g. mistraced chain). Given the values you are reporting I think the latter is very unlikely, possibly you just need to tweak the X-ray and/or restraint weights. HTH Cheers -- Ian -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Sun Tang Sent: 04 February 2008 16:56 To: Boaz Shaanan Cc: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] an over refined structure Hi Boaz, Thank you for your opinions. The resolution is 2.8A and I remembered some people may think the structure is over-refined when the difference between Rfree/Rwork is greater than 6. What do you think the greatest acceptable difference between the two? Best, Sun Boaz Shaanan [EMAIL PROTECTED] wrote: Hi, Why do you think this structure is over-refined ? The Rfree/Rwork difference of 6.2% seems fine, although you didn't mention resolution. If anything, an over-refined structure would show a smaller difference, as far as I know. If all the other criteria (Ramachandran outliers, etc., map) are OK you should just be happy with your structure. Cheers, Boaz - Original Message - From: Sun Tang [EMAIL PROTECTED] Date: Monday, February 4, 2008 18:41 Subject: [ccp4bb] an over refined structure To: CCP4BB@JISCMAIL.AC.UK Hello All, I refined a structure with Refmac in CCP4i and the R/Rfree is 0.215/0.277. The difference between R and Rfree is too much even though I used 0.01 for weighting term in the refinement (the default value is 0.3). The RMSD for bond length and bond angle is 0.016 A and 1.7 degree. What may be wrong with the over-refined structure? What is the reason for leading to an over-refined structure? How to avoid it? Best wishes, Sun Tang - Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel Phone: 972-8-647-2220 ; Fax: 646-1710 Skype: boaz.shaanan ââ'¬Å
Re: [ccp4bb] an over refined structure
Hi Tim, Thank you for your and information and suggestions. There are two indepdent molecules in the asymmetric unit and one molecule does not have very good density, especially in the N-terminus. Do you think that I should remove the region in the refinement? Best, Sun Tim Gruene [EMAIL PROTECTED] wrote: I would agree that the difference is suspiciously high. I. Tickle and others have published analytical expressions for how to estimate the ratio between R and Rfree, just google for tickle rfree to find the references. You easily achieve a large difference by adding too many waters which just model noise. There may be other reasons for which more knowledge about the structure is required. Do you have large unmodelled regions, like loops that do not show in the density map? Tim -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A On Mon, 4 Feb 2008, Sun Tang wrote: Hello All, I refined a structure with Refmac in CCP4i and the R/Rfree is 0.215/0.277. The difference between R and Rfree is too much even though I used 0.01 for weighting term in the refinement (the default value is 0.3). The RMSD for bond length and bond angle is 0.016 A and 1.7 degree. What may be wrong with the over-refined structure? What is the reason for leading to an over-refined structure? How to avoid it? Best wishes, Sun Tang - Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. - Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.
Re: [ccp4bb] an over refined structure
Hi Ian, Thank you very much for your detailed information. I checked the effect of weighter term (wa) in CCP4i for the R/Rfree. When I used wa=0.01 , the value is 0.225/0.277 FOM =0.799. The values changed to 0.204/0.269 (FOM=0.806) for wa= 0.05, 0.195/0.268 (FOM=0.807) for wa=0.1 and 0.186/0.267 (FOM=0.807) for wa=0.2, respectively. It seemed that increase in wa decreases both R and Rfree with R more than Rfree. Which wa value is the best one in this case? Thank you very much for your valuable help. Best, Sun Ian Tickle [EMAIL PROTECTED] wrote: Hi Sun Tang Unfortunately there's no such thing as a fixed value for the maximum acceptable Rfree-Rwork difference that applies in all circumstances, because the 'normal' difference depends on a number of factors, mainly the observation/parameter ratio, which depends in turn on the resolution and the solvent content (a greater solvent content means a bigger cell volume which means more reflections for a given number of ordered atoms in the a.u. and hence a bigger obs/param ratio). The Rfree-Rwork difference also depends on Rwork itself (i.e. you tend to get higher values of Rfree-Rwork for higher values of Rwork), so it's better to think in terms of the Rfree/Rwork ratio (which is independent of Rwork). So for example at very high resolution a 'normal' value for Rfree-Rwork might be only 0.02 (so 0.05 which is what many people consider acceptable would actually be unacceptably high), whereas at low resolution it might be 0.1 (so 0.05 would be unacceptably low). Also you need to bear in mind that Rfree tends to have a quite high uncertainty, particularly at low resolution (because it's usually based on a relatively small number of observations), so the deviation has to be quite big (e.g. 3 SU) before it can be considered to be statistically significant. So Rfree needs to be compared not with Rwork at all but with the value of the optimal Rfree/Rwork expected on the basis that the model parameterisation and weighting of X-ray terms and restraints are optimal and the errors in the model have the same effect as the random experimental errors in the data (i.e. a statistical 'null hypothesis'). As Tim just pointed out we tried to do this in our Acta D (1998) papers: there you can compare your observed Rfree/Rwork ratio either with the theoretical value or with the value found for 'typical' structures in the PDB at the same resolution. An abnormal Rfree/Rwork ratio could arise from a number of causes, not just over-fitting (I assume that's what you mean by 'over-refinement' - it's not clear to me how a structure can be 'over-refined' since a fundamental requirement of the maximum likelihood method is that the structure is always refined to convergence, and refining beyond that will by definition produce no further statistically significant changes in the parameters). For example the number of parameters being refined may be either too low, or too high (over-fitting), or the values of the weighting parameters may not be appropriate, or there may be something badly wrong with the atomic model (e.g. mistraced chain). Given the values you are reporting I think the latter is very unlikely, possibly you just need to tweak the X-ray and/or restraint weights. HTH Cheers -- Ian -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Sun Tang Sent: 04 February 2008 16:56 To: Boaz Shaanan Cc: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] an over refined structure Hi Boaz, Thank you for your opinions. The resolution is 2.8A and I remembered some people may think the structure is over-refined when the difference between Rfree/Rwork is greater than 6. What do you think the greatest acceptable difference between the two? Best, Sun Boaz Shaanan wrote: Hi, Why do you think this structure is over-refined ? The Rfree/Rwork difference of 6.2% seems fine, although you didn't mention resolution. If anything, an over-refined structure would show a smaller difference, as far as I know. If all the other criteria (Ramachandran outliers, etc., map) are OK you should just be happy with your structure. Cheers, Boaz - Original Message - From: Sun Tang Date: Monday, February 4, 2008 18:41 Subject: [ccp4bb] an over refined structure To: CCP4BB@JISCMAIL.AC.UK Hello All, I refined a structure with Refmac in CCP4i and the R/Rfree is 0.215/0.277. The difference between R and Rfree is too much even though I used 0.01 for weighting term in the refinement (the default value is 0.3). The RMSD for bond length and bond angle is 0.016 A and 1.7 degree. What may be wrong with the over-refined structure? What is the reason for leading to an over-refined structure? How to avoid it? Best wishes, Sun Tang
Re: [ccp4bb] an over refined structure
Hi Anastassis, Thank you very much for your suggestions. I answered the questions as follows. I used NCS before rigid body refinement. After that I did not put NCS restraints in the restrained refinement and TLS+restrained refinement because it raised the R/Rfree quite a lot. The resolution is 2.8 A. I did not check twinning. I will do that soon. I used PHASER to solve the structure and the density of the N-domain (~ 50 a.a) in one molecule is not good, with a lot of broken density for the backbone. I used the TLS in the refinement. I usually used the initial TLS parameters (with only residues in group, no coordinates for the center) for all the TLS refinement. When I used the refined TLS parameters, the refinement would go divergence. I only added about 120 water molecules for the whole structures. I will update the information after I try further refinement. Best wishes, Sun Anastassis Perrakis [EMAIL PROTECTED] wrote: Hi - I don't think there is something necessarily wrong with the values you report. A few questions to see *if* something is wrong are: - as you wrote to Tim you have NCS: do you use NCS restraints ? - what is the resolution / B factor of the data ? - have the data been checked for twining ? (phenix.xtriage) - is the N-term domain of one copy really invisible (then indeed do remove ...!) - has TLS been used ? - did you add waters ? (too many?) I guess then we can make better suggestions if something is wrong and if so how its best to fix. A. I refined a structure with Refmac in CCP4i and the R/Rfree is 0.215/0.277. The difference between R and Rfree is too much even though I used 0.01 for weighting term in the refinement (the default value is 0.3). The RMSD for bond length and bond angle is 0.016 A and 1.7 degree. - Looking for last minute shopping deals? Find them fast with Yahoo! Search.