Re: [ccp4bb] crystallographic confusion
Dear Dale, dear Kay, last year, we discussed this kind of problems (Urzhumtseva et al., 2013, Acta Cryst., D69, 1921-1934). Our approach does not tell you where to cut your data and which reflections to accept / reject but as soon as you have your set of reflections, you calculate very formally and very strictly the effective resolution of ANY diffraction data set, with ANY completeness, with ANY composition of measured / missed reflections. For a complete data set, d_effective coincides with the d_high value but is different for incomplete data sets. The article contains a number of examples. With this approach, the discussion of the completeness of the highest-resolution shell becomes irrelevant; one can simply cite the effective resolution. I hope this can help. With best regards, Sacha Urzhumtsev De : CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] de la part de Dale Tronrud [de...@daletronrud.com] Envoyé : samedi 19 avril 2014 03:20 À : CCP4BB@JISCMAIL.AC.UK Objet : Re: [ccp4bb] crystallographic confusion -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I see no problem with saying that the model was refined against every spot on the detector that the data reduction program said was observed (and I realize there is argument about this) but declare that the resolution of the model is a number based on the traditional criteria. Dale Tronrud
Re: [ccp4bb] crystallographic confusion [SEC=UNCLASSIFIED]
I thought... we had a definition for reportable resolution: The resolution at which I/sig(I) = 2, and completeness 50% This reported resolution is not to be confused with data cutoff. We give the software all the scaled and merged data and let it down-weight the weak data. At the edge, we might happily have Rmerge=50%, multiplicity = 1.1, I/sig(I) = 1. The resolution of the edge data should not be reported as the resolution of the data. This reportable resolution is actually useful in refinement. Very very roughly, you are done when the R-factor equals reportable resolution divided by 10. 25% for 2.5A data. 15% for 1.5A data. Anthony Duff -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Tom Peat Sent: Saturday, 19 April 2014 6:03 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] crystallographic confusion As has been alluded to, people (and not just crystallographers) are looking for a simple number to indicate the quality of a structure. Unfortunately this doesn't exist, but it doesn't keep people from wanting such a number. Most crystallographers (I think) now agree that throwing data away is a bad idea and will make maps worse. The real question is not whether to throw data away, but what to call the resolution of a map/ structure. A structure that has been refined with data that is ~90% complete at 3.6 Angstrom resolution but that has 2% completeness at 2.8 Angstrom would be considered to be ? (Just to pull one instance from the PDB). If we as crystallographers could agree to some definition as to what our arbitrary resolution number is, life would probably be easier for the non-crystallographers (as well as for the crystallographers in some instances- particularly in the process of reviewing papers). cheers, tom Tom Peat Biophysics Group CSIRO, CMSE 343 Royal Parade Parkville, VIC, 3052 +613 9662 7304 +614 57 539 419 tom.p...@csiro.au From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of William G. Scott [wgsc...@ucsc.edu] Sent: Saturday, April 19, 2014 11:41 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] crystallographic confusion Dear Arnon et al: My understanding of the Shannon/Nyquist sampling theorem is admittedly extremely rudimentary, but I think aliasing can result if an arbitrary brick-wall resolution cut-off to the data is applied. So let's say there are real data are to 2.0 Å resolution. Applying the 2.2 Å cutoff will result in aliasing artifacts in the electron density map corresponding to an outer shell reciprocal space volume equal but opposite to the cut out data. The alternative, which is to process and keep all the measured reflections, should help to minimize this. An effective resolution can be calculated and quoted. This becomes a significant problem with nucleic acids and their complexes, which often diffract with significant anisotropy. The idea that 85% completeness in the outer shell should dictate its rejection seems rather surprising and arbitrary. The aliasing artifacts in that case would probably be significant. The map image quality, after all, is what we are after, not beautiful Table 1 statistics. Bill William G. Scott Professor Department of Chemistry and Biochemistry and The Center for the Molecular Biology of RNA University of California at Santa Cruz Santa Cruz, California 95064 USA http://scottlab.ucsc.edu/scottlab/ On Apr 18, 2014, at 5:22 PM, Lavie, Arnon la...@uic.edu wrote: Dear Kay. Arguably, the resolution of a structure is the most important number to look at; it is definitely the first to be examined, and often the only one examined by non-structural biologists. Since this number conveys so much concerning the quality/reliability of the the structure, it is not surprising that we need to get this one parameter right. Let us examine a hypothetical situation, in which a data set at the 2.2-2.0 resolution shell has 20% completeness. Is this a 2.0 A resolution structure? While you make a sound argument that including that data may result in a better refined model (more observations, more restraints), I would not consider that model the same quality as one refined against a data set that has 90% completeness at that resolution shell. As I see it, there are two issues here: one, is whether to include such data in refinement? I am not sure if low completeness (especially if not random) can be detrimental to a correct model, but I will let other weigh in on that. The second question is where to declare the resolution limit of a particular data set? To my mind, here high completeness (the term high needs a precise definition) better describes the true resolution limit of the diffraction, and with this what I can conclude about the quality of the refined model. My two cents. Arnon Lavie On Fri, April 18, 2014 6:51 pm, Kay Diederichs wrote: Hi
Re: [ccp4bb] crystallographic confusion
Dear Alexandre, I read your paper and it seems very relevant to the present discussion (and future referee comments). Have the criteria that you propose for determining the effective resolution been implemented in any program or crystallographic suite in way that we can read in a data set and get out the effective resolution based on your criteria? Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Alexandre OURJOUMTSEV [sa...@igbmc.fr] Sent: Saturday, April 19, 2014 12:41 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] crystallographic confusion Dear Dale, dear Kay, last year, we discussed this kind of problems (Urzhumtseva et al., 2013, Acta Cryst., D69, 1921-1934). Our approach does not tell you where to cut your data and which reflections to accept / reject but as soon as you have your set of reflections, you calculate very formally and very strictly the effective resolution of ANY diffraction data set, with ANY completeness, with ANY composition of measured / missed reflections. For a complete data set, d_effective coincides with the d_high value but is different for incomplete data sets. The article contains a number of examples. With this approach, the discussion of the completeness of the highest-resolution shell becomes irrelevant; one can simply cite the effective resolution. I hope this can help. With best regards, Sacha Urzhumtsev De : CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] de la part de Dale Tronrud [de...@daletronrud.com] Envoyé : samedi 19 avril 2014 03:20 À : CCP4BB@JISCMAIL.AC.UK Objet : Re: [ccp4bb] crystallographic confusion -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I see no problem with saying that the model was refined against every spot on the detector that the data reduction program said was observed (and I realize there is argument about this) but declare that the resolution of the model is a number based on the traditional criteria. Dale Tronrud
Re: [ccp4bb] crystallographic confusion [SEC=UNCLASSIFIED]
There are three places in a pdb file where resolution is defined. Unfortunately by current conventions I believe they are all required to show the same value. If one of them could be redefined to be effective resolution, with a comment to explain how that was arrived at, it would take the pressure off of resolution cuttoff to serve double duty as the principal indicator of quality. I guess you can entitle your paper 2.2 A structure of XYZ even if the pdb file shows the resolution to be 1.92 with 22% completeness in the last shell, which could appease some reviewers but make problems with others. eab DUFF, Anthony wrote: I thought... we had a definition for reportable resolution: The resolution at which I/sig(I) = 2, and completeness 50% This reported resolution is not to be confused with data cutoff. We give the software all the scaled and merged data and let it down-weight the weak data. At the edge, we might happily have Rmerge=50%, multiplicity = 1.1, I/sig(I) = 1. The resolution of the edge data should not be reported as the resolution of the data. This reportable resolution is actually useful in refinement. Very very roughly, you are done when the R-factor equals reportable resolution divided by 10. 25% for 2.5A data. 15% for 1.5A data. Anthony Duff -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Tom Peat Sent: Saturday, 19 April 2014 6:03 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] crystallographic confusion As has been alluded to, people (and not just crystallographers) are looking for a simple number to indicate the quality of a structure. Unfortunately this doesn't exist, but it doesn't keep people from wanting such a number. Most crystallographers (I think) now agree that throwing data away is a bad idea and will make maps worse. The real question is not whether to throw data away, but what to call the resolution of a map/ structure. A structure that has been refined with data that is ~90% complete at 3.6 Angstrom resolution but that has 2% completeness at 2.8 Angstrom would be considered to be ? (Just to pull one instance from the PDB). If we as crystallographers could agree to some definition as to what our arbitrary resolution number is, life would probably be easier for the non-crystallographers (as well as for the crystallographers in some instances- particularly in the process of reviewing papers). cheers, tom Tom Peat Biophysics Group CSIRO, CMSE 343 Royal Parade Parkville, VIC, 3052 +613 9662 7304 +614 57 539 419 tom.p...@csiro.au From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of William G. Scott [wgsc...@ucsc.edu] Sent: Saturday, April 19, 2014 11:41 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] crystallographic confusion Dear Arnon et al: My understanding of the Shannon/Nyquist sampling theorem is admittedly extremely rudimentary, but I think aliasing can result if an arbitrary brick-wall resolution cut-off to the data is applied. So let's say there are real data are to 2.0 Å resolution. Applying the 2.2 Å cutoff will result in aliasing artifacts in the electron density map corresponding to an outer shell reciprocal space volume equal but opposite to the cut out data. The alternative, which is to process and keep all the measured reflections, should help to minimize this. An effective resolution can be calculated and quoted. This becomes a significant problem with nucleic acids and their complexes, which often diffract with significant anisotropy. The idea that 85% completeness in the outer shell should dictate its rejection seems rather surprising and arbitrary. The aliasing artifacts in that case would probably be significant. The map image quality, after all, is what we are after, not beautiful Table 1 statistics. Bill William G. Scott Professor Department of Chemistry and Biochemistry and The Center for the Molecular Biology of RNA University of California at Santa Cruz Santa Cruz, California 95064 USA http://scottlab.ucsc.edu/scottlab/ On Apr 18, 2014, at 5:22 PM, Lavie, Arnon la...@uic.edu wrote: Dear Kay. Arguably, the resolution of a structure is the most important number to look at; it is definitely the first to be examined, and often the only one examined by non-structural biologists. Since this number conveys so much concerning the quality/reliability of the the structure, it is not surprising that we need to get this one parameter right. Let us examine a hypothetical situation, in which a data set at the 2.2-2.0 resolution shell has 20% completeness. Is this a 2.0 A resolution structure? While you make a sound argument that including that data may result in a better refined model (more observations, more restraints), I would not consider that model the same quality as one refined against a data set that has 90% completeness at that resolution shell. As I see it, there are two issues here
Re: [ccp4bb] crystallographic confusion
Hello, I read your paper and it seems very relevant to the present discussion (and future referee comments). Have the criteria that you propose for determining the effective resolution been implemented in any program or crystallographic suite in way that we can read in a data set and get out the effective resolution based on your criteria? I'm happy to add it as something like phenix.effective_resolution.. I'll see what it takes (should re-read the paper!). Pavel
Re: [ccp4bb] crystallographic confusion
On Saturday, 19 April 2014 02:52:38 PM Zbyszek Otwinowski wrote: Why not improve effective resolution to include consideration of solvent content? Due to constant packing density of proteins, it would become a synonim (by appropriate transformation) to number of observations per modelled atom. Following that line of thought, perhaps reporting the observation/parameter ratio would provide a more informative number than resolution. Of course that leads to a morass of argumentation about whether to modify it by the number and class of restraints used during refinement. Ethan Zbyszek Otwinowski Dear Dale, dear Kay, last year, we discussed this kind of problems (Urzhumtseva et al., 2013, Acta Cryst., D69, 1921-1934). Our approach does not tell you where to cut your data and which reflections to accept / reject but as soon as you have your set of reflections, you calculate very formally and very strictly the effective resolution of ANY diffraction data set, with ANY completeness, with ANY composition of measured / missed reflections. For a complete data set, d_effective coincides with the d_high value but is different for incomplete data sets. The article contains a number of examples. With this approach, the discussion of the completeness of the highest-resolution shell becomes irrelevant; one can simply cite the effective resolution. I hope this can help. With best regards, Sacha Urzhumtsev De : CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] de la part de Dale Tronrud [de...@daletronrud.com] Envoyé : samedi 19 avril 2014 03:20 À : CCP4BB@JISCMAIL.AC.UK Objet : Re: [ccp4bb] crystallographic confusion -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I see no problem with saying that the model was refined against every spot on the detector that the data reduction program said was observed (and I realize there is argument about this) but declare that the resolution of the model is a number based on the traditional criteria. Dale Tronrud Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353 -- mail: Biomolecular Structure Center, K-428 Health Sciences Bldg MS 357742, University of Washington, Seattle 98195-7742
Re: [ccp4bb] crystallographic confusion
Dear Kay. Arguably, the resolution of a structure is the most important number to look at; it is definitely the first to be examined, and often the only one examined by non-structural biologists. Since this number conveys so much concerning the quality/reliability of the the structure, it is not surprising that we need to get this one parameter right. Let us examine a hypothetical situation, in which a data set at the 2.2-2.0 resolution shell has 20% completeness. Is this a 2.0 A resolution structure? While you make a sound argument that including that data may result in a better refined model (more observations, more restraints), I would not consider that model the same quality as one refined against a data set that has 90% completeness at that resolution shell. As I see it, there are two issues here: one, is whether to include such data in refinement? I am not sure if low completeness (especially if not random) can be detrimental to a correct model, but I will let other weigh in on that. The second question is where to declare the resolution limit of a particular data set? To my mind, here high completeness (the term high needs a precise definition) better describes the true resolution limit of the diffraction, and with this what I can conclude about the quality of the refined model. My two cents. Arnon Lavie On Fri, April 18, 2014 6:51 pm, Kay Diederichs wrote: Hi everybody, since we seem to have a little Easter discussion about crystallographic statistics anyway, I would like to bring up one more topic. A recent email sent to me said: Another referee complained that the completeness in that bin was too low at 85% - my answer was that I consider the referee's assertion as indicating a (unfortunately not untypical case of) severe statistical confusion. Actually, there is no reason at all to discard a resolution shell just because it is not complete, and what would be a cutoff, if there were one? What constitutes too low? The benefit of including also incomplete resolution shells is that every reflection constitutes a restraint in refinement (and thus reduces overfitting), and contributes its little bit of detail to the electron density map. Some people may be mis-lead by a wrong understanding of the cats and ducks examples by Kevin Cowtan: omitting further data from maps makes Fourier ripples/artifacts worse, not better. The unfortunate consequence of the referee's opinion (and its enforcement and implementation in papers) is that the structures that result from the enforced re-refinement against truncated data are _worse_ than the original data that included the incomplete resolution shells. So could we as a community please abandon this inappropriate and un-justified practice - of course after proper discussion here? Kay
Re: [ccp4bb] crystallographic confusion
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I see no problem with saying that the model was refined against every spot on the detector that the data reduction program said was observed (and I realize there is argument about this) but declare that the resolution of the model is a number based on the traditional criteria. This solution allows for the best possible model to be constructed and the buyer is still allowed to make quality judgements the same way as always. Dale Tronrud On 4/18/2014 5:22 PM, Lavie, Arnon wrote: Dear Kay. Arguably, the resolution of a structure is the most important number to look at; it is definitely the first to be examined, and often the only one examined by non-structural biologists. Since this number conveys so much concerning the quality/reliability of the the structure, it is not surprising that we need to get this one parameter right. Let us examine a hypothetical situation, in which a data set at the 2.2-2.0 resolution shell has 20% completeness. Is this a 2.0 A resolution structure? While you make a sound argument that including that data may result in a better refined model (more observations, more restraints), I would not consider that model the same quality as one refined against a data set that has 90% completeness at that resolution shell. As I see it, there are two issues here: one, is whether to include such data in refinement? I am not sure if low completeness (especially if not random) can be detrimental to a correct model, but I will let other weigh in on that. The second question is where to declare the resolution limit of a particular data set? To my mind, here high completeness (the term high needs a precise definition) better describes the true resolution limit of the diffraction, and with this what I can conclude about the quality of the refined model. My two cents. Arnon Lavie On Fri, April 18, 2014 6:51 pm, Kay Diederichs wrote: Hi everybody, since we seem to have a little Easter discussion about crystallographic statistics anyway, I would like to bring up one more topic. A recent email sent to me said: Another referee complained that the completeness in that bin was too low at 85% - my answer was that I consider the referee's assertion as indicating a (unfortunately not untypical case of) severe statistical confusion. Actually, there is no reason at all to discard a resolution shell just because it is not complete, and what would be a cutoff, if there were one? What constitutes too low? The benefit of including also incomplete resolution shells is that every reflection constitutes a restraint in refinement (and thus reduces overfitting), and contributes its little bit of detail to the electron density map. Some people may be mis-lead by a wrong understanding of the cats and ducks examples by Kevin Cowtan: omitting further data from maps makes Fourier ripples/artifacts worse, not better. The unfortunate consequence of the referee's opinion (and its enforcement and implementation in papers) is that the structures that result from the enforced re-refinement against truncated data are _worse_ than the original data that included the incomplete resolution shells. So could we as a community please abandon this inappropriate and un-justified practice - of course after proper discussion here? Kay -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlNRz14ACgkQU5C0gGfAG138HwCfYbUXb5MgQvC/8iCftiuuP1pn H0AAn24ej2FSBxbNbndjnHoJ/xAKCitK =Xh7C -END PGP SIGNATURE-
Re: [ccp4bb] crystallographic confusion
Dear Arnon et al: My understanding of the Shannon/Nyquist sampling theorem is admittedly extremely rudimentary, but I think aliasing can result if an arbitrary brick-wall resolution cut-off to the data is applied. So let’s say there are real data are to 2.0 Å resolution. Applying the 2.2 Å cutoff will result in aliasing artifacts in the electron density map corresponding to an outer shell reciprocal space volume equal but opposite to the cut out data. The alternative, which is to process and keep all the measured reflections, should help to minimize this. An effective resolution can be calculated and quoted. This becomes a significant problem with nucleic acids and their complexes, which often diffract with significant anisotropy. The idea that 85% completeness in the outer shell should dictate its rejection seems rather surprising and arbitrary. The aliasing artifacts in that case would probably be significant. The map image quality, after all, is what we are after, not beautiful Table 1 statistics. Bill William G. Scott Professor Department of Chemistry and Biochemistry and The Center for the Molecular Biology of RNA University of California at Santa Cruz Santa Cruz, California 95064 USA http://scottlab.ucsc.edu/scottlab/ On Apr 18, 2014, at 5:22 PM, Lavie, Arnon la...@uic.edu wrote: Dear Kay. Arguably, the resolution of a structure is the most important number to look at; it is definitely the first to be examined, and often the only one examined by non-structural biologists. Since this number conveys so much concerning the quality/reliability of the the structure, it is not surprising that we need to get this one parameter right. Let us examine a hypothetical situation, in which a data set at the 2.2-2.0 resolution shell has 20% completeness. Is this a 2.0 A resolution structure? While you make a sound argument that including that data may result in a better refined model (more observations, more restraints), I would not consider that model the same quality as one refined against a data set that has 90% completeness at that resolution shell. As I see it, there are two issues here: one, is whether to include such data in refinement? I am not sure if low completeness (especially if not random) can be detrimental to a correct model, but I will let other weigh in on that. The second question is where to declare the resolution limit of a particular data set? To my mind, here high completeness (the term high needs a precise definition) better describes the true resolution limit of the diffraction, and with this what I can conclude about the quality of the refined model. My two cents. Arnon Lavie On Fri, April 18, 2014 6:51 pm, Kay Diederichs wrote: Hi everybody, since we seem to have a little Easter discussion about crystallographic statistics anyway, I would like to bring up one more topic. A recent email sent to me said: Another referee complained that the completeness in that bin was too low at 85% - my answer was that I consider the referee's assertion as indicating a (unfortunately not untypical case of) severe statistical confusion. Actually, there is no reason at all to discard a resolution shell just because it is not complete, and what would be a cutoff, if there were one? What constitutes too low? The benefit of including also incomplete resolution shells is that every reflection constitutes a restraint in refinement (and thus reduces overfitting), and contributes its little bit of detail to the electron density map. Some people may be mis-lead by a wrong understanding of the cats and ducks examples by Kevin Cowtan: omitting further data from maps makes Fourier ripples/artifacts worse, not better. The unfortunate consequence of the referee's opinion (and its enforcement and implementation in papers) is that the structures that result from the enforced re-refinement against truncated data are _worse_ than the original data that included the incomplete resolution shells. So could we as a community please abandon this inappropriate and un-justified practice - of course after proper discussion here? Kay