Re: [ccp4bb] I/sigmaI of 3.0 rule
Dear Boaz, You are quite correct, 'latter' and 'former' need to be switched in my email. Apologies to CCP4bb for the confusion caused! Best wishes, George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582 On Sun, 13 Mar 2011, Boaz Shaanan wrote: Dear George, While I agree with you I wonder whether in this statement: ...The practice of quoting R-values both for all data and for F4sigma(F) seems to me to be useful. For example if the latter is much larger than the former, maybe you are including a lot of weak data... Shouldn't it be: ...former (i.e. R for all data) is much larger than the latter (i.e. R for F4sigma(F)... ? Just wondering, although it could be my late night misunderstanding. Best regards, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] On Behalf Of George M. Sheldrick [gshe...@shelx.uni-ac.gwdg.de] Sent: Sunday, March 13, 2011 12:11 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule Dear James, I'm a bit puzzled by your negative R-values and unstable behavior. In practice, whether we refine against intensity or against |F|, it is traditional to quote an R-factor (called R1 in small molecule crystallography) R = Sum||Fo|-|Fc|| / Sum|Fo|. Reflections that have negative measured intensities are either given F=0 or (e.g. using TRUNCATE) F is set to a small positive value, both of which avoid having to take the square root of a negative number which most computers don't like doing. Then the 'divide by zero' catastropy and negative R-values cannot happen because Sum|Fo| is always significantly greater than zero, and in my experience there is no problem in calculating an R-value even if the data are complete noise. The practice of quoting R-values both for all data and for F4sigma(F) seems to me to be useful. For example if the latter is much larger than the former, maybe you are including a lot of weak data. Similarly in calculating merging R-values, most programs replace negative intensities by zero, again avoiding the problems you describe. Best wishes, George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582 On Sat, 12 Mar 2011, James Holton wrote: The fundamental mathematical problem of using an R statistic on data with I/sigma(I) 3 is that the assumption that the fractional deviates (I-I)/I obey a Gaussian distribution breaks down. And when that happens, the R calculation itself becomes unstable, giving essentially random R values. Therefore, including weak data in R calculations is equivalent to calculating R with a 3-sigma cutoff, and then adding a random number to the R value. Now, random data is one thing, but if the statistic used to evaluate the data quality is itself random, then it is not what I would call useful. Since I am not very good at math, I always find myself approaching statistics by generating long lists of random numbers, manipulating them in some way, and then graphing the results. For graphing Rmerge vs I/sigma(I), one does find that Bernhard's rule of Rmerge = 0.8/( I/sigma(I) ) generally applies, but only for I/sigma(I) that is = 3. It gets better with high multiplicity, but even with m=100, the Rmerge values for the I/sigma(I) 1 points are all over the place. This is true even if you average the value of Rmerge over a million random number seeds. In fact, one must do so much averaging, that I start to worry about the low-order bits of common random number generators. I have attached images of these Rmerge vs I/sigma graphs. The error bars reflect the rms deviation from the average of a large number of Rmerge values (different random number seeds). The missing values are actually points where the average Rmerge in 60 trials (m=3) was still negative. The reason for this noisy R factor problem becomes clear if you consider the limiting case where the true intensity is zero, and make a histogram of ( I - I )/I. It is not a Gaussian. Rather, it is the Gaussian's evil stepsister: the Lorentzian (or Cauchy distribution). This distribution may look a lot like a Gaussian, but it has longer tails, and these tails give it the weird statistical property of having an undefined mean value. This is counterintuitive! Because you can clearly just look at the histogram and see that it has a central peak (at zero), but if you generate a million
Re: [ccp4bb] I/sigmaI of 3.0 rule
You do also have always to consider why you are doing this calculation - usually to satisfy a sceptical and possibly ill-informed referee. A major reason for doing this is to justify including an outer resolution shell of data (see this BB passim), and for this I have come to prefer the random half-dataset correlation coefficient in shells. A CC has a more straightforward distribution than an R-factor (though not entirely without problems). It is independent of the SD estimates, and easy to understand. Phil The fundamental mathematical problem of using an R statistic on data with I/sigma(I) 3 is that the assumption that the fractional deviates (I-I)/I obey a Gaussian distribution breaks down. And when that happens, the R calculation itself becomes unstable, giving essentially random R values. Therefore, including weak data in R calculations is equivalent to calculating R with a 3-sigma cutoff, and then adding a random number to the R value. Now, random data is one thing, but if the statistic used to evaluate the data quality is itself random, then it is not what I would call useful. Since I am not very good at math, I always find myself approaching statistics by generating long lists of random numbers, manipulating them in some way, and then graphing the results. For graphing Rmerge vs I/sigma(I), one does find that Bernhard's rule of Rmerge = 0.8/( I/sigma(I) ) generally applies, but only for I/sigma(I) that is = 3. It gets better with high multiplicity, but even with m=100, the Rmerge values for the I/sigma(I) 1 points are all over the place. This is true even if you average the value of Rmerge over a million random number seeds. In fact, one must do so much averaging, that I start to worry about the low-order bits of common random number generators. I have attached images of these Rmerge vs I/sigma graphs. The error bars reflect the rms deviation from the average of a large number of Rmerge values (different random number seeds). The missing values are actually points where the average Rmerge in 60 trials (m=3) was still negative. The reason for this noisy R factor problem becomes clear if you consider the limiting case where the true intensity is zero, and make a histogram of ( I - I )/I. It is not a Gaussian. Rather, it is the Gaussian's evil stepsister: the Lorentzian (or Cauchy distribution). This distribution may look a lot like a Gaussian, but it has longer tails, and these tails give it the weird statistical property of having an undefined mean value. This is counterintuitive! Because you can clearly just look at the histogram and see that it has a central peak (at zero), but if you generate a million Lorentzian-distributed random numbers and take the average value, you will not get anything close to zero. Try it! You can generate a Lorentzian deviate from a uniform deviate like this: tan(pi*(rand()-0.5)), where rand() makes a random number from 0 to 1. Now, it is not too hard to understand how R could blow up when the true spot intensities are all zero. After all, as I approaches zero, the ratio ( I - I ) / I approaches a divide-by-zero problem. But what about when I/sigma(I) = 1? Or 2? If you look at these histograms, you find that they are a cross between a Gaussian and a Lorentzian (the so-called Voigt function), and the histogram does not become truly Gaussian-looking until I/sigma(I) = 3. At this point, the R factor behaves Bernhard's rule quite well, even with multiplicities as low as 2 or 3. This was the moment when I realized that the early crystallographers who first decided to use this 3-sigma cutoff, were smarter than I am. Now, you can make a Voigt function (or even a Lorentzian) look more like a Gaussian by doing something called outlier rejection, but it is hard to rationalize why the outliers are being rejected. Especially in a simulation! Then again, the silly part of all this is all we really want is the middle of the histogram of ( I - I )/I. In fact, if you just pick the most common Rmerge, you would get a much better estimate of the true Rmerge in a given resolution bin than you would by averaging a hundred times more data. Such procedures are called robust estimators in statistics, and the robust estimator equivalents to the average and the rms deviation from the average are the median and the median absolute deviation from the median. If you make a list of Lorentzian-random numbers as above, and compute the median, you will get a value very close to zero, even with modest multiplicity! And the median absolute deviation from the median rapidly converges to 1, which matches the full width at half maximum of the histogram quite nicely. So, what are the practical implications of this? Perhaps instead of the average Rmerge in each bin we should be looking at the median Rmerge? This will be the same as the average for the cases where I/sigma(I) 3, but still be well
Re: [ccp4bb] I/sigmaI of 3.0 rule
Dear James, I'm a bit puzzled by your negative R-values and unstable behavior. In practice, whether we refine against intensity or against |F|, it is traditional to quote an R-factor (called R1 in small molecule crystallography) R = Sum||Fo|-|Fc|| / Sum|Fo|. Reflections that have negative measured intensities are either given F=0 or (e.g. using TRUNCATE) F is set to a small positive value, both of which avoid having to take the square root of a negative number which most computers don't like doing. Then the 'divide by zero' catastropy and negative R-values cannot happen because Sum|Fo| is always significantly greater than zero, and in my experience there is no problem in calculating an R-value even if the data are complete noise. The practice of quoting R-values both for all data and for F4sigma(F) seems to me to be useful. For example if the latter is much larger than the former, maybe you are including a lot of weak data. Similarly in calculating merging R-values, most programs replace negative intensities by zero, again avoiding the problems you describe. Best wishes, George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582 On Sat, 12 Mar 2011, James Holton wrote: The fundamental mathematical problem of using an R statistic on data with I/sigma(I) 3 is that the assumption that the fractional deviates (I-I)/I obey a Gaussian distribution breaks down. And when that happens, the R calculation itself becomes unstable, giving essentially random R values. Therefore, including weak data in R calculations is equivalent to calculating R with a 3-sigma cutoff, and then adding a random number to the R value. Now, random data is one thing, but if the statistic used to evaluate the data quality is itself random, then it is not what I would call useful. Since I am not very good at math, I always find myself approaching statistics by generating long lists of random numbers, manipulating them in some way, and then graphing the results. For graphing Rmerge vs I/sigma(I), one does find that Bernhard's rule of Rmerge = 0.8/( I/sigma(I) ) generally applies, but only for I/sigma(I) that is = 3. It gets better with high multiplicity, but even with m=100, the Rmerge values for the I/sigma(I) 1 points are all over the place. This is true even if you average the value of Rmerge over a million random number seeds. In fact, one must do so much averaging, that I start to worry about the low-order bits of common random number generators. I have attached images of these Rmerge vs I/sigma graphs. The error bars reflect the rms deviation from the average of a large number of Rmerge values (different random number seeds). The missing values are actually points where the average Rmerge in 60 trials (m=3) was still negative. The reason for this noisy R factor problem becomes clear if you consider the limiting case where the true intensity is zero, and make a histogram of ( I - I )/I. It is not a Gaussian. Rather, it is the Gaussian's evil stepsister: the Lorentzian (or Cauchy distribution). This distribution may look a lot like a Gaussian, but it has longer tails, and these tails give it the weird statistical property of having an undefined mean value. This is counterintuitive! Because you can clearly just look at the histogram and see that it has a central peak (at zero), but if you generate a million Lorentzian-distributed random numbers and take the average value, you will not get anything close to zero. Try it! You can generate a Lorentzian deviate from a uniform deviate like this: tan(pi*(rand()-0.5)), where rand() makes a random number from 0 to 1. Now, it is not too hard to understand how R could blow up when the true spot intensities are all zero. After all, as I approaches zero, the ratio ( I - I ) / I approaches a divide-by-zero problem. But what about when I/sigma(I) = 1? Or 2? If you look at these histograms, you find that they are a cross between a Gaussian and a Lorentzian (the so-called Voigt function), and the histogram does not become truly Gaussian-looking until I/sigma(I) = 3. At this point, the R factor behaves Bernhard's rule quite well, even with multiplicities as low as 2 or 3. This was the moment when I realized that the early crystallographers who first decided to use this 3-sigma cutoff, were smarter than I am. Now, you can make a Voigt function (or even a Lorentzian) look more like a Gaussian by doing something called outlier rejection, but it is hard to rationalize why the outliers are being rejected. Especially in a simulation! Then again, the silly part of all this is all we really want is the middle of the histogram of ( I - I )/I. In fact, if you just pick the most common Rmerge, you would get a much better estimate of the true Rmerge in a given resolution bin than
Re: [ccp4bb] I/sigmaI of 3.0 rule
Hi James, May I just offer a short counter-argument to your case for not including weak reflections in the merging residuals? Unlike many people I rather like Rmerge, not because it tells you how good the data are, but because it gives you a clue as to how well the unmerged measurements agree with one another. It's already been mentioned on this thread that Rmerge is ~ 0.8 / I/sigma which means that the inverse is also true - an Rmerge of 0.8 indicates that the average measurement in the shell has an I/sigma of ~ 1 (presuming there are sufficient multiple measurements - if the multiplicity is 3 or so this can be nonsense) This does not depend on the error model or the multiplicity. It just talks about the average. Now, if we exclude all measurements with an I/sigma of less than three we have no idea of how strong the reflections in the shell are on average. We're just top-slicing the good reflections and asking how well they agree. Well, with an I/sigma 3 I would hope they agree rather well if your error model is reasonable. It would suddenly become rare to see an Rmerge 0.3 in the outer shell. I like Rpim. It tells you how good the average measurement should be provided you have not too much radiation damage. However, without Rmerge I can't get a real handle on how well the measurements agree. Personally, what I would like to see is the full contents of the Scala log file available as graphs along with Rd from xdsstat and some other choice statistics so you can get a relatively complete picture, however I appreciate that this is unrealistic :o) Just my 2c. Cheerio, Graeme On 8 March 2011 20:07, James Holton jmhol...@lbl.gov wrote: Although George does not mention anything about data reduction programs, I take from his description that common small-molecule data processing packages (SAINT, others?), have also been modernized to record all data (no I/sigmaI 2 or 3 cutoff). I agree with him that this is a good thing! And it is also a good thing that small-molecule refinement programs use all data. I just don't think it is a good idea to use all data in R factor calculations. Like Ron, I will probably be dating myself when I say that when I first got into the macromolecular crystallography business, it was still commonplace to use a 2-3 sigma spot intensity cutoff. In fact, this is the reason why the PDB wants to know your completeness in the outermost resolution shell (in those days, the outer resolution was defined by where completeness drops to ~80% after the 3 sigma spot cutoff). My experience with this, however, was brief, as the maximum-likelihood revolution was just starting to take hold, and the denzo manual specifically stated that only bad people use sigma cutoffs -3.0. Nevertheless, like many crystallographers from this era, I have fond memories of the REALLY low R factors you can get by using this arcane and now reviled practice. Rsym values of 1-2% were common. It was only recently that I learned enough about statistics to understand the wisdom of my ancestors and that a 3-sigma cutoff is actually the right thing to do if you want to measure a fractional error (like an R factor). That is all I'm saying. -James Holton MAD Scientist On 3/6/2011 2:50 PM, Ronald E Stenkamp wrote: My small molecule experience is old enough (maybe 20 years) that I doubt if it's even close to representing current practices (best or otherwise). Given George's comments, I suspect (and hope) that less-than cutoffs are historical artifacts at this point, kept around in software for making comparisons with older structure determinations. But a bit of scanning of Acta papers and others might be necessary to confirm that. Ron On Sun, 6 Mar 2011, James Holton wrote: Yes, I would classify anything with I/sigmaI 3 as weak. And yes, of course it is possible to get weak spots from small molecule crystals. After all, there is no spot so strong that it cannot be defeated by a sufficient amount of background! I just meant that, relatively speaking, the intensities diffracted from a small molecule crystal are orders of magnitude brighter than those from a macromolecular crystal of the same size, and even the same quality (the 1/Vcell^2 term in Darwin's formula). I find it interesting that you point out the use of a 2 sigma(I) intensity cutoff for small molecule data sets! Is this still common practice? I am not a card-carrying small molecule crystallographer, so I'm not sure. However, if that is the case, then by definition there are no weak intensities in the data set. And this is exactly the kind of data you want for least-squares refinement targets and computing % error quality metrics like R factors. For likelihood targets, however, the weak data are actually a powerful restraint. -James Holton MAD Scientist On 3/6/2011 11:22 AM, Ronald E Stenkamp wrote: Could you please expand on your statement that small-molecule data has
Re: [ccp4bb] I/sigmaI of 3.0 rule
Although George does not mention anything about data reduction programs, I take from his description that common small-molecule data processing packages (SAINT, others?), have also been modernized to record all data (no I/sigmaI 2 or 3 cutoff). I agree with him that this is a good thing! And it is also a good thing that small-molecule refinement programs use all data. I just don't think it is a good idea to use all data in R factor calculations. Like Ron, I will probably be dating myself when I say that when I first got into the macromolecular crystallography business, it was still commonplace to use a 2-3 sigma spot intensity cutoff. In fact, this is the reason why the PDB wants to know your completeness in the outermost resolution shell (in those days, the outer resolution was defined by where completeness drops to ~80% after the 3 sigma spot cutoff). My experience with this, however, was brief, as the maximum-likelihood revolution was just starting to take hold, and the denzo manual specifically stated that only bad people use sigma cutoffs -3.0. Nevertheless, like many crystallographers from this era, I have fond memories of the REALLY low R factors you can get by using this arcane and now reviled practice. Rsym values of 1-2% were common. It was only recently that I learned enough about statistics to understand the wisdom of my ancestors and that a 3-sigma cutoff is actually the right thing to do if you want to measure a fractional error (like an R factor). That is all I'm saying. -James Holton MAD Scientist On 3/6/2011 2:50 PM, Ronald E Stenkamp wrote: My small molecule experience is old enough (maybe 20 years) that I doubt if it's even close to representing current practices (best or otherwise). Given George's comments, I suspect (and hope) that less-than cutoffs are historical artifacts at this point, kept around in software for making comparisons with older structure determinations. But a bit of scanning of Acta papers and others might be necessary to confirm that. Ron On Sun, 6 Mar 2011, James Holton wrote: Yes, I would classify anything with I/sigmaI 3 as weak. And yes, of course it is possible to get weak spots from small molecule crystals. After all, there is no spot so strong that it cannot be defeated by a sufficient amount of background! I just meant that, relatively speaking, the intensities diffracted from a small molecule crystal are orders of magnitude brighter than those from a macromolecular crystal of the same size, and even the same quality (the 1/Vcell^2 term in Darwin's formula). I find it interesting that you point out the use of a 2 sigma(I) intensity cutoff for small molecule data sets! Is this still common practice? I am not a card-carrying small molecule crystallographer, so I'm not sure. However, if that is the case, then by definition there are no weak intensities in the data set. And this is exactly the kind of data you want for least-squares refinement targets and computing % error quality metrics like R factors. For likelihood targets, however, the weak data are actually a powerful restraint. -James Holton MAD Scientist On 3/6/2011 11:22 AM, Ronald E Stenkamp wrote: Could you please expand on your statement that small-molecule data has essentially no weak spots.? The small molecule data sets I've worked with have had large numbers of unobserved reflections where I used 2 sigma(I) cutoffs (maybe 15-30% of the reflections). Would you consider those weak spots or not? Ron On Sun, 6 Mar 2011, James Holton wrote: I should probably admit that I might be indirectly responsible for the resurgence of this I/sigma 3 idea, but I never intended this in the way described by the original poster's reviewer! What I have been trying to encourage people to do is calculate R factors using only hkls for which the signal-to-noise ratio is 3. Not refinement! Refinement should be done against all data. I merely propose that weak data be excluded from R-factor calculations after the refinement/scaling/mergeing/etc. is done. This is because R factors are a metric of the FRACTIONAL error in something (aka a % difference), but a % error is only meaningful when the thing being measured is not zero. However, in macromolecular crystallography, we tend to measure a lot of zeroes. There is nothing wrong with measuring zero! An excellent example of this is confirming that a systematic absence is in fact absent. The sigma on the intensity assigned to an absent spot is still a useful quantity, because it reflects how confident you are in the measurement. I.E. a sigma of 10 vs 100 means you are more sure that the intensity is zero. However, there is no R factor for systematic absences. How could there be! This is because the definition of % error starts to break down as the true spot intensity gets weaker, and it becomes completely meaningless when the true intensity reaches zero.
Re: [ccp4bb] I/sigmaI of 3.0 rule
I should probably admit that I might be indirectly responsible for the resurgence of this I/sigma 3 idea, but I never intended this in the way described by the original poster's reviewer! What I have been trying to encourage people to do is calculate R factors using only hkls for which the signal-to-noise ratio is 3. Not refinement! Refinement should be done against all data. I merely propose that weak data be excluded from R-factor calculations after the refinement/scaling/mergeing/etc. is done. This is because R factors are a metric of the FRACTIONAL error in something (aka a % difference), but a % error is only meaningful when the thing being measured is not zero. However, in macromolecular crystallography, we tend to measure a lot of zeroes. There is nothing wrong with measuring zero! An excellent example of this is confirming that a systematic absence is in fact absent. The sigma on the intensity assigned to an absent spot is still a useful quantity, because it reflects how confident you are in the measurement. I.E. a sigma of 10 vs 100 means you are more sure that the intensity is zero. However, there is no R factor for systematic absences. How could there be! This is because the definition of % error starts to break down as the true spot intensity gets weaker, and it becomes completely meaningless when the true intensity reaches zero. Historically, I believe the widespread use of R factors came about because small-molecule data has essentially no weak spots. With the exception of absences (which are not used in refinement), spots from salt crystals are strong all the way out to edge of the detector, (even out to the limiting sphere, which is defined by the x-ray wavelength). So, when all the data are strong, a % error is an easy-to-calculate quantity that actually describes the sigmas of the data very well. That is, sigma(I) of strong spots tends to be dominated by things like beam flicker, spindle stability, shutter accuracy, etc. All these usually add up to ~5% error, and indeed even the Braggs could typically get +/-5% for the intensity of the diffracted rays they were measuring. Things like Rsym were therefore created to check that nothing funny happened in the measurement. For similar reasons, the quality of a model refined against all-strong data is described very well by a % error, and this is why the refinement R factors rapidly became popular. Most people intuitively know what you mean if you say that your model fits the data to within 5%. In fact, a widely used criterion for the correctness of a small molecule structure is that the refinement R factor must be LOWER than Rsym. This is equivalent to saying that your curve (model) fit your data to within experimental error. Unfortunately, this has never been the case for macromolecular structures! The problem with protein crystals, of course, is that we have lots of weak data. And by weak, I don't mean bad! Yes, it is always nicer to have more intense spots, but there is nothing shameful about knowing that certain intensities are actually very close to zero. In fact, from the point of view of the refinement program, isn't describing some high-angle spot as: zero, plus or minus 10, better than I have no idea? Indeed, several works mentioned already as well as the free lunch algorithm have demonstrated that these zero data can actually be useful, even if it is well beyond the resolution limit. So, what do we do? I see no reason to abandon R factors, since they have such a long history and give us continuity of criteria going back almost a century. However, I also see no reason to punish ourselves by including lots of zeroes in the denominator. In fact, using weak data in an R factor calculation defeats their best feature. R factors are a very good estimate of the fractional component of the total error, provided they are calculated with strong data only. Of course, with strong and weak data, the best thing to do is compare the model-data disagreement with the magnitude of the error. That is, compare |Fobs-Fcalc| to sigma(Fobs), not Fobs itself. Modern refinement programs do this! And I say the more data the merrier. -James Holton MAD Scientist On 3/4/2011 5:15 AM, Marjolein Thunnissen wrote: hi Recently on a paper I submitted, it was the editor of the journal who wanted exactly the same thing. I never argued with the editor about this (should have maybe), but it could be one cause of the epidemic that Bart Hazes saw best regards Marjolein On Mar 3, 2011, at 12:29 PM, Roberto Battistutta wrote: Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space
Re: [ccp4bb] I/sigmaI of 3.0 rule
Could you please expand on your statement that small-molecule data has essentially no weak spots.? The small molecule data sets I've worked with have had large numbers of unobserved reflections where I used 2 sigma(I) cutoffs (maybe 15-30% of the reflections). Would you consider those weak spots or not? Ron On Sun, 6 Mar 2011, James Holton wrote: I should probably admit that I might be indirectly responsible for the resurgence of this I/sigma 3 idea, but I never intended this in the way described by the original poster's reviewer! What I have been trying to encourage people to do is calculate R factors using only hkls for which the signal-to-noise ratio is 3. Not refinement! Refinement should be done against all data. I merely propose that weak data be excluded from R-factor calculations after the refinement/scaling/mergeing/etc. is done. This is because R factors are a metric of the FRACTIONAL error in something (aka a % difference), but a % error is only meaningful when the thing being measured is not zero. However, in macromolecular crystallography, we tend to measure a lot of zeroes. There is nothing wrong with measuring zero! An excellent example of this is confirming that a systematic absence is in fact absent. The sigma on the intensity assigned to an absent spot is still a useful quantity, because it reflects how confident you are in the measurement. I.E. a sigma of 10 vs 100 means you are more sure that the intensity is zero. However, there is no R factor for systematic absences. How could there be! This is because the definition of % error starts to break down as the true spot intensity gets weaker, and it becomes completely meaningless when the true intensity reaches zero. Historically, I believe the widespread use of R factors came about because small-molecule data has essentially no weak spots. With the exception of absences (which are not used in refinement), spots from salt crystals are strong all the way out to edge of the detector, (even out to the limiting sphere, which is defined by the x-ray wavelength). So, when all the data are strong, a % error is an easy-to-calculate quantity that actually describes the sigmas of the data very well. That is, sigma(I) of strong spots tends to be dominated by things like beam flicker, spindle stability, shutter accuracy, etc. All these usually add up to ~5% error, and indeed even the Braggs could typically get +/-5% for the intensity of the diffracted rays they were measuring. Things like Rsym were therefore created to check that nothing funny happened in the measurement. For similar reasons, the quality of a model refined against all-strong data is described very well by a % error, and this is why the refinement R factors rapidly became popular. Most people intuitively know what you mean if you say that your model fits the data to within 5%. In fact, a widely used criterion for the correctness of a small molecule structure is that the refinement R factor must be LOWER than Rsym. This is equivalent to saying that your curve (model) fit your data to within experimental error. Unfortunately, this has never been the case for macromolecular structures! The problem with protein crystals, of course, is that we have lots of weak data. And by weak, I don't mean bad! Yes, it is always nicer to have more intense spots, but there is nothing shameful about knowing that certain intensities are actually very close to zero. In fact, from the point of view of the refinement program, isn't describing some high-angle spot as: zero, plus or minus 10, better than I have no idea? Indeed, several works mentioned already as well as the free lunch algorithm have demonstrated that these zero data can actually be useful, even if it is well beyond the resolution limit. So, what do we do? I see no reason to abandon R factors, since they have such a long history and give us continuity of criteria going back almost a century. However, I also see no reason to punish ourselves by including lots of zeroes in the denominator. In fact, using weak data in an R factor calculation defeats their best feature. R factors are a very good estimate of the fractional component of the total error, provided they are calculated with strong data only. Of course, with strong and weak data, the best thing to do is compare the model-data disagreement with the magnitude of the error. That is, compare |Fobs-Fcalc| to sigma(Fobs), not Fobs itself. Modern refinement programs do this! And I say the more data the merrier. -James Holton MAD Scientist On 3/4/2011 5:15 AM, Marjolein Thunnissen wrote: hi Recently on a paper I submitted, it was the editor of the journal who wanted exactly the same thing. I never argued with the editor about this (should have maybe), but it could be one cause of the epidemic that Bart Hazes saw best regards Marjolein On Mar 3, 2011, at 12:29 PM, Roberto
Re: [ccp4bb] I/sigmaI of 3.0 rule
Yes, I would classify anything with I/sigmaI 3 as weak. And yes, of course it is possible to get weak spots from small molecule crystals. After all, there is no spot so strong that it cannot be defeated by a sufficient amount of background! I just meant that, relatively speaking, the intensities diffracted from a small molecule crystal are orders of magnitude brighter than those from a macromolecular crystal of the same size, and even the same quality (the 1/Vcell^2 term in Darwin's formula). I find it interesting that you point out the use of a 2 sigma(I) intensity cutoff for small molecule data sets! Is this still common practice? I am not a card-carrying small molecule crystallographer, so I'm not sure. However, if that is the case, then by definition there are no weak intensities in the data set. And this is exactly the kind of data you want for least-squares refinement targets and computing % error quality metrics like R factors. For likelihood targets, however, the weak data are actually a powerful restraint. -James Holton MAD Scientist On 3/6/2011 11:22 AM, Ronald E Stenkamp wrote: Could you please expand on your statement that small-molecule data has essentially no weak spots.? The small molecule data sets I've worked with have had large numbers of unobserved reflections where I used 2 sigma(I) cutoffs (maybe 15-30% of the reflections). Would you consider those weak spots or not? Ron On Sun, 6 Mar 2011, James Holton wrote: I should probably admit that I might be indirectly responsible for the resurgence of this I/sigma 3 idea, but I never intended this in the way described by the original poster's reviewer! What I have been trying to encourage people to do is calculate R factors using only hkls for which the signal-to-noise ratio is 3. Not refinement! Refinement should be done against all data. I merely propose that weak data be excluded from R-factor calculations after the refinement/scaling/mergeing/etc. is done. This is because R factors are a metric of the FRACTIONAL error in something (aka a % difference), but a % error is only meaningful when the thing being measured is not zero. However, in macromolecular crystallography, we tend to measure a lot of zeroes. There is nothing wrong with measuring zero! An excellent example of this is confirming that a systematic absence is in fact absent. The sigma on the intensity assigned to an absent spot is still a useful quantity, because it reflects how confident you are in the measurement. I.E. a sigma of 10 vs 100 means you are more sure that the intensity is zero. However, there is no R factor for systematic absences. How could there be! This is because the definition of % error starts to break down as the true spot intensity gets weaker, and it becomes completely meaningless when the true intensity reaches zero. Historically, I believe the widespread use of R factors came about because small-molecule data has essentially no weak spots. With the exception of absences (which are not used in refinement), spots from salt crystals are strong all the way out to edge of the detector, (even out to the limiting sphere, which is defined by the x-ray wavelength). So, when all the data are strong, a % error is an easy-to-calculate quantity that actually describes the sigmas of the data very well. That is, sigma(I) of strong spots tends to be dominated by things like beam flicker, spindle stability, shutter accuracy, etc. All these usually add up to ~5% error, and indeed even the Braggs could typically get +/-5% for the intensity of the diffracted rays they were measuring. Things like Rsym were therefore created to check that nothing funny happened in the measurement. For similar reasons, the quality of a model refined against all-strong data is described very well by a % error, and this is why the refinement R factors rapidly became popular. Most people intuitively know what you mean if you say that your model fits the data to within 5%. In fact, a widely used criterion for the correctness of a small molecule structure is that the refinement R factor must be LOWER than Rsym. This is equivalent to saying that your curve (model) fit your data to within experimental error. Unfortunately, this has never been the case for macromolecular structures! The problem with protein crystals, of course, is that we have lots of weak data. And by weak, I don't mean bad! Yes, it is always nicer to have more intense spots, but there is nothing shameful about knowing that certain intensities are actually very close to zero. In fact, from the point of view of the refinement program, isn't describing some high-angle spot as: zero, plus or minus 10, better than I have no idea? Indeed, several works mentioned already as well as the free lunch algorithm have demonstrated that these zero data can actually be useful, even if it is well beyond the resolution limit.
Re: [ccp4bb] I/sigmaI of 3.0 rule
Since small molecules are being discussed maybe I should comment. A widely used small molecule program that I don't need to advertise here refines against all measured intensities unless the user has imposed a resolution cutoff. It prints R values for all data and for I2sig(I) [F4sig(F)]. The user can of course improve these by cutting back the resolution but if he or she oversteps 0.84A he/she will be caught by the CIF police. This works like a radar trap so weak datasets are usually truncated to 0.84A whether or not there are significant data to that resolution. It is always instructive to compare the R-values for all data and I2sig(I); if the former is substantially larger, a lot of noisy outer data have been included. It is not true that small molecule datasets do not contain weak reflections. One should remember that the intensity statistics are different for centrosymmetric space groups: very weak AND very strong reflections (relative to the average in a resolution shell) are much more common! George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582 On Sun, 6 Mar 2011, James Holton wrote: Yes, I would classify anything with I/sigmaI 3 as weak. And yes, of course it is possible to get weak spots from small molecule crystals. After all, there is no spot so strong that it cannot be defeated by a sufficient amount of background! I just meant that, relatively speaking, the intensities diffracted from a small molecule crystal are orders of magnitude brighter than those from a macromolecular crystal of the same size, and even the same quality (the 1/Vcell^2 term in Darwin's formula). I find it interesting that you point out the use of a 2 sigma(I) intensity cutoff for small molecule data sets! Is this still common practice? I am not a card-carrying small molecule crystallographer, so I'm not sure. However, if that is the case, then by definition there are no weak intensities in the data set. And this is exactly the kind of data you want for least-squares refinement targets and computing % error quality metrics like R factors. For likelihood targets, however, the weak data are actually a powerful restraint. -James Holton MAD Scientist On 3/6/2011 11:22 AM, Ronald E Stenkamp wrote: Could you please expand on your statement that small-molecule data has essentially no weak spots.? The small molecule data sets I've worked with have had large numbers of unobserved reflections where I used 2 sigma(I) cutoffs (maybe 15-30% of the reflections). Would you consider those weak spots or not? Ron On Sun, 6 Mar 2011, James Holton wrote: I should probably admit that I might be indirectly responsible for the resurgence of this I/sigma 3 idea, but I never intended this in the way described by the original poster's reviewer! What I have been trying to encourage people to do is calculate R factors using only hkls for which the signal-to-noise ratio is 3. Not refinement! Refinement should be done against all data. I merely propose that weak data be excluded from R-factor calculations after the refinement/scaling/mergeing/etc. is done. This is because R factors are a metric of the FRACTIONAL error in something (aka a % difference), but a % error is only meaningful when the thing being measured is not zero. However, in macromolecular crystallography, we tend to measure a lot of zeroes. There is nothing wrong with measuring zero! An excellent example of this is confirming that a systematic absence is in fact absent. The sigma on the intensity assigned to an absent spot is still a useful quantity, because it reflects how confident you are in the measurement. I.E. a sigma of 10 vs 100 means you are more sure that the intensity is zero. However, there is no R factor for systematic absences. How could there be! This is because the definition of % error starts to break down as the true spot intensity gets weaker, and it becomes completely meaningless when the true intensity reaches zero. Historically, I believe the widespread use of R factors came about because small-molecule data has essentially no weak spots. With the exception of absences (which are not used in refinement), spots from salt crystals are strong all the way out to edge of the detector, (even out to the limiting sphere, which is defined by the x-ray wavelength). So, when all the data are strong, a % error is an easy-to-calculate quantity that actually describes the sigmas of the data very well. That is, sigma(I) of strong spots tends to be dominated by things like beam flicker, spindle stability, shutter accuracy, etc. All these usually add up to ~5% error, and indeed even the Braggs could typically get +/-5% for the intensity of the diffracted rays they
[ccp4bb] Philosophy and Re: [ccp4bb] I/sigmaI of 3.0 rule
Dear Colleagues, Agreed! There is a wider point though which is that the 3D structure and data can form a potential for further analysis and thus the data and the structure can ideally be more than the current paper's contents. Obviously artificially high I/ sig I cut offs are both unfortunate for the current article and such future analyses. In chemical crystallography this potential for further analyses is widely recognised. Eg a crystal structure should have all static disorder sorted, methyl rotor groups correctly positioned etc even if not directly relevant to an article. Such rigour is the requirement for Acta Cryst C , for example, in chemical crystallography. Best wishes, John Prof John R Helliwell DSc On 4 Mar 2011, at 20:36, Roberto Battistutta roberto.battistu...@unipd.it wrote: Dear Phil, I completely agree with you, your words seem to me the best philosophical outcome of the discussion and indicate the right perspective to tackle this topic. In particular you write In the end, the important question as ever is does the experimental data support the conclusions drawn from it? and that will depend on local information about particular atoms and groups, not on global indicators. Exactly, in my case, all the discussion of the structures was absolutely independent from having 1.9, 2.0 or 2.1 A nominal resolution, or to cut at 1.5 or 2.0 or 3.0 I/sigma. This makes the unjustified (as this two-day discussion has clearly pointed out) technical critics of the reviewer even more upsetting. Ciao, Roberto
Re: [ccp4bb] [Fwd: Re: [ccp4bb] I/sigmaI of 3.0 rule]
Maia, provided radiation damage is not a major detrimental factor, your data are just fine, and useful also in the high resolution shell (which still has I/sigma of 2.84 so you could probably process a bit beyond 2.25A). There is nothing wrong with R_meas of 147.1% since, as others have said, R_meas is not limited to 59% (or similar) as a refinement R-factor is. Rather, R_meas is computed from a formula that has a denominator which in the asymptotic limit (noise) approaches zero - because there will be (almost) as many negative observations as positive ones! (The numerator however does not go to zero) Concerning radiation damage: First, take a look at your frames - but make sure you have the same crystal orientation, as anisotropy may mask radiation damage! Then, you can check (using CCP4's loggraph) the R_d plot provided by XDSSTAT (for a single dataset; works best for high-symmetry spacegroups), and you should also check ISa (printed in CORRECT.LP and XSCALE.LP). HTH, Kay P.S. I see one potential problem: XSCALE (VERSION December 6, 2007) when the calculation was done 28-Aug-2009. There were quite a number of improvements in XDS/XSCALE since that version. The reason may be that a licensed, non-expiring version was used - make sure you always rather use the latest version available! Original Message Subject: [Fwd: Re: [ccp4bb] I/sigmaI of 3.0 rule] Date: Thu, 3 Mar 2011 10:45:03 -0700 From: Maia Cherney ch...@ualberta.ca Original Message Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule Date: Thu, 03 Mar 2011 10:43:23 -0700 From: Maia Cherney ch...@ualberta.ca To: Oganesyan, Vaheh oganesy...@medimmune.com References: 2ba9ce2f-c299-4ca9-a36a-99065d1b3...@unipd.it 4d6faed8.7040...@ualberta.ca 021001cbd9bc$f0ecc940$d2c65bc0$@gmail.com 4d6fcab6.3090...@ualberta.ca 4d6fcbff.2010...@ualberta.ca 73e543de77290c409c9bed6fa4ca34bb0173a...@md1ev002.medimmune.com Vaheh, The problem was with Rmerg. As you can see at I/sigma=2.84, the Rmerge (R-factor) was 143%. I am asking this question because B. Rupp wrote However, there is a simple relation between I/sigI and R-merge (provided no other indecency has been done to the data). It simply is (BMC) Rm=0.8/I/sigI. Maybe my data are indecent? This is the whole LP file. Maia MMC741_scale-2.25.LP ** XSCALE (VERSION December 6, 2007) 28-Aug-2009 ** Author: Wolfgang Kabsch Copy licensed until (unlimited) to Canadian Light Source, Saskatoon, Canada. No redistribution. -- Kay Diederichshttp://strucbio.biologie.uni-konstanz.de email: kay.diederi...@uni-konstanz.deTel +49 7531 88 4049 Fax 3183 Fachbereich Biologie, Universität Konstanz, Box M647, D-78457 Konstanz This e-mail is digitally signed. If your e-mail client does not have the necessary capabilities, just ignore the attached signature smime.p7s. smime.p7s Description: S/MIME Cryptographic Signature
Re: [ccp4bb] I/sigmaI of 3.0 rule
Dear Roberto, Overnight I recall an additional point:- In chemical crystallography, where standard uncertainties are routinely avaliable for the molecular model from the full matrix inversion in the model refinement, it is of course possible to keep extending your resolution until your bond distance and angles su values go up. Thus if you distrust or do not wish to slavishly follow a Journal's Notes for Authors, such as for Acta Crystallographica Section C to which I referred yesterday, you can, in this way, check yourself the good sense of the data quality criteria required. [This is a similar test to the one that Phil mentioned yesterday ie with respect to scrutinising electron density maps for your protein ie do they show more detail by adding more diffraction data.] Best wishes, John On Thu, Mar 3, 2011 at 11:29 AM, Roberto Battistutta roberto.battistu...@unipd.it wrote: Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of 3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI 3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it -- Professor John R Helliwell DSc
Re: [ccp4bb] I/sigmaI of 3.0 rule
Dear all, just to say that I really appreciate and thank the many people who spent time responding to my issue. I have read with much interest (and sometimes with fun) all comments and suggestions, very interesting and useful. Thanks a lot, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it Il giorno 03/mar/2011, alle ore 12.29, Roberto Battistutta ha scritto: Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of 3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI 3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it
Re: [ccp4bb] I/sigmaI of 3.0 rule
This is very closely related to the way in which I would like to think about this: if you consider adding another thin shell of data, are you adding any significant information? Unfortunately as Garib Murshudov has pointed out, we don't have any reliable way of estimating the information content of data. (Also it should be considered anisotropically.) Another way of thinking about this is to consider that if we had perfect error models and weighted the data perfectly, then adding a shell of data with essentially no useful information should at least do no harm (ie weights are close to zero). But we do not have perfect error models, so adding too much data may in the end degrade our structural model. Much of the problem arises from our addiction to R-factors as a measure of quality, when they are unweighted and therefore very misleading. We are are also too quick to judge the quality of a structure by its nominal resolution, whatever that means. In the end, the important question as ever is does the experimental data support the conclusions drawn from it? and that will depend on local information about particular atoms and groups, not on global indicators Phil On 4 Mar 2011, at 10:35, John R Helliwell wrote: Dear Roberto, Overnight I recall an additional point:- In chemical crystallography, where standard uncertainties are routinely avaliable for the molecular model from the full matrix inversion in the model refinement, it is of course possible to keep extending your resolution until your bond distance and angles su values go up. Thus if you distrust or do not wish to slavishly follow a Journal's Notes for Authors, such as for Acta Crystallographica Section C to which I referred yesterday, you can, in this way, check yourself the good sense of the data quality criteria required. [This is a similar test to the one that Phil mentioned yesterday ie with respect to scrutinising electron density maps for your protein ie do they show more detail by adding more diffraction data.] Best wishes, John On Thu, Mar 3, 2011 at 11:29 AM, Roberto Battistutta roberto.battistu...@unipd.it wrote: Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of 3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI 3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it -- Professor John R Helliwell DSc
Re: [ccp4bb] I/sigmaI of 3.0 rule
hi Recently on a paper I submitted, it was the editor of the journal who wanted exactly the same thing. I never argued with the editor about this (should have maybe), but it could be one cause of the epidemic that Bart Hazes saw best regards Marjolein On Mar 3, 2011, at 12:29 PM, Roberto Battistutta wrote: Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of 3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI 3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it
Re: [ccp4bb] [Fwd: Re: [ccp4bb] I/sigmaI of 3.0 rule]
Kay, Thank you for your explanation. The radiation damage was not the factor, but there was something strange about this crystal (actually two crystals had the same strange behavior). I could not process them in HKL2000, but it showed the problem (see pictures in the attachment). The processing in XDS was done at the CLS (Canadian Light Source). I know they always have the latest version of XDS. Maia Kay Diederichs wrote: Maia, provided radiation damage is not a major detrimental factor, your data are just fine, and useful also in the high resolution shell (which still has I/sigma of 2.84 so you could probably process a bit beyond 2.25A). There is nothing wrong with R_meas of 147.1% since, as others have said, R_meas is not limited to 59% (or similar) as a refinement R-factor is. Rather, R_meas is computed from a formula that has a denominator which in the asymptotic limit (noise) approaches zero - because there will be (almost) as many negative observations as positive ones! (The numerator however does not go to zero) Concerning radiation damage: First, take a look at your frames - but make sure you have the same crystal orientation, as anisotropy may mask radiation damage! Then, you can check (using CCP4's loggraph) the R_d plot provided by XDSSTAT (for a single dataset; works best for high-symmetry spacegroups), and you should also check ISa (printed in CORRECT.LP and XSCALE.LP). HTH, Kay P.S. I see one potential problem: XSCALE (VERSION December 6, 2007) when the calculation was done 28-Aug-2009. There were quite a number of improvements in XDS/XSCALE since that version. The reason may be that a licensed, non-expiring version was used - make sure you always rather use the latest version available! Original Message Subject: [Fwd: Re: [ccp4bb] I/sigmaI of 3.0 rule] Date: Thu, 3 Mar 2011 10:45:03 -0700 From: Maia Cherney ch...@ualberta.ca Original Message Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule Date: Thu, 03 Mar 2011 10:43:23 -0700 From: Maia Cherney ch...@ualberta.ca To: Oganesyan, Vaheh oganesy...@medimmune.com References: 2ba9ce2f-c299-4ca9-a36a-99065d1b3...@unipd.it 4d6faed8.7040...@ualberta.ca 021001cbd9bc$f0ecc940$d2c65bc0$@gmail.com 4d6fcab6.3090...@ualberta.ca 4d6fcbff.2010...@ualberta.ca 73e543de77290c409c9bed6fa4ca34bb0173a...@md1ev002.medimmune.com Vaheh, The problem was with Rmerg. As you can see at I/sigma=2.84, the Rmerge (R-factor) was 143%. I am asking this question because B. Rupp wrote However, there is a simple relation between I/sigI and R-merge (provided no other indecency has been done to the data). It simply is (BMC) Rm=0.8/I/sigI. Maybe my data are indecent? This is the whole LP file. Maia MMC741_scale-2.25.LP ** XSCALE (VERSION December 6, 2007) 28-Aug-2009 ** Author: Wolfgang Kabsch Copy licensed until (unlimited) to Canadian Light Source, Saskatoon, Canada. No redistribution. inline: Cell1.gifinline: Distance1.gif
Re: [ccp4bb] [Fwd: Re: [ccp4bb] I/sigmaI of 3.0 rule]
Am 04.03.2011 11:11, schrieb Kay Diederichs: There is nothing wrong with R_meas of 147.1% since, as others have said, R_meas is not limited to 59% (or similar) as a refinement R-factor is. Rather, R_meas is computed from a formula that has a denominator which in the asymptotic limit (noise) approaches zero - because there will be (almost) as many negative observations as positive ones! (The numerator however does not go to zero) upon second thought, this explanation is wrong since the absolute value is taken in the formula for the denominator. A better explanation is: in the noise limit the numerator is (apart from a factor1 which is why R_meas is R_sym) a sum over absolute values of differences of random numbers. The denominator is a sum over absolute values of random numbers. If the random values are drawn from a Gaussian distribution then the numerator contributions are bigger by square-root-of-two than the denominator contributions. Thus, R_meas can be 150-200% . Kay
Re: [ccp4bb] I/sigmaI of 3.0 rule
Dear Phil, I completely agree with you, your words seem to me the best philosophical outcome of the discussion and indicate the right perspective to tackle this topic. In particular you write In the end, the important question as ever is does the experimental data support the conclusions drawn from it? and that will depend on local information about particular atoms and groups, not on global indicators. Exactly, in my case, all the discussion of the structures was absolutely independent from having 1.9, 2.0 or 2.1 A nominal resolution, or to cut at 1.5 or 2.0 or 3.0 I/sigma. This makes the unjustified (as this two-day discussion has clearly pointed out) technical critics of the reviewer even more upsetting. Ciao, Roberto
Re: [ccp4bb] I/sigmaI of 3.0 rule
No - and I dont think it is accepted practice now either.. I often use I/SigI 1.5 for refinement.. Look at your Rfactor plots from REFMAC - if they look reasonable at higher resolution use the data Eleanor On 03/03/2011 11:29 AM, Roberto Battistutta wrote: Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it
Re: [ccp4bb] I/sigmaI of 3.0 rule
Roberto, The reviewer's request is complete nonsense. The problem is how to best and politely respond so as not to prevent the paper from being accepted. Best would be to have educated editors who could simply tell you to ignore that request. Since this issue comes up quite often still, I think we all should come up with a canned response to such a request. One way to approach this issue is to avoid saying something like the structure has been refined to 2.2Å resolution, but instead say has been refined using data to a resolution of 2.2Å., or even has been refined using data with an I/sigmaI 1.5 (or whatever). Next could be to point out that even data with an I/sigmaI of 1 can contain information (I actually don't have a good reference for this, but I'm sure someone else can provide one'), and inclusion of such data can improve refinement stability and speed of convergence (not really important in a scientific sense, though). The point is that all of your data combined result in a structure with a certain resolution, pretty much no matter what high-resolution limits you choose (I/sigmaI of 0.5, 1.0, or 1.5). As long as you don't portrait your structure of having a resolution corresponding to the resolution of the high-resolution limit of your data, you should be fine. Now, requesting to toss out data with I/sigmaI of 3 simply reduces the resolution of your structure. You could calculate two electron density maps and show that your structure does indeed improve when including data with /sigmaI of 3. One criterion could be to use the optical resolution of the structure. Hope that helps. Best, MM On Mar 3, 2011, at 6:29 AM, Roberto Battistutta wrote: Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of 3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI 3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it
Re: [ccp4bb] I/sigmaI of 3.0 rule
Dear Roberto, As indicated by others in reply to you the current best practice in protein crystallography is not a rigid application of such a cut off criterion. This is because there is such a diverse range of crystal qualities. However in chemical crystallography where the data quality from such crystals is more homogeneous such a rule is more often required notably as a guard against 'fast and loose' data collection which may occur (to achieve a very high throughput). As an Editor myself, whilst usually allowing the authors' chosen resolution cut off, I will insist on the data table saying in a footnote the diffraction resolution where I/sig(I) crosses 2.0 and/or, if relevant, where DeltaAnom/sig(DeltaAnom) crosses 1.0. A remaining possible contentious point with a submitting author is where the title of a paper may claim a diffraction resolution that in fact cannot really be substantiated. Best wishes, Yours sincerely, John On Thu, Mar 3, 2011 at 11:29 AM, Roberto Battistutta roberto.battistu...@unipd.it wrote: Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of 3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI 3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it -- Professor John R Helliwell DSc
Re: [ccp4bb] I/sigmaI of 3.0 rule
On Thu, 2011-03-03 at 12:29 +0100, Roberto Battistutta wrote: Does anyone know the origin or the theoretical basis of this I/sigmaI 3.0 rule for an appropriate resolution? There is none. Did editor ask you to follow this suggestion? I wonder if there is anyone among the subscribers of this bb who would come forward and support this I/sigmaI 3.0 claim. What was your I/sigma, by the way? I almost always collect data to I/sigma=1, which has the downside of generating somewhat higher R-values. Shall I, according to this reviewer, retract/amend every single one of them? What a mess. Cheers, Ed. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] I/sigmaI of 3.0 rule
There seem to be quite a few rule followers out there regarding resolution cutoffs. One that I have encountered several times is reviewers objecting to high Rsym values (say 60-80% in the last shell), which may be even worse than using some fixed value of I/sigI. On 3/3/11 9:55 AM, Ed Pozharski epozh...@umaryland.edu wrote: On Thu, 2011-03-03 at 12:29 +0100, Roberto Battistutta wrote: Does anyone know the origin or the theoretical basis of this I/sigmaI 3.0 rule for an appropriate resolution? There is none. Did editor ask you to follow this suggestion? I wonder if there is anyone among the subscribers of this bb who would come forward and support this I/sigmaI 3.0 claim. What was your I/sigma, by the way? I almost always collect data to I/sigma=1, which has the downside of generating somewhat higher R-values. Shall I, according to this reviewer, retract/amend every single one of them? What a mess. Cheers, Ed. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] I/sigmaI of 3.0 rule
For myself, I decide on the high resolution cutoff by looking at the Rsym vs resolution curve. The curve rises, and for all data sets I have processed (so far) there is a break in the curve and the curve shoots up. To near vertical. This inflexion point is where I decide to place the high resolution cutoff, I never look at the I/sigma(I) values nor at the Rsym in the high resolution shell. As a reviewer, when I have to evaluate a manuscript where very high Rsym values are quoted, I have no way of knowing how the high resolution cutoff was set. So I simply suggest to the authors to double check this cutoff, in order to ensure that the high resolution limit really corresponds to high resolution data and not to noise. But I certainly do not make statements such as this one. I have seen cases where, using this rule to decide on the high resolution limit, the Rsym in the high resolution bin is well below 50% and cases where it is much higher. Like 65%, 70% (0.65, 0.7 if you prefer). So, in my opinion, there is no fixed rule as to what the acceptable Rsym value in the highest resolution shell should be. Fred. Van Den Berg, Bert wrote: There seem to be quite a few “rule” followers out there regarding resolution cutoffs. One that I have encountered several times is reviewers objecting to high Rsym values (say 60-80% in the last shell), which may be even worse than using some fixed value of I/sigI. On 3/3/11 9:55 AM, Ed Pozharski epozh...@umaryland.edu wrote: On Thu, 2011-03-03 at 12:29 +0100, Roberto Battistutta wrote: Does anyone know the origin or the theoretical basis of this I/sigmaI 3.0 rule for an appropriate resolution? There is none. Did editor ask you to follow this suggestion? I wonder if there is anyone among the subscribers of this bb who would come forward and support this I/sigmaI 3.0 claim. What was your I/sigma, by the way? I almost always collect data to I/sigma=1, which has the downside of generating somewhat higher R-values. Shall I, according to this reviewer, retract/amend every single one of them? What a mess. Cheers, Ed. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] I/sigmaI of 3.0 rule
As mentioned there is no I/sigmaI rule. Also you need to specify (and correctly calculate) I/sigmaI and not I/sigmaI. A review of similar articles in the same journal will show what is typical for the journal. I think you will find that the I/sigmaI cutoff varies. This information can be used in your response to the reviewer as in, A review of actual published articles in the Journal shows that 75% (60 out of 80) used an I/sigmaI cutoff of 2 for the resolution of the diffraction data used in refinement. We respectfully believe that our cutoff of 2 should be acceptable. -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Roberto Battistutta Sent: Thursday, March 03, 2011 5:30 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] I/sigmaI of 3.0 rule Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of 3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI 3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it
Re: [ccp4bb] I/sigmaI of 3.0 rule
My preferred criterion is the half-dataset correlation coefficient output by Scala (an idea stolen from the EM guys): I tend to cut my data where this falls to not less than 0.5. The good thing about this is that it is independent of the vagaries of I/sigma (or rather of the SD estimation) and has a more intuitive cutoff point than Rmeas (let alone Rmerge). It probably doesn't work well at low multiplicity and there is always a problem with anisotropy (I intend to do anisotropic analysis in future) That said, the exact resolution cut-off is not really important: if you refine look at maps at say 2.6A vs. 2.5A (if that's around the potential cutoff), there is probably little significant difference Phil On 3 Mar 2011, at 15:34, Jim Pflugrath wrote: As mentioned there is no I/sigmaI rule. Also you need to specify (and correctly calculate) I/sigmaI and not I/sigmaI. A review of similar articles in the same journal will show what is typical for the journal. I think you will find that the I/sigmaI cutoff varies. This information can be used in your response to the reviewer as in, A review of actual published articles in the Journal shows that 75% (60 out of 80) used an I/sigmaI cutoff of 2 for the resolution of the diffraction data used in refinement. We respectfully believe that our cutoff of 2 should be acceptable. -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Roberto Battistutta Sent: Thursday, March 03, 2011 5:30 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] I/sigmaI of 3.0 rule Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of 3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI 3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it
Re: [ccp4bb] I/sigmaI of 3.0 rule
I think this suppression of high resolution shells via I/sigI cutoffs is partially attributable to a conceptual misunderstanding of what these (darn) R-values mean in refinement versus data merging. In refinement, even a random atom structure follows the Wilson distribution, and therefore, even a completely wrong non-centrosymmetric structure will not - given proper scaling - give an Rf of more than 59%. There is no such limit for the basic linear merging R. However, there is a simple relation between I/sigI and R-merge (provided no other indecency has been done to the data). It simply is (BMC) Rm=0.8/I/sigI. I.e. for I/sigI -0.8 you get 100%, for 2 we obtain 40%, which, interpreted as Rf would be dreadful, but for I/sigI 3, we get Rm=0.27, and that looks acceptable for an Rf (or uninformed reviewer). Btw, I also wish to point out that the I/sig cutoffs are not exactly the cutoff criterion for anomalous phasing, a more direct measure is a signal cutoff such as delF/sig(delF); George I believe uses 1.3 for SAD. Interestingly, in almost all structures I played with, delF/sig(delF) for both, noise in anomalous data or no anomalous scatterer present, the anomalous signal was 0.8. I havent figured out yet or proved the statistics and whether this is generally true or just numerology... And, the usual biased rant - irrespective of Hamilton tests, nobody really needs these popular unweighted linear residuals which shall not be named, particularly on F. They only cause trouble. Best regards, BR - Bernhard Rupp 001 (925) 209-7429 +43 (676) 571-0536 b...@ruppweb.org hofkristall...@gmail.com http://www.ruppweb.org/ - Structural Biology is the practice of crystallography without a license. - -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Bart Hazes Sent: Thursday, March 03, 2011 7:08 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule There seems to be an epidemic of papers with I/Sigma 3 (sometime much larger). In fact such cases have become so frequent that I fear some people start to believe that this is the proper procedure. I don't know where that has come from as the I/Sigma ~ 2 criterion has been established long ago and many consider that even a tad conservative. It simply pains me to see people going to the most advanced synchrotrons to boost their highest resolution data and then simply throw away much of it. I don't know what has caused this wave of high I/Sigma threshold use but here are some ideas - High I/Sigma cutoffs are normal for (S/M)AD data sets where a more strict focus on data quality is needed. Perhaps some people have started to think this is the norm. - For some dataset Rsym goes up strongly while I/SigI is still reasonable. I personally believe this is due to radiation damage which affects Rsym (which compares reflections taken after different amounts of exposure) much more than I/SigI which is based on individual reflections. A good test would be to see if processing only the first half of the dataset improves Rsym (or better Rrim) - Most detectors are square and if the detector is too far from the crystal then the highest resolution data falls beyond the edges of the detector. In this case one could, and should, still process data into the corners of the detector. Data completeness at higher resolution may suffer but each additional reflection still represents an extra restraint in refinement and a Fourier term in the map. Due to crystal symmetry the effect on completeness may even be less than expected. Bart On 11-03-03 04:29 AM, Roberto Battistutta wrote: Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it -- Bart Hazes (Associate Professor) Dept. of Medical Microbiology Immunology University of Alberta 1-15 Medical Sciences Building Edmonton, Alberta Canada, T6G 2H7 phone: 1-780-492-0042 fax:1-780
Re: [ccp4bb] I/sigmaI of 3.0 rule
On Thu, 2011-03-03 at 16:02 +0100, Vellieux Frederic wrote: For myself, I decide on the high resolution cutoff by looking at the Rsym vs resolution curve. The curve rises, and for all data sets I have processed (so far) there is a break in the curve and the curve shoots up. To near vertical. This inflexion point is where I decide to place the high resolution cutoff, I never look at the I/sigma(I) values nor at the Rsym in the high resolution shell. Fred, while your procedure is definitely more sophisticated than what I do, let me point out that the Rsym is genuinely a bad measure for this, as it depends strongly on redundancy. Does more robust measures (e.g. Rpim) show similar inflexion? I suspect it will at least shift towards higher resolution. Cheers, Ed. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] I/sigmaI of 3.0 rule
On Thu, 2011-03-03 at 08:08 -0700, Bart Hazes wrote: I don't know what has caused this wave of high I/Sigma threshold use but here are some ideas It may also be related to what I feel is recent revival of the significance of the R-values in general. Lower resolution cutoffs in this context improve the R-values, which is (incorrectly) perceived as model improvement. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] I/sigmaI of 3.0 rule
Does the position of this inflection point depend on the redundancy? Maybe it does not; for high-redundancy data one would simply get a much higher corresponding Rsym. On 3/3/11 11:13 AM, Ed Pozharski epozh...@umaryland.edu wrote: On Thu, 2011-03-03 at 16:02 +0100, Vellieux Frederic wrote: For myself, I decide on the high resolution cutoff by looking at the Rsym vs resolution curve. The curve rises, and for all data sets I have processed (so far) there is a break in the curve and the curve shoots up. To near vertical. This inflexion point is where I decide to place the high resolution cutoff, I never look at the I/sigma(I) values nor at the Rsym in the high resolution shell. Fred, while your procedure is definitely more sophisticated than what I do, let me point out that the Rsym is genuinely a bad measure for this, as it depends strongly on redundancy. Does more robust measures (e.g. Rpim) show similar inflexion? I suspect it will at least shift towards higher resolution. Cheers, Ed. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] I/sigmaI of 3.0 rule
On Thu, 2011-03-03 at 09:34 -0600, Jim Pflugrath wrote: As mentioned there is no I/sigmaI rule. Also you need to specify (and correctly calculate) I/sigmaI and not I/sigmaI. A review of similar articles in the same journal will show what is typical for the journal. I think you will find that the I/sigmaI cutoff varies. This information can be used in your response to the reviewer as in, A review of actual published articles in the Journal shows that 75% (60 out of 80) used an I/sigmaI cutoff of 2 for the resolution of the diffraction data used in refinement. We respectfully believe that our cutoff of 2 should be acceptable. Jim, Excellent point. Such statistics would be somewhat tedious to gather though, does anyone know if I/sigma stats are available for the whole PDB somewhere? On your first point though - why is one better than the other? My experimental observation is while the two differ significantly at low resolution (what matters, of course, is I/sigma itself and not the resolution per se), at high resolution where the cutoff is chosen they are not that different. And since the cutoff value itself is rather arbitrarily chosen, then why I/sigma is better than I/sigma? Cheers, Ed. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] I/sigmaI of 3.0 rule
related to what I feel is recent revival of the significance of the R-values because it's so handy to have one single number to judge a highly complex nonlinear multivariate barely determined regularized problem! Just as easy as running a gel! Best BR -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Ed Pozharski Sent: Thursday, March 03, 2011 8:19 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule On Thu, 2011-03-03 at 08:08 -0700, Bart Hazes wrote: I don't know what has caused this wave of high I/Sigma threshold use but here are some ideas It may also be related to what I feel is recent revival of the significance of the R-values in general. Lower resolution cutoffs in this context improve the R-values, which is (incorrectly) perceived as model improvement. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] I/sigmaI of 3.0 rule
Hi, I don't think XDS generates an Rpim value, does it? The XDS CORRECT strep provides the old fashioned Rsym (R-FACTOR) plus R-meas and Rmrgd-F. The curves look all the same though Fred. Ed Pozharski wrote: On Thu, 2011-03-03 at 16:02 +0100, Vellieux Frederic wrote: For myself, I decide on the high resolution cutoff by looking at the Rsym vs resolution curve. The curve rises, and for all data sets I have processed (so far) there is a break in the curve and the curve shoots up. To near vertical. This inflexion point is where I decide to place the high resolution cutoff, I never look at the I/sigma(I) values nor at the Rsym in the high resolution shell. Fred, while your procedure is definitely more sophisticated than what I do, let me point out that the Rsym is genuinely a bad measure for this, as it depends strongly on redundancy. Does more robust measures (e.g. Rpim) show similar inflexion? I suspect it will at least shift towards higher resolution. Cheers, Ed.
Re: [ccp4bb] I/sigmaI of 3.0 rule
Discussions of I/sigma(I) or less-than cutoffs have been going on for at least 35 years. For example, see Acta Cryst. (1975) B31, 1507-1509. I was taught by my elders (mainly Lyle Jensen) that less-than cutoffs came into use when diffractometers replaced film methods for small molecule work, i.e., 1960s. To compare new and old structures, they needed some criterion for the electronic measurements that would correspond to the fog level on their films. People settled on 2 sigma cutoffs (on I which mean 4 sigma on F), but subsequently, the cutoffs got higher and higher, as people realized they could get lower and lower R values by throwing away the weak reflections. I'm unaware of any statistical justification for any cutoff. The approach I like the most is to refine on Fsquared and use every reflection. Error estimates and weighting schemes should take care of the noise. Ron On Thu, 3 Mar 2011, Ed Pozharski wrote: On Thu, 2011-03-03 at 09:34 -0600, Jim Pflugrath wrote: As mentioned there is no I/sigmaI rule. Also you need to specify (and correctly calculate) I/sigmaI and not I/sigmaI. A review of similar articles in the same journal will show what is typical for the journal. I think you will find that the I/sigmaI cutoff varies. This information can be used in your response to the reviewer as in, A review of actual published articles in the Journal shows that 75% (60 out of 80) used an I/sigmaI cutoff of 2 for the resolution of the diffraction data used in refinement. We respectfully believe that our cutoff of 2 should be acceptable. Jim, Excellent point. Such statistics would be somewhat tedious to gather though, does anyone know if I/sigma stats are available for the whole PDB somewhere? On your first point though - why is one better than the other? My experimental observation is while the two differ significantly at low resolution (what matters, of course, is I/sigma itself and not the resolution per se), at high resolution where the cutoff is chosen they are not that different. And since the cutoff value itself is rather arbitrarily chosen, then why I/sigma is better than I/sigma? Cheers, Ed. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] I/sigmaI of 3.0 rule- do not underestimate gels
Well BR, do not underestimate complexity of running a gel! There are even more harsh referees comments on gel appearance and quality than comments on cutting data based on R,RF and sigmaI :-) Especially when one is trying to penetrate into prestigious journals... Dr Felix Frolow Professor of Structural Biology and Biotechnology Department of Molecular Microbiology and Biotechnology Tel Aviv University 69978, Israel Acta Crystallographica F, co-editor e-mail: mbfro...@post.tau.ac.il Tel: ++972-3640-8723 Fax: ++972-3640-9407 Cellular: 0547 459 608 On Mar 3, 2011, at 18:38 , Bernhard Rupp (Hofkristallrat a.D.) wrote: related to what I feel is recent revival of the significance of the R-values because it's so handy to have one single number to judge a highly complex nonlinear multivariate barely determined regularized problem! Just as easy as running a gel! Best BR -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Ed Pozharski Sent: Thursday, March 03, 2011 8:19 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule On Thu, 2011-03-03 at 08:08 -0700, Bart Hazes wrote: I don't know what has caused this wave of high I/Sigma threshold use but here are some ideas It may also be related to what I feel is recent revival of the significance of the R-values in general. Lower resolution cutoffs in this context improve the R-values, which is (incorrectly) perceived as model improvement. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] I/sigmaI of 3.0 rule
Dear Bernhard I am wondering where I should cut my data off. Here is the statistics from XDS processing. Maia SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE = -3.0 AS FUNCTION OF RESOLUTION RESOLUTION NUMBER OF REFLECTIONS COMPLET R-FACTOR R-FACTOR COMPARED I/SIGMA R-meas Rmrgd-F Anomal SigAno Nano LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr 10.06 5509 304 364 83.5% 3.0% 4.4% 5509 63.83 3.1% 1.0% 11% 0.652 173 7.12 11785 595 595 100.0% 3.5% 4.8% 11785 59.14 3.6% 1.4% -10% 0.696 414 5.81 15168 736 736 100.0% 5.0% 5.6% 15168 51.88 5.1% 1.8% -9% 0.692 561 5.03 17803 854 854 100.0% 5.5% 5.7% 17803 50.02 5.6% 2.2% -10% 0.738 675 4.50 20258 964 964 100.0% 5.1% 5.4% 20258 52.61 5.3% 2.1% -16% 0.710 782 4.11 22333 1054 1054 100.0% 5.6% 5.7% 22333 50.89 5.8% 2.0% -16% 0.705 878 3.80 23312 1137 1137 100.0% 7.0% 6.6% 23312 42.95 7.1% 3.0% -13% 0.770 952 3.56 25374 1207 1208 99.9% 7.6% 7.3% 25374 40.56 7.8% 3.4% -18% 0.739 1033 3.35 27033 1291 1293 99.8% 9.7% 9.2% 27033 33.73 10.0% 4.1% -12% 0.765 1107 3.18 29488 1353 1353 100.0% 11.6% 11.6% 29488 28.16 11.9% 4.4% -7% 0.750 1176 3.03 31054 1419 1419 100.0% 15.7% 15.9% 31054 21.77 16.0% 6.9% -9% 0.741 1243 2.90 32288 1478 1478 100.0% 21.1% 21.6% 32288 16.99 21.6% 9.2% -6% 0.745 1296 2.79 33807 1542 1542 100.0% 28.1% 28.8% 33807 13.07 28.8% 12.9% -2% 0.783 1361 2.69 34983 1604 1604 100.0% 37.4% 38.7% 34983 9.95 38.3% 17.2% -2% 0.743 1422 2.60 35163 1653 1653 100.0% 48.8% 48.0% 35163 8.03 50.0% 21.9% -6% 0.754 1475 2.52 36690 1699 1699 100.0% 54.0% 56.0% 36690 6.98 55.3% 25.9% 0% 0.745 1517 2.44 37751 1757 1757 100.0% 67.9% 70.4% 37751 5.61 69.5% 32.5% -5% 0.733 1577 2.37 38484 1798 1799 99.9% 82.2% 84.5% 38484 4.72 84.2% 36.5% 2% 0.753 1620 2.31 39098 1842 1842 100.0% 91.4% 94.3% 39098 4.19 93.7% 43.7% -3% 0.744 1661 2.25 38809 1873 1923 97.4% 143.4% 139.3% 38809 2.84 147.1% 69.8% -2% 0.693 1696 total 556190 26160 26274 99.6% 11.9% 12.2% 556190 21.71 12.2% 9.7% -5% 0.739 22619 Bernhard Rupp (Hofkristallrat a.D.) wrote: I think this suppression of high resolution shells via I/sigI cutoffs is partially attributable to a conceptual misunderstanding of what these (darn) R-values mean in refinement versus data merging. In refinement, even a random atom structure follows the Wilson distribution, and therefore, even a completely wrong non-centrosymmetric structure will not - given proper scaling - give an Rf of more than 59%. There is no such limit for the basic linear merging R. However, there is a simple relation between I/sigI and R-merge (provided no other indecency has been done to the data). It simply is (BMC) Rm=0.8/I/sigI. I.e. for I/sigI -0.8 you get 100%, for 2 we obtain 40%, which, interpreted as Rf would be dreadful, but for I/sigI 3, we get Rm=0.27, and that looks acceptable for an Rf (or uninformed reviewer). Btw, I also wish to point out that the I/sig cutoffs are not exactly the cutoff criterion for anomalous phasing, a more direct measure is a signal cutoff such as delF/sig(delF); George I believe uses 1.3 for SAD. Interestingly, in almost all structures I played with, delF/sig(delF) for both, noise in anomalous data or no anomalous scatterer present, the anomalous signal was 0.8. I haven’t figured out yet or proved the statistics and whether this is generally true or just numerology... And, the usual biased rant - irrespective of Hamilton tests, nobody really needs these popular unweighted linear residuals which shall not be named, particularly on F. They only cause trouble. Best regards, BR - Bernhard Rupp 001 (925) 209-7429 +43 (676) 571-0536 b...@ruppweb.org hofkristall...@gmail.com http://www.ruppweb.org/ - Structural Biology is the practice of crystallography without a license. - -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Bart Hazes Sent: Thursday, March 03, 2011 7:08 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule There seems to be an epidemic of papers with I/Sigma 3 (sometime much larger). In fact such cases have become so frequent that I fear some people start to believe that this is the proper procedure. I don't know where that has come from as the I/Sigma ~ 2 criterion has been established long ago and many consider that even a tad conservative. It simply pains me to see people going to the most advanced synchrotrons to boost their highest resolution data and then simply throw away much of it. I don't know what has caused this wave of high I/Sigma threshold use but here are some ideas - High I/Sigma cutoffs are normal for (S/M)AD data sets where a more strict focus on data quality is needed. Perhaps some people have started to think this is the norm. - For some dataset Rsym goes up strongly while I/SigI
Re: [ccp4bb] I/sigmaI of 3.0 rule
I take the point about a tendency in those days to apply sigma cutoffs to get lower R values, which were erroneously expected to indicate better structures. I wonder how many of us remember this paper by Arnberg et al (1979) Acta Cryst A35, 497-499, where it is shown for (small molecule) structures that had been refined with only reflections I3*sigma(I) that the models were degraded by leaving out weak data (although the R factors looked better of course). Arnberg et al took published structures and showed the refined models got better when the weak data were included. The best bit, I think, was when they went on to demonstrate successful refinement of a structure using ONLY the weak data where I3*sigma(I) and ignoring all the strong ones. This shows, as was alluded to earlier in the discussion, that a weak reflection puts a powerful constraint on a refinement, especially if there are other stronger reflections in the same resolution range. --- | Simon E.V. Phillips | --- | Director, Research Complex at Harwell (RCaH)| | Rutherford Appleton Laboratory | | Harwell Science and Innovation Campus | | Didcot | | Oxon OX11 0FA | | United Kingdom | | Email: simon.phill...@rc-harwell.ac.uk | | Tel: +44 (0)1235 567701 | |+44 (0)1235 567700 (sec) | |+44 (0)7884 436011 (mobile) | | www.rc-harwell.ac.uk| --- | Astbury Centre for Structural Molecular Biology | | Institute of Molecular and Cellular Biology | | University of LEEDS | | LEEDS LS2 9JT | | United Kingdom | | Email: s.e.v.phill...@leeds.ac.uk | | Tel: +44 (0)113 343 3027 | | WWW: http://www.astbury.leeds.ac.uk/People/staffpage.php?StaffID=SEVP | ---
Re: [ccp4bb] I/sigmaI of 3.0 rule
I have to resend my statistics. Maia Cherney wrote: Dear Bernhard I am wondering where I should cut my data off. Here is the statistics from XDS processing. Maia On 11-03-03 04:29 AM, Roberto Battistutta wrote: Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it DETECTOR_SU SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE = -3.0 AS FUNCTION OF RESOLUTION RESOLUTION NUMBER OF REFLECTIONSCOMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA R-meas Rmrgd-F Anomal SigAno Nano LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr 10.065509 304 364 83.5% 3.0% 4.4% 5509 63.83 3.1% 1.0%11% 0.652 173 7.12 11785 595 595 100.0% 3.5% 4.8% 11785 59.14 3.6% 1.4% -10% 0.696 414 5.81 15168 736 736 100.0% 5.0% 5.6% 15168 51.88 5.1% 1.8%-9% 0.692 561 5.03 17803 854 854 100.0% 5.5% 5.7% 17803 50.02 5.6% 2.2% -10% 0.738 675 4.50 20258 964 964 100.0% 5.1% 5.4% 20258 52.61 5.3% 2.1% -16% 0.710 782 4.11 223331054 1054 100.0% 5.6% 5.7% 22333 50.89 5.8% 2.0% -16% 0.705 878 3.80 233121137 1137 100.0% 7.0% 6.6% 23312 42.95 7.1% 3.0% -13% 0.770 952 3.56 253741207 1208 99.9% 7.6% 7.3% 25374 40.56 7.8% 3.4% -18% 0.7391033 3.35 270331291 1293 99.8% 9.7% 9.2% 27033 33.7310.0% 4.1% -12% 0.7651107 3.18 294881353 1353 100.0% 11.6% 11.6% 29488 28.1611.9% 4.4%-7% 0.7501176 3.03 310541419 1419 100.0% 15.7% 15.9% 31054 21.7716.0% 6.9%-9% 0.7411243 2.90 322881478 1478 100.0% 21.1% 21.6% 32288 16.9921.6% 9.2%-6% 0.7451296 2.79 338071542 1542 100.0% 28.1% 28.8% 33807 13.0728.8%12.9%-2% 0.7831361 2.69 349831604 1604 100.0% 37.4% 38.7% 349839.9538.3%17.2%-2% 0.7431422 2.60 351631653 1653 100.0% 48.8% 48.0% 351638.0350.0%21.9%-6% 0.7541475 2.52 366901699 1699 100.0% 54.0% 56.0% 366906.9855.3%25.9% 0% 0.7451517 2.44 377511757 1757 100.0% 67.9% 70.4% 377515.6169.5%32.5%-5% 0.7331577 2.37 384841798 1799 99.9% 82.2% 84.5% 384844.7284.2%36.5% 2% 0.7531620 2.31 390981842 1842 100.0% 91.4% 94.3% 390984.1993.7%43.7%-3% 0.7441661 2.25 388091873 1923 97.4% 143.4%139.3% 388092.84 147.1%69.8%-2% 0.6931696 total 556190 26160 26274 99.6% 11.9% 12.2% 556190 21.7112.2% 9.7%-5% 0.739 22619
Re: [ccp4bb] I/sigmaI of 3.0 rule- do not underestimate gels
there are even more harsh referees comments on gel appearance and quality than comments on cutting data based on R,RF and sigmaI :-) Especially when one is trying to penetrate into prestigious journals... Ok I repent. For improving gels there is the same excellent program, also useful for density modification - Photoshop ;-) Best, BR Dr Felix Frolow Professor of Structural Biology and Biotechnology Department of Molecular Microbiology and Biotechnology Tel Aviv University 69978, Israel Acta Crystallographica F, co-editor e-mail: mbfro...@post.tau.ac.il Tel: ++972-3640-8723 Fax: ++972-3640-9407 Cellular: 0547 459 608 On Mar 3, 2011, at 18:38 , Bernhard Rupp (Hofkristallrat a.D.) wrote: related to what I feel is recent revival of the significance of the R-values because it's so handy to have one single number to judge a highly complex nonlinear multivariate barely determined regularized problem! Just as easy as running a gel! Best BR -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Ed Pozharski Sent: Thursday, March 03, 2011 8:19 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule On Thu, 2011-03-03 at 08:08 -0700, Bart Hazes wrote: I don't know what has caused this wave of high I/Sigma threshold use but here are some ideas It may also be related to what I feel is recent revival of the significance of the R-values in general. Lower resolution cutoffs in this context improve the R-values, which is (incorrectly) perceived as model improvement. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] I/sigmaI of 3.0 rule
We should compile this discussion and send it as compulsive reading to journal editors...;-) Bert On 3/3/11 12:07 PM, Simon Phillips s.e.v.phill...@leeds.ac.uk wrote: I take the point about a tendency in those days to apply sigma cutoffs to get lower R values, which were erroneously expected to indicate better structures. I wonder how many of us remember this paper by Arnberg et al (1979) Acta Cryst A35, 497-499, where it is shown for (small molecule) structures that had been refined with only reflections I3*sigma(I) that the models were degraded by leaving out weak data (although the R factors looked better of course). Arnberg et al took published structures and showed the refined models got better when the weak data were included. The best bit, I think, was when they went on to demonstrate successful refinement of a structure using ONLY the weak data where I3*sigma(I) and ignoring all the strong ones. This shows, as was alluded to earlier in the discussion, that a weak reflection puts a powerful constraint on a refinement, especially if there are other stronger reflections in the same resolution range. --- | Simon E.V. Phillips | --- | Director, Research Complex at Harwell (RCaH)| | Rutherford Appleton Laboratory | | Harwell Science and Innovation Campus | | Didcot | | Oxon OX11 0FA | | United Kingdom | | Email: simon.phill...@rc-harwell.ac.uk | | Tel: +44 (0)1235 567701 | |+44 (0)1235 567700 (sec) | |+44 (0)7884 436011 (mobile) | | www.rc-harwell.ac.uk| --- | Astbury Centre for Structural Molecular Biology | | Institute of Molecular and Cellular Biology | | University of LEEDS | | LEEDS LS2 9JT | | United Kingdom | | Email: s.e.v.phill...@leeds.ac.uk | | Tel: +44 (0)113 343 3027 | | WWW: http://www.astbury.leeds.ac.uk/People/staffpage.php?StaffID=SEVP http://www.astbury.leeds.ac.uk/People/staffpage.php?StaffID=SEVP | ---
Re: [ccp4bb] I/sigmaI of 3.0 rule
When will we finally jettison Rsym/Rcryst/Rmerge? 1. Perhaps software developers should either not even calculate the number, or hide it somewhere obscure, and of course replacing it with a better R flavor? 2. Maybe reviewers should insist on other R's (Rpim etc) instead of Rmerge? JPK PS is this as quixotic as chucking the QWERTY keyboard, or using Esperanto? I don't think so! On Thu, Mar 3, 2011 at 11:07 AM, Simon Phillips s.e.v.phill...@leeds.ac.uk wrote: I take the point about a tendency in those days to apply sigma cutoffs to get lower R values, which were erroneously expected to indicate better structures. I wonder how many of us remember this paper by Arnberg et al (1979) Acta Cryst A35, 497-499, where it is shown for (small molecule) structures that had been refined with only reflections I3*sigma(I) that the models were degraded by leaving out weak data (although the R factors looked better of course). Arnberg et al took published structures and showed the refined models got better when the weak data were included. The best bit, I think, was when they went on to demonstrate successful refinement of a structure using ONLY the weak data where I3*sigma(I) and ignoring all the strong ones. This shows, as was alluded to earlier in the discussion, that a weak reflection puts a powerful constraint on a refinement, especially if there are other stronger reflections in the same resolution range. --- | Simon E.V. Phillips | --- | Director, Research Complex at Harwell (RCaH) | | Rutherford Appleton Laboratory | | Harwell Science and Innovation Campus | | Didcot | | Oxon OX11 0FA | | United Kingdom | | Email: simon.phill...@rc-harwell.ac.uk | | Tel: +44 (0)1235 567701 | | +44 (0)1235 567700 (sec) | | +44 (0)7884 436011 (mobile) | | www.rc-harwell.ac.uk | --- | Astbury Centre for Structural Molecular Biology | | Institute of Molecular and Cellular Biology | | University of LEEDS | | LEEDS LS2 9JT | | United Kingdom | | Email: s.e.v.phill...@leeds.ac.uk | | Tel: +44 (0)113 343 3027 | | WWW: http://www.astbury.leeds.ac.uk/People/staffpage.php?StaffID=SEVP | --- -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program cel: 773.608.9185 email: j-kell...@northwestern.edu ***
Re: [ccp4bb] I/sigmaI of 3.0 rule
First of all I would ask a XDS expert for that because I don't know exactly what stats the XDS program reports (shame on me, ok) nor what the quality of your error model is, or what you want to use the data for (I guess refinement - see Eleanor's response for that, and use all data). There is one point I'd like to make re cutoff: If one gets greedy and collects too much noise in high resolution shells (like way below I/sigI = 0.8 or so) the scaling/integration may suffer from an overabundance of nonsense data, and here I believe it makes sense to select a higher cutoff (like what exactly?) and reprocess the data. Maybe one of our data collection specialist should comment on that. BR -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Maia Cherney Sent: Thursday, March 03, 2011 9:13 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule I have to resend my statistics. Maia Cherney wrote: Dear Bernhard I am wondering where I should cut my data off. Here is the statistics from XDS processing. Maia On 11-03-03 04:29 AM, Roberto Battistutta wrote: Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it
[ccp4bb] [Fwd: Re: [ccp4bb] I/sigmaI of 3.0 rule]
Original Message Subject:Re: [ccp4bb] I/sigmaI of 3.0 rule Date: Thu, 03 Mar 2011 10:43:23 -0700 From: Maia Cherney ch...@ualberta.ca To: Oganesyan, Vaheh oganesy...@medimmune.com References: 2ba9ce2f-c299-4ca9-a36a-99065d1b3...@unipd.it 4d6faed8.7040...@ualberta.ca 021001cbd9bc$f0ecc940$d2c65bc0$@gmail.com 4d6fcab6.3090...@ualberta.ca 4d6fcbff.2010...@ualberta.ca 73e543de77290c409c9bed6fa4ca34bb0173a...@md1ev002.medimmune.com Vaheh, The problem was with Rmerg. As you can see at I/sigma=2.84, the Rmerge (R-factor) was 143%. I am asking this question because B. Rupp wrote However, there is a simple relation between I/sigI and R-merge (provided no other indecency has been done to the data). It simply is (BMC) Rm=0.8/I/sigI. Maybe my data are indecent? This is the whole LP file. Maia ** XSCALE (VERSION December 6, 2007)28-Aug-2009 ** Author: Wolfgang Kabsch Copy licensed until (unlimited) to Canadian Light Source, Saskatoon, Canada. No redistribution. ** CONTROL CARDS ** MAXIMUM_NUMBER_OF_PROCESSORS=8 SPACE_GROUP_NUMBER=180 UNIT_CELL_CONSTANTS= 150.1 150.1 81.8 90.0 90.0 120.0 OUTPUT_FILE=XSCALE.HKL FRIEDEL'S_LAW=TRUE INPUT_FILE= XDS_ASCII.HKL XDS_ASCII INCLUDE_RESOLUTION_RANGE= 40 2.25 THE DATA COLLECTION STATISTICS REPORTED BELOW ASSUMES: SPACE_GROUP_NUMBER= 180 UNIT_CELL_CONSTANTS= 150.10 150.1081.80 90.000 90.000 120.000 * 12 EQUIVALENT POSITIONS IN SPACE GROUP #180 * If x',y',z' is an equivalent position to x,y,z, then x'=x*ML(1)+y*ML( 2)+z*ML( 3)+ML( 4)/12.0 y'=x*ML(5)+y*ML( 6)+z*ML( 7)+ML( 8)/12.0 z'=x*ML(9)+y*ML(10)+z*ML(11)+ML(12)/12.0 #1 2 3 45 6 7 89 10 11 12 11 0 0 00 1 0 00 0 1 0 20 -1 0 01 -1 0 00 0 1 8 3 -1 1 0 0 -1 0 0 00 0 1 4 4 -1 0 0 00 -1 0 00 0 1 0 50 1 0 0 -1 1 0 00 0 1 8 61 -1 0 01 0 0 00 0 1 4 70 1 0 01 0 0 00 0 -1 8 8 -1 0 0 0 -1 1 0 00 0 -1 4 91 -1 0 00 -1 0 00 0 -1 0 100 -1 0 0 -1 0 0 00 0 -1 8 111 0 0 01 -1 0 00 0 -1 4 12 -1 1 0 00 1 0 00 0 -1 0 ALL DATA SETS WILL BE SCALED TO XDS_ASCII.HKL ** READING INPUT REFLECTION DATA FILES ** DATAMEAN REFLECTIONSINPUT FILE NAME SET# INTENSITY ACCEPTED REJECTED 1 0.6203E+03 557303 0 XDS_ASCII.HKL ** CORRECTION FACTORS AS FUNCTION OF IMAGE NUMBER RESOLUTION ** RECIPROCAL CORRECTION FACTORS FOR INPUT DATA SETS MERGED TO OUTPUT FILE: XSCALE.HKL THE CALCULATIONS ASSUME FRIEDEL'S_LAW= TRUE TOTAL NUMBER OF CORRECTION FACTORS DEFINED 720 DEGREES OF FREEDOM OF CHI^2 FIT140494.9 CHI^2-VALUE OF FIT OF CORRECTION FACTORS 1.037 NUMBER OF CYCLES CARRIED OUT 3 CORRECTION FACTORS for visual inspection with VIEW DECAY_001.pck INPUT_FILE=XDS_ASCII.HKL XMIN= 0.1 XMAX= 179.9 NXBIN= 36 YMIN= 0.00257 YMAX= 0.19752 NYBIN= 20 NUMBER OF REFLECTIONS USED FOR DETERMINING CORRECTION FACTORS 238321 ** CORRECTION FACTORS AS FUNCTION OF X (fast) Y(slow) IN THE DETECTOR PLANE ** RECIPROCAL CORRECTION FACTORS FOR INPUT DATA SETS MERGED TO OUTPUT FILE: XSCALE.HKL THE CALCULATIONS ASSUME FRIEDEL'S_LAW= TRUE TOTAL NUMBER OF CORRECTION FACTORS DEFINED 4760 DEGREES OF FREEDOM OF CHI^2 FIT186486.8 CHI^2-VALUE OF FIT OF CORRECTION
Re: [ccp4bb] I/sigmaI of 3.0 rule
just to clarify that, at least in my case, my impression is that the editor was fair, I was referring only to the comment of one reviewer. Roberto Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it Il giorno 03/mar/2011, alle ore 18.16, Van Den Berg, Bert ha scritto: We should compile this discussion and send it as compulsive reading to journal editors...;-) Bert
Re: [ccp4bb] I/sigmaI of 3.0 rule
I see, there is no consensus about my data. Some people say 2.4A, other say all. Well, I chose 2.3 A. My rule was to be a little bit below Rmerg 100%. At 2.3A Rmerg was 98.7% Actually, I have published my paper in JMB. Yes, reviewers did not like that and even made me give Rrim and Rpim etc. Maia Bernhard Rupp (Hofkristallrat a.D.) wrote: First of all I would ask a XDS expert for that because I don't know exactly what stats the XDS program reports (shame on me, ok) nor what the quality of your error model is, or what you want to use the data for (I guess refinement - see Eleanor's response for that, and use all data). There is one point I'd like to make re cutoff: If one gets greedy and collects too much noise in high resolution shells (like way below I/sigI = 0.8 or so) the scaling/integration may suffer from an overabundance of nonsense data, and here I believe it makes sense to select a higher cutoff (like what exactly?) and reprocess the data. Maybe one of our data collection specialist should comment on that. BR -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Maia Cherney Sent: Thursday, March 03, 2011 9:13 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule I have to resend my statistics. Maia Cherney wrote: Dear Bernhard I am wondering where I should cut my data off. Here is the statistics from XDS processing. Maia On 11-03-03 04:29 AM, Roberto Battistutta wrote: Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it
Re: [ccp4bb] I/sigmaI of 3.0 rule
Dear All, Relatively recent statistics on I/sigmaI and Rmerge in PDB deposits are presented in two following publications: 1.Benefits of structural genomics for drug discovery research. Grabowski M, Chruszcz M, Zimmerman MD, Kirillova O, Minor W. Infect Disord Drug Targets. 2009 Nov;9(5):459-74. PMID: 19594422 2. X-ray diffraction experiment-the last experiment in the structure elucidation process. Chruszcz M, Borek D, Domagalski M, Otwinowski Z, Minor W. Adv Protein Chem Struct Biol. 2009;77:23-40 PMID: 20663480 Best regards, Maksattachment: I_over_sigma_I.png
Re: [ccp4bb] I/sigmaI of 3.0 rule
Hello Maia, Rmerge is obsolete, so the reviewers had a good point to make you publish Rmeas instead. Rmeas should replace Rmerge in my opinion. The data statistics you sent show a mulltiplicity of about 20! Did you check your data for radiation damage? That might explain why your Rmeas is so utterly high while your I/sigI is still above 2 (You should not cut your data but include more!) What do the statistics look like if you process just about enough frames so that you get a reasonable mulltiplicity, 3-4, say? Cheers, Tim On Thu, Mar 03, 2011 at 10:57:37AM -0700, Maia Cherney wrote: I see, there is no consensus about my data. Some people say 2.4A, other say all. Well, I chose 2.3 A. My rule was to be a little bit below Rmerg 100%. At 2.3A Rmerg was 98.7% Actually, I have published my paper in JMB. Yes, reviewers did not like that and even made me give Rrim and Rpim etc. Maia Bernhard Rupp (Hofkristallrat a.D.) wrote: First of all I would ask a XDS expert for that because I don't know exactly what stats the XDS program reports (shame on me, ok) nor what the quality of your error model is, or what you want to use the data for (I guess refinement - see Eleanor's response for that, and use all data). There is one point I'd like to make re cutoff: If one gets greedy and collects too much noise in high resolution shells (like way below I/sigI = 0.8 or so) the scaling/integration may suffer from an overabundance of nonsense data, and here I believe it makes sense to select a higher cutoff (like what exactly?) and reprocess the data. Maybe one of our data collection specialist should comment on that. BR -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Maia Cherney Sent: Thursday, March 03, 2011 9:13 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule I have to resend my statistics. Maia Cherney wrote: Dear Bernhard I am wondering where I should cut my data off. Here is the statistics from XDS processing. Maia On 11-03-03 04:29 AM, Roberto Battistutta wrote: Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it -- -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen phone: +49 (0)551 39 22149 GPG Key ID = A46BEE1A signature.asc Description: Digital signature
Re: [ccp4bb] I/sigmaI of 3.0 rule
The data statistics you sent show a mulltiplicity of about 20! Did you check your data for radiation damage? That might explain why your Rmeas is so utterly high while your I/sigI is still above 2 (You should not cut your data but include more!) So then I got that wrong - with that *high* a redundancy, the preceding term becomes ~1 and linear Rmerge and Rmeas asymptotically become the same? BR Cheers, Tim On Thu, Mar 03, 2011 at 10:57:37AM -0700, Maia Cherney wrote: I see, there is no consensus about my data. Some people say 2.4A, other say all. Well, I chose 2.3 A. My rule was to be a little bit below Rmerg 100%. At 2.3A Rmerg was 98.7% Actually, I have published my paper in JMB. Yes, reviewers did not like that and even made me give Rrim and Rpim etc. Maia Bernhard Rupp (Hofkristallrat a.D.) wrote: First of all I would ask a XDS expert for that because I don't know exactly what stats the XDS program reports (shame on me, ok) nor what the quality of your error model is, or what you want to use the data for (I guess refinement - see Eleanor's response for that, and use all data). There is one point I'd like to make re cutoff: If one gets greedy and collects too much noise in high resolution shells (like way below I/sigI = 0.8 or so) the scaling/integration may suffer from an overabundance of nonsense data, and here I believe it makes sense to select a higher cutoff (like what exactly?) and reprocess the data. Maybe one of our data collection specialist should comment on that. BR -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Maia Cherney Sent: Thursday, March 03, 2011 9:13 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule I have to resend my statistics. Maia Cherney wrote: Dear Bernhard I am wondering where I should cut my data off. Here is the statistics from XDS processing. Maia On 11-03-03 04:29 AM, Roberto Battistutta wrote: Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it -- -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen phone: +49 (0)551 39 22149 GPG Key ID = A46BEE1A
Re: [ccp4bb] I/sigmaI of 3.0 rule
Rmeas is always higher than Rmerge, so if my Rmerg is high I don't like Rmeas either. But that makes perfect sense now per Tim: the linear Rmerge gives for small N (lower redundancy) always lower values and rises with redundancy to approach Rmeas/rim for high redundancy. I like the idea just to look at the I/sigI and include more data. Lucky me to suggest to use all your present data for refinement... ;-) BR Maia Tim Gruene wrote: Hello Maia, Rmerge is obsolete, so the reviewers had a good point to make you publish Rmeas instead. Rmeas should replace Rmerge in my opinion. The data statistics you sent show a mulltiplicity of about 20! Did you check your data for radiation damage? That might explain why your Rmeas is so utterly high while your I/sigI is still above 2 (You should not cut your data but include more!) What do the statistics look like if you process just about enough frames so that you get a reasonable mulltiplicity, 3-4, say? Cheers, Tim On Thu, Mar 03, 2011 at 10:57:37AM -0700, Maia Cherney wrote: I see, there is no consensus about my data. Some people say 2.4A, other say all. Well, I chose 2.3 A. My rule was to be a little bit below Rmerg 100%. At 2.3A Rmerg was 98.7% Actually, I have published my paper in JMB. Yes, reviewers did not like that and even made me give Rrim and Rpim etc. Maia Bernhard Rupp (Hofkristallrat a.D.) wrote: First of all I would ask a XDS expert for that because I don't know exactly what stats the XDS program reports (shame on me, ok) nor what the quality of your error model is, or what you want to use the data for (I guess refinement - see Eleanor's response for that, and use all data). There is one point I'd like to make re cutoff: If one gets greedy and collects too much noise in high resolution shells (like way below I/sigI = 0.8 or so) the scaling/integration may suffer from an overabundance of nonsense data, and here I believe it makes sense to select a higher cutoff (like what exactly?) and reprocess the data. Maybe one of our data collection specialist should comment on that. BR -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Maia Cherney Sent: Thursday, March 03, 2011 9:13 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule I have to resend my statistics. Maia Cherney wrote: Dear Bernhard I am wondering where I should cut my data off. Here is the statistics from XDS processing. Maia On 11-03-03 04:29 AM, Roberto Battistutta wrote: Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it
Re: [ccp4bb] I/sigmaI of 3.0 rule
I don't like Rmeas either. Given the Angst caused by actually useful redundancy, would it not be more reasonable then to report Rpim which decreases with redundancy? Maybe Rpim in an additional column would help to reduce the Angst? BR Maia Tim Gruene wrote: Hello Maia, Rmerge is obsolete, so the reviewers had a good point to make you publish Rmeas instead. Rmeas should replace Rmerge in my opinion. The data statistics you sent show a mulltiplicity of about 20! Did you check your data for radiation damage? That might explain why your Rmeas is so utterly high while your I/sigI is still above 2 (You should not cut your data but include more!) What do the statistics look like if you process just about enough frames so that you get a reasonable mulltiplicity, 3-4, say? Cheers, Tim On Thu, Mar 03, 2011 at 10:57:37AM -0700, Maia Cherney wrote: I see, there is no consensus about my data. Some people say 2.4A, other say all. Well, I chose 2.3 A. My rule was to be a little bit below Rmerg 100%. At 2.3A Rmerg was 98.7% Actually, I have published my paper in JMB. Yes, reviewers did not like that and even made me give Rrim and Rpim etc. Maia Bernhard Rupp (Hofkristallrat a.D.) wrote: First of all I would ask a XDS expert for that because I don't know exactly what stats the XDS program reports (shame on me, ok) nor what the quality of your error model is, or what you want to use the data for (I guess refinement - see Eleanor's response for that, and use all data). There is one point I'd like to make re cutoff: If one gets greedy and collects too much noise in high resolution shells (like way below I/sigI = 0.8 or so) the scaling/integration may suffer from an overabundance of nonsense data, and here I believe it makes sense to select a higher cutoff (like what exactly?) and reprocess the data. Maybe one of our data collection specialist should comment on that. BR -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Maia Cherney Sent: Thursday, March 03, 2011 9:13 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule I have to resend my statistics. Maia Cherney wrote: Dear Bernhard I am wondering where I should cut my data off. Here is the statistics from XDS processing. Maia On 11-03-03 04:29 AM, Roberto Battistutta wrote: Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it
Re: [ccp4bb] I/sigmaI of 3.0 rule
higher redundancy lowers Rpim because it increases precision. However, it need not increase accuracy if the observations are not drawn from the true distribution. If pathologic behaviour of Rfactor statistics is due to radiation damage, as I believe is often the case, we are combining observations that are no longer equivalent. If you used long exposures per image and collected just enough data for a complete data set you are out of luck. If you used shorter exposures and opted for a high-redundancy set then you have the option to toss out the last N images to get rid of the most damaged data, or you can try to compensate for the damage with zerodose, or whatever the name was of the program, I think from Wolfgang Kabsch. Rejecting data is never desirable but I think it may be better than merging non-equivalent data that can't be properly modeled by a single structure. Bart On 11-03-03 12:34 PM, Bernhard Rupp (Hofkristallrat a.D.) wrote: I don't like Rmeas either. Given the Angst caused by actually useful redundancy, would it not be more reasonable then to report Rpim which decreases with redundancy? Maybe Rpim in an additional column would help to reduce the Angst? BR Maia Tim Gruene wrote: Hello Maia, Rmerge is obsolete, so the reviewers had a good point to make you publish Rmeas instead. Rmeas should replace Rmerge in my opinion. The data statistics you sent show a mulltiplicity of about 20! Did you check your data for radiation damage? That might explain why your Rmeas is so utterly high while your I/sigI is still above 2 (You should not cut your data but include more!) What do the statistics look like if you process just about enough frames so that you get a reasonable mulltiplicity, 3-4, say? Cheers, Tim On Thu, Mar 03, 2011 at 10:57:37AM -0700, Maia Cherney wrote: I see, there is no consensus about my data. Some people say 2.4A, other say all. Well, I chose 2.3 A. My rule was to be a little bit below Rmerg 100%. At 2.3A Rmerg was 98.7% Actually, I have published my paper in JMB. Yes, reviewers did not like that and even made me give Rrim and Rpim etc. Maia Bernhard Rupp (Hofkristallrat a.D.) wrote: First of all I would ask a XDS expert for that because I don't know exactly what stats the XDS program reports (shame on me, ok) nor what the quality of your error model is, or what you want to use the data for (I guess refinement - see Eleanor's response for that, and use all data). There is one point I'd like to make re cutoff: If one gets greedy and collects too much noise in high resolution shells (like way belowI/sigI = 0.8 or so) the scaling/integration may suffer from an overabundance of nonsense data, and here I believe it makes sense to select a higher cutoff (like what exactly?) and reprocess the data. Maybe one of our data collection specialist should comment on that. BR -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Maia Cherney Sent: Thursday, March 03, 2011 9:13 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] I/sigmaI of3.0 rule I have to resend my statistics. Maia Cherney wrote: Dear Bernhard I am wondering where I should cut my data off. Here is the statistics from XDS processing. Maia On 11-03-03 04:29 AM, Roberto Battistutta wrote: Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it -- Bart Hazes (Associate Professor) Dept. of Medical Microbiology Immunology University of Alberta 1-15 Medical Sciences Building Edmonton, Alberta Canada, T6G 2H7 phone: 1-780-492-0042 fax:1-780-492-7521
Re: [ccp4bb] I/sigmaI of 3.0 rule
not sure whether this option has been mentioned before ... i think what we really would like to do is decide by the quality of the density. i see that this is difficult. so, short of that ... how about the figure of merit in refinement ? wouldn't the fom reflect how useful our data really are ? ingo On 03/03/2011 12:29, Roberto Battistutta wrote: Dear all, I got a reviewer comment that indicate the need to refine the structures at an appropriate resolution (I/sigmaI of 3.0), and re-submit the revised coordinate files to the PDB for validation.. In the manuscript I present some crystal structures determined by molecular replacement using the same protein in a different space group as search model. Does anyone know the origin or the theoretical basis of this I/sigmaI 3.0 rule for an appropriate resolution? Thanks, Bye, Roberto. Roberto Battistutta Associate Professor Department of Chemistry University of Padua via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. +39.049.8275239 roberto.battistu...@unipd.it www.chimica.unipd.it/roberto.battistutta/ VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it