Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-13 Thread George M. Sheldrick
Dear Boaz,

You are quite correct, 'latter' and 'former' need to be switched in my 
email. Apologies to CCP4bb for the confusion caused!

Best wishes, George

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582


On Sun, 13 Mar 2011, Boaz Shaanan wrote:

 Dear George,
 
 While I agree with you I wonder whether in this statement:
 
 ...The practice of quoting R-values both
 for all data and for F4sigma(F) seems to me to be useful. For example
 if the latter is much larger than the former, maybe you are including a
 lot of weak data...
 
 Shouldn't it be: ...former (i.e. R for all data) is much larger than the 
 latter (i.e. R for F4sigma(F)... ?
 Just wondering, although it could be my late night misunderstanding.
 
Best regards,
 
 Boaz
 
 Boaz Shaanan, Ph.D.
 Dept. of Life Sciences
 Ben-Gurion University of the Negev
 Beer-Sheva 84105
 Israel
 
 Phone: 972-8-647-2220  Skype: boaz.shaanan
 Fax:   972-8-647-2992 or 972-8-646-1710
 
 
 
 
 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] On Behalf Of George M. 
 Sheldrick [gshe...@shelx.uni-ac.gwdg.de]
 Sent: Sunday, March 13, 2011 12:11 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule
 
 Dear James,
 
 I'm a bit puzzled by your negative R-values and unstable behavior. In
 practice, whether we refine against intensity or against |F|, it is
 traditional to quote an R-factor (called R1 in small molecule
 crystallography) R = Sum||Fo|-|Fc|| / Sum|Fo|. Reflections that have
 negative measured intensities are either given F=0 or (e.g. using
 TRUNCATE) F is set to a small positive value, both of which avoid having
 to take the square root of a negative number which most computers don't
 like doing. Then the 'divide by zero' catastropy and negative R-values
 cannot happen because Sum|Fo| is always significantly greater than zero,
 and in my experience there is no problem in calculating an R-value even
 if the data are complete noise. The practice of quoting R-values both
 for all data and for F4sigma(F) seems to me to be useful. For example
 if the latter is much larger than the former, maybe you are including a
 lot of weak data. Similarly in calculating merging R-values, most
 programs replace negative intensities by zero, again avoiding the
 problems you describe.
 
 Best wishes, George
 
 Prof. George M. Sheldrick FRS
 Dept. Structural Chemistry,
 University of Goettingen,
 Tammannstr. 4,
 D37077 Goettingen, Germany
 Tel. +49-551-39-3021 or -3068
 Fax. +49-551-39-22582
 
 
 On Sat, 12 Mar 2011, James Holton wrote:
 
 
  The fundamental mathematical problem of using an R statistic on data with
  I/sigma(I)  3 is that the assumption that the fractional deviates 
  (I-I)/I
  obey a Gaussian distribution breaks down.  And when that happens, the R
  calculation itself becomes unstable, giving essentially random R values.
  Therefore, including weak data in R calculations is equivalent to 
  calculating
  R with a 3-sigma cutoff, and then adding a random number to the R value.  
  Now,
  random data is one thing, but if the statistic used to evaluate the data
  quality is itself random, then it is not what I would call useful.
 
  Since I am not very good at math, I always find myself approaching 
  statistics
  by generating long lists of random numbers, manipulating them in some way, 
  and
  then graphing the results.  For graphing Rmerge vs I/sigma(I), one does find
  that Bernhard's rule of Rmerge = 0.8/( I/sigma(I) ) generally applies, but
  only for I/sigma(I) that is = 3.  It gets better with high multiplicity, 
  but
  even with m=100, the Rmerge values for the I/sigma(I)  1 points are all 
  over
  the place.  This is true even if you average the value of Rmerge over a
  million random number seeds.  In fact, one must do so much averaging, 
  that I
  start to worry about the low-order bits of common random number generators. 
   I
  have attached images of these Rmerge vs I/sigma graphs.  The error bars
  reflect the rms deviation from the average of a large number of Rmerge
  values (different random number seeds).  The missing values are actually
  points where the average Rmerge in 60 trials (m=3) was still negative.
 
  The reason for this noisy R factor problem becomes clear if you consider 
  the
  limiting case where the true intensity is zero, and make a histogram of ( 
  I
  - I )/I.  It is not a Gaussian.  Rather, it is  the Gaussian's evil
  stepsister: the Lorentzian (or Cauchy distribution).  This distribution 
  may
  look a lot like a Gaussian, but it has longer tails, and these tails give it
  the weird statistical property of having an undefined mean value.  This is
  counterintuitive!  Because you can clearly just look at the histogram and 
  see
  that it has a central peak (at zero), but if you generate a million

Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-13 Thread Phil Evans
You do also have always to consider why you are doing this calculation -
usually to satisfy a sceptical and possibly ill-informed referee. A major
reason for doing this is to justify including an outer resolution shell of
data (see this BB passim), and for this I have come to prefer the random
half-dataset correlation coefficient in shells. A CC has a more
straightforward distribution than an R-factor (though not entirely without
problems). It is independent of the SD estimates, and easy to understand.

Phil


 The fundamental mathematical problem of using an R statistic on data
 with I/sigma(I)  3 is that the assumption that the fractional deviates
 (I-I)/I obey a Gaussian distribution breaks down.  And when that
 happens, the R calculation itself becomes unstable, giving essentially
 random R values.  Therefore, including weak data in R calculations is
 equivalent to calculating R with a 3-sigma cutoff, and then adding a
 random number to the R value.  Now, random data is one thing, but if
 the statistic used to evaluate the data quality is itself random, then
 it is not what I would call useful.

 Since I am not very good at math, I always find myself approaching
 statistics by generating long lists of random numbers, manipulating them
 in some way, and then graphing the results.  For graphing Rmerge vs
 I/sigma(I), one does find that Bernhard's rule of Rmerge = 0.8/(
 I/sigma(I) ) generally applies, but only for I/sigma(I) that is = 3.
 It gets better with high multiplicity, but even with m=100, the Rmerge
 values for the I/sigma(I)  1 points are all over the place.  This is
 true even if you average the value of Rmerge over a million random
 number seeds.  In fact, one must do so much averaging, that I start to
 worry about the low-order bits of common random number generators.  I
 have attached images of these Rmerge vs I/sigma graphs.  The error
 bars reflect the rms deviation from the average of a large number of
 Rmerge values (different random number seeds).  The missing values
 are actually points where the average Rmerge in 60 trials (m=3) was
 still negative.

 The reason for this noisy R factor problem becomes clear if you
 consider the limiting case where the true intensity is zero, and make
 a histogram of ( I - I )/I.  It is not a Gaussian.  Rather, it is
 the Gaussian's evil stepsister: the Lorentzian (or Cauchy
 distribution).  This distribution may look a lot like a Gaussian, but
 it has longer tails, and these tails give it the weird statistical
 property of having an undefined mean value.  This is counterintuitive!
 Because you can clearly just look at the histogram and see that it has a
 central peak (at zero), but if you generate a million
 Lorentzian-distributed random numbers and take the average value, you
 will not get anything close to zero.  Try it!  You can generate a
 Lorentzian deviate from a uniform deviate like this:
 tan(pi*(rand()-0.5)), where rand() makes a random number from 0 to 1.

 Now, it is not too hard to understand how R could blow up when the
 true spot intensities are all zero.  After all, as I approaches
 zero, the ratio ( I - I ) / I approaches a divide-by-zero problem.
 But what about when I/sigma(I) = 1?  Or 2?  If you look at these
 histograms, you find that they are a cross between a Gaussian and a
 Lorentzian (the so-called Voigt function), and the histogram does not
 become truly Gaussian-looking until I/sigma(I) = 3.  At this point,
 the R factor behaves Bernhard's rule quite well, even with
 multiplicities as low as 2 or 3.  This was the moment when I realized
 that the early crystallographers who first decided to use this 3-sigma
 cutoff, were smarter than I am.

 Now, you can make a Voigt function (or even a Lorentzian) look more like
 a Gaussian by doing something called outlier rejection, but it is hard
 to rationalize why the outliers are being rejected.  Especially in a
 simulation!  Then again, the silly part of all this is all we really
 want is the middle of the histogram of ( I - I )/I.  In fact, if
 you just pick the most common Rmerge, you would get a much better
 estimate of the true Rmerge in a given resolution bin than you would
 by averaging a hundred times more data.  Such procedures are called
 robust estimators in statistics, and the robust estimator
 equivalents to the average and the rms deviation from the average are
 the median and the median absolute deviation from the median.  If you
 make a list of Lorentzian-random numbers as above, and compute the
 median, you will get a value very close to zero, even with modest
 multiplicity!  And the median absolute deviation from the median
 rapidly converges to 1, which matches the full width at half maximum
 of the histogram quite nicely.

 So, what are the practical implications of this?  Perhaps instead of the
 average Rmerge in each bin we should be looking at the median Rmerge?
 This will be the same as the average for the cases where I/sigma(I)  3,
 but still be well 

Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-12 Thread George M. Sheldrick
Dear James,

I'm a bit puzzled by your negative R-values and unstable behavior. In 
practice, whether we refine against intensity or against |F|, it is 
traditional to quote an R-factor (called R1 in small molecule 
crystallography) R = Sum||Fo|-|Fc|| / Sum|Fo|. Reflections that have
negative measured intensities are either given F=0 or (e.g. using 
TRUNCATE) F is set to a small positive value, both of which avoid having 
to take the square root of a negative number which most computers don't 
like doing. Then the 'divide by zero' catastropy and negative R-values 
cannot happen because Sum|Fo| is always significantly greater than zero, 
and in my experience there is no problem in calculating an R-value even 
if the data are complete noise. The practice of quoting R-values both 
for all data and for F4sigma(F) seems to me to be useful. For example 
if the latter is much larger than the former, maybe you are including a 
lot of weak data. Similarly in calculating merging R-values, most 
programs replace negative intensities by zero, again avoiding the 
problems you describe.

Best wishes, George  

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582


On Sat, 12 Mar 2011, James Holton wrote:

 
 The fundamental mathematical problem of using an R statistic on data with
 I/sigma(I)  3 is that the assumption that the fractional deviates (I-I)/I
 obey a Gaussian distribution breaks down.  And when that happens, the R
 calculation itself becomes unstable, giving essentially random R values.
 Therefore, including weak data in R calculations is equivalent to calculating
 R with a 3-sigma cutoff, and then adding a random number to the R value.  Now,
 random data is one thing, but if the statistic used to evaluate the data
 quality is itself random, then it is not what I would call useful.
 
 Since I am not very good at math, I always find myself approaching statistics
 by generating long lists of random numbers, manipulating them in some way, and
 then graphing the results.  For graphing Rmerge vs I/sigma(I), one does find
 that Bernhard's rule of Rmerge = 0.8/( I/sigma(I) ) generally applies, but
 only for I/sigma(I) that is = 3.  It gets better with high multiplicity, but
 even with m=100, the Rmerge values for the I/sigma(I)  1 points are all over
 the place.  This is true even if you average the value of Rmerge over a
 million random number seeds.  In fact, one must do so much averaging, that I
 start to worry about the low-order bits of common random number generators.  I
 have attached images of these Rmerge vs I/sigma graphs.  The error bars
 reflect the rms deviation from the average of a large number of Rmerge
 values (different random number seeds).  The missing values are actually
 points where the average Rmerge in 60 trials (m=3) was still negative.
 
 The reason for this noisy R factor problem becomes clear if you consider the
 limiting case where the true intensity is zero, and make a histogram of ( I
 - I )/I.  It is not a Gaussian.  Rather, it is  the Gaussian's evil
 stepsister: the Lorentzian (or Cauchy distribution).  This distribution may
 look a lot like a Gaussian, but it has longer tails, and these tails give it
 the weird statistical property of having an undefined mean value.  This is
 counterintuitive!  Because you can clearly just look at the histogram and see
 that it has a central peak (at zero), but if you generate a million
 Lorentzian-distributed random numbers and take the average value, you will not
 get anything close to zero.  Try it!  You can generate a Lorentzian deviate
 from a uniform deviate like this: tan(pi*(rand()-0.5)), where rand() makes a
 random number from 0 to 1.
 
 Now, it is not too hard to understand how R could blow up when the true
 spot intensities are all zero.  After all, as I approaches zero, the ratio (
 I - I ) / I approaches a divide-by-zero problem.  But what about when
 I/sigma(I) = 1?  Or 2?  If you look at these histograms, you find that they
 are a cross between a Gaussian and a Lorentzian (the so-called Voigt
 function), and the histogram does not become truly Gaussian-looking until
 I/sigma(I) = 3.  At this point, the R factor behaves Bernhard's rule quite
 well, even with multiplicities as low as 2 or 3.  This was the moment when I
 realized that the early crystallographers who first decided to use this
 3-sigma cutoff, were smarter than I am.
 
 Now, you can make a Voigt function (or even a Lorentzian) look more like a
 Gaussian by doing something called outlier rejection, but it is hard to
 rationalize why the outliers are being rejected.  Especially in a
 simulation!  Then again, the silly part of all this is all we really want is
 the middle of the histogram of ( I - I )/I.  In fact, if you just pick
 the most common Rmerge, you would get a much better estimate of the true
 Rmerge in a given resolution bin than 

Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-09 Thread Graeme Winter
Hi James,

May I just offer a short counter-argument to your case for not including
weak reflections in the merging residuals?

Unlike many people I rather like Rmerge, not because it tells you how good
the data are, but because it gives you a clue as to how well the unmerged
measurements agree with one another. It's already been mentioned on this
thread that Rmerge is ~ 0.8 / I/sigma which means that the inverse is also
true - an Rmerge of 0.8 indicates that the average measurement in the shell
has an I/sigma of ~ 1 (presuming there are sufficient multiple measurements
- if the multiplicity is  3 or so this can be nonsense)

This does not depend on the error model or the multiplicity. It just talks
about the average. Now, if we exclude all measurements with an I/sigma of
less than three we have no idea of how strong the reflections in the shell
are on average. We're just top-slicing the good reflections and asking how
well they agree. Well, with an I/sigma  3 I would hope they agree rather
well if your error model is reasonable. It would suddenly become rare to see
an Rmerge  0.3 in the outer shell.

I like Rpim. It tells you how good the average measurement should be
provided you have not too much radiation damage. However, without Rmerge I
can't get a real handle on how well the measurements agree.

Personally, what I would like to see is the full contents of the Scala log
file available as graphs along with Rd from xdsstat and some other choice
statistics so you can get a relatively complete picture, however I
appreciate that this is unrealistic :o)

Just my 2c.

Cheerio,

Graeme

On 8 March 2011 20:07, James Holton jmhol...@lbl.gov wrote:

 Although George does not mention anything about data reduction programs, I
 take from his description that common small-molecule data processing
 packages (SAINT, others?), have also been modernized to record all data (no
 I/sigmaI  2 or 3 cutoff).  I agree with him that this is a good thing!  And
 it is also a good thing that small-molecule refinement programs use all
 data.  I just don't think it is a good idea to use all data in R factor
 calculations.

 Like Ron, I will probably be dating myself when I say that when I first got
 into the macromolecular crystallography business, it was still commonplace
 to use a 2-3 sigma spot intensity cutoff.  In fact, this is the reason why
 the PDB wants to know your completeness in the outermost resolution shell
 (in those days, the outer resolution was defined by where completeness drops
 to ~80% after the 3 sigma spot cutoff).  My experience with this, however,
 was brief, as the maximum-likelihood revolution was just starting to take
 hold, and the denzo manual specifically stated that only bad people use
 sigma cutoffs  -3.0.  Nevertheless, like many crystallographers from this
 era, I have fond memories of the REALLY low R factors you can get by using
 this arcane and now reviled practice.  Rsym values of 1-2% were common.

 It was only recently that I learned enough about statistics to understand
 the wisdom of my ancestors and that a 3-sigma cutoff is actually the right
 thing to do if you want to measure a fractional error (like an R factor).
  That is all I'm saying.

 -James Holton
 MAD Scientist


 On 3/6/2011 2:50 PM, Ronald E Stenkamp wrote:

 My small molecule experience is old enough (maybe 20 years) that I doubt
 if it's even close to representing current practices (best or otherwise).
  Given George's comments, I suspect (and hope) that less-than cutoffs are
 historical artifacts at this point, kept around in software for making
 comparisons with older structure determinations.  But a bit of scanning of
 Acta papers and others might be necessary to confirm that.  Ron


 On Sun, 6 Mar 2011, James Holton wrote:


 Yes, I would classify anything with I/sigmaI  3 as weak.  And yes, of
 course it is possible to get weak spots from small molecule crystals.
 After all, there is no spot so strong that it cannot be defeated by a
 sufficient amount of background!  I just meant that, relatively speaking,
 the intensities diffracted from a small molecule crystal are orders of
 magnitude brighter than those from a macromolecular crystal of the same
 size, and even the same quality (the 1/Vcell^2 term in Darwin's formula).

 I find it interesting that you point out the use of a 2 sigma(I)
 intensity cutoff for small molecule data sets!  Is this still common
 practice?  I am not a card-carrying small molecule crystallographer, so
 I'm not sure. However, if that is the case, then by definition there are no
 weak intensities in the data set.  And this is exactly the kind of data
 you want for least-squares refinement targets and computing % error
 quality metrics like R factors.  For likelihood targets, however, the weak
 data are actually a powerful restraint.

 -James Holton
 MAD Scientist

 On 3/6/2011 11:22 AM, Ronald E Stenkamp wrote:

 Could you please expand on your statement that small-molecule data has
 

Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-08 Thread James Holton
Although George does not mention anything about data reduction programs, 
I take from his description that common small-molecule data processing 
packages (SAINT, others?), have also been modernized to record all data 
(no I/sigmaI  2 or 3 cutoff).  I agree with him that this is a good 
thing!  And it is also a good thing that small-molecule refinement 
programs use all data.  I just don't think it is a good idea to use all 
data in R factor calculations.


Like Ron, I will probably be dating myself when I say that when I first 
got into the macromolecular crystallography business, it was still 
commonplace to use a 2-3 sigma spot intensity cutoff.  In fact, this is 
the reason why the PDB wants to know your completeness in the 
outermost resolution shell (in those days, the outer resolution was 
defined by where completeness drops to ~80% after the 3 sigma spot 
cutoff).  My experience with this, however, was brief, as the 
maximum-likelihood revolution was just starting to take hold, and the 
denzo manual specifically stated that only bad people use sigma cutoffs 
 -3.0.  Nevertheless, like many crystallographers from this era, I 
have fond memories of the REALLY low R factors you can get by using this 
arcane and now reviled practice.  Rsym values of 1-2% were common.


It was only recently that I learned enough about statistics to 
understand the wisdom of my ancestors and that a 3-sigma cutoff is 
actually the right thing to do if you want to measure a fractional 
error (like an R factor).  That is all I'm saying.


-James Holton
MAD Scientist


On 3/6/2011 2:50 PM, Ronald E Stenkamp wrote:
My small molecule experience is old enough (maybe 20 years) that I 
doubt if it's even close to representing current practices (best or 
otherwise).  Given George's comments, I suspect (and hope) that 
less-than cutoffs are historical artifacts at this point, kept around 
in software for making comparisons with older structure 
determinations.  But a bit of scanning of Acta papers and others might 
be necessary to confirm that.  Ron


On Sun, 6 Mar 2011, James Holton wrote:



Yes, I would classify anything with I/sigmaI  3 as weak.  And yes, 
of course it is possible to get weak spots from small molecule 
crystals. After all, there is no spot so strong that it cannot be 
defeated by a sufficient amount of background!  I just meant that, 
relatively speaking, the intensities diffracted from a small molecule 
crystal are orders of magnitude brighter than those from a 
macromolecular crystal of the same size, and even the same quality 
(the 1/Vcell^2 term in Darwin's formula).


I find it interesting that you point out the use of a 2 sigma(I) 
intensity cutoff for small molecule data sets!  Is this still common 
practice?  I am not a card-carrying small molecule 
crystallographer, so I'm not sure. However, if that is the case, 
then by definition there are no weak intensities in the data set.  
And this is exactly the kind of data you want for least-squares 
refinement targets and computing % error quality metrics like R 
factors.  For likelihood targets, however, the weak data are 
actually a powerful restraint.


-James Holton
MAD Scientist

On 3/6/2011 11:22 AM, Ronald E Stenkamp wrote:
Could you please expand on your statement that small-molecule data 
has essentially no weak spots.?  The small molecule data sets I've 
worked with have had large numbers of unobserved reflections where 
I used 2 sigma(I) cutoffs (maybe 15-30% of the reflections).  Would 
you consider those weak spots or not?  Ron


On Sun, 6 Mar 2011, James Holton wrote:

I should probably admit that I might be indirectly responsible for 
the resurgence of this I/sigma  3 idea, but I never intended this 
in the way described by the original poster's reviewer!


What I have been trying to encourage people to do is calculate R 
factors using only hkls for which the signal-to-noise ratio is  
3.  Not refinement! Refinement should be done against all data.  I 
merely propose that weak data be excluded from R-factor 
calculations after the refinement/scaling/mergeing/etc. is done.


This is because R factors are a metric of the FRACTIONAL error in 
something (aka a % difference), but a % error is only 
meaningful when the thing being measured is not zero.  However, in 
macromolecular crystallography, we tend to measure a lot of 
zeroes.  There is nothing wrong with measuring zero!  An 
excellent example of this is confirming that a systematic absence 
is in fact absent.  The sigma on the intensity assigned to an 
absent spot is still a useful quantity, because it reflects how 
confident you are in the measurement.  I.E.  a sigma of 10 vs 
100 means you are more sure that the intensity is zero. However, 
there is no R factor for systematic absences. How could there 
be!  This is because the definition of % error starts to break 
down as the true spot intensity gets weaker, and it becomes 
completely meaningless when the true intensity reaches zero.



Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-06 Thread James Holton
I should probably admit that I might be indirectly responsible for the 
resurgence of this I/sigma  3 idea, but I never intended this in the 
way described by the original poster's reviewer!


What I have been trying to encourage people to do is calculate R factors 
using only hkls for which the signal-to-noise ratio is  3.  Not 
refinement!  Refinement should be done against all data.  I merely 
propose that weak data be excluded from R-factor calculations after the 
refinement/scaling/mergeing/etc. is done.


This is because R factors are a metric of the FRACTIONAL error in 
something (aka a % difference), but a % error is only meaningful 
when the thing being measured is not zero.  However, in macromolecular 
crystallography, we tend to measure a lot of zeroes.  There is nothing 
wrong with measuring zero!  An excellent example of this is confirming 
that a systematic absence is in fact absent.  The sigma on the 
intensity assigned to an absent spot is still a useful quantity, because 
it reflects how confident you are in the measurement.  I.E.  a sigma of 
10 vs 100 means you are more sure that the intensity is zero.  
However, there is no R factor for systematic absences.  How could 
there be!  This is because the definition of % error starts to break 
down as the true spot intensity gets weaker, and it becomes completely 
meaningless when the true intensity reaches zero.


Historically, I believe the widespread use of R factors came about 
because small-molecule data has essentially no weak spots.  With the 
exception of absences (which are not used in refinement), spots from 
salt crystals are strong all the way out to edge of the detector, 
(even out to the limiting sphere, which is defined by the x-ray 
wavelength).  So, when all the data are strong, a % error is an 
easy-to-calculate quantity that actually describes the sigmas of the 
data very well.  That is, sigma(I) of strong spots tends to be dominated 
by things like beam flicker, spindle stability, shutter accuracy, etc.  
All these usually add up to ~5% error, and indeed even the Braggs could 
typically get +/-5% for the intensity of the diffracted rays they were 
measuring.  Things like Rsym were therefore created to check that 
nothing funny happened in the measurement.


For similar reasons, the quality of a model refined against all-strong 
data is described very well by a % error, and this is why the 
refinement R factors rapidly became popular.  Most people intuitively 
know what you mean if you say that your model fits the data to within 
5%.  In fact, a widely used criterion for the correctness of a small 
molecule structure is that the refinement R factor must be LOWER than 
Rsym.  This is equivalent to saying that your curve (model) fit your 
data to within experimental error.  Unfortunately, this has never been 
the case for macromolecular structures!


The problem with protein crystals, of course, is that we have lots of 
weak data.  And by weak, I don't mean bad!  Yes, it is always 
nicer to have more intense spots, but there is nothing shameful about 
knowing that certain intensities are actually very close to zero.  In 
fact, from the point of view of the refinement program, isn't describing 
some high-angle spot as: zero, plus or minus 10, better than I have 
no idea?   Indeed, several works mentioned already as well as the free 
lunch algorithm have demonstrated that these zero data can actually 
be useful, even if it is well beyond the resolution limit.


So, what do we do?  I see no reason to abandon R factors, since they 
have such a long history and give us continuity of criteria going back 
almost a century.  However, I also see no reason to punish ourselves by 
including lots of zeroes in the denominator.  In fact, using weak data 
in an R factor calculation defeats their best feature.  R factors are a 
very good estimate of the fractional component of the total error, 
provided they are calculated with strong data only.


Of course, with strong and weak data, the best thing to do is compare 
the model-data disagreement with the magnitude of the error.  That is, 
compare |Fobs-Fcalc| to sigma(Fobs), not Fobs itself.  Modern refinement 
programs do this!  And I say the more data the merrier.



-James Holton
MAD Scientist


On 3/4/2011 5:15 AM, Marjolein Thunnissen wrote:

hi

Recently on a paper I submitted, it was the editor of the journal who wanted 
exactly the same thing. I never argued with the editor about this (should have 
maybe), but it could be one cause of the epidemic that Bart Hazes saw


best regards

Marjolein

On Mar 3, 2011, at 12:29 PM, Roberto Battistutta wrote:


Dear all,
I got a reviewer comment that indicate the need to refine the structures at an appropriate 
resolution (I/sigmaI of3.0), and re-submit the revised coordinate files to the PDB for 
validation.. In the manuscript I present some crystal structures determined by molecular 
replacement using the same protein in a different space 

Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-06 Thread Ronald E Stenkamp

Could you please expand on your statement that small-molecule data has essentially no weak 
spots.?  The small molecule data sets I've worked with have had large numbers of unobserved 
reflections where I used 2 sigma(I) cutoffs (maybe 15-30% of the reflections).  Would you consider those 
weak spots or not?  Ron

On Sun, 6 Mar 2011, James Holton wrote:

I should probably admit that I might be indirectly responsible for the 
resurgence of this I/sigma  3 idea, but I never intended this in the way 
described by the original poster's reviewer!


What I have been trying to encourage people to do is calculate R factors 
using only hkls for which the signal-to-noise ratio is  3.  Not refinement! 
Refinement should be done against all data.  I merely propose that weak data 
be excluded from R-factor calculations after the 
refinement/scaling/mergeing/etc. is done.


This is because R factors are a metric of the FRACTIONAL error in something 
(aka a % difference), but a % error is only meaningful when the thing 
being measured is not zero.  However, in macromolecular crystallography, we 
tend to measure a lot of zeroes.  There is nothing wrong with measuring 
zero!  An excellent example of this is confirming that a systematic absence 
is in fact absent.  The sigma on the intensity assigned to an absent spot 
is still a useful quantity, because it reflects how confident you are in the 
measurement.  I.E.  a sigma of 10 vs 100 means you are more sure that the 
intensity is zero.  However, there is no R factor for systematic absences. 
How could there be!  This is because the definition of % error starts to 
break down as the true spot intensity gets weaker, and it becomes 
completely meaningless when the true intensity reaches zero.


Historically, I believe the widespread use of R factors came about because 
small-molecule data has essentially no weak spots.  With the exception of 
absences (which are not used in refinement), spots from salt crystals are 
strong all the way out to edge of the detector, (even out to the limiting 
sphere, which is defined by the x-ray wavelength).  So, when all the data 
are strong, a % error is an easy-to-calculate quantity that actually 
describes the sigmas of the data very well.  That is, sigma(I) of strong 
spots tends to be dominated by things like beam flicker, spindle stability, 
shutter accuracy, etc.  All these usually add up to ~5% error, and indeed 
even the Braggs could typically get +/-5% for the intensity of the diffracted 
rays they were measuring.  Things like Rsym were therefore created to check 
that nothing funny happened in the measurement.


For similar reasons, the quality of a model refined against all-strong data 
is described very well by a % error, and this is why the refinement R 
factors rapidly became popular.  Most people intuitively know what you mean 
if you say that your model fits the data to within 5%.  In fact, a widely 
used criterion for the correctness of a small molecule structure is that 
the refinement R factor must be LOWER than Rsym.  This is equivalent to 
saying that your curve (model) fit your data to within experimental error. 
Unfortunately, this has never been the case for macromolecular structures!


The problem with protein crystals, of course, is that we have lots of weak 
data.  And by weak, I don't mean bad!  Yes, it is always nicer to have 
more intense spots, but there is nothing shameful about knowing that certain 
intensities are actually very close to zero.  In fact, from the point of view 
of the refinement program, isn't describing some high-angle spot as: zero, 
plus or minus 10, better than I have no idea?   Indeed, several works 
mentioned already as well as the free lunch algorithm have demonstrated 
that these zero data can actually be useful, even if it is well beyond the 
resolution limit.


So, what do we do?  I see no reason to abandon R factors, since they have 
such a long history and give us continuity of criteria going back almost a 
century.  However, I also see no reason to punish ourselves by including lots 
of zeroes in the denominator.  In fact, using weak data in an R factor 
calculation defeats their best feature.  R factors are a very good estimate 
of the fractional component of the total error, provided they are calculated 
with strong data only.


Of course, with strong and weak data, the best thing to do is compare the 
model-data disagreement with the magnitude of the error.  That is, compare 
|Fobs-Fcalc| to sigma(Fobs), not Fobs itself.  Modern refinement programs do 
this!  And I say the more data the merrier.



-James Holton
MAD Scientist


On 3/4/2011 5:15 AM, Marjolein Thunnissen wrote:

hi

Recently on a paper I submitted, it was the editor of the journal who 
wanted exactly the same thing. I never argued with the editor about this 
(should have maybe), but it could be one cause of the epidemic that Bart 
Hazes saw



best regards

Marjolein

On Mar 3, 2011, at 12:29 PM, Roberto 

Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-06 Thread James Holton
Yes, I would classify anything with I/sigmaI  3 as weak.  And yes, of 
course it is possible to get weak spots from small molecule crystals.  
After all, there is no spot so strong that it cannot be defeated by a 
sufficient amount of background!  I just meant that, relatively 
speaking, the intensities diffracted from a small molecule crystal are 
orders of magnitude brighter than those from a macromolecular crystal of 
the same size, and even the same quality (the 1/Vcell^2 term in Darwin's 
formula).


I find it interesting that you point out the use of a 2 sigma(I) 
intensity cutoff for small molecule data sets!  Is this still common 
practice?  I am not a card-carrying small molecule crystallographer, 
so I'm not sure.  However, if that is the case, then by definition there 
are no weak intensities in the data set.  And this is exactly the kind 
of data you want for least-squares refinement targets and computing % 
error quality metrics like R factors.  For likelihood targets, however, 
the weak data are actually a powerful restraint.


-James Holton
MAD Scientist

On 3/6/2011 11:22 AM, Ronald E Stenkamp wrote:
Could you please expand on your statement that small-molecule data 
has essentially no weak spots.?  The small molecule data sets I've 
worked with have had large numbers of unobserved reflections where I 
used 2 sigma(I) cutoffs (maybe 15-30% of the reflections).  Would you 
consider those weak spots or not?  Ron


On Sun, 6 Mar 2011, James Holton wrote:

I should probably admit that I might be indirectly responsible for 
the resurgence of this I/sigma  3 idea, but I never intended this in 
the way described by the original poster's reviewer!


What I have been trying to encourage people to do is calculate R 
factors using only hkls for which the signal-to-noise ratio is  3.  
Not refinement! Refinement should be done against all data.  I merely 
propose that weak data be excluded from R-factor calculations after 
the refinement/scaling/mergeing/etc. is done.


This is because R factors are a metric of the FRACTIONAL error in 
something (aka a % difference), but a % error is only meaningful 
when the thing being measured is not zero.  However, in 
macromolecular crystallography, we tend to measure a lot of 
zeroes.  There is nothing wrong with measuring zero!  An excellent 
example of this is confirming that a systematic absence is in fact 
absent.  The sigma on the intensity assigned to an absent spot is 
still a useful quantity, because it reflects how confident you are in 
the measurement.  I.E.  a sigma of 10 vs 100 means you are more 
sure that the intensity is zero.  However, there is no R factor for 
systematic absences. How could there be!  This is because the 
definition of % error starts to break down as the true spot 
intensity gets weaker, and it becomes completely meaningless when the 
true intensity reaches zero.


Historically, I believe the widespread use of R factors came about 
because small-molecule data has essentially no weak spots.  With the 
exception of absences (which are not used in refinement), spots from 
salt crystals are strong all the way out to edge of the detector, 
(even out to the limiting sphere, which is defined by the x-ray 
wavelength).  So, when all the data are strong, a % error is an 
easy-to-calculate quantity that actually describes the sigmas of 
the data very well.  That is, sigma(I) of strong spots tends to be 
dominated by things like beam flicker, spindle stability, shutter 
accuracy, etc.  All these usually add up to ~5% error, and indeed 
even the Braggs could typically get +/-5% for the intensity of the 
diffracted rays they were measuring.  Things like Rsym were therefore 
created to check that nothing funny happened in the measurement.


For similar reasons, the quality of a model refined against 
all-strong data is described very well by a % error, and this is 
why the refinement R factors rapidly became popular.  Most people 
intuitively know what you mean if you say that your model fits the 
data to within 5%.  In fact, a widely used criterion for the 
correctness of a small molecule structure is that the refinement R 
factor must be LOWER than Rsym.  This is equivalent to saying that 
your curve (model) fit your data to within experimental error. 
Unfortunately, this has never been the case for macromolecular 
structures!


The problem with protein crystals, of course, is that we have lots of 
weak data.  And by weak, I don't mean bad!  Yes, it is always 
nicer to have more intense spots, but there is nothing shameful about 
knowing that certain intensities are actually very close to zero.  In 
fact, from the point of view of the refinement program, isn't 
describing some high-angle spot as: zero, plus or minus 10, better 
than I have no idea?   Indeed, several works mentioned already as 
well as the free lunch algorithm have demonstrated that these 
zero data can actually be useful, even if it is well beyond the 
resolution limit.



Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-06 Thread George M. Sheldrick
Since small molecules are being discussed maybe I should comment. A widely 
used small molecule program that I don't need to advertise here refines 
against all measured intensities unless the user has imposed a resolution 
cutoff. It prints R values for all data and for I2sig(I) [F4sig(F)]. 
The user can of course improve these by cutting back the resolution but
if he or she oversteps 0.84A he/she will be caught by the CIF police.
This works like a radar trap so weak datasets are usually truncated to 
0.84A whether or not there are significant data to that resolution. It is 
always instructive to compare the R-values for all data and I2sig(I); 
if the former is substantially larger, a lot of noisy outer data have been
included. 

It is not true that small molecule datasets do not contain weak 
reflections. One should remember that the intensity statistics are 
different for centrosymmetric space groups: very weak AND very strong 
reflections (relative to the average in a resolution shell) are much more 
common!

George

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582


On Sun, 6 Mar 2011, James Holton wrote:

 Yes, I would classify anything with I/sigmaI  3 as weak.  And yes, of
 course it is possible to get weak spots from small molecule crystals.  After
 all, there is no spot so strong that it cannot be defeated by a sufficient
 amount of background!  I just meant that, relatively speaking, the intensities
 diffracted from a small molecule crystal are orders of magnitude brighter than
 those from a macromolecular crystal of the same size, and even the same
 quality (the 1/Vcell^2 term in Darwin's formula).
 
 I find it interesting that you point out the use of a 2 sigma(I) intensity
 cutoff for small molecule data sets!  Is this still common practice?  I am not
 a card-carrying small molecule crystallographer, so I'm not sure.  However,
 if that is the case, then by definition there are no weak intensities in the
 data set.  And this is exactly the kind of data you want for least-squares
 refinement targets and computing % error quality metrics like R factors.
 For likelihood targets, however, the weak data are actually a powerful
 restraint.
 
 -James Holton
 MAD Scientist
 
 On 3/6/2011 11:22 AM, Ronald E Stenkamp wrote:
  Could you please expand on your statement that small-molecule data has
  essentially no weak spots.?  The small molecule data sets I've worked with
  have had large numbers of unobserved reflections where I used 2 sigma(I)
  cutoffs (maybe 15-30% of the reflections).  Would you consider those weak
  spots or not?  Ron
 
  On Sun, 6 Mar 2011, James Holton wrote:
 
   I should probably admit that I might be indirectly responsible for the
   resurgence of this I/sigma  3 idea, but I never intended this in the way
   described by the original poster's reviewer!
  
   What I have been trying to encourage people to do is calculate R factors
   using only hkls for which the signal-to-noise ratio is  3.  Not
   refinement! Refinement should be done against all data.  I merely propose
   that weak data be excluded from R-factor calculations after the
   refinement/scaling/mergeing/etc. is done.
  
   This is because R factors are a metric of the FRACTIONAL error in
   something (aka a % difference), but a % error is only meaningful when
   the thing being measured is not zero.  However, in macromolecular
   crystallography, we tend to measure a lot of zeroes.  There is nothing
   wrong with measuring zero!  An excellent example of this is confirming
   that a systematic absence is in fact absent.  The sigma on the
   intensity assigned to an absent spot is still a useful quantity, because
   it reflects how confident you are in the measurement.  I.E.  a sigma of
   10 vs 100 means you are more sure that the intensity is zero.
   However, there is no R factor for systematic absences. How could there
   be!  This is because the definition of % error starts to break down as
   the true spot intensity gets weaker, and it becomes completely
   meaningless when the true intensity reaches zero.
  
   Historically, I believe the widespread use of R factors came about because
   small-molecule data has essentially no weak spots.  With the exception of
   absences (which are not used in refinement), spots from salt crystals
   are strong all the way out to edge of the detector, (even out to the
   limiting sphere, which is defined by the x-ray wavelength).  So, when
   all the data are strong, a % error is an easy-to-calculate quantity that
   actually describes the sigmas of the data very well.  That is, sigma(I)
   of strong spots tends to be dominated by things like beam flicker, spindle
   stability, shutter accuracy, etc.  All these usually add up to ~5% error,
   and indeed even the Braggs could typically get +/-5% for the intensity of
   the diffracted rays they 

[ccp4bb] Philosophy and Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-05 Thread Jrh
Dear Colleagues,
Agreed!  There is a wider point though which is that the 3D structure and data 
can form a potential for further analysis and thus the data and the structure 
can ideally be more than the current paper's contents. Obviously artificially 
high I/ sig I  cut offs are both unfortunate for the current article and such 
future analyses. In chemical crystallography this potential for further 
analyses is widely recognised. Eg a crystal structure should have all static 
disorder sorted, methyl rotor groups correctly positioned etc even if not 
directly relevant to an article. Such rigour is the requirement for Acta Cryst 
C , for example, in chemical crystallography. 
Best wishes,
John


Prof John R Helliwell DSc


On 4 Mar 2011, at 20:36, Roberto Battistutta roberto.battistu...@unipd.it 
wrote:

 Dear Phil,
 I completely agree with you, your words seem to me the best
 philosophical outcome of the discussion and indicate the right
 perspective to tackle this topic. In particular you write In the end, the
 important question as ever is does the experimental data support the
 conclusions drawn from it? and that will depend on local information
 about particular atoms and groups, not on global indicators. Exactly, in
 my case, all the discussion of the structures was absolutely independent
 from having 1.9, 2.0 or 2.1 A nominal resolution, or to cut at 1.5 or 2.0
 or 3.0 I/sigma. This makes the unjustified (as this two-day discussion has
 clearly pointed out) technical critics of the reviewer even more
 upsetting.
 Ciao,
 Roberto


Re: [ccp4bb] [Fwd: Re: [ccp4bb] I/sigmaI of 3.0 rule]

2011-03-04 Thread Kay Diederichs

Maia,

provided radiation damage is not a major detrimental factor, your data 
are just fine, and useful also in the high resolution shell (which still 
has I/sigma of 2.84 so you could probably process a bit beyond 2.25A).


There is nothing wrong with R_meas of 147.1% since, as others have said, 
R_meas is not limited to 59% (or similar) as a refinement R-factor is. 
Rather, R_meas is computed from a formula that has a denominator which 
in the asymptotic limit (noise) approaches zero - because there will be 
(almost) as many negative observations as positive ones! (The numerator 
however does not go to zero)


Concerning radiation damage: First, take a look at your frames - but 
make sure you have the same crystal orientation, as anisotropy may mask 
radiation damage! Then, you can check (using CCP4's loggraph) the R_d 
plot provided by XDSSTAT (for a single dataset; works best for 
high-symmetry spacegroups), and you should also check ISa (printed in 
CORRECT.LP and XSCALE.LP).


HTH,

Kay

P.S. I see one potential problem: XSCALE (VERSION  December 6, 2007) 
when the calculation was done 28-Aug-2009. There were quite a number of 
improvements in XDS/XSCALE since that version. The reason may be that a 
licensed, non-expiring version was used - make sure you always rather 
use the latest version available!



  Original Message 
 Subject: [Fwd: Re: [ccp4bb] I/sigmaI of 3.0 rule]
 Date: Thu, 3 Mar 2011 10:45:03 -0700
 From: Maia Cherney ch...@ualberta.ca



  Original Message 
 Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule
 Date: Thu, 03 Mar 2011 10:43:23 -0700
 From: Maia Cherney ch...@ualberta.ca
 To: Oganesyan, Vaheh oganesy...@medimmune.com
 References: 2ba9ce2f-c299-4ca9-a36a-99065d1b3...@unipd.it
 4d6faed8.7040...@ualberta.ca
 021001cbd9bc$f0ecc940$d2c65bc0$@gmail.com
 4d6fcab6.3090...@ualberta.ca 4d6fcbff.2010...@ualberta.ca
 73e543de77290c409c9bed6fa4ca34bb0173a...@md1ev002.medimmune.com



 Vaheh,

 The problem was with Rmerg. As you can see at I/sigma=2.84, the Rmerge
 (R-factor) was 143%. I am asking this question because B. Rupp wrote
 However, there is a simple relation between I/sigI and R-merge
 (provided no other indecency has been done to the data). It simply is
 (BMC) Rm=0.8/I/sigI.
 Maybe my data are indecent? This is the whole LP file.

 Maia

 MMC741_scale-2.25.LP


**
XSCALE (VERSION  December 6, 2007)  28-Aug-2009
**

 Author: Wolfgang Kabsch
 Copy licensed until (unlimited) to
  Canadian Light Source, Saskatoon, Canada.
 No redistribution.


--
Kay Diederichshttp://strucbio.biologie.uni-konstanz.de
email: kay.diederi...@uni-konstanz.deTel +49 7531 88 4049 Fax 3183
Fachbereich Biologie, Universität Konstanz, Box M647, D-78457 Konstanz

This e-mail is digitally signed. If your e-mail client does not have the
necessary capabilities, just ignore the attached signature smime.p7s.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-04 Thread John R Helliwell
Dear Roberto,
Overnight I recall an additional point:-
In chemical crystallography, where standard uncertainties are
routinely avaliable for the molecular model from the full matrix
inversion in the model refinement, it is of course possible to keep
extending your resolution until your bond distance and angles su
values go up. Thus if you distrust or do not wish to slavishly follow
a Journal's Notes for Authors, such as for Acta Crystallographica
Section C to which I referred yesterday, you can, in this way, check
yourself the good sense of the data quality criteria required. [This
is a similar test to the one that Phil mentioned yesterday ie with
respect to scrutinising electron density maps for your protein ie do
they show more detail by adding more diffraction data.]
Best wishes,
John

On Thu, Mar 3, 2011 at 11:29 AM, Roberto Battistutta
roberto.battistu...@unipd.it wrote:
 Dear all,
 I got a reviewer comment that indicate the need to refine the structures at 
 an appropriate resolution (I/sigmaI of 3.0), and re-submit the revised 
 coordinate files to the PDB for validation.. In the manuscript I present 
 some crystal structures determined by molecular replacement using the same 
 protein in a different space group as search model. Does anyone know the 
 origin or the theoretical basis of this I/sigmaI 3.0 rule for an 
 appropriate resolution?
 Thanks,
 Bye,
 Roberto.


 Roberto Battistutta
 Associate Professor
 Department of Chemistry
 University of Padua
 via Marzolo 1, 35131 Padova - ITALY
 tel. +39.049.8275265/67
 fax. +39.049.8275239
 roberto.battistu...@unipd.it
 www.chimica.unipd.it/roberto.battistutta/
 VIMM (Venetian Institute of Molecular Medicine)
 via Orus 2, 35129 Padova - ITALY
 tel. +39.049.7923236
 fax +39.049.7923250
 www.vimm.it




-- 
Professor John R Helliwell DSc


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-04 Thread Roberto Battistutta
Dear all,
just to say that I really appreciate and thank the many people who spent time 
responding to my issue. I have read with much interest (and sometimes with fun) 
all comments and suggestions, very interesting and useful.
Thanks a lot,
Bye,
Roberto.


Roberto Battistutta
Associate Professor
Department of Chemistry
University of Padua
via Marzolo 1, 35131 Padova - ITALY
tel. +39.049.8275265/67
fax. +39.049.8275239
roberto.battistu...@unipd.it
www.chimica.unipd.it/roberto.battistutta/
VIMM (Venetian Institute of Molecular Medicine)
via Orus 2, 35129 Padova - ITALY
tel. +39.049.7923236
fax +39.049.7923250
www.vimm.it

Il giorno 03/mar/2011, alle ore 12.29, Roberto Battistutta ha scritto:

 Dear all,
 I got a reviewer comment that indicate the need to refine the structures at 
 an appropriate resolution (I/sigmaI of 3.0), and re-submit the revised 
 coordinate files to the PDB for validation.. In the manuscript I present 
 some crystal structures determined by molecular replacement using the same 
 protein in a different space group as search model. Does anyone know the 
 origin or the theoretical basis of this I/sigmaI 3.0 rule for an 
 appropriate resolution?
 Thanks,
 Bye,
 Roberto.
 
 
 Roberto Battistutta
 Associate Professor
 Department of Chemistry
 University of Padua
 via Marzolo 1, 35131 Padova - ITALY
 tel. +39.049.8275265/67
 fax. +39.049.8275239
 roberto.battistu...@unipd.it
 www.chimica.unipd.it/roberto.battistutta/
 VIMM (Venetian Institute of Molecular Medicine)
 via Orus 2, 35129 Padova - ITALY
 tel. +39.049.7923236
 fax +39.049.7923250
 www.vimm.it
 


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-04 Thread Phil Evans
This is very closely related to the way in which I would like to think about 
this: if you consider adding another thin shell of data, are you adding any 
significant information? Unfortunately as Garib Murshudov has pointed out, we 
don't have any reliable way of estimating the information content of data.  
(Also it should be considered anisotropically.)

Another way of thinking about this is to consider that if we had perfect error 
models and weighted the data perfectly, then adding a shell of data with 
essentially no useful information should at least do no harm (ie weights are 
close to zero).  But we do not have perfect error models, so adding too much 
data may in the end degrade our structural model.

Much of the problem arises from our addiction to R-factors as a measure of 
quality, when they are unweighted and therefore very misleading. We are are 
also too quick to judge the quality of a structure by its nominal resolution, 
whatever that means. In the end, the important question as ever is does the 
experimental data support the conclusions drawn from it? and that will depend 
on local information about particular atoms and groups, not on global indicators

Phil


On 4 Mar 2011, at 10:35, John R Helliwell wrote:

 Dear Roberto,
 Overnight I recall an additional point:-
 In chemical crystallography, where standard uncertainties are
 routinely avaliable for the molecular model from the full matrix
 inversion in the model refinement, it is of course possible to keep
 extending your resolution until your bond distance and angles su
 values go up. Thus if you distrust or do not wish to slavishly follow
 a Journal's Notes for Authors, such as for Acta Crystallographica
 Section C to which I referred yesterday, you can, in this way, check
 yourself the good sense of the data quality criteria required. [This
 is a similar test to the one that Phil mentioned yesterday ie with
 respect to scrutinising electron density maps for your protein ie do
 they show more detail by adding more diffraction data.]
 Best wishes,
 John
 
 On Thu, Mar 3, 2011 at 11:29 AM, Roberto Battistutta
 roberto.battistu...@unipd.it wrote:
 Dear all,
 I got a reviewer comment that indicate the need to refine the structures at 
 an appropriate resolution (I/sigmaI of 3.0), and re-submit the revised 
 coordinate files to the PDB for validation.. In the manuscript I present 
 some crystal structures determined by molecular replacement using the same 
 protein in a different space group as search model. Does anyone know the 
 origin or the theoretical basis of this I/sigmaI 3.0 rule for an 
 appropriate resolution?
 Thanks,
 Bye,
 Roberto.
 
 
 Roberto Battistutta
 Associate Professor
 Department of Chemistry
 University of Padua
 via Marzolo 1, 35131 Padova - ITALY
 tel. +39.049.8275265/67
 fax. +39.049.8275239
 roberto.battistu...@unipd.it
 www.chimica.unipd.it/roberto.battistutta/
 VIMM (Venetian Institute of Molecular Medicine)
 via Orus 2, 35129 Padova - ITALY
 tel. +39.049.7923236
 fax +39.049.7923250
 www.vimm.it
 
 
 
 
 -- 
 Professor John R Helliwell DSc


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-04 Thread Marjolein Thunnissen
hi

Recently on a paper I submitted, it was the editor of the journal who wanted 
exactly the same thing. I never argued with the editor about this (should have 
maybe), but it could be one cause of the epidemic that Bart Hazes saw


best regards

Marjolein

On Mar 3, 2011, at 12:29 PM, Roberto Battistutta wrote:

 Dear all,
 I got a reviewer comment that indicate the need to refine the structures at 
 an appropriate resolution (I/sigmaI of 3.0), and re-submit the revised 
 coordinate files to the PDB for validation.. In the manuscript I present 
 some crystal structures determined by molecular replacement using the same 
 protein in a different space group as search model. Does anyone know the 
 origin or the theoretical basis of this I/sigmaI 3.0 rule for an 
 appropriate resolution?
 Thanks,
 Bye,
 Roberto.
 
 
 Roberto Battistutta
 Associate Professor
 Department of Chemistry
 University of Padua
 via Marzolo 1, 35131 Padova - ITALY
 tel. +39.049.8275265/67
 fax. +39.049.8275239
 roberto.battistu...@unipd.it
 www.chimica.unipd.it/roberto.battistutta/
 VIMM (Venetian Institute of Molecular Medicine)
 via Orus 2, 35129 Padova - ITALY
 tel. +39.049.7923236
 fax +39.049.7923250
 www.vimm.it


Re: [ccp4bb] [Fwd: Re: [ccp4bb] I/sigmaI of 3.0 rule]

2011-03-04 Thread Maia Cherney

Kay,

Thank you for your explanation. The radiation damage was not the factor, 
but there was something strange about this crystal (actually two 
crystals had the same strange behavior). I could not process them in 
HKL2000, but it showed the problem (see pictures in the attachment). The 
processing in XDS was done at the CLS (Canadian Light Source). I know 
they always have the latest version of XDS.


Maia

Kay Diederichs wrote:

Maia,

provided radiation damage is not a major detrimental factor, your data 
are just fine, and useful also in the high resolution shell (which 
still has I/sigma of 2.84 so you could probably process a bit beyond 
2.25A).


There is nothing wrong with R_meas of 147.1% since, as others have 
said, R_meas is not limited to 59% (or similar) as a refinement 
R-factor is. Rather, R_meas is computed from a formula that has a 
denominator which in the asymptotic limit (noise) approaches zero - 
because there will be (almost) as many negative observations as 
positive ones! (The numerator however does not go to zero)


Concerning radiation damage: First, take a look at your frames - but 
make sure you have the same crystal orientation, as anisotropy may 
mask radiation damage! Then, you can check (using CCP4's loggraph) the 
R_d plot provided by XDSSTAT (for a single dataset; works best for 
high-symmetry spacegroups), and you should also check ISa (printed in 
CORRECT.LP and XSCALE.LP).


HTH,

Kay

P.S. I see one potential problem: XSCALE (VERSION  December 6, 2007) 
when the calculation was done 28-Aug-2009. There were quite a number 
of improvements in XDS/XSCALE since that version. The reason may be 
that a licensed, non-expiring version was used - make sure you always 
rather use the latest version available!



  Original Message 
 Subject: [Fwd: Re: [ccp4bb] I/sigmaI of 3.0 rule]
 Date: Thu, 3 Mar 2011 10:45:03 -0700
 From: Maia Cherney ch...@ualberta.ca



  Original Message 
 Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule
 Date: Thu, 03 Mar 2011 10:43:23 -0700
 From: Maia Cherney ch...@ualberta.ca
 To: Oganesyan, Vaheh oganesy...@medimmune.com
 References: 2ba9ce2f-c299-4ca9-a36a-99065d1b3...@unipd.it
 4d6faed8.7040...@ualberta.ca
 021001cbd9bc$f0ecc940$d2c65bc0$@gmail.com
 4d6fcab6.3090...@ualberta.ca 4d6fcbff.2010...@ualberta.ca
 73e543de77290c409c9bed6fa4ca34bb0173a...@md1ev002.medimmune.com



 Vaheh,

 The problem was with Rmerg. As you can see at I/sigma=2.84, the Rmerge
 (R-factor) was 143%. I am asking this question because B. Rupp wrote
 However, there is a simple relation between I/sigI and R-merge
 (provided no other indecency has been done to the data). It simply is
 (BMC) Rm=0.8/I/sigI.
 Maybe my data are indecent? This is the whole LP file.

 Maia

 MMC741_scale-2.25.LP


** 


XSCALE (VERSION  December 6, 2007)  28-Aug-2009
** 



 Author: Wolfgang Kabsch
 Copy licensed until (unlimited) to
  Canadian Light Source, Saskatoon, Canada.
 No redistribution.


inline: Cell1.gifinline: Distance1.gif

Re: [ccp4bb] [Fwd: Re: [ccp4bb] I/sigmaI of 3.0 rule]

2011-03-04 Thread Kay Diederichs

Am 04.03.2011 11:11, schrieb Kay Diederichs:


There is nothing wrong with R_meas of 147.1% since, as others have said,
R_meas is not limited to 59% (or similar) as a refinement R-factor is.
Rather, R_meas is computed from a formula that has a denominator which
in the asymptotic limit (noise) approaches zero - because there will be
(almost) as many negative observations as positive ones! (The numerator
however does not go to zero)



upon second thought, this explanation is wrong since the absolute value 
is taken in the formula for the denominator.


A better explanation is: in the noise limit the numerator is (apart 
from a factor1 which is why R_meas is  R_sym) a sum over absolute 
values of differences of random numbers. The denominator is a sum over 
absolute values of random numbers. If the random values are drawn from a 
Gaussian distribution then the numerator contributions are bigger by 
square-root-of-two than the denominator contributions. Thus, R_meas can 
be 150-200% .


Kay


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-04 Thread Roberto Battistutta
Dear Phil,
I completely agree with you, your words seem to me the best
philosophical outcome of the discussion and indicate the right
perspective to tackle this topic. In particular you write In the end, the
important question as ever is does the experimental data support the
conclusions drawn from it? and that will depend on local information
about particular atoms and groups, not on global indicators. Exactly, in
my case, all the discussion of the structures was absolutely independent
from having 1.9, 2.0 or 2.1 A nominal resolution, or to cut at 1.5 or 2.0
or 3.0 I/sigma. This makes the unjustified (as this two-day discussion has
clearly pointed out) technical critics of the reviewer even more
upsetting.
Ciao,
Roberto


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Eleanor Dodson

No - and I dont think it is accepted practice now  either..

I often use I/SigI  1.5 for refinement..

Look at your Rfactor plots from REFMAC - if they look reasonable at 
higher resolution use the data

Eleanor



On 03/03/2011 11:29 AM, Roberto Battistutta wrote:

Dear all,
I got a reviewer comment that indicate the need to refine the structures at an appropriate 
resolution (I/sigmaI of3.0), and re-submit the revised coordinate files to the PDB for 
validation.. In the manuscript I present some crystal structures determined by molecular 
replacement using the same protein in a different space group as search model. Does anyone know the 
origin or the theoretical basis of this I/sigmaI3.0 rule for an appropriate resolution?
Thanks,
Bye,
Roberto.


Roberto Battistutta
Associate Professor
Department of Chemistry
University of Padua
via Marzolo 1, 35131 Padova - ITALY
tel. +39.049.8275265/67
fax. +39.049.8275239
roberto.battistu...@unipd.it
www.chimica.unipd.it/roberto.battistutta/
VIMM (Venetian Institute of Molecular Medicine)
via Orus 2, 35129 Padova - ITALY
tel. +39.049.7923236
fax +39.049.7923250
www.vimm.it


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Mischa Machius
Roberto,

The reviewer's request is complete nonsense. The problem is how to best and 
politely respond so as not to prevent the paper from being accepted. Best would 
be to have educated editors who could simply tell you to ignore that request.

Since this issue comes up quite often still, I think we all should come up with 
a canned response to such a request.

One way to approach this issue is to avoid saying something like the structure 
has been refined to 2.2Å resolution, but instead say has been refined using 
data to a resolution of 2.2Å., or even has been refined using data with an 
I/sigmaI  1.5 (or whatever). Next could be to point out that even data with 
an I/sigmaI of 1 can contain information (I actually don't have a good 
reference for this, but I'm sure someone else can provide one'), and inclusion 
of such data can improve refinement stability and speed of convergence (not 
really important in a scientific sense, though).

The point is that all of your data combined result in a structure with a 
certain resolution, pretty much no matter what high-resolution limits you 
choose (I/sigmaI of 0.5, 1.0, or 1.5). As long as you don't portrait your 
structure of having a resolution corresponding to the resolution of the 
high-resolution limit of your data, you should be fine.

Now, requesting to toss out data with I/sigmaI of 3 simply reduces the 
resolution of your structure. You could calculate two electron density maps and 
show that your structure does indeed improve when including data with /sigmaI 
of 3. One criterion could be to use the optical resolution of the structure.

Hope that helps.

Best,
MM

On Mar 3, 2011, at 6:29 AM, Roberto Battistutta wrote:

 Dear all,
 I got a reviewer comment that indicate the need to refine the structures at 
 an appropriate resolution (I/sigmaI of 3.0), and re-submit the revised 
 coordinate files to the PDB for validation.. In the manuscript I present 
 some crystal structures determined by molecular replacement using the same 
 protein in a different space group as search model. Does anyone know the 
 origin or the theoretical basis of this I/sigmaI 3.0 rule for an 
 appropriate resolution?
 Thanks,
 Bye,
 Roberto.
 
 
 Roberto Battistutta
 Associate Professor
 Department of Chemistry
 University of Padua
 via Marzolo 1, 35131 Padova - ITALY
 tel. +39.049.8275265/67
 fax. +39.049.8275239
 roberto.battistu...@unipd.it
 www.chimica.unipd.it/roberto.battistutta/
 VIMM (Venetian Institute of Molecular Medicine)
 via Orus 2, 35129 Padova - ITALY
 tel. +39.049.7923236
 fax +39.049.7923250
 www.vimm.it


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread John R Helliwell
Dear Roberto,
As indicated by others in reply to you the current best practice in
protein crystallography is not a rigid application of such a cut off
criterion. This is because there is such a diverse range of crystal
qualities. However in chemical crystallography where the data quality
from such crystals is more homogeneous such a rule is more often
required notably as a guard against 'fast and loose' data collection
which may occur (to achieve a very high throughput).

As an Editor myself, whilst usually allowing the authors' chosen
resolution cut off, I will insist on the data table saying in a
footnote the diffraction resolution where I/sig(I) crosses 2.0 and/or,
if relevant, where DeltaAnom/sig(DeltaAnom) crosses 1.0.

A remaining possible contentious point with a submitting author is
where the title of a paper may claim a diffraction resolution that in
fact cannot really be substantiated.

Best wishes,
Yours sincerely,
John



On Thu, Mar 3, 2011 at 11:29 AM, Roberto Battistutta
roberto.battistu...@unipd.it wrote:
 Dear all,
 I got a reviewer comment that indicate the need to refine the structures at 
 an appropriate resolution (I/sigmaI of 3.0), and re-submit the revised 
 coordinate files to the PDB for validation.. In the manuscript I present 
 some crystal structures determined by molecular replacement using the same 
 protein in a different space group as search model. Does anyone know the 
 origin or the theoretical basis of this I/sigmaI 3.0 rule for an 
 appropriate resolution?
 Thanks,
 Bye,
 Roberto.


 Roberto Battistutta
 Associate Professor
 Department of Chemistry
 University of Padua
 via Marzolo 1, 35131 Padova - ITALY
 tel. +39.049.8275265/67
 fax. +39.049.8275239
 roberto.battistu...@unipd.it
 www.chimica.unipd.it/roberto.battistutta/
 VIMM (Venetian Institute of Molecular Medicine)
 via Orus 2, 35129 Padova - ITALY
 tel. +39.049.7923236
 fax +39.049.7923250
 www.vimm.it




-- 
Professor John R Helliwell DSc


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Ed Pozharski
On Thu, 2011-03-03 at 12:29 +0100, Roberto Battistutta wrote:
 Does anyone know the origin or the theoretical basis of this I/sigmaI
 3.0 rule for an appropriate resolution?

There is none.  Did editor ask you to follow this suggestion?  I
wonder if there is anyone among the subscribers of this bb who would
come forward and support this I/sigmaI 3.0 claim.

What was your I/sigma, by the way?  I almost always collect data to
I/sigma=1, which has the downside of generating somewhat higher
R-values.  Shall I, according to this reviewer, retract/amend every
single one of them?  What a mess.

Cheers,

Ed.

-- 
I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Van Den Berg, Bert
There seem to be quite a few rule followers out there regarding resolution 
cutoffs. One that I have encountered several times is reviewers objecting to 
high Rsym values (say 60-80% in the last shell), which may be even worse than 
using some fixed value of I/sigI.


On 3/3/11 9:55 AM, Ed Pozharski epozh...@umaryland.edu wrote:

On Thu, 2011-03-03 at 12:29 +0100, Roberto Battistutta wrote:
 Does anyone know the origin or the theoretical basis of this I/sigmaI
 3.0 rule for an appropriate resolution?

There is none.  Did editor ask you to follow this suggestion?  I
wonder if there is anyone among the subscribers of this bb who would
come forward and support this I/sigmaI 3.0 claim.

What was your I/sigma, by the way?  I almost always collect data to
I/sigma=1, which has the downside of generating somewhat higher
R-values.  Shall I, according to this reviewer, retract/amend every
single one of them?  What a mess.

Cheers,

Ed.

--
I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs




Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Vellieux Frederic
For myself, I decide on the high resolution cutoff by looking at the 
Rsym vs resolution curve. The curve rises, and for all data sets I have 
processed (so far) there is a break in the curve and the curve shoots 
up. To near vertical. This inflexion point is where I decide to place 
the high resolution cutoff, I never look at the I/sigma(I) values nor at 
the Rsym in the high resolution shell.


As a reviewer, when I have to evaluate a manuscript where very high Rsym 
values are quoted, I have no way of knowing how the high resolution 
cutoff was set. So I simply suggest to the authors to double check this 
cutoff, in order to ensure that the high resolution limit really 
corresponds to high resolution data and not to noise. But I certainly do 
not make statements such as this one.


I have seen cases where, using this rule to decide on the high 
resolution limit, the Rsym in the high resolution bin is well below 50% 
and cases where it is much higher. Like 65%, 70% (0.65, 0.7 if you 
prefer). So, in my opinion, there is no fixed rule as to what the 
acceptable Rsym value in the highest resolution shell should be.


Fred.

Van Den Berg, Bert wrote:
There seem to be quite a few “rule” followers out there regarding 
resolution cutoffs. One that I have encountered several times is 
reviewers objecting to high Rsym values (say 60-80% in the last 
shell), which may be even worse than using some fixed value of I/sigI.



On 3/3/11 9:55 AM, Ed Pozharski epozh...@umaryland.edu wrote:

On Thu, 2011-03-03 at 12:29 +0100, Roberto Battistutta wrote:
 Does anyone know the origin or the theoretical basis of this
I/sigmaI
 3.0 rule for an appropriate resolution?

There is none. Did editor ask you to follow this suggestion? I
wonder if there is anyone among the subscribers of this bb who would
come forward and support this I/sigmaI 3.0 claim.

What was your I/sigma, by the way? I almost always collect data to
I/sigma=1, which has the downside of generating somewhat higher
R-values. Shall I, according to this reviewer, retract/amend every
single one of them? What a mess.

Cheers,

Ed.

--
I'd jump in myself, if I weren't so good at whistling.
Julian, King of Lemurs




Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Jim Pflugrath
As mentioned there is no I/sigmaI rule.  Also you need to specify (and
correctly calculate) I/sigmaI and not I/sigmaI.

A review of similar articles in the same journal will show what is typical
for the journal.  I think you will find that the I/sigmaI cutoff varies.
This information can be used in your response to the reviewer as in, A
review of actual published articles in the Journal shows that 75% (60 out of
80) used an I/sigmaI cutoff of 2 for the resolution of the diffraction
data used in refinement.  We respectfully believe that our cutoff of 2
should be acceptable. 

-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of
Roberto Battistutta
Sent: Thursday, March 03, 2011 5:30 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] I/sigmaI of 3.0 rule

Dear all,
I got a reviewer comment that indicate the need to refine the structures at
an appropriate resolution (I/sigmaI of 3.0), and re-submit the revised
coordinate files to the PDB for validation.. In the manuscript I present
some crystal structures determined by molecular replacement using the same
protein in a different space group as search model. Does anyone know the
origin or the theoretical basis of this I/sigmaI 3.0 rule for an
appropriate resolution?
Thanks,
Bye,
Roberto.


Roberto Battistutta
Associate Professor
Department of Chemistry
University of Padua
via Marzolo 1, 35131 Padova - ITALY
tel. +39.049.8275265/67
fax. +39.049.8275239
roberto.battistu...@unipd.it
www.chimica.unipd.it/roberto.battistutta/
VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova -
ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Phil Evans
My preferred criterion is the half-dataset correlation coefficient output by 
Scala (an idea stolen from the EM guys): I tend to cut my data where this falls 
to not less than 0.5.

The good thing about this is that it is independent of the vagaries of 
I/sigma (or rather of the SD estimation) and has a more intuitive cutoff 
point than Rmeas (let alone Rmerge). It probably doesn't work well at low 
multiplicity and there is always a problem with anisotropy (I intend to do 
anisotropic analysis in future)

That said, the exact resolution cut-off is not really important: if you refine 
 look at maps at say 2.6A vs. 2.5A (if that's around the potential cutoff), 
there is probably little significant difference

Phil

On 3 Mar 2011, at 15:34, Jim Pflugrath wrote:

 As mentioned there is no I/sigmaI rule.  Also you need to specify (and
 correctly calculate) I/sigmaI and not I/sigmaI.
 
 A review of similar articles in the same journal will show what is typical
 for the journal.  I think you will find that the I/sigmaI cutoff varies.
 This information can be used in your response to the reviewer as in, A
 review of actual published articles in the Journal shows that 75% (60 out of
 80) used an I/sigmaI cutoff of 2 for the resolution of the diffraction
 data used in refinement.  We respectfully believe that our cutoff of 2
 should be acceptable. 
 
 -Original Message-
 From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of
 Roberto Battistutta
 Sent: Thursday, March 03, 2011 5:30 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: [ccp4bb] I/sigmaI of 3.0 rule
 
 Dear all,
 I got a reviewer comment that indicate the need to refine the structures at
 an appropriate resolution (I/sigmaI of 3.0), and re-submit the revised
 coordinate files to the PDB for validation.. In the manuscript I present
 some crystal structures determined by molecular replacement using the same
 protein in a different space group as search model. Does anyone know the
 origin or the theoretical basis of this I/sigmaI 3.0 rule for an
 appropriate resolution?
 Thanks,
 Bye,
 Roberto.
 
 
 Roberto Battistutta
 Associate Professor
 Department of Chemistry
 University of Padua
 via Marzolo 1, 35131 Padova - ITALY
 tel. +39.049.8275265/67
 fax. +39.049.8275239
 roberto.battistu...@unipd.it
 www.chimica.unipd.it/roberto.battistutta/
 VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 Padova -
 ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Bernhard Rupp (Hofkristallrat a.D.)
I think this suppression of high resolution shells via I/sigI cutoffs is
partially attributable to a conceptual misunderstanding of what these (darn)
R-values mean in refinement versus data merging. 

In refinement, even a random atom structure follows the Wilson distribution,
and therefore, even a completely wrong non-centrosymmetric structure will
not  - given proper scaling - give an Rf of more than 59%. 

There is no such limit for the basic linear merging R. However, there is a
simple relation between I/sigI and R-merge (provided no other indecency
has been done to the data). It simply is (BMC) Rm=0.8/I/sigI. I.e. for
I/sigI -0.8 you get 100%, for 2 we obtain 40%, which, interpreted as Rf
would be dreadful, but for I/sigI 3, we get Rm=0.27, and that looks
acceptable for an Rf (or uninformed reviewer).  

Btw, I also wish to point out that the I/sig cutoffs are not exactly the
cutoff criterion for anomalous phasing, a more direct measure is a signal
cutoff such as delF/sig(delF); George I believe uses 1.3 for SAD.
Interestingly, in almost all structures I played with, delF/sig(delF) for
both, noise in anomalous data or no anomalous scatterer present, the
anomalous signal was 0.8. I haven’t figured out yet or proved the statistics
and whether this is generally true or just numerology...

And, the usual biased rant - irrespective of Hamilton tests, nobody really
needs these popular unweighted linear residuals which shall not be named,
particularly on F. They only cause trouble.  

Best regards, BR
-
Bernhard Rupp
001 (925) 209-7429
+43 (676) 571-0536
b...@ruppweb.org
hofkristall...@gmail.com
http://www.ruppweb.org/
-
Structural Biology is the practice of
crystallography without a license.
-

-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Bart
Hazes
Sent: Thursday, March 03, 2011 7:08 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule

There seems to be an epidemic of papers with I/Sigma  3 (sometime much
larger). In fact such cases have become so frequent that I fear some people
start to believe that this is the proper procedure. I don't know where that
has come from as the I/Sigma ~ 2 criterion has been established long ago and
many consider that even a tad conservative. It simply pains me to see people
going to the most advanced synchrotrons to boost their highest resolution
data and then simply throw away much of it.

I don't know what has caused this wave of high I/Sigma threshold use but
here are some ideas

- High I/Sigma cutoffs are normal for (S/M)AD data sets where a more strict
focus on data quality is needed.
Perhaps some people have started to think this is the norm.

- For some dataset Rsym goes up strongly while I/SigI is still reasonable. I
personally believe this is due to radiation damage which affects Rsym (which
compares reflections taken after different amounts of exposure) much more
than I/SigI which is based on individual reflections. A good test would be
to see if processing only the first half of the dataset improves Rsym (or
better Rrim)

- Most detectors are square and if the detector is too far from the crystal
then the highest resolution data falls beyond the edges of the detector. In
this case one could, and should, still process data into the corners of the
detector. Data completeness at higher resolution may suffer but each
additional reflection still represents an extra restraint in refinement and
a Fourier term in the map. Due to crystal symmetry the effect on
completeness may even be less than expected.

Bart


On 11-03-03 04:29 AM, Roberto Battistutta wrote:
 Dear all,
 I got a reviewer comment that indicate the need to refine the structures
at an appropriate resolution (I/sigmaI of3.0), and re-submit the revised
coordinate files to the PDB for validation.. In the manuscript I present
some crystal structures determined by molecular replacement using the same
protein in a different space group as search model. Does anyone know the
origin or the theoretical basis of this I/sigmaI3.0 rule for an
appropriate resolution?
 Thanks,
 Bye,
 Roberto.


 Roberto Battistutta
 Associate Professor
 Department of Chemistry
 University of Padua
 via Marzolo 1, 35131 Padova - ITALY
 tel. +39.049.8275265/67
 fax. +39.049.8275239
 roberto.battistu...@unipd.it
 www.chimica.unipd.it/roberto.battistutta/
 VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 
 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it


-- 



Bart Hazes (Associate Professor)
Dept. of Medical Microbiology  Immunology University of Alberta
1-15 Medical Sciences Building
Edmonton, Alberta
Canada, T6G 2H7
phone:  1-780-492-0042
fax:1-780

Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Ed Pozharski
On Thu, 2011-03-03 at 16:02 +0100, Vellieux Frederic wrote:
 For myself, I decide on the high resolution cutoff by looking at the 
 Rsym vs resolution curve. The curve rises, and for all data sets I
 have 
 processed (so far) there is a break in the curve and the curve shoots 
 up. To near vertical. This inflexion point is where I decide to
 place 
 the high resolution cutoff, I never look at the I/sigma(I) values nor
 at 
 the Rsym in the high resolution shell.
 

Fred,

while your procedure is definitely more sophisticated than what I do,
let me point out that the Rsym is genuinely a bad measure for this, as
it depends strongly on redundancy.  Does more robust measures (e.g.
Rpim) show similar inflexion?  I suspect it will at least shift
towards higher resolution.

Cheers,

Ed.

-- 
I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Ed Pozharski
On Thu, 2011-03-03 at 08:08 -0700, Bart Hazes wrote:
 I don't know what has caused this wave of high I/Sigma threshold use
 but 
 here are some ideas
 

It may also be related to what I feel is recent revival of the
significance of the R-values in general.  Lower resolution cutoffs in
this context improve the R-values, which is (incorrectly) perceived as
model improvement.

-- 
I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Van Den Berg, Bert
Does the position of this inflection point depend on the redundancy? Maybe it 
does not; for high-redundancy data one would simply get a much higher 
corresponding Rsym.


On 3/3/11 11:13 AM, Ed Pozharski epozh...@umaryland.edu wrote:

On Thu, 2011-03-03 at 16:02 +0100, Vellieux Frederic wrote:
 For myself, I decide on the high resolution cutoff by looking at the
 Rsym vs resolution curve. The curve rises, and for all data sets I
 have
 processed (so far) there is a break in the curve and the curve shoots
 up. To near vertical. This inflexion point is where I decide to
 place
 the high resolution cutoff, I never look at the I/sigma(I) values nor
 at
 the Rsym in the high resolution shell.


Fred,

while your procedure is definitely more sophisticated than what I do,
let me point out that the Rsym is genuinely a bad measure for this, as
it depends strongly on redundancy.  Does more robust measures (e.g.
Rpim) show similar inflexion?  I suspect it will at least shift
towards higher resolution.

Cheers,

Ed.

--
I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs




Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Ed Pozharski
On Thu, 2011-03-03 at 09:34 -0600, Jim Pflugrath wrote:
 As mentioned there is no I/sigmaI rule.  Also you need to specify (and
 correctly calculate) I/sigmaI and not I/sigmaI.
 
 A review of similar articles in the same journal will show what is
 typical
 for the journal.  I think you will find that the I/sigmaI cutoff
 varies.
 This information can be used in your response to the reviewer as in,
 A
 review of actual published articles in the Journal shows that 75% (60
 out of
 80) used an I/sigmaI cutoff of 2 for the resolution of the
 diffraction
 data used in refinement.  We respectfully believe that our cutoff of 2
 should be acceptable. 
 

Jim,

Excellent point.  Such statistics would be somewhat tedious to gather
though, does anyone know if I/sigma stats are available for the whole
PDB somewhere?

On your first point though - why is one better than the other?  My
experimental observation is while the two differ significantly at low
resolution (what matters, of course, is I/sigma itself and not the
resolution per se), at high resolution where the cutoff is chosen they
are not that different.  And since the cutoff value itself is rather
arbitrarily chosen, then why I/sigma is better than I/sigma?

Cheers,

Ed.


-- 
I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Bernhard Rupp (Hofkristallrat a.D.)
 related to what I feel is recent revival of the significance of the R-values

because it's so handy to have one single number to judge a highly complex 
nonlinear multivariate barely determined regularized problem! Just as easy as 
running a gel!

Best BR

-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Ed 
Pozharski
Sent: Thursday, March 03, 2011 8:19 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule

On Thu, 2011-03-03 at 08:08 -0700, Bart Hazes wrote:
 I don't know what has caused this wave of high I/Sigma threshold use 
 but here are some ideas
 

It may also be related to what I feel is recent revival of the significance of 
the R-values in general.  Lower resolution cutoffs in this context improve the 
R-values, which is (incorrectly) perceived as model improvement.

--
I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Vellieux Frederic

Hi,

I don't think XDS generates an Rpim value, does it? The XDS CORRECT 
strep provides the old fashioned Rsym (R-FACTOR) plus R-meas and Rmrgd-F.


The curves look all the same though

Fred.

Ed Pozharski wrote:

On Thu, 2011-03-03 at 16:02 +0100, Vellieux Frederic wrote:
  
For myself, I decide on the high resolution cutoff by looking at the 
Rsym vs resolution curve. The curve rises, and for all data sets I
have 
processed (so far) there is a break in the curve and the curve shoots 
up. To near vertical. This inflexion point is where I decide to
place 
the high resolution cutoff, I never look at the I/sigma(I) values nor
at 
the Rsym in the high resolution shell.





Fred,

while your procedure is definitely more sophisticated than what I do,
let me point out that the Rsym is genuinely a bad measure for this, as
it depends strongly on redundancy.  Does more robust measures (e.g.
Rpim) show similar inflexion?  I suspect it will at least shift
towards higher resolution.

Cheers,

Ed.

  


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Ronald E Stenkamp

Discussions of I/sigma(I) or less-than cutoffs have been going on for at least 
35 years.  For example, see Acta Cryst. (1975) B31, 1507-1509.  I was taught by 
my elders (mainly Lyle Jensen) that less-than cutoffs came into use when 
diffractometers replaced film methods for small molecule work, i.e., 1960s.  To 
compare new and old structures, they needed some criterion for the electronic 
measurements that would correspond to the fog level on their films.  People 
settled on 2 sigma cutoffs (on I which mean 4 sigma on F), but subsequently, 
the cutoffs got higher and higher, as people realized they could get lower and 
lower R values by throwing away the weak reflections.  I'm unaware of any 
statistical justification for any cutoff.  The approach I like the most is to 
refine on Fsquared and use every reflection.  Error estimates and weighting 
schemes should take care of the noise.

Ron

On Thu, 3 Mar 2011, Ed Pozharski wrote:


On Thu, 2011-03-03 at 09:34 -0600, Jim Pflugrath wrote:

As mentioned there is no I/sigmaI rule.  Also you need to specify (and
correctly calculate) I/sigmaI and not I/sigmaI.

A review of similar articles in the same journal will show what is
typical
for the journal.  I think you will find that the I/sigmaI cutoff
varies.
This information can be used in your response to the reviewer as in,
A
review of actual published articles in the Journal shows that 75% (60
out of
80) used an I/sigmaI cutoff of 2 for the resolution of the
diffraction
data used in refinement.  We respectfully believe that our cutoff of 2
should be acceptable.



Jim,

Excellent point.  Such statistics would be somewhat tedious to gather
though, does anyone know if I/sigma stats are available for the whole
PDB somewhere?

On your first point though - why is one better than the other?  My
experimental observation is while the two differ significantly at low
resolution (what matters, of course, is I/sigma itself and not the
resolution per se), at high resolution where the cutoff is chosen they
are not that different.  And since the cutoff value itself is rather
arbitrarily chosen, then why I/sigma is better than I/sigma?

Cheers,

Ed.


--
I'd jump in myself, if I weren't so good at whistling.
  Julian, King of Lemurs



Re: [ccp4bb] I/sigmaI of 3.0 rule- do not underestimate gels

2011-03-03 Thread Felix Frolow
Well BR, do not underestimate complexity of running a gel! There are even more 
harsh referees comments on gel appearance and quality 
than comments on cutting data based on R,RF and sigmaI :-)
 Especially when one is trying to penetrate into prestigious journals...
Dr Felix Frolow   
Professor of Structural Biology and Biotechnology
Department of Molecular Microbiology
and Biotechnology
Tel Aviv University 69978, Israel

Acta Crystallographica F, co-editor

e-mail: mbfro...@post.tau.ac.il
Tel:  ++972-3640-8723
Fax: ++972-3640-9407
Cellular: 0547 459 608

On Mar 3, 2011, at 18:38 , Bernhard Rupp (Hofkristallrat a.D.) wrote:

 related to what I feel is recent revival of the significance of the R-values
 
 because it's so handy to have one single number to judge a highly complex 
 nonlinear multivariate barely determined regularized problem! Just as easy as 
 running a gel!
 
 Best BR
 
 -Original Message-
 From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Ed 
 Pozharski
 Sent: Thursday, March 03, 2011 8:19 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule
 
 On Thu, 2011-03-03 at 08:08 -0700, Bart Hazes wrote:
 I don't know what has caused this wave of high I/Sigma threshold use 
 but here are some ideas
 
 
 It may also be related to what I feel is recent revival of the significance 
 of the R-values in general.  Lower resolution cutoffs in this context improve 
 the R-values, which is (incorrectly) perceived as model improvement.
 
 --
 I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Maia Cherney

Dear Bernhard

I am wondering where I should cut my data off. Here is the statistics 
from XDS processing.


Maia

SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE = -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION NUMBER OF REFLECTIONS COMPLET R-FACTOR R-FACTOR COMPARED 
I/SIGMA R-meas Rmrgd-F Anomal SigAno Nano

LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr

10.06 5509 304 364 83.5% 3.0% 4.4% 5509 63.83 3.1% 1.0% 11% 0.652 173
7.12 11785 595 595 100.0% 3.5% 4.8% 11785 59.14 3.6% 1.4% -10% 0.696 414
5.81 15168 736 736 100.0% 5.0% 5.6% 15168 51.88 5.1% 1.8% -9% 0.692 561
5.03 17803 854 854 100.0% 5.5% 5.7% 17803 50.02 5.6% 2.2% -10% 0.738 675
4.50 20258 964 964 100.0% 5.1% 5.4% 20258 52.61 5.3% 2.1% -16% 0.710 782
4.11 22333 1054 1054 100.0% 5.6% 5.7% 22333 50.89 5.8% 2.0% -16% 0.705 878
3.80 23312 1137 1137 100.0% 7.0% 6.6% 23312 42.95 7.1% 3.0% -13% 0.770 952
3.56 25374 1207 1208 99.9% 7.6% 7.3% 25374 40.56 7.8% 3.4% -18% 0.739 1033
3.35 27033 1291 1293 99.8% 9.7% 9.2% 27033 33.73 10.0% 4.1% -12% 0.765 1107
3.18 29488 1353 1353 100.0% 11.6% 11.6% 29488 28.16 11.9% 4.4% -7% 0.750 
1176
3.03 31054 1419 1419 100.0% 15.7% 15.9% 31054 21.77 16.0% 6.9% -9% 0.741 
1243
2.90 32288 1478 1478 100.0% 21.1% 21.6% 32288 16.99 21.6% 9.2% -6% 0.745 
1296
2.79 33807 1542 1542 100.0% 28.1% 28.8% 33807 13.07 28.8% 12.9% -2% 
0.783 1361
2.69 34983 1604 1604 100.0% 37.4% 38.7% 34983 9.95 38.3% 17.2% -2% 0.743 
1422
2.60 35163 1653 1653 100.0% 48.8% 48.0% 35163 8.03 50.0% 21.9% -6% 0.754 
1475

2.52 36690 1699 1699 100.0% 54.0% 56.0% 36690 6.98 55.3% 25.9% 0% 0.745 1517
2.44 37751 1757 1757 100.0% 67.9% 70.4% 37751 5.61 69.5% 32.5% -5% 0.733 
1577

2.37 38484 1798 1799 99.9% 82.2% 84.5% 38484 4.72 84.2% 36.5% 2% 0.753 1620
2.31 39098 1842 1842 100.0% 91.4% 94.3% 39098 4.19 93.7% 43.7% -3% 0.744 
1661
2.25 38809 1873 1923 97.4% 143.4% 139.3% 38809 2.84 147.1% 69.8% -2% 
0.693 1696


total 556190 26160 26274 99.6% 11.9% 12.2% 556190 21.71 12.2% 9.7% -5% 
0.739 22619




Bernhard Rupp (Hofkristallrat a.D.) wrote:

I think this suppression of high resolution shells via I/sigI cutoffs is
partially attributable to a conceptual misunderstanding of what these (darn)
R-values mean in refinement versus data merging. 


In refinement, even a random atom structure follows the Wilson distribution,
and therefore, even a completely wrong non-centrosymmetric structure will
not  - given proper scaling - give an Rf of more than 59%. 


There is no such limit for the basic linear merging R. However, there is a
simple relation between I/sigI and R-merge (provided no other indecency
has been done to the data). It simply is (BMC) Rm=0.8/I/sigI. I.e. for
I/sigI -0.8 you get 100%, for 2 we obtain 40%, which, interpreted as Rf
would be dreadful, but for I/sigI 3, we get Rm=0.27, and that looks
acceptable for an Rf (or uninformed reviewer).  


Btw, I also wish to point out that the I/sig cutoffs are not exactly the
cutoff criterion for anomalous phasing, a more direct measure is a signal
cutoff such as delF/sig(delF); George I believe uses 1.3 for SAD.
Interestingly, in almost all structures I played with, delF/sig(delF) for
both, noise in anomalous data or no anomalous scatterer present, the
anomalous signal was 0.8. I haven’t figured out yet or proved the statistics
and whether this is generally true or just numerology...

And, the usual biased rant - irrespective of Hamilton tests, nobody really
needs these popular unweighted linear residuals which shall not be named,
particularly on F. They only cause trouble.  


Best regards, BR
-
Bernhard Rupp
001 (925) 209-7429
+43 (676) 571-0536
b...@ruppweb.org
hofkristall...@gmail.com
http://www.ruppweb.org/
-

Structural Biology is the practice of
crystallography without a license.
-

-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Bart
Hazes
Sent: Thursday, March 03, 2011 7:08 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule

There seems to be an epidemic of papers with I/Sigma  3 (sometime much
larger). In fact such cases have become so frequent that I fear some people
start to believe that this is the proper procedure. I don't know where that
has come from as the I/Sigma ~ 2 criterion has been established long ago and
many consider that even a tad conservative. It simply pains me to see people
going to the most advanced synchrotrons to boost their highest resolution
data and then simply throw away much of it.

I don't know what has caused this wave of high I/Sigma threshold use but
here are some ideas

- High I/Sigma cutoffs are normal for (S/M)AD data sets where a more strict
focus on data quality is needed.
Perhaps some people have started to think this is the norm.

- For some dataset Rsym goes up strongly while I/SigI

Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Simon Phillips

I take the point about a tendency in those days to apply sigma cutoffs to get 
lower R values, which were erroneously expected to indicate better structures.  
I wonder how many of us remember this paper by Arnberg et al (1979) Acta Cryst 
A35, 497-499, where it is shown for (small molecule) structures that had been 
refined with only reflections I3*sigma(I) that the models were degraded by 
leaving out weak data (although the R factors looked better of course).

Arnberg et al took published structures and showed the refined models got 
better when the weak data were included.  The best bit, I think, was when they 
went on to demonstrate successful refinement of a structure using ONLY the weak 
data where I3*sigma(I) and ignoring all the strong ones.  This shows, as was 
alluded to earlier in the discussion, that a weak reflection puts a powerful 
constraint on a refinement, especially if there are other stronger reflections 
in the same resolution range.

---
| Simon E.V. Phillips |
---
| Director, Research Complex at Harwell (RCaH)|
| Rutherford Appleton Laboratory  |
| Harwell Science and Innovation Campus   |
| Didcot  |
| Oxon OX11 0FA   |
| United Kingdom  |
| Email: simon.phill...@rc-harwell.ac.uk  |
| Tel:   +44 (0)1235 567701   |
|+44 (0)1235 567700 (sec) |
|+44 (0)7884 436011 (mobile)  |
| www.rc-harwell.ac.uk|
---
| Astbury Centre for Structural Molecular Biology |
| Institute of Molecular and Cellular Biology |
| University of LEEDS |
| LEEDS LS2 9JT   |
| United Kingdom  |
| Email: s.e.v.phill...@leeds.ac.uk   |
| Tel:   +44 (0)113 343 3027  |
| WWW:   http://www.astbury.leeds.ac.uk/People/staffpage.php?StaffID=SEVP |
---


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Maia Cherney

I have to resend my statistics.

Maia Cherney wrote:

Dear Bernhard

I am wondering where I should cut my data off. Here is the statistics 
from XDS processing.


Maia





On 11-03-03 04:29 AM, Roberto Battistutta wrote:
 

Dear all,
I got a reviewer comment that indicate the need to refine the 
structures

at an appropriate resolution (I/sigmaI of3.0), and re-submit the 
revised
coordinate files to the PDB for validation.. In the manuscript I 
present
some crystal structures determined by molecular replacement using the 
same

protein in a different space group as search model. Does anyone know the
origin or the theoretical basis of this I/sigmaI3.0 rule for an
appropriate resolution?
 

Thanks,
Bye,
Roberto.


Roberto Battistutta
Associate Professor
Department of Chemistry
University of Padua
via Marzolo 1, 35131 Padova - ITALY
tel. +39.049.8275265/67
fax. +39.049.8275239
roberto.battistu...@unipd.it
www.chimica.unipd.it/roberto.battistutta/
VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 
Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it





  




 DETECTOR_SU

 SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE = -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION NUMBER OF REFLECTIONSCOMPLETENESS R-FACTOR  R-FACTOR 
COMPARED I/SIGMA   R-meas  Rmrgd-F  Anomal  SigAno   Nano
   LIMIT OBSERVED  UNIQUE  POSSIBLE OF DATA   observed  expected
  Corr

10.065509 304   364   83.5%   3.0%  4.4% 
5509   63.83 3.1% 1.0%11%   0.652 173
 7.12   11785 595   595  100.0%   3.5%  4.8%
11785   59.14 3.6% 1.4%   -10%   0.696 414
 5.81   15168 736   736  100.0%   5.0%  5.6%
15168   51.88 5.1% 1.8%-9%   0.692 561
 5.03   17803 854   854  100.0%   5.5%  5.7%
17803   50.02 5.6% 2.2%   -10%   0.738 675
 4.50   20258 964   964  100.0%   5.1%  5.4%
20258   52.61 5.3% 2.1%   -16%   0.710 782
 4.11   223331054  1054  100.0%   5.6%  5.7%
22333   50.89 5.8% 2.0%   -16%   0.705 878
 3.80   233121137  1137  100.0%   7.0%  6.6%
23312   42.95 7.1% 3.0%   -13%   0.770 952
 3.56   253741207  1208   99.9%   7.6%  7.3%
25374   40.56 7.8% 3.4%   -18%   0.7391033
 3.35   270331291  1293   99.8%   9.7%  9.2%
27033   33.7310.0% 4.1%   -12%   0.7651107
 3.18   294881353  1353  100.0%  11.6% 11.6%
29488   28.1611.9% 4.4%-7%   0.7501176
 3.03   310541419  1419  100.0%  15.7% 15.9%
31054   21.7716.0% 6.9%-9%   0.7411243
 2.90   322881478  1478  100.0%  21.1% 21.6%
32288   16.9921.6% 9.2%-6%   0.7451296
 2.79   338071542  1542  100.0%  28.1% 28.8%
33807   13.0728.8%12.9%-2%   0.7831361
 2.69   349831604  1604  100.0%  37.4% 38.7%
349839.9538.3%17.2%-2%   0.7431422
 2.60   351631653  1653  100.0%  48.8% 48.0%
351638.0350.0%21.9%-6%   0.7541475
 2.52   366901699  1699  100.0%  54.0% 56.0%
366906.9855.3%25.9% 0%   0.7451517
 2.44   377511757  1757  100.0%  67.9% 70.4%
377515.6169.5%32.5%-5%   0.7331577
 2.37   384841798  1799   99.9%  82.2% 84.5%
384844.7284.2%36.5% 2%   0.7531620
 2.31   390981842  1842  100.0%  91.4% 94.3%
390984.1993.7%43.7%-3%   0.7441661
 2.25   388091873  1923   97.4% 143.4%139.3%
388092.84   147.1%69.8%-2%   0.6931696
total  556190   26160 26274   99.6%  11.9% 12.2%   
556190   21.7112.2% 9.7%-5%   0.739   22619



Re: [ccp4bb] I/sigmaI of 3.0 rule- do not underestimate gels

2011-03-03 Thread Bernhard Rupp (Hofkristallrat a.D.)
there are even more harsh referees comments on gel appearance and quality
than comments on cutting data based on R,RF and sigmaI :-)  Especially when
one is trying to penetrate into prestigious journals...

Ok I repent. For improving gels there is the same excellent program, also
useful for density modification - Photoshop ;-)

Best, BR

Dr Felix Frolow   
Professor of Structural Biology and Biotechnology Department of Molecular
Microbiology and Biotechnology Tel Aviv University 69978, Israel

Acta Crystallographica F, co-editor

e-mail: mbfro...@post.tau.ac.il
Tel:  ++972-3640-8723
Fax: ++972-3640-9407
Cellular: 0547 459 608

On Mar 3, 2011, at 18:38 , Bernhard Rupp (Hofkristallrat a.D.) wrote:

 related to what I feel is recent revival of the significance of the 
 R-values
 
 because it's so handy to have one single number to judge a highly
complex nonlinear multivariate barely determined regularized problem! Just
as easy as running a gel!
 
 Best BR
 
 -Original Message-
 From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of 
 Ed Pozharski
 Sent: Thursday, March 03, 2011 8:19 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule
 
 On Thu, 2011-03-03 at 08:08 -0700, Bart Hazes wrote:
 I don't know what has caused this wave of high I/Sigma threshold use 
 but here are some ideas
 
 
 It may also be related to what I feel is recent revival of the
significance of the R-values in general.  Lower resolution cutoffs in this
context improve the R-values, which is (incorrectly) perceived as model
improvement.
 
 --
 I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Van Den Berg, Bert
We should compile this discussion and send it as compulsive reading to journal 
editors...;-)

Bert


On 3/3/11 12:07 PM, Simon Phillips s.e.v.phill...@leeds.ac.uk wrote:



I take the point about a tendency in those days to apply sigma cutoffs to get 
lower R values, which were erroneously expected to indicate better structures.  
I wonder how many of us remember this paper by Arnberg et al (1979) Acta Cryst 
A35, 497-499, where it is shown for (small molecule) structures that had been 
refined with only reflections I3*sigma(I) that the models were degraded by 
leaving out weak data (although the R factors looked better of course).

Arnberg et al took published structures and showed the refined models got 
better when the weak data were included.  The best bit, I think, was when they 
went on to demonstrate successful refinement of a structure using ONLY the weak 
data where I3*sigma(I) and ignoring all the strong ones.  This shows, as was 
alluded to earlier in the discussion, that a weak reflection puts a powerful 
constraint on a refinement, especially if there are other stronger reflections 
in the same resolution range.

---
| Simon E.V. Phillips |
---
| Director, Research Complex at Harwell (RCaH)|
| Rutherford Appleton Laboratory  |
| Harwell Science and Innovation Campus   |
| Didcot  |
| Oxon OX11 0FA   |
| United Kingdom  |
| Email: simon.phill...@rc-harwell.ac.uk  |
| Tel:   +44 (0)1235 567701   |
|+44 (0)1235 567700 (sec) |
|+44 (0)7884 436011 (mobile)  |
| www.rc-harwell.ac.uk|
---
| Astbury Centre for Structural Molecular Biology |
| Institute of Molecular and Cellular Biology |
| University of LEEDS |
| LEEDS LS2 9JT   |
| United Kingdom  |
| Email: s.e.v.phill...@leeds.ac.uk   |
| Tel:   +44 (0)113 343 3027  |
| WWW:   http://www.astbury.leeds.ac.uk/People/staffpage.php?StaffID=SEVP 
http://www.astbury.leeds.ac.uk/People/staffpage.php?StaffID=SEVP  |
---



Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Jacob Keller
When will we finally jettison Rsym/Rcryst/Rmerge?

1. Perhaps software developers should either not even calculate the
number, or hide it somewhere obscure, and of course replacing it with
a better R flavor?

2. Maybe reviewers should insist on other R's (Rpim etc) instead of Rmerge?

JPK

PS is this as quixotic as chucking the QWERTY keyboard, or using
Esperanto? I don't think so!


On Thu, Mar 3, 2011 at 11:07 AM, Simon Phillips
s.e.v.phill...@leeds.ac.uk wrote:

 I take the point about a tendency in those days to apply sigma cutoffs to
 get lower R values, which were erroneously expected to indicate better
 structures.  I wonder how many of us remember this paper by Arnberg et al
 (1979) Acta Cryst A35, 497-499, where it is shown for (small molecule)
 structures that had been refined with only reflections I3*sigma(I) that the
 models were degraded by leaving out weak data (although the R factors looked
 better of course).

 Arnberg et al took published structures and showed the refined models got
 better when the weak data were included.  The best bit, I think, was when
 they went on to demonstrate successful refinement of a structure using ONLY
 the weak data where I3*sigma(I) and ignoring all the strong ones.  This
 shows, as was alluded to earlier in the discussion, that a weak reflection
 puts a powerful constraint on a refinement, especially if there are other
 stronger reflections in the same resolution range.

 ---
 | Simon E.V. Phillips |
 ---
 | Director, Research Complex at Harwell (RCaH)    |
 | Rutherford Appleton Laboratory  |
 | Harwell Science and Innovation Campus   |
 | Didcot  |
 | Oxon OX11 0FA   |
 | United Kingdom  |
 | Email: simon.phill...@rc-harwell.ac.uk  |
 | Tel:   +44 (0)1235 567701   |
 |    +44 (0)1235 567700 (sec) |
 |    +44 (0)7884 436011 (mobile)  |
 | www.rc-harwell.ac.uk    |
 ---
 | Astbury Centre for Structural Molecular Biology |
 | Institute of Molecular and Cellular Biology |
 | University of LEEDS |
 | LEEDS LS2 9JT   |
 | United Kingdom  |
 | Email: s.e.v.phill...@leeds.ac.uk   |
 | Tel:   +44 (0)113 343 3027  |
 | WWW:   http://www.astbury.leeds.ac.uk/People/staffpage.php?StaffID=SEVP |
 ---



-- 
***
Jacob Pearson Keller
Northwestern University
Medical Scientist Training Program
cel: 773.608.9185
email: j-kell...@northwestern.edu
***


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Bernhard Rupp (Hofkristallrat a.D.)
First of all I would ask a XDS expert for that because I don't know exactly
what stats the XDS program reports (shame on me, ok) nor what the quality of
your error model is, or what you want to use the data for (I guess
refinement - see Eleanor's response for that, and use all data).

There is one point I'd like to make re cutoff: If one gets greedy and
collects too much noise in high resolution shells (like way below I/sigI =
0.8 or so) the scaling/integration may suffer from an overabundance of
nonsense data, and here I believe it makes sense to select a higher cutoff
(like what exactly?) and reprocess the data. Maybe one of our data
collection specialist should comment on that.

BR

-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Maia
Cherney
Sent: Thursday, March 03, 2011 9:13 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule

I have to resend my statistics.

Maia Cherney wrote:
 Dear Bernhard

 I am wondering where I should cut my data off. Here is the statistics 
 from XDS processing.

 Maia




 On 11-03-03 04:29 AM, Roberto Battistutta wrote:
  
 Dear all,
 I got a reviewer comment that indicate the need to refine the 
 structures
 
 at an appropriate resolution (I/sigmaI of3.0), and re-submit the 
 revised coordinate files to the PDB for validation.. In the 
 manuscript I present some crystal structures determined by molecular 
 replacement using the same protein in a different space group as 
 search model. Does anyone know the origin or the theoretical basis of 
 this I/sigmaI3.0 rule for an appropriate resolution?
  
 Thanks,
 Bye,
 Roberto.


 Roberto Battistutta
 Associate Professor
 Department of Chemistry
 University of Padua
 via Marzolo 1, 35131 Padova - ITALY
 tel. +39.049.8275265/67
 fax. +39.049.8275239
 roberto.battistu...@unipd.it
 www.chimica.unipd.it/roberto.battistutta/
 VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 
 Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it

 

   




[ccp4bb] [Fwd: Re: [ccp4bb] I/sigmaI of 3.0 rule]

2011-03-03 Thread Maia Cherney



 Original Message 
Subject:Re: [ccp4bb] I/sigmaI of 3.0 rule
Date:   Thu, 03 Mar 2011 10:43:23 -0700
From:   Maia Cherney ch...@ualberta.ca
To: Oganesyan, Vaheh oganesy...@medimmune.com
References: 	2ba9ce2f-c299-4ca9-a36a-99065d1b3...@unipd.it 
4d6faed8.7040...@ualberta.ca 
021001cbd9bc$f0ecc940$d2c65bc0$@gmail.com 
4d6fcab6.3090...@ualberta.ca 4d6fcbff.2010...@ualberta.ca 
73e543de77290c409c9bed6fa4ca34bb0173a...@md1ev002.medimmune.com




Vaheh,

The problem was with Rmerg. As you can see at I/sigma=2.84, the Rmerge 
(R-factor) was 143%. I am asking this question because B. Rupp wrote
However, there is a simple relation between I/sigI and R-merge 
(provided no other indecency has been done to the data). It simply is 
(BMC) Rm=0.8/I/sigI.

Maybe my data are indecent? This is the whole LP file.

Maia





 **
XSCALE (VERSION  December 6, 2007)28-Aug-2009
 **

 Author: Wolfgang Kabsch
 Copy licensed until (unlimited) to
  Canadian Light Source, Saskatoon, Canada.
 No redistribution.


 **
  CONTROL CARDS
 **

 MAXIMUM_NUMBER_OF_PROCESSORS=8
 SPACE_GROUP_NUMBER=180
 UNIT_CELL_CONSTANTS= 150.1 150.1  81.8  90.0  90.0 120.0  
 OUTPUT_FILE=XSCALE.HKL
 FRIEDEL'S_LAW=TRUE
 INPUT_FILE= XDS_ASCII.HKL  XDS_ASCII  
 INCLUDE_RESOLUTION_RANGE= 40  2.25

 THE DATA COLLECTION STATISTICS REPORTED BELOW ASSUMES:
 SPACE_GROUP_NUMBER=  180
 UNIT_CELL_CONSTANTS=   150.10   150.1081.80  90.000  90.000 120.000

 * 12 EQUIVALENT POSITIONS IN SPACE GROUP #180 *

If x',y',z' is an equivalent position to x,y,z, then
x'=x*ML(1)+y*ML( 2)+z*ML( 3)+ML( 4)/12.0
y'=x*ML(5)+y*ML( 6)+z*ML( 7)+ML( 8)/12.0
z'=x*ML(9)+y*ML(10)+z*ML(11)+ML(12)/12.0

#1  2  3  45  6  7  89 10 11 12 
11  0  0  00  1  0  00  0  1  0
20 -1  0  01 -1  0  00  0  1  8
3   -1  1  0  0   -1  0  0  00  0  1  4
4   -1  0  0  00 -1  0  00  0  1  0
50  1  0  0   -1  1  0  00  0  1  8
61 -1  0  01  0  0  00  0  1  4
70  1  0  01  0  0  00  0 -1  8
8   -1  0  0  0   -1  1  0  00  0 -1  4
91 -1  0  00 -1  0  00  0 -1  0
   100 -1  0  0   -1  0  0  00  0 -1  8
   111  0  0  01 -1  0  00  0 -1  4
   12   -1  1  0  00  1  0  00  0 -1  0
 

 ALL DATA SETS WILL BE SCALED TO XDS_ASCII.HKL  
   


 **
READING INPUT REFLECTION DATA FILES
 **


 DATAMEAN   REFLECTIONSINPUT FILE NAME
 SET# INTENSITY  ACCEPTED REJECTED
   1  0.6203E+03   557303  0  XDS_ASCII.HKL 


 **
   CORRECTION FACTORS AS FUNCTION OF IMAGE NUMBER  RESOLUTION
 **

 RECIPROCAL CORRECTION FACTORS FOR INPUT DATA SETS MERGED TO
 OUTPUT FILE: XSCALE.HKL

 THE CALCULATIONS ASSUME FRIEDEL'S_LAW= TRUE
 TOTAL NUMBER OF CORRECTION FACTORS DEFINED  720
 DEGREES OF FREEDOM OF CHI^2 FIT140494.9
 CHI^2-VALUE OF FIT OF CORRECTION FACTORS  1.037
 NUMBER OF CYCLES CARRIED OUT  3

 CORRECTION FACTORS for visual inspection with VIEW DECAY_001.pck   
 INPUT_FILE=XDS_ASCII.HKL 
 XMIN= 0.1 XMAX=   179.9 NXBIN=   36
 YMIN= 0.00257 YMAX= 0.19752 NYBIN=   20
 NUMBER OF REFLECTIONS USED FOR DETERMINING CORRECTION FACTORS 238321


 **
  CORRECTION FACTORS AS FUNCTION OF X (fast)  Y(slow) IN THE DETECTOR PLANE
 **

 RECIPROCAL CORRECTION FACTORS FOR INPUT DATA SETS MERGED TO
 OUTPUT FILE: XSCALE.HKL

 THE CALCULATIONS ASSUME FRIEDEL'S_LAW= TRUE
 TOTAL NUMBER OF CORRECTION FACTORS DEFINED 4760
 DEGREES OF FREEDOM OF CHI^2 FIT186486.8
 CHI^2-VALUE OF FIT OF CORRECTION

Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Roberto Battistutta
just to clarify that, at least in my case, my impression is that the editor was 
fair, I was referring only to the comment of one reviewer.

Roberto


Roberto Battistutta
Associate Professor
Department of Chemistry
University of Padua
via Marzolo 1, 35131 Padova - ITALY
tel. +39.049.8275265/67
fax. +39.049.8275239
roberto.battistu...@unipd.it
www.chimica.unipd.it/roberto.battistutta/
VIMM (Venetian Institute of Molecular Medicine)
via Orus 2, 35129 Padova - ITALY
tel. +39.049.7923236
fax +39.049.7923250
www.vimm.it

Il giorno 03/mar/2011, alle ore 18.16, Van Den Berg, Bert ha scritto:

 We should compile this discussion and send it as compulsive reading to 
 journal editors...;-)
 
 Bert


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Maia Cherney
I see, there is no consensus about my data. Some people say 2.4A, other 
say all. Well, I chose 2.3 A. My rule was to be a little bit below Rmerg 
100%. At 2.3A Rmerg was 98.7%
Actually, I have published my paper in JMB. Yes, reviewers did not like 
that and even made me give Rrim and Rpim etc.


Maia



Bernhard Rupp (Hofkristallrat a.D.) wrote:

First of all I would ask a XDS expert for that because I don't know exactly
what stats the XDS program reports (shame on me, ok) nor what the quality of
your error model is, or what you want to use the data for (I guess
refinement - see Eleanor's response for that, and use all data).

There is one point I'd like to make re cutoff: If one gets greedy and
collects too much noise in high resolution shells (like way below I/sigI =
0.8 or so) the scaling/integration may suffer from an overabundance of
nonsense data, and here I believe it makes sense to select a higher cutoff
(like what exactly?) and reprocess the data. Maybe one of our data
collection specialist should comment on that.

BR

-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Maia
Cherney
Sent: Thursday, March 03, 2011 9:13 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule

I have to resend my statistics.

Maia Cherney wrote:
  

Dear Bernhard

I am wondering where I should cut my data off. Here is the statistics 
from XDS processing.


Maia




On 11-03-03 04:29 AM, Roberto Battistutta wrote:
 
  

Dear all,
I got a reviewer comment that indicate the need to refine the 
structures


at an appropriate resolution (I/sigmaI of3.0), and re-submit the 
revised coordinate files to the PDB for validation.. In the 
manuscript I present some crystal structures determined by molecular 
replacement using the same protein in a different space group as 
search model. Does anyone know the origin or the theoretical basis of 
this I/sigmaI3.0 rule for an appropriate resolution?
 
  

Thanks,
Bye,
Roberto.


Roberto Battistutta
Associate Professor
Department of Chemistry
University of Padua
via Marzolo 1, 35131 Padova - ITALY
tel. +39.049.8275265/67
fax. +39.049.8275239
roberto.battistu...@unipd.it
www.chimica.unipd.it/roberto.battistutta/
VIMM (Venetian Institute of Molecular Medicine) via Orus 2, 35129 
Padova - ITALY tel. +39.049.7923236 fax +39.049.7923250 www.vimm.it




  
  




  


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Maksymilian Chruszcz
Dear All,

Relatively recent statistics on I/sigmaI and Rmerge in PDB deposits are
presented in two following publications:

1.Benefits of structural genomics for drug discovery research.
Grabowski M, Chruszcz M, Zimmerman MD, Kirillova O, Minor W.
Infect Disord Drug Targets. 2009 Nov;9(5):459-74.
PMID: 19594422

2. X-ray diffraction experiment-the last experiment in the structure
elucidation process.
Chruszcz M, Borek D, Domagalski M, Otwinowski Z, Minor W.
Adv Protein Chem Struct Biol. 2009;77:23-40
PMID: 20663480

Best regards,

Maksattachment: I_over_sigma_I.png

Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Tim Gruene
Hello Maia,

Rmerge is obsolete, so the reviewers had a good point to make you publish Rmeas
instead. Rmeas should replace Rmerge in my opinion.

The data statistics you sent show a mulltiplicity of about 20! Did you check 
your
data for radiation damage? That might explain why your Rmeas is so utterly high
while your I/sigI is still above 2 (You should not cut your data but include
more!)

What do the statistics look like if you process just about enough frames so that
you get a reasonable mulltiplicity, 3-4, say?

Cheers, Tim

On Thu, Mar 03, 2011 at 10:57:37AM -0700, Maia Cherney wrote:
 I see, there is no consensus about my data. Some people say 2.4A,
 other say all. Well, I chose 2.3 A. My rule was to be a little bit
 below Rmerg 100%. At 2.3A Rmerg was 98.7%
 Actually, I have published my paper in JMB. Yes, reviewers did not
 like that and even made me give Rrim and Rpim etc.
 
 Maia
 
 
 
 Bernhard Rupp (Hofkristallrat a.D.) wrote:
 First of all I would ask a XDS expert for that because I don't know exactly
 what stats the XDS program reports (shame on me, ok) nor what the quality of
 your error model is, or what you want to use the data for (I guess
 refinement - see Eleanor's response for that, and use all data).
 
 There is one point I'd like to make re cutoff: If one gets greedy and
 collects too much noise in high resolution shells (like way below I/sigI =
 0.8 or so) the scaling/integration may suffer from an overabundance of
 nonsense data, and here I believe it makes sense to select a higher cutoff
 (like what exactly?) and reprocess the data. Maybe one of our data
 collection specialist should comment on that.
 
 BR
 
 -Original Message-
 From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Maia
 Cherney
 Sent: Thursday, March 03, 2011 9:13 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule
 
 I have to resend my statistics.
 
 Maia Cherney wrote:
 Dear Bernhard
 
 I am wondering where I should cut my data off. Here is the
 statistics from XDS processing.
 
 Maia
 
 
 On 11-03-03 04:29 AM, Roberto Battistutta wrote:
 Dear all,
 I got a reviewer comment that indicate the need to refine
 the structures
 at an appropriate resolution (I/sigmaI of3.0), and re-submit
 the revised coordinate files to the PDB for validation.. In
 the manuscript I present some crystal structures determined by
 molecular replacement using the same protein in a different
 space group as search model. Does anyone know the origin or
 the theoretical basis of this I/sigmaI3.0 rule for an
 appropriate resolution?
 Thanks,
 Bye,
 Roberto.
 
 
 Roberto Battistutta
 Associate Professor
 Department of Chemistry
 University of Padua
 via Marzolo 1, 35131 Padova - ITALY
 tel. +39.049.8275265/67
 fax. +39.049.8275239
 roberto.battistu...@unipd.it
 www.chimica.unipd.it/roberto.battistutta/
 VIMM (Venetian Institute of Molecular Medicine) via Orus 2,
 35129 Padova - ITALY tel. +39.049.7923236 fax
 +39.049.7923250 www.vimm.it
 
 
 

-- 
--
Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

phone: +49 (0)551 39 22149

GPG Key ID = A46BEE1A



signature.asc
Description: Digital signature


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Bernhard Rupp (Hofkristallrat a.D.)
 The data statistics you sent show a mulltiplicity of about 20! Did you
check your data for radiation damage? That might explain why your Rmeas is
so utterly high while your I/sigI is still above 2 (You should not cut your
data but include more!)

So then I got that wrong - with that *high* a redundancy, the preceding term
becomes ~1 and linear Rmerge and Rmeas asymptotically become the same?

BR 

Cheers, Tim

On Thu, Mar 03, 2011 at 10:57:37AM -0700, Maia Cherney wrote:
 I see, there is no consensus about my data. Some people say 2.4A, 
 other say all. Well, I chose 2.3 A. My rule was to be a little bit 
 below Rmerg 100%. At 2.3A Rmerg was 98.7% Actually, I have published 
 my paper in JMB. Yes, reviewers did not like that and even made me 
 give Rrim and Rpim etc.
 
 Maia
 
 
 
 Bernhard Rupp (Hofkristallrat a.D.) wrote:
 First of all I would ask a XDS expert for that because I don't know 
 exactly what stats the XDS program reports (shame on me, ok) nor what 
 the quality of your error model is, or what you want to use the data 
 for (I guess refinement - see Eleanor's response for that, and use all
data).
 
 There is one point I'd like to make re cutoff: If one gets greedy and 
 collects too much noise in high resolution shells (like way below 
 I/sigI =
 0.8 or so) the scaling/integration may suffer from an overabundance 
 of nonsense data, and here I believe it makes sense to select a 
 higher cutoff (like what exactly?) and reprocess the data. Maybe one 
 of our data collection specialist should comment on that.
 
 BR
 
 -Original Message-
 From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of 
 Maia Cherney
 Sent: Thursday, March 03, 2011 9:13 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule
 
 I have to resend my statistics.
 
 Maia Cherney wrote:
 Dear Bernhard
 
 I am wondering where I should cut my data off. Here is the 
 statistics from XDS processing.
 
 Maia
 
 
 On 11-03-03 04:29 AM, Roberto Battistutta wrote:
 Dear all,
 I got a reviewer comment that indicate the need to refine the 
 structures
 at an appropriate resolution (I/sigmaI of3.0), and re-submit the 
 revised coordinate files to the PDB for validation.. In the 
 manuscript I present some crystal structures determined by 
 molecular replacement using the same protein in a different space 
 group as search model. Does anyone know the origin or the 
 theoretical basis of this I/sigmaI3.0 rule for an appropriate 
 resolution?
 Thanks,
 Bye,
 Roberto.
 
 
 Roberto Battistutta
 Associate Professor
 Department of Chemistry
 University of Padua
 via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. 
 +39.049.8275239 roberto.battistu...@unipd.it 
 www.chimica.unipd.it/roberto.battistutta/
 VIMM (Venetian Institute of Molecular Medicine) via Orus 2,
 35129 Padova - ITALY tel. +39.049.7923236 fax
 +39.049.7923250 www.vimm.it
 
 
 

--
--
Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

phone: +49 (0)551 39 22149

GPG Key ID = A46BEE1A


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Bernhard Rupp (Hofkristallrat a.D.)
 Rmeas is always higher than Rmerge, so if my Rmerg is high I don't like
Rmeas either.

But that makes perfect sense now per Tim: the linear Rmerge gives for small
N (lower redundancy) always lower values and rises with redundancy to
approach Rmeas/rim for high redundancy. 

 I like the idea just to look at the I/sigI and include more data.

Lucky me to suggest  to use all your present data for refinement... ;-)

BR

Maia

Tim Gruene wrote:
 Hello Maia,

 Rmerge is obsolete, so the reviewers had a good point to make you 
 publish Rmeas instead. Rmeas should replace Rmerge in my opinion.

 The data statistics you sent show a mulltiplicity of about 20! Did you 
 check your data for radiation damage? That might explain why your 
 Rmeas is so utterly high while your I/sigI is still above 2 (You 
 should not cut your data but include
 more!)

 What do the statistics look like if you process just about enough 
 frames so that you get a reasonable mulltiplicity, 3-4, say?

 Cheers, Tim

 On Thu, Mar 03, 2011 at 10:57:37AM -0700, Maia Cherney wrote:
   
 I see, there is no consensus about my data. Some people say 2.4A, 
 other say all. Well, I chose 2.3 A. My rule was to be a little bit 
 below Rmerg 100%. At 2.3A Rmerg was 98.7% Actually, I have published 
 my paper in JMB. Yes, reviewers did not like that and even made me 
 give Rrim and Rpim etc.

 Maia



 Bernhard Rupp (Hofkristallrat a.D.) wrote:
 
 First of all I would ask a XDS expert for that because I don't know 
 exactly what stats the XDS program reports (shame on me, ok) nor 
 what the quality of your error model is, or what you want to use the 
 data for (I guess refinement - see Eleanor's response for that, and use
all data).

 There is one point I'd like to make re cutoff: If one gets greedy 
 and collects too much noise in high resolution shells (like way 
 below I/sigI =
 0.8 or so) the scaling/integration may suffer from an overabundance 
 of nonsense data, and here I believe it makes sense to select a 
 higher cutoff (like what exactly?) and reprocess the data. Maybe one 
 of our data collection specialist should comment on that.

 BR

 -Original Message-
 From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf 
 Of Maia Cherney
 Sent: Thursday, March 03, 2011 9:13 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule

 I have to resend my statistics.

 Maia Cherney wrote:
   
 Dear Bernhard

 I am wondering where I should cut my data off. Here is the 
 statistics from XDS processing.

 Maia

 
 On 11-03-03 04:29 AM, Roberto Battistutta wrote:
   
 Dear all,
 I got a reviewer comment that indicate the need to refine the 
 structures
 
 at an appropriate resolution (I/sigmaI of3.0), and re-submit the 
 revised coordinate files to the PDB for validation.. In the 
 manuscript I present some crystal structures determined by 
 molecular replacement using the same protein in a different space 
 group as search model. Does anyone know the origin or the 
 theoretical basis of this I/sigmaI3.0 rule for an appropriate 
 resolution?
   
 Thanks,
 Bye,
 Roberto.


 Roberto Battistutta
 Associate Professor
 Department of Chemistry
 University of Padua
 via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. 
 +39.049.8275239 roberto.battistu...@unipd.it 
 www.chimica.unipd.it/roberto.battistutta/
 VIMM (Venetian Institute of Molecular Medicine) via Orus 2,
 35129 Padova - ITALY tel. +39.049.7923236 fax
 +39.049.7923250 www.vimm.it

 
   

   


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Bernhard Rupp (Hofkristallrat a.D.)
 I don't like Rmeas either.

Given the Angst caused by actually useful redundancy, would it not be more
reasonable then to report Rpim which decreases with redundancy? Maybe Rpim
in an additional column would help to reduce the Angst?

BR  

Maia

Tim Gruene wrote:
 Hello Maia,

 Rmerge is obsolete, so the reviewers had a good point to make you 
 publish Rmeas instead. Rmeas should replace Rmerge in my opinion.

 The data statistics you sent show a mulltiplicity of about 20! Did you 
 check your data for radiation damage? That might explain why your 
 Rmeas is so utterly high while your I/sigI is still above 2 (You 
 should not cut your data but include
 more!)

 What do the statistics look like if you process just about enough 
 frames so that you get a reasonable mulltiplicity, 3-4, say?

 Cheers, Tim

 On Thu, Mar 03, 2011 at 10:57:37AM -0700, Maia Cherney wrote:
   
 I see, there is no consensus about my data. Some people say 2.4A, 
 other say all. Well, I chose 2.3 A. My rule was to be a little bit 
 below Rmerg 100%. At 2.3A Rmerg was 98.7% Actually, I have published 
 my paper in JMB. Yes, reviewers did not like that and even made me 
 give Rrim and Rpim etc.

 Maia



 Bernhard Rupp (Hofkristallrat a.D.) wrote:
 
 First of all I would ask a XDS expert for that because I don't know 
 exactly what stats the XDS program reports (shame on me, ok) nor 
 what the quality of your error model is, or what you want to use the 
 data for (I guess refinement - see Eleanor's response for that, and use
all data).

 There is one point I'd like to make re cutoff: If one gets greedy 
 and collects too much noise in high resolution shells (like way 
 below I/sigI =
 0.8 or so) the scaling/integration may suffer from an overabundance 
 of nonsense data, and here I believe it makes sense to select a 
 higher cutoff (like what exactly?) and reprocess the data. Maybe one 
 of our data collection specialist should comment on that.

 BR

 -Original Message-
 From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf 
 Of Maia Cherney
 Sent: Thursday, March 03, 2011 9:13 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] I/sigmaI of 3.0 rule

 I have to resend my statistics.

 Maia Cherney wrote:
   
 Dear Bernhard

 I am wondering where I should cut my data off. Here is the 
 statistics from XDS processing.

 Maia

 
 On 11-03-03 04:29 AM, Roberto Battistutta wrote:
   
 Dear all,
 I got a reviewer comment that indicate the need to refine the 
 structures
 
 at an appropriate resolution (I/sigmaI of3.0), and re-submit the 
 revised coordinate files to the PDB for validation.. In the 
 manuscript I present some crystal structures determined by 
 molecular replacement using the same protein in a different space 
 group as search model. Does anyone know the origin or the 
 theoretical basis of this I/sigmaI3.0 rule for an appropriate 
 resolution?
   
 Thanks,
 Bye,
 Roberto.


 Roberto Battistutta
 Associate Professor
 Department of Chemistry
 University of Padua
 via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax. 
 +39.049.8275239 roberto.battistu...@unipd.it 
 www.chimica.unipd.it/roberto.battistutta/
 VIMM (Venetian Institute of Molecular Medicine) via Orus 2,
 35129 Padova - ITALY tel. +39.049.7923236 fax
 +39.049.7923250 www.vimm.it

 
   

   


Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Bart Hazes
higher redundancy lowers Rpim because it increases precision. However, 
it need not increase accuracy if the observations are not drawn from the 
true distribution. If pathologic behaviour of Rfactor statistics is 
due to radiation damage, as I believe is often the case, we are 
combining observations that are no longer equivalent. If you used long 
exposures per image and collected just enough data for a complete data 
set you are out of luck. If you used shorter exposures and opted for a 
high-redundancy set then you have the option to toss out the last N 
images to get rid of the most damaged data, or you can try to compensate 
for the damage with zerodose, or whatever the name was of the program, I 
think from Wolfgang Kabsch.


Rejecting data is never desirable but I think it may be better than 
merging non-equivalent data that can't be properly modeled by a single 
structure.


Bart

On 11-03-03 12:34 PM, Bernhard Rupp (Hofkristallrat a.D.) wrote:

I don't like Rmeas either.

Given the Angst caused by actually useful redundancy, would it not be more
reasonable then to report Rpim which decreases with redundancy? Maybe Rpim
in an additional column would help to reduce the Angst?

BR

Maia

Tim Gruene wrote:

Hello Maia,

Rmerge is obsolete, so the reviewers had a good point to make you
publish Rmeas instead. Rmeas should replace Rmerge in my opinion.

The data statistics you sent show a mulltiplicity of about 20! Did you
check your data for radiation damage? That might explain why your
Rmeas is so utterly high while your I/sigI is still above 2 (You
should not cut your data but include
more!)

What do the statistics look like if you process just about enough
frames so that you get a reasonable mulltiplicity, 3-4, say?

Cheers, Tim

On Thu, Mar 03, 2011 at 10:57:37AM -0700, Maia Cherney wrote:


I see, there is no consensus about my data. Some people say 2.4A,
other say all. Well, I chose 2.3 A. My rule was to be a little bit
below Rmerg 100%. At 2.3A Rmerg was 98.7% Actually, I have published
my paper in JMB. Yes, reviewers did not like that and even made me
give Rrim and Rpim etc.

Maia



Bernhard Rupp (Hofkristallrat a.D.) wrote:


First of all I would ask a XDS expert for that because I don't know
exactly what stats the XDS program reports (shame on me, ok) nor
what the quality of your error model is, or what you want to use the
data for (I guess refinement - see Eleanor's response for that, and use

all data).

There is one point I'd like to make re cutoff: If one gets greedy
and collects too much noise in high resolution shells (like way
belowI/sigI  =
0.8 or so) the scaling/integration may suffer from an overabundance
of nonsense data, and here I believe it makes sense to select a
higher cutoff (like what exactly?) and reprocess the data. Maybe one
of our data collection specialist should comment on that.

BR

-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf
Of Maia Cherney
Sent: Thursday, March 03, 2011 9:13 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] I/sigmaI of3.0 rule

I have to resend my statistics.

Maia Cherney wrote:


Dear Bernhard

I am wondering where I should cut my data off. Here is the
statistics from XDS processing.

Maia



On 11-03-03 04:29 AM, Roberto Battistutta wrote:


Dear all,
I got a reviewer comment that indicate the need to refine the
structures


at an appropriate resolution (I/sigmaI of3.0), and re-submit the
revised coordinate files to the PDB for validation.. In the
manuscript I present some crystal structures determined by
molecular replacement using the same protein in a different space
group as search model. Does anyone know the origin or the
theoretical basis of this I/sigmaI3.0 rule for an appropriate
resolution?


Thanks,
Bye,
Roberto.


Roberto Battistutta
Associate Professor
Department of Chemistry
University of Padua
via Marzolo 1, 35131 Padova - ITALY tel. +39.049.8275265/67 fax.
+39.049.8275239 roberto.battistu...@unipd.it
www.chimica.unipd.it/roberto.battistutta/
VIMM (Venetian Institute of Molecular Medicine) via Orus 2,
35129 Padova - ITALY tel. +39.049.7923236 fax
+39.049.7923250 www.vimm.it








--



Bart Hazes (Associate Professor)
Dept. of Medical Microbiology  Immunology
University of Alberta
1-15 Medical Sciences Building
Edmonton, Alberta
Canada, T6G 2H7
phone:  1-780-492-0042
fax:1-780-492-7521




Re: [ccp4bb] I/sigmaI of 3.0 rule

2011-03-03 Thread Ingo P. Korndoerfer
not sure whether this option has been mentioned before ...

i think what we really would like to do is decide by the quality of the
density. i see that this is difficult.

so, short of that ... how about the figure of merit in refinement ?

wouldn't the fom reflect how useful our data really are ?

ingo


On 03/03/2011 12:29, Roberto Battistutta wrote:
 Dear all,
 I got a reviewer comment that indicate the need to refine the structures at 
 an appropriate resolution (I/sigmaI of 3.0), and re-submit the revised 
 coordinate files to the PDB for validation.. In the manuscript I present 
 some crystal structures determined by molecular replacement using the same 
 protein in a different space group as search model. Does anyone know the 
 origin or the theoretical basis of this I/sigmaI 3.0 rule for an 
 appropriate resolution?
 Thanks,
 Bye,
 Roberto.


 Roberto Battistutta
 Associate Professor
 Department of Chemistry
 University of Padua
 via Marzolo 1, 35131 Padova - ITALY
 tel. +39.049.8275265/67
 fax. +39.049.8275239
 roberto.battistu...@unipd.it
 www.chimica.unipd.it/roberto.battistutta/
 VIMM (Venetian Institute of Molecular Medicine)
 via Orus 2, 35129 Padova - ITALY
 tel. +39.049.7923236
 fax +39.049.7923250
 www.vimm.it