Re: [ccp4bb] Resolution, R factors and data quality

2013-09-02 Thread Ian Tickle
On 1 September 2013 11:31, Frank von Delft frank.vonde...@sgc.ox.ac.ukwrote:


 2.
 I'm struck by how small the improvements in R/Rfree are in Diederichs 
 Karplus (ActaD 2013, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3689524/);
 the authors don't discuss it, but what's current thinking on how to
 estimate the expected variation in R/Rfree - does the Tickle formalism
 (1998) still apply for ML with very weak data?


Frank, another point just occurred to me: the main reason for using Rfree
as a model selection criterion is to detect overfitting in cases where
you're comparing models with different numbers of parameters.  That doesn't
apply here since you're comparing the same model.  In that case you would
be much better off comparing Rwork since it has a much lower variance than
Rfree (in fact lower by a factor of 19 if you use the usual 5% of
reflections for the test set).

Cheers

-- Ian


Re: [ccp4bb] Resolution, R factors and data quality

2013-09-02 Thread Robbie Joosten
Hi Frank and Ian,

We struggled with the small changes in free R-factors when we implementing
the paired refinement for resolution cut-offs in PDB_REDO. It's not just the
lack of a proper test of significance for (weighted) R-factor changes, it's
also a more philosophical problem. When should you reject a higher
resolution cut-off? 
a) When it gives significantly higher R-factors (lenient)
b) When it gives numerically higher R-factors (less lenient, but takes away
the need for a significance test)
c) When it does not give significantly lower R-factors (very strict; if I
take X*sigma(R-free) as a cut-off, with X  1.0, in most cases I should
reject the higher cut-off).

PDB_REDO uses b), similar to Karplus and Diederichs in their Science paper.

Then the next question is which metric are you going to use? R-free,
weighted R-free, free log likelihood and CCfree are all written out by
Refmac. At least the latter two have proper significance tests (likelihood
ratios and transformation Z-scores respectively). Note that we use different
models, constructed with different (but very much overlapping) data, but the
metrics are calculated with the same data. The different metrics do not
necessarily move in the same direction when moving to a higher resolution.

We ended up using all 4 in PDB_REDO. By default a higher resolution cut-off
is rejected if more than 1 metric gets (numerically) worse, but this can be
changed by the user.

Next question is the size of the resolution steps. How big should those be
and how should they be set up? Karplus and Diederichs used equal steps in
Angstrom, PDB_REDO uses equal steps in number of reflections. That way you
add the same amount of data (but not usable information) with each step.
Anyway, a different choice of steps will give a different final resolution
cut-off. And the exact cut-off doesn't matter that much (see Evans and
Murshudov). Different (versions of) refinement programs will probably also
give somewhat different results. 

We tested our implementation on a number of structures in the PDB with data
extending to higher resolution than marked in the PDB file and we observed
that quite a lot had very conservative resolution cut-offs. In some cases we
could use so much extra data that we could move to a more complex B-factor
model and seriously improve R-factors.

The best resolution cut-off is unclear and may change over time with
improving methods. So whatever you choose, please deposit all the data that
you can get even if you don't use it yourself. I think that the Karplus and
Diederichs papers show us that you should at least realize that your
resolution cut-off is a methodological choice that you should describe and
should be able to defend if somebody asks you why you made that particular
choice.

Cheers,
Robbie


 On 1 September 2013 11:31, Frank von Delft frank.vonde...@sgc.ox.ac.uk
 wrote:
 
 
 
   2.
   I'm struck by how small the improvements in R/Rfree are in
 Diederichs  Karplus (ActaD 2013,
 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3689524/
 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3689524/ );  the authors
 don't discuss it, but what's current thinking on how to estimate the
expected
 variation in R/Rfree - does the Tickle formalism (1998) still apply for ML
with
 very weak data?
 
 
 
 Frank, our paper is still relevant, unfortunately just not to the question
 you're trying to answer!  We were trying to answer 2 questions: 1) what
 value of Rfree would you expect to get if the structure were free of
 systematic error and only random errors were present, so that could be
used
 as a baseline (assuming a fixed cross-validation test set) to identify
models
 with gross (e.g. chain-tracing) errors; and 2) how much would you expect
 Rfree to vary assuming a fixed starting model but with a different random
 sampling of the test set (i.e. the sampling standard deviation).  The
latter is
 relevant if say you want to compare the same structure (at the same
 resolution obviously) done independently in 2 labs, since it tells you how
big
 the difference in Rfree for an arbitrary choice of test set needs to be
before
 you can claim that it's statistically significant.
 
 
 In this case the questions are different because you're certainly not
 comparing different models using the same test set, neither I suspect are
 you comparing the same model with different randomly selected test sets.
I
 assume in this case that the test sets for different resolution cut-offs
are
 highly correlated, which I suspect makes it quite difficult to say what is
a
 significant difference in Rfree (I have not attempted to do the algebra!).
 
 
 Rfree is one of a number of model selection criteria (see
 http://en.wikipedia.org/wiki/Model_selection#Criteria_for_model_selectio
 n) whose purpose is to provide a metric for comparison of different models
 given specific data, i.e. as for the likelihood function they all take the
form
 f(model | data), so in all cases you're varying 

Re: [ccp4bb] Resolution, R factors and data quality

2013-09-01 Thread Frank von Delft

A bit late to this thread.

1.
Juergen:   Jim was not actually adopting CC*, he was asking how to make 
practical use of it when faced with actual datasets fading into noise.  
If I understand correctly from later responses, paired refinement is 
what KD suggest should be best practice?


2.
I'm struck by how small the improvements in R/Rfree are in Diederichs  
Karplus (ActaD 2013,http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3689524/ 
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3689524/);  the authors 
don't discuss it, but what's current thinking on how to estimate the 
expected variation in R/Rfree - does the Tickle formalism (1998) still 
apply for ML with very weak data?


I'm puzzled by Table 4 (and discussion):  do I read correctly that 
discarding negative unique reflections led to higher CCwork/CCfree?  
Wasn't the point of the paper that massaging data always shows up in 
worse refinement stats?  Is this a corner case, and how would one know?


Cheers
phx











On 28/08/2013 01:48, Bosch, Juergen wrote:

Hi Jim,

all data is good data - the more data you have the better (that's what 
they say anyhow)


Not everybody is adopting to the Karplus Diederich paper as quickly as 
you do. And not to be confused with the Diederichs and Karplus paper :-)

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3689524/
http://www.ncbi.nlm.nih.gov/pubmed/22628654

My models get better by including the data I had been omitting before, 
that's all that counts for me.


Jürgen

P.S. reminds me somehow of those guys collecting more and more data - 
PRISM greetings


On Aug 27, 2013, at 8:29 PM, Jim Pflugrath wrote:


I have to ask flamingly: So what about CC1/2 and CC*?

Did we not replace an arbitrary resolution cut-off based on a value 
of Rmerge with an arbitrary resolution cut-off based on a value of 
Rmeas already?  And now we are going to replace that with an 
arbitrary resolution cut-off based on a value of CC* or is it CC1/2?


I am asked often:  What value of CC1/2 should I cut my resolution at? 
 What should I tell my students?  I've got a course coming up and I 
am sure they will ask me again.


Jim


*From:* CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK 
mailto:CCP4BB@JISCMAIL.AC.UK] on behalf of Arka Chakraborty 
[arko.chakrabort...@gmail.com mailto:arko.chakrabort...@gmail.com]

*Sent:* Tuesday, August 27, 2013 7:45 AM
*To:* CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK
*Subject:* Re: [ccp4bb] Resolution, R factors and data quality

Hi all,
does this not again bring up the still prevailing adherence to R 
factors and not  a shift to correlation coefficients ( CC1/2 and CC*) 
? (as Dr. Phil Evans has indicated).?
The way we look at data quality ( by we I mean the end users ) 
needs to be altered, I guess.


best,

Arka Chakraborty

On Tue, Aug 27, 2013 at 9:50 AM, Phil Evans p...@mrc-lmb.cam.ac.uk 
mailto:p...@mrc-lmb.cam.ac.uk wrote:


The question you should ask yourself is why would omitting data
improve my model?

Phil



..
Jürgen Bosch
Johns Hopkins University
Bloomberg School of Public Health
Department of Biochemistry  Molecular Biology
Johns Hopkins Malaria Research Institute
615 North Wolfe Street, W8708
Baltimore, MD 21205
Office: +1-410-614-4742
Lab:  +1-410-614-4894
Fax:  +1-410-955-2926
http://lupo.jhsph.edu








Re: [ccp4bb] Resolution, R factors and data quality

2013-09-01 Thread Ian Tickle
On 1 September 2013 11:31, Frank von Delft frank.vonde...@sgc.ox.ac.ukwrote:


 2.
 I'm struck by how small the improvements in R/Rfree are in Diederichs 
 Karplus (ActaD 2013, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3689524/);
 the authors don't discuss it, but what's current thinking on how to
 estimate the expected variation in R/Rfree - does the Tickle formalism
 (1998) still apply for ML with very weak data?


Frank, our paper is still relevant, unfortunately just not to the question
you're trying to answer!  We were trying to answer 2 questions: 1) what
value of Rfree would you expect to get if the structure were free of
systematic error and only random errors were present, so that could be used
as a baseline (assuming a fixed cross-validation test set) to identify
models with gross (e.g. chain-tracing) errors; and 2) how much would you
expect Rfree to vary assuming a fixed starting model but with a different
random sampling of the test set (i.e. the sampling standard deviation).
The latter is relevant if say you want to compare the same structure (at
the same resolution obviously) done independently in 2 labs, since it tells
you how big the difference in Rfree for an arbitrary choice of test set
needs to be before you can claim that it's statistically significant.

In this case the questions are different because you're certainly not
comparing different models using the same test set, neither I suspect are
you comparing the same model with different randomly selected test sets.  I
assume in this case that the test sets for different resolution cut-offs
are highly correlated, which I suspect makes it quite difficult to say what
is a significant difference in Rfree (I have not attempted to do the
algebra!).

Rfree is one of a number of model selection criteria (see
http://en.wikipedia.org/wiki/Model_selection#Criteria_for_model_selection)
whose purpose is to provide a metric for comparison of different models
given specific data, i.e. as for the likelihood function they all take the
form f(model | data), so in all cases you're varying the model with fixed
data.  It's use in the form f(data | model), i.e. where you're varying the
data with a fixed model I would say is somewhat questionable and certainly
requires careful analysis to determine whether the results are
statistically significant.  Even assuming we can argue our way around the
inappropriate application of model selection methodology to a different
problem, unfortunately Rfree is far from an ideal criterion in this
respect; a better one would surely be the free log-likelihood as originally
proposed by Gerard Bricogne.

Cheers

-- Ian


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-29 Thread Robbie Joosten
Hi Bernhard,

snip
 But the real objective is – where do data stop making an improvement to the
 model. The categorical statement that all data is good
 
 is simply not true in practice. It is probably specific to each data set 
 refinement, and as long as we do not always run paired refinement ala KD
 
 or similar in order to find out where that point is, the yearning for a simple
 number will not stop (although I believe automation will make the KD
 approach or similar eventually routine).

For what it is worth: This is already implemented in PDB_REDO.

Cheers,
Robbie


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-28 Thread Bernhard Rupp
Based on the simulations I've done the data should be cut at CC1/2 = 0. 
Seriously. Problem is figuring out where it hits zero. 

 

But the real objective is – where do data stop making an improvement to the 
model. The categorical statement that all data is good

is simply not true in practice. It is probably specific to each data set  
refinement, and as long as we do not always run paired refinement ala KD

or similar in order to find out where that point is, the yearning for a simple 
number will not stop (although I believe automation will make the KD approach 
or similar eventually routine). 

 

As for the resolution of the structure I'd say call that where |Fo-Fc| 
(error in the map) becomes comparable to Sigma(Fo). This is I/Sigma = 2.5 if 
Rcryst is 20%.  That is: |Fo-Fc| / Fo = 0.2, which implies |Io-Ic|/Io = 0.4 or 
Io/|Io-Ic| = Io/sigma(Io) = 2.5.

 

Makes sense to me...

 

As long as it is understood that this ‘model resolution value’ derived via your 
argument from I/sigI is not the same as a I/sigI data cutoff (and that Rcryst 
and Rmerge have nothing in common)….

 

-James Holton

MAD Scientist

 

Best, BR

 

 


On Aug 27, 2013, at 5:29 PM, Jim Pflugrath  mailto:jim.pflugr...@rigaku.com 
jim.pflugr...@rigaku.com wrote:

I have to ask flamingly: So what about CC1/2 and CC*?  

 

Did we not replace an arbitrary resolution cut-off based on a value of Rmerge 
with an arbitrary resolution cut-off based on a value of Rmeas already?  And 
now we are going to replace that with an arbitrary resolution cut-off based on 
a value of CC* or is it CC1/2?

 

I am asked often:  What value of CC1/2 should I cut my resolution at?  What 
should I tell my students?  I've got a course coming up and I am sure they will 
ask me again.

 

Jim

 


  _  


From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Arka Chakraborty 
[arko.chakrabort...@gmail.com]
Sent: Tuesday, August 27, 2013 7:45 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] Resolution, R factors and data quality

Hi all,

does this not again bring up the still prevailing adherence to R factors and 
not  a shift to correlation coefficients ( CC1/2 and CC*) ? (as Dr. Phil Evans 
has indicated).?

The way we look at data quality ( by we I mean the end users ) needs to be 
altered, I guess.

best,

 

Arka Chakraborty

 

On Tue, Aug 27, 2013 at 9:50 AM, Phil Evans p...@mrc-lmb.cam.ac.uk wrote:

The question you should ask yourself is why would omitting data improve my 
model?

Phil



Re: [ccp4bb] Resolution, R factors and data quality

2013-08-28 Thread Phil Evans
We don't currently have a really good measure of that point where adding the 
extra shell of data adds significant information (whatever that means. 
However, my rough trials (see http://www.ncbi.nlm.nih.gov/pubmed/23793146) 
suggested that the exact cutoff point was not very critical, presumably as the 
information content fades out slowly, so it probably isn't something to 
agonise over too much. K  D's paired refinement may be useful though.

I would again caution against looking too hard at CC* rather than CC1/2: they 
are exactly equivalent, but CC* changes very rapidly at small values, which may 
be misleading. The purpose of CC* is for comparison with CCcryst (i.e. Fo to 
Fc).

I would remind any users of Scala who want to look back at old log files to see 
the statistics for the outer shell at the cutoff they used, that CC1/2 has been 
calculated in Scala for many years under the name CC_IMEAN. It's now called 
CC1/2 in Aimless (and Scala) following Kai's excellent suggestion.

Phil


On 28 Aug 2013, at 08:21, Bernhard Rupp hofkristall...@gmail.com wrote:

 Based on the simulations I've done the data should be cut at CC1/2 = 0. 
 Seriously. Problem is figuring out where it hits zero. 
  
 But the real objective is – where do data stop making an improvement to the 
 model. The categorical statement that all data is good
 is simply not true in practice. It is probably specific to each data set  
 refinement, and as long as we do not always run paired refinement ala KD
 or similar in order to find out where that point is, the yearning for a 
 simple number will not stop (although I believe automation will make the KD 
 approach or similar eventually routine).
  
 As for the resolution of the structure I'd say call that where |Fo-Fc| 
 (error in the map) becomes comparable to Sigma(Fo). This is I/Sigma = 2.5 if 
 Rcryst is 20%.  That is: |Fo-Fc| / Fo = 0.2, which implies |Io-Ic|/Io = 0.4 
 or Io/|Io-Ic| = Io/sigma(Io) = 2.5.
  
 Makes sense to me...
  
 As long as it is understood that this ‘model resolution value’ derived via 
 your argument from I/sigI is not the same as a I/sigI data cutoff (and that 
 Rcryst and Rmerge have nothing in common)….
  
 -James Holton
 MAD Scientist
  
 
 Best, BR
 
  
 
  
 
 
 On Aug 27, 2013, at 5:29 PM, Jim Pflugrath jim.pflugr...@rigaku.com wrote:
 
 I have to ask flamingly: So what about CC1/2 and CC*?  
  
 Did we not replace an arbitrary resolution cut-off based on a value of Rmerge 
 with an arbitrary resolution cut-off based on a value of Rmeas already?  And 
 now we are going to replace that with an arbitrary resolution cut-off based 
 on a value of CC* or is it CC1/2?
  
 I am asked often:  What value of CC1/2 should I cut my resolution at?  What 
 should I tell my students?  I've got a course coming up and I am sure they 
 will ask me again.
  
 Jim
  
 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Arka 
 Chakraborty [arko.chakrabort...@gmail.com]
 Sent: Tuesday, August 27, 2013 7:45 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] Resolution, R factors and data quality
 
 Hi all,
 does this not again bring up the still prevailing adherence to R factors and 
 not  a shift to correlation coefficients ( CC1/2 and CC*) ? (as Dr. Phil 
 Evans has indicated).?
 The way we look at data quality ( by we I mean the end users ) needs to be 
 altered, I guess.
 
 best,
  
 Arka Chakraborty
  
 On Tue, Aug 27, 2013 at 9:50 AM, Phil Evans p...@mrc-lmb.cam.ac.uk wrote:
 The question you should ask yourself is why would omitting data improve my 
 model?
 
 Phil


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-28 Thread Arka Chakraborty
Hi all,
 If I am not wrong, the Karplus  Diederich paper suggests that data is
generally meaningful upto CC1/2  value of 0.20 but they suggest a paired
refinement technique ( pretty easy to perform) to actually decide on the
resolution at which to cut the data. This will be the most prudent thing to
do I guess and not follow any arbitrary value, as each data-set is
different. But the fact remains that even where I/sigma(I) falls to 0.5
useful information remains which will improve the quality of the maps, and
when discarded just leads us a bit further away from  truth. However, as
always, Dr Diederich and Karplus will be the best persons to comment on
that ( as they have already done in the paper :) )

best,

Arka Chakraborty

p.s. Aimless seems to suggest a resolution limit bases on CC1/2=0.5
criterion ( which I guess is done to be on the safe side- Dr. Phil Evans
can explain if there are other or an entirely different reason to it! ).
But if we want to squeeze the most from our data-set,  I guess we need to
push a bit further sometimes :)


On Wed, Aug 28, 2013 at 9:21 AM, Bernhard Rupp hofkristall...@gmail.comwrote:

 **Based on the simulations I've done the data should be cut at CC1/2 =
 0. Seriously. Problem is figuring out where it hits zero. 

 ** **

 But the real objective is – where do data stop making an improvement to
 the model. The categorical statement that all data is good

 is simply not true in practice. It is probably specific to each data set 
 refinement, and as long as we do not always run paired refinement ala KD**
 **

 or similar in order to find out where that point is, the yearning for a
 simple number will not stop (although I believe automation will make the KD
 approach or similar eventually routine). 

 ** **

 As for the resolution of the structure I'd say call that where |Fo-Fc|
 (error in the map) becomes comparable to Sigma(Fo). This is I/Sigma = 2.5
 if Rcryst is 20%.  That is: |Fo-Fc| / Fo = 0.2, which implies |Io-Ic|/Io =
 0.4 or Io/|Io-Ic| = Io/sigma(Io) = 2.5.

 ** **

 Makes sense to me...

 ** **

 As long as it is understood that this ‘model resolution value’ derived via
 your argument from I/sigI is not the same as a I/sigI data cutoff (and
 that Rcryst and Rmerge have nothing in common)….

 ** **

 -James Holton

 MAD Scientist

 ** **

 Best, BR

 ** **

 ** **


 On Aug 27, 2013, at 5:29 PM, Jim Pflugrath jim.pflugr...@rigaku.com
 wrote:

 I have to ask flamingly: So what about CC1/2 and CC*?  

 ** **

 Did we not replace an arbitrary resolution cut-off based on a value of
 Rmerge with an arbitrary resolution cut-off based on a value of Rmeas
 already?  And now we are going to replace that with an arbitrary resolution
 cut-off based on a value of CC* or is it CC1/2?

 ** **

 I am asked often:  What value of CC1/2 should I cut my resolution at?
  What should I tell my students?  I've got a course coming up and I am sure
 they will ask me again.

 ** **

 Jim

 ** **
 --

 *From:* CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Arka
 Chakraborty [arko.chakrabort...@gmail.com]
 *Sent:* Tuesday, August 27, 2013 7:45 AM
 *To:* CCP4BB@JISCMAIL.AC.UK
 *Subject:* Re: [ccp4bb] Resolution, R factors and data quality

 Hi all,

 does this not again bring up the still prevailing adherence to R factors
 and not  a shift to correlation coefficients ( CC1/2 and CC*) ? (as Dr.
 Phil Evans has indicated).?

 The way we look at data quality ( by we I mean the end users ) needs to
 be altered, I guess.

 best,

 ** **

 Arka Chakraborty

 ** **

 On Tue, Aug 27, 2013 at 9:50 AM, Phil Evans p...@mrc-lmb.cam.ac.uk wrote:
 

 The question you should ask yourself is why would omitting data improve
 my model?

 Phil




-- 
*Arka Chakraborty*
*ibmb (Institut de Biologia Molecular de Barcelona)**
**BARCELONA, SPAIN**
*


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-28 Thread Phil Evans
Aimless does indeed calculate the point at which CC1/2 falls below 0.5 but I 
would not necessarily suggest that as the best cutoff point. Personally I 
would also look at I/sigI, anisotropy and completeness, but as I said at that 
point I don't think it makes a huge difference

Phil

On 28 Aug 2013, at 10:00, Arka Chakraborty arko.chakrabort...@gmail.com wrote:

 Hi all,
  If I am not wrong, the Karplus  Diederich paper suggests that data is 
 generally meaningful upto CC1/2  value of 0.20 but they suggest a paired 
 refinement technique ( pretty easy to perform) to actually decide on the 
 resolution at which to cut the data. This will be the most prudent thing to 
 do I guess and not follow any arbitrary value, as each data-set is different. 
 But the fact remains that even where I/sigma(I) falls to 0.5 useful 
 information remains which will improve the quality of the maps, and when 
 discarded just leads us a bit further away from  truth. However, as always, 
 Dr Diederich and Karplus will be the best persons to comment on that ( as 
 they have already done in the paper :) )
 
 best,
 
 Arka Chakraborty
 
 p.s. Aimless seems to suggest a resolution limit bases on CC1/2=0.5 criterion 
 ( which I guess is done to be on the safe side- Dr. Phil Evans can explain if 
 there are other or an entirely different reason to it! ). But if we want to 
 squeeze the most from our data-set,  I guess we need to push a bit further 
 sometimes :)
 
 
 On Wed, Aug 28, 2013 at 9:21 AM, Bernhard Rupp hofkristall...@gmail.com 
 wrote:
 Based on the simulations I've done the data should be cut at CC1/2 = 0. 
 Seriously. Problem is figuring out where it hits zero. 
 
  
 
 But the real objective is – where do data stop making an improvement to the 
 model. The categorical statement that all data is good
 
 is simply not true in practice. It is probably specific to each data set  
 refinement, and as long as we do not always run paired refinement ala KD
 
 or similar in order to find out where that point is, the yearning for a 
 simple number will not stop (although I believe automation will make the KD 
 approach or similar eventually routine).
 
  
 
 As for the resolution of the structure I'd say call that where |Fo-Fc| 
 (error in the map) becomes comparable to Sigma(Fo). This is I/Sigma = 2.5 if 
 Rcryst is 20%.  That is: |Fo-Fc| / Fo = 0.2, which implies |Io-Ic|/Io = 0.4 
 or Io/|Io-Ic| = Io/sigma(Io) = 2.5.
 
  
 
 Makes sense to me...
 
  
 
 As long as it is understood that this ‘model resolution value’ derived via 
 your argument from I/sigI is not the same as a I/sigI data cutoff (and that 
 Rcryst and Rmerge have nothing in common)….
 
  
 
 -James Holton
 
 MAD Scientist
 
  
 
 Best, BR
 
  
 
  
 
 
 On Aug 27, 2013, at 5:29 PM, Jim Pflugrath jim.pflugr...@rigaku.com wrote:
 
 I have to ask flamingly: So what about CC1/2 and CC*?  
 
  
 
 Did we not replace an arbitrary resolution cut-off based on a value of Rmerge 
 with an arbitrary resolution cut-off based on a value of Rmeas already?  And 
 now we are going to replace that with an arbitrary resolution cut-off based 
 on a value of CC* or is it CC1/2?
 
  
 
 I am asked often:  What value of CC1/2 should I cut my resolution at?  What 
 should I tell my students?  I've got a course coming up and I am sure they 
 will ask me again.
 
  
 
 Jim
 
  
 
 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Arka 
 Chakraborty [arko.chakrabort...@gmail.com]
 Sent: Tuesday, August 27, 2013 7:45 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] Resolution, R factors and data quality
 
 Hi all,
 
 does this not again bring up the still prevailing adherence to R factors and 
 not  a shift to correlation coefficients ( CC1/2 and CC*) ? (as Dr. Phil 
 Evans has indicated).?
 
 The way we look at data quality ( by we I mean the end users ) needs to be 
 altered, I guess.
 
 best,
 
  
 
 Arka Chakraborty
 
  
 
 On Tue, Aug 27, 2013 at 9:50 AM, Phil Evans p...@mrc-lmb.cam.ac.uk wrote:
 
 The question you should ask yourself is why would omitting data improve my 
 model?
 
 Phil
 
 
 
 
 -- 
 Arka Chakraborty
 ibmb (Institut de Biologia Molecular de Barcelona)
 BARCELONA, SPAIN


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-28 Thread Bernhard Rupp
 We don't currently have a really good measure of that point where adding
the extra shell of data adds significant information 
  so it probably isn't something to agonise over too much. K  D's paired
refinement may be useful though.

That seems to be a correct assessment of the situation and a forceful
argument to eliminate the
review nonsense of nitpicking on I/sigI values, associated R-merges, and
other
pseudo-statistics once and for good. We can now, thanks to data deposition,
at any time generate or download the maps and the models 
and judge for ourselves even minute details of local model quality from
there. 
As far as use and interpretation goes, when the model meets the map is where
the rubber meets the road.
I therefore make the heretic statement that the entire table 1 of data
collection statistics, justifiable in pre-deposition times 
as some means to guess structure quality can go the way of X-ray film and be
almost always eliminated from papers. 
There is nothing really useful in Table 1, and all its data items and more
are in the PDB header anyhow. 
Availability of maps for review and for users is the key point.

Cheers, BR


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-28 Thread Bosch, Juergen
What a statement !
Give reviewers maps, I agree however, what if the reviewer has no clue of these 
things we call structures ? I think for those people table 1 might still 
provide some justification. I would argue it should go into the supplement at 
least.

Jürgen 

Sent from my iPad

On Aug 28, 2013, at 5:58, Bernhard Rupp hofkristall...@gmail.com wrote:

 We don't currently have a really good measure of that point where adding
 the extra shell of data adds significant information 
 so it probably isn't something to agonise over too much. K  D's paired
 refinement may be useful though.
 
 That seems to be a correct assessment of the situation and a forceful
 argument to eliminate the
 review nonsense of nitpicking on I/sigI values, associated R-merges, and
 other
 pseudo-statistics once and for good. We can now, thanks to data deposition,
 at any time generate or download the maps and the models 
 and judge for ourselves even minute details of local model quality from
 there. 
 As far as use and interpretation goes, when the model meets the map is where
 the rubber meets the road.
 I therefore make the heretic statement that the entire table 1 of data
 collection statistics, justifiable in pre-deposition times 
 as some means to guess structure quality can go the way of X-ray film and be
 almost always eliminated from papers. 
 There is nothing really useful in Table 1, and all its data items and more
 are in the PDB header anyhow. 
 Availability of maps for review and for users is the key point.
 
 Cheers, BR


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-28 Thread Pavel Afonine
Hi,

a random thought: the data resolution, d_min_actual, can be thought of as
such that maximizes the correlation (*) between the synthesis calculated
using your data and an equivalent Fmodel synthesis calculated using
complete set of Miller indices in d_min_actual-inf resolution range, where
d_min=d_min_actual and d_min is the highest resolution of data set in
question. Makes sense to me..

(*) or any other more appropriate similarity measure: usual map CC may not
be the best one in this context.

Pavel


On Tue, Aug 27, 2013 at 5:45 AM, Arka Chakraborty 
arko.chakrabort...@gmail.com wrote:

 Hi all,
 does this not again bring up the still prevailing adherence to R factors
 and not  a shift to correlation coefficients ( CC1/2 and CC*) ? (as Dr.
 Phil Evans has indicated).?
 The way we look at data quality ( by we I mean the end users ) needs to
 be altered, I guess.

 best,

 Arka Chakraborty

 On Tue, Aug 27, 2013 at 9:50 AM, Phil Evans p...@mrc-lmb.cam.ac.uk wrote:

 The question you should ask yourself is why would omitting data improve
 my model?

 Phil

 On 27 Aug 2013, at 02:49, Emily Golden 10417...@student.uwa.edu.au
 wrote:

  Hi All,
 
  I have collected diffraction images to 1 Angstrom resolution to the
 edge of the detector and 0.9A to the corner.I collected two sets, one
 for low resolution reflections and one for high resolution reflections.
  I get 100% completeness above 1A and 41% completeness in the 0.9A-0.95A
 shell.
 
  However, my Rmerge in the highest shelll is not good, ~80%.
 
  The Rfree is 0.17 and Rwork is 0.16 but the maps look very good.   If I
 cut the data to 1 Angstrom the R factors improve but I feel the maps are
 not as good and I'm not sure if I can justify cutting data.
 
  So my question is,  should I cut the data to 1Angstrom or should I keep
 the data I have?
 
  Also, taking geometric restraints off during refinement the Rfactors
 improve marginally, am I justified in doing this at this resolution?
 
  Thank you,
 
  Emily




 --
 *Arka Chakraborty*
 *ibmb (Institut de Biologia Molecular de Barcelona)**
 **BARCELONA, SPAIN**
 *



Re: [ccp4bb] Resolution, R factors and data quality

2013-08-28 Thread Bernhard Rupp
 what if the reviewer has no clue of these things we call structures ? I think 
 for those people table 1 might still provide some justification.

Someone who knows little about structures probably won’t appreciate the 
technical details in Table 1 either 

J rgen 

Sent from my iPad

On Aug 28, 2013, at 5:58, Bernhard Rupp hofkristall...@gmail.com wrote:

 We don't currently have a really good measure of that point where 
 adding
 the extra shell of data adds significant information
 so it probably isn't something to agonise over too much. K  D's 
 paired
 refinement may be useful though.
 
 That seems to be a correct assessment of the situation and a forceful 
 argument to eliminate the review nonsense of nitpicking on I/sigI 
 values, associated R-merges, and other pseudo-statistics once and for 
 good. We can now, thanks to data deposition, at any time generate or 
 download the maps and the models and judge for ourselves even minute 
 details of local model quality from there.
 As far as use and interpretation goes, when the model meets the map is 
 where the rubber meets the road.
 I therefore make the heretic statement that the entire table 1 of data 
 collection statistics, justifiable in pre-deposition times as some 
 means to guess structure quality can go the way of X-ray film and be 
 almost always eliminated from papers.
 There is nothing really useful in Table 1, and all its data items and 
 more are in the PDB header anyhow.
 Availability of maps for review and for users is the key point.
 
 Cheers, BR


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-28 Thread Stefan Gajewski
Jim,

This is coming from someone who just got enlightened a few weeks ago on 
resolution cut-offs.

I am asked often:  What value of CC1/2 should I cut my resolution at? 

The KD paper mentioned that the CC(1/2) criterion loses its significance at ~9 
according to student test.

I doubt that this can be a generally true guideline for a resolution cut-off. 
The structures I am doing right now were cut off at ~20 to ~80 CC(1/2)

You probably do not want to do the same mistake again, we all made before, when 
cutting resolution based on Rmerge/Rmeas, do you?


 What should I tell my students?  I've got a course coming up and I am sure 
 they will ask me again.

This is actually the more valuable insight I got from the KD paper. You don't 
use the CC(1/2) as an absolute indicator but rather as an suggestion. The 
resolution limit is determined by the refinement, not by the data processing.

I think I will handle my data in future as follows:

Bins with CC(1/2) less than 9 should be initially excluded.

The structure is then refined against all reflections in the file and only 
those bins that add information to the map/structure are kept in the final 
rounds. In most cases this will probably be more than CC(1/2) 25. If the last 
shell (CC~9) still adds information to the model, process the images again, 
e.g. till CC(1/2) drops to 0, and see if some more useful information is in 
there. You could also go ahead and use CC(1/2) 0 as initial cut-off, but I 
think that will rather increase computation time than help your structure in 
most cases.


So yes, I would feel comfortable with giving true resolution limits based on 
the refinement of the model, and not based on any number derived from data 
processing. In the end, you can always say  I tried it and this was the 
highest resolution I could model vs. I cut at _numerical value X of this 
parameter_ because everybody else does so.


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-27 Thread Pavel Afonine
Excellent point about R-factors. Indeed, at this resolution they should be
quite lower than what you have. Did you:
- model solvent?
- use anisotropic ADPs?
- add H (this alone can drop R by 1-2%)?
- model alternative conformations?
- How R-factors (Rwork) look in resolution?
Pavel


On Mon, Aug 26, 2013 at 10:47 PM, Emily Golden
10417...@student.uwa.edu.auwrote:

 Thanks Yuriy and Pavel,

 at this resolution one would expect R/Rfree to be ~ 10-11%/12-13% assuming
 you applied anisotropic B-factor refinement ( and probably having  a low
 symmetry SG).
 R merge of 80% may be OK if I/sig for high res shell is 2.

 Yes, I used anisotropic Bfactors and the space group is P1 21 1.  However,
 the I/sig is only 1.5 in the highest shell.   Cutting the data such that
 the I/sig is 2 has improved the R factors.  Thank you.

 Maps get worse Could it be when you use all resolution range you get
 59% of missing reflections in highest resolution shell filled in with DFc
 for the purpose of map calculation?

 Yes! the map that I was looking at was filled.

 Emily


 On 27 August 2013 09:49, Emily Golden 10417...@student.uwa.edu.au wrote:

 Hi All,

 I have collected diffraction images to 1 Angstrom resolution to the edge
 of the detector and 0.9A to the corner.I collected two sets, one for
 low resolution reflections and one for high resolution reflections.
 I get 100% completeness above 1A and 41% completeness in the 0.9A-0.95A
 shell.

 However, my Rmerge in the highest shelll is not good, ~80%.

 The Rfree is 0.17 and Rwork is 0.16 but the maps look very good.   If I
 cut the data to 1 Angstrom the R factors improve but I feel the maps are
 not as good and I'm not sure if I can justify cutting data.

 So my question is,  should I cut the data to 1Angstrom or should I keep
 the data I have?

 Also, taking geometric restraints off during refinement the Rfactors
 improve marginally, am I justified in doing this at this resolution?

 Thank you,

 Emily





Re: [ccp4bb] Resolution, R factors and data quality

2013-08-27 Thread Bernhard Rupp
Maybe a few remarks might help:

 

Ad a) R merge of 80% may be OK if I/sig for high res shell is 2.

What rationale is that statement based upon and what is the exact meaning of
this statement?

 

Is an Rmerge of 80% not ok when I/sigi is say  1.5? Or would 80% be ok if
the i/sigI is 3.0? 

 

Why should an R-merge of  80% be (too) high in the first place?

 

b) there is no statistical justification whatsoever for the I/sigI cutoff
of 2 for refinement. This has been discussed @CCP4bb multiple times, for
good reason. 

In this particular case, the (in)completeness appears to be the dominating
factor. 

 

c) as Pavel notes, the R-value improvement means nil when truncating data -
try to refine from 8 to 2 A and Rs might be even lower (abuse we engaged in
ages ago when we did not know better and no ML)

 

d) absolute values of refinement Rs vs (historic) expectation values cannot
be judged without complete and detailed knowledge of the refinement
protocol. 

 

The ultimate question is whether your model improves with inclusion of more
data or not. Kay Diederichs has a few papers to this effect that make good
reading. 

And CC1/2 seems to provide statistically justifiable limits for cut-off of
(reasonably complete) high resolution shells.

 

LG, BR

 

From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Emily
Golden
Sent: Dienstag, 27. August 2013 07:48
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] Resolution, R factors and data quality

 

Thanks Yuriy and Pavel, 

at this resolution one would expect R/Rfree to be ~ 10-11%/12-13% assuming
you applied anisotropic B-factor refinement ( and probably having  a low
symmetry SG). 
R merge of 80% may be OK if I/sig for high res shell is 2.

Yes, I used anisotropic Bfactors and the space group is P1 21 1.  However,
the I/sig is only 1.5 in the highest shell.   Cutting the data such that the
I/sig is 2 has improved the R factors.  Thank you. 

Maps get worse Could it be when you use all resolution range you get 59%
of missing reflections in highest resolution shell filled in with DFc for
the purpose of map calculation?

Yes! the map that I was looking at was filled. 

Emily

 

On 27 August 2013 09:49, Emily Golden 10417...@student.uwa.edu.au wrote:

Hi All, 

I have collected diffraction images to 1 Angstrom resolution to the edge of
the detector and 0.9A to the corner.I collected two sets, one for low
resolution reflections and one for high resolution reflections.  

I get 100% completeness above 1A and 41% completeness in the 0.9A-0.95A
shell.   

 

However, my Rmerge in the highest shelll is not good, ~80%.

The Rfree is 0.17 and Rwork is 0.16 but the maps look very good.   If I cut
the data to 1 Angstrom the R factors improve but I feel the maps are not as
good and I'm not sure if I can justify cutting data. 

So my question is,  should I cut the data to 1Angstrom or should I keep the
data I have?

Also, taking geometric restraints off during refinement the Rfactors improve
marginally, am I justified in doing this at this resolution?

 

Thank you, 

Emily

 



Re: [ccp4bb] Resolution, R factors and data quality

2013-08-27 Thread Phil Evans
The question you should ask yourself is why would omitting data improve my 
model? 

Phil

On 27 Aug 2013, at 02:49, Emily Golden 10417...@student.uwa.edu.au wrote:

 Hi All, 
 
 I have collected diffraction images to 1 Angstrom resolution to the edge of 
 the detector and 0.9A to the corner.I collected two sets, one for low 
 resolution reflections and one for high resolution reflections.  
 I get 100% completeness above 1A and 41% completeness in the 0.9A-0.95A 
 shell.   
 
 However, my Rmerge in the highest shelll is not good, ~80%.
 
 The Rfree is 0.17 and Rwork is 0.16 but the maps look very good.   If I cut 
 the data to 1 Angstrom the R factors improve but I feel the maps are not as 
 good and I'm not sure if I can justify cutting data. 
 
 So my question is,  should I cut the data to 1Angstrom or should I keep the 
 data I have?
 
 Also, taking geometric restraints off during refinement the Rfactors improve 
 marginally, am I justified in doing this at this resolution?
 
 Thank you, 
 
 Emily


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-27 Thread Arka Chakraborty
Hi all,
does this not again bring up the still prevailing adherence to R factors
and not  a shift to correlation coefficients ( CC1/2 and CC*) ? (as Dr.
Phil Evans has indicated).?
The way we look at data quality ( by we I mean the end users ) needs to
be altered, I guess.

best,

Arka Chakraborty

On Tue, Aug 27, 2013 at 9:50 AM, Phil Evans p...@mrc-lmb.cam.ac.uk wrote:

 The question you should ask yourself is why would omitting data improve
 my model?

 Phil

 On 27 Aug 2013, at 02:49, Emily Golden 10417...@student.uwa.edu.au
 wrote:

  Hi All,
 
  I have collected diffraction images to 1 Angstrom resolution to the edge
 of the detector and 0.9A to the corner.I collected two sets, one for
 low resolution reflections and one for high resolution reflections.
  I get 100% completeness above 1A and 41% completeness in the 0.9A-0.95A
 shell.
 
  However, my Rmerge in the highest shelll is not good, ~80%.
 
  The Rfree is 0.17 and Rwork is 0.16 but the maps look very good.   If I
 cut the data to 1 Angstrom the R factors improve but I feel the maps are
 not as good and I'm not sure if I can justify cutting data.
 
  So my question is,  should I cut the data to 1Angstrom or should I keep
 the data I have?
 
  Also, taking geometric restraints off during refinement the Rfactors
 improve marginally, am I justified in doing this at this resolution?
 
  Thank you,
 
  Emily




-- 
*Arka Chakraborty*
*ibmb (Institut de Biologia Molecular de Barcelona)**
**BARCELONA, SPAIN**
*


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-27 Thread Jim Pflugrath
I have to ask flamingly: So what about CC1/2 and CC*?

Did we not replace an arbitrary resolution cut-off based on a value of Rmerge 
with an arbitrary resolution cut-off based on a value of Rmeas already?  And 
now we are going to replace that with an arbitrary resolution cut-off based on 
a value of CC* or is it CC1/2?

I am asked often:  What value of CC1/2 should I cut my resolution at?  What 
should I tell my students?  I've got a course coming up and I am sure they will 
ask me again.

Jim


From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Arka Chakraborty 
[arko.chakrabort...@gmail.com]
Sent: Tuesday, August 27, 2013 7:45 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] Resolution, R factors and data quality

Hi all,
does this not again bring up the still prevailing adherence to R factors and 
not  a shift to correlation coefficients ( CC1/2 and CC*) ? (as Dr. Phil Evans 
has indicated).?
The way we look at data quality ( by we I mean the end users ) needs to be 
altered, I guess.

best,

Arka Chakraborty

On Tue, Aug 27, 2013 at 9:50 AM, Phil Evans 
p...@mrc-lmb.cam.ac.ukmailto:p...@mrc-lmb.cam.ac.uk wrote:
The question you should ask yourself is why would omitting data improve my 
model?

Phil


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-27 Thread Bosch, Juergen
Hi Jim,

all data is good data - the more data you have the better (that's what they say 
anyhow)

Not everybody is adopting to the Karplus Diederich paper as quickly as you do. 
And not to be confused with the Diederichs and Karplus paper :-)
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3689524/
http://www.ncbi.nlm.nih.gov/pubmed/22628654

My models get better by including the data I had been omitting before, that's 
all that counts for me.

Jürgen

P.S. reminds me somehow of those guys collecting more and more data - PRISM 
greetings

On Aug 27, 2013, at 8:29 PM, Jim Pflugrath wrote:

I have to ask flamingly: So what about CC1/2 and CC*?

Did we not replace an arbitrary resolution cut-off based on a value of Rmerge 
with an arbitrary resolution cut-off based on a value of Rmeas already?  And 
now we are going to replace that with an arbitrary resolution cut-off based on 
a value of CC* or is it CC1/2?

I am asked often:  What value of CC1/2 should I cut my resolution at?  What 
should I tell my students?  I've got a course coming up and I am sure they will 
ask me again.

Jim


From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UKmailto:CCP4BB@JISCMAIL.AC.UK] 
on behalf of Arka Chakraborty 
[arko.chakrabort...@gmail.commailto:arko.chakrabort...@gmail.com]
Sent: Tuesday, August 27, 2013 7:45 AM
To: CCP4BB@JISCMAIL.AC.UKmailto:CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] Resolution, R factors and data quality

Hi all,
does this not again bring up the still prevailing adherence to R factors and 
not  a shift to correlation coefficients ( CC1/2 and CC*) ? (as Dr. Phil Evans 
has indicated).?
The way we look at data quality ( by we I mean the end users ) needs to be 
altered, I guess.

best,

Arka Chakraborty

On Tue, Aug 27, 2013 at 9:50 AM, Phil Evans 
p...@mrc-lmb.cam.ac.ukmailto:p...@mrc-lmb.cam.ac.uk wrote:
The question you should ask yourself is why would omitting data improve my 
model?

Phil

..
Jürgen Bosch
Johns Hopkins University
Bloomberg School of Public Health
Department of Biochemistry  Molecular Biology
Johns Hopkins Malaria Research Institute
615 North Wolfe Street, W8708
Baltimore, MD 21205
Office: +1-410-614-4742
Lab:  +1-410-614-4894
Fax:  +1-410-955-2926
http://lupo.jhsph.edu






Re: [ccp4bb] Resolution, R factors and data quality

2013-08-27 Thread James M Holton
Based on the simulations I've done the data should be cut at CC1/2 = 0. 
Seriously. Problem is figuring out where it hits zero. 

Alternately, if French  Wilson can be modified so the Wilson plot is always 
straight, then the data don't need to be cut at all. 

As for the resolution of the structure I'd say call that where |Fo-Fc| (error 
in the map) becomes comparable to Sigma(Fo). This is I/Sigma = 2.5 if Rcryst is 
20%.  That is: |Fo-Fc| / Fo = 0.2, which implies |Io-Ic|/Io = 0.4 or Io/|Io-Ic| 
= Io/sigma(Io) = 2.5.

Makes sense to me...

-James Holton
MAD Scientist

On Aug 27, 2013, at 5:29 PM, Jim Pflugrath jim.pflugr...@rigaku.com wrote:

 I have to ask flamingly: So what about CC1/2 and CC*?  
 
 Did we not replace an arbitrary resolution cut-off based on a value of Rmerge 
 with an arbitrary resolution cut-off based on a value of Rmeas already?  And 
 now we are going to replace that with an arbitrary resolution cut-off based 
 on a value of CC* or is it CC1/2?
 
 I am asked often:  What value of CC1/2 should I cut my resolution at?  What 
 should I tell my students?  I've got a course coming up and I am sure they 
 will ask me again.
 
 Jim
 
 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Arka 
 Chakraborty [arko.chakrabort...@gmail.com]
 Sent: Tuesday, August 27, 2013 7:45 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] Resolution, R factors and data quality
 
 Hi all,
 does this not again bring up the still prevailing adherence to R factors and 
 not  a shift to correlation coefficients ( CC1/2 and CC*) ? (as Dr. Phil 
 Evans has indicated).?
 The way we look at data quality ( by we I mean the end users ) needs to be 
 altered, I guess.
 
 best,
 
 Arka Chakraborty
 
 On Tue, Aug 27, 2013 at 9:50 AM, Phil Evans p...@mrc-lmb.cam.ac.uk wrote:
 The question you should ask yourself is why would omitting data improve my 
 model?
 
 Phil


[ccp4bb] Resolution, R factors and data quality

2013-08-26 Thread Emily Golden
Hi All,

I have collected diffraction images to 1 Angstrom resolution to the edge of
the detector and 0.9A to the corner.I collected two sets, one for low
resolution reflections and one for high resolution reflections.
I get 100% completeness above 1A and 41% completeness in the 0.9A-0.95A
shell.

However, my Rmerge in the highest shelll is not good, ~80%.

The Rfree is 0.17 and Rwork is 0.16 but the maps look very good.   If I cut
the data to 1 Angstrom the R factors improve but I feel the maps are not as
good and I'm not sure if I can justify cutting data.

So my question is,  should I cut the data to 1Angstrom or should I keep the
data I have?

Also, taking geometric restraints off during refinement the Rfactors
improve marginally, am I justified in doing this at this resolution?

Thank you,

Emily


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-26 Thread Pavel Afonine
Hi Emily,


I get 100% completeness above 1A and 41% completeness in the 0.9A-0.95A
 shell.

 However, my Rmerge in the highest shelll is not good, ~80%.

 The Rfree is 0.17 and Rwork is 0.16 but the maps look very good.   If I
 cut the data to 1 Angstrom the R factors improve but I feel the maps are
 not as good and I'm not sure if I can justify cutting data.



You can't compare R-factors calculated using different sets of reflections.

Maps get worse Could it be when you use all resolution range you get
59% of missing reflections in highest resolution shell filled in with DFc
for the purpose of map calculation?


 Also, taking geometric restraints off during refinement the Rfactors
 improve marginally, am I justified in doing this at this resolution?



It's unlikely you can refine without restraints at this resolution.
Perhaps, without restraints the model still ok overall, but I would bet
there are places that get badly distorted, so have a closer look at your
model quality locally (alternative conformations, mobile loops, etc).

Pavel


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-26 Thread Emily Golden
Thanks Yuriy and Pavel,

at this resolution one would expect R/Rfree to be ~ 10-11%/12-13% assuming
you applied anisotropic B-factor refinement ( and probably having  a low
symmetry SG).
R merge of 80% may be OK if I/sig for high res shell is 2.

Yes, I used anisotropic Bfactors and the space group is P1 21 1.  However,
the I/sig is only 1.5 in the highest shell.   Cutting the data such that
the I/sig is 2 has improved the R factors.  Thank you.

Maps get worse Could it be when you use all resolution range you get
59% of missing reflections in highest resolution shell filled in with DFc
for the purpose of map calculation?

Yes! the map that I was looking at was filled.

Emily


On 27 August 2013 09:49, Emily Golden 10417...@student.uwa.edu.au wrote:

 Hi All,

 I have collected diffraction images to 1 Angstrom resolution to the edge
 of the detector and 0.9A to the corner.I collected two sets, one for
 low resolution reflections and one for high resolution reflections.
 I get 100% completeness above 1A and 41% completeness in the 0.9A-0.95A
 shell.

 However, my Rmerge in the highest shelll is not good, ~80%.

 The Rfree is 0.17 and Rwork is 0.16 but the maps look very good.   If I
 cut the data to 1 Angstrom the R factors improve but I feel the maps are
 not as good and I'm not sure if I can justify cutting data.

 So my question is,  should I cut the data to 1Angstrom or should I keep
 the data I have?

 Also, taking geometric restraints off during refinement the Rfactors
 improve marginally, am I justified in doing this at this resolution?

 Thank you,

 Emily