Re: [ccp4bb] refining against weak data and Table I stats

2012-12-13 Thread Felix Frolow
-resolution Rmerge that is only a few percent 
 worse than the average over the PDB at that time is probably considered 
 okay, and the average just keeps increasing over time.
 
 Nevertheless, Rmerge is a useful statistic for evaluating the quality of a 
 diffractometer, provided it is used in the way it was originally defined by 
 Uli Arndt: over the entire dataset for spots with I/sd  3.  At large 
 multiplicity, the  Rmerge calculated this way asymptotically approaches the 
 average % error for measuring a single spot.  If it is more than 5% or so, 
 then there might be something wrong with the camera (or the space group 
 choice, etc).  This is only true for Rmerge of ALL the data, not when it is 
 relegated to a given resolution bin.
 
 
 Perhaps it is time we did have a discussion about what we mean by the 
 resolution of a structure so that some kind of historically relevant and 
 future proof definition for it can be devised? Otherwise, we will probably 
 one day see 1.0 A used to describe what today we would call a 3.0 A 
 structure?  The whole point here is to be able to compare results done by 
 different people at different periods in history to each other, so I think 
 its important to try and keep our definition of resolution stable, even if 
 we do use spots that are beyond it.
 
 So, what I would advise is to refine your model with data out to the 
 resolution limit defined by CC*, but declare the resolution of the 
 structure to be where the merged I/sigma(I) falls to 2. You might even want 
 to calculate your Rmerge, Rcryst, Rfree and all the other R values to this 
 resolution as well, since including a lot of zeroes does nothing but 
 artificially drive up estimates of relative error.  Perhaps we should even 
 take a lesson from our small molecule friends and start reporting R1, 
 where the R factor is computed only for hkls where I/sigma(I) is above 3?
 
 -James Holton
 MAD Scientist
 
 On 12/8/2012 4:04 AM, Miller, Mitchell D. wrote:
 I too like the idea of reporting the table 1 stats vs resolution
 rather than just the overall values and highest resolution shell.
 
 I also wanted to point out an earlier thread from April about the
 limitations of the PDB's defining the resolution as being that of
 the highest resolution reflection (even if data is incomplete or weak).
 https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=376289
 https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=377673
 
 What we have done in the past for cases of low completeness
 in the outer shell is to define the nominal resolution ala Bart
 Hazes' method of same number of reflections as a complete data set and
 use this in the PDB title and describe it in the remark 3 other
 refinement remarks.
   There is also the possibility of adding a comment to the PDB
 remark 2 which we have not used.
 http://www.wwpdb.org/documentation/format33/remarks1.html#REMARK%202
 This should help convince reviewers that you are not trying
 to mis-represent the resolution of the structure.
 
 
 Regards,
 Mitch
 
 -Original Message-
 From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Edward 
 A. Berry
 Sent: Friday, December 07, 2012 8:43 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] refining against weak data and Table I stats
 
 Yes, well, actually i'm only a middle author on that paper for a good
 reason, but I did encourage Rebecca and Stephan to use all the data.
 But on a later, much more modest submission, where the outer shell
 was not only weak but very incomplete (edges of the detector),
 the reviewers found it difficult to evaluate the quality
 of the data (we had also excluded a zone with bad ice-ring
 problems). So we provided a second table, cutting off above
 the ice ring in the good strong data, which convinced them
 that at least it is a decent 2A structure. In the PDB it is
 a 1.6A structure. but there was a lot of good data between
 the ice ring and 1.6 A.
 
 Bart Hazes (I think) suggested a statistic called effective
 resolution which is the resolution to which a complete dataset
 would have the number of reflectionin your dataset, and we
 reported this, which came out to something like 1.75.
 
 I do like the idea of reporting in multiple shells, not just overall
 and highest shell, and the PDB accomodatesthis, even has a GUI
 to enter it in the ADIT 2.0 software. It could also be used to
 report two different overall ranges, such as completeness, 25 to 1.6 A,
 which would be shocking in my case, and 25 to 2.0 which would
 be more reassuring.
 
 eab
 
 Douglas Theobald wrote:
 Hi Ed,
 
 Thanks for the comments.  So what do you recommend?  Refine against weak 
 data, and report all stats in a single Table I?
 
 Looking at your latest V-ATPase structure paper, it appears you favor 
 something like that, since you report a high res shell with I/sigI=1.34

[ccp4bb] FW: [ccp4bb] refining against weak data and Table I stats

2012-12-13 Thread Boaz Shaanan







   




From: בעז שאנן
Sent: Thursday, December 13, 2012 11:42 AM
To: Frank von Delft
Subject: RE: [ccp4bb] refining against weak data and Table I stats




Hi Frank,


Also tNCS screw up the CC*/CC1/2 measure. I played with it on a structure that I'm working on, which has a huge off-origin Patterson peak at 1/3c (60% of the origin) and I get very high CC*/CC1/2 values even when I dig deep into the noise. Nevertheless
 I used this measure to push the resolution from 2.2 to 2. A using the protocol suggested by Karplus and Diederichs, although my results are somewhat different from what they report. I think it's another tool that will show case-by-case dependence.


 Cheers,


   Boaz




Boaz Shaanan, Ph.D.

Dept. of Life Sciences 
Ben-Gurion University of the Negev 
Beer-Sheva 84105 
Israel 
 
E-mail: 
bshaa...@bgu.ac.il
Phone: 972-8-647-2220Skype: boaz.shaanan 
Fax: 972-8-647-2992 or 972-8-646-1710










From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Frank
 von Delft [frank.vonde...@sgc.ox.ac.uk]
Sent: Thursday, December 13, 2012 9:27 AM
To: 
CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] refining against weak data and Table I stats



I like the R1 idea... report CC* and R1. 

Of course, anisotropy screws up everything (what do our small molecule friends know about that - ha!). So earlier in the thread, Ed Berry brought up the effective resolution:

Bart Hazes (I think) suggested a statistic called effective
resolution which is the resolution to which a complete dataset
would have the number of reflections in your dataset.


We just have to settle on how to determine number of reflections - maybe those with I/s  3?

phx




On 13/12/2012 06:52, James Holton wrote:

I think CC* (derived from CC1/2) is an important step forward in how to decide where to cut off the data you give to your refinement program, but I don't think it is a good idea to re-define what we call the resolution of a structure.
 These do NOT have to be the same thing! 

 Remember, what we crystallographers call resolution is actually about 3x the resolution a normal person would use. That is, for most types of imaging whether it be 2D (pictures of Mars) or 3D (such as electron density) the resolution is the minimum
 feature size you can reliably detect in the image. This definition of resolution makes intuitive sense, especially to non-crystallographers. It is also considerably less pessimistic than our current definition since the minimum observable feature size
 in an electron density map is about 1/3 of the d-spacing of the highest-angle spots. This is basically because the d-spacing is the period of a sine wave in space, but the minimum feature size is related to the full-width at half max of this same wave. So,
 all you have to do is change your definition of resolution and a 3.0 A structure becomes a 1.0 A structure!


 However, I think proposing this new way to define resolution in crystallography will be met with some resistance. Why? Because changing the meaning of resolution so drastically after ~100 years would be devastating to its usefulness in structure evaluation.
 I, for one, do not want to have to check the deposition date and see if the structure was solved before or after the end of the world (Dec 2012) before I can figure out whether or not I need to divide or multiply by 3 to get the real resolution of the structure.
 I don't think I'm alone in this. 

Now, calling what used to be a 1.6 A structure a 1.42 A structure (one way to interpret Karplus  Diederichs 2012) is not quite as drastic a change as the one I flippantly propose above, but it is still a change, and there is a real danger of definition creep
 here. Most people these days seem to define the resolution limit of their data at the point where the merged I/sigma(I) drops below 2. However, using CC* = 0.5 would place the new resolution at the point where merged I/sigma(I) drops below 0.5. That's
 definitely going beyond what anyone would have called the resolution of the structure last year. So, which one is it? Is it a 1.6 A structure (refined using data out to 1.42 A), or is it actually a 1.42 A structure?


Unfortunately, if you talk to a number of experienced crystallographers, they will each have a slightly different set of rules for defining the resolution limit that they learned from their thesis advisor, who, in turn, learned it from theirs, etc. Nearly
 all of these rule sets include some reference to Rmerge, but the acceptable Rmerge seems to vary from 30% to as much as 150%, depending on whom you talk to. However, despite this prevalence of Rmerge in our perception of resolution there does not seem
 to be a single publication anywhere in the literature that recommends the use of Rmerge to define the resolution limit. Several papers have been cited to that effect, but then if you go and read them they actually made no such claim.


Mathematically, it is fairly easy to show that Rmerge is wildly unstable as the average

Re: [ccp4bb] refining against weak data and Table I stats

2012-12-13 Thread Douglas Theobald
On Dec 13, 2012, at 1:52 AM, James Holton jmhol...@lbl.gov wrote:

[snip]

 So, what I would advise is to refine your model with data out to the 
 resolution limit defined by CC*, but declare the resolution of the 
 structure to be where the merged I/sigma(I) falls to 2. You might even want 
 to calculate your Rmerge, Rcryst, Rfree and all the other R values to this 
 resolution as well, since including a lot of zeroes does nothing but 
 artificially drive up estimates of relative error.  

So James --- it appears that you basically agree with my proposal?  I.e., 

(1) include all of the data in refinement (at least up to where CC1/2 or CC* is 
still significant)

(2) keep the definition of resolution to what is more-or-less the defacto 
standard (res bin where I/sigI=2), 

(3) report Table I where everything is calculated up to this resolution (where 
I/sigI=2), and 

(4) maybe include in Supp Mat an additional table that reports statistics for 
all the data (I'm leaning towards a table with stats for each res bin)

As you argued, and as I argued, this seems to be a good compromise, one that 
modifies current practice to include weak data, but nevertheless does not 
change the def of resolution or the Table I stats, so that we can still compare 
with legacy structures/stats.


 Perhaps we should even take a lesson from our small molecule friends and 
 start reporting R1, where the R factor is computed only for hkls where 
 I/sigma(I) is above 3?
 
 -James Holton
 MAD Scientist
 
 On 12/8/2012 4:04 AM, Miller, Mitchell D. wrote:
 I too like the idea of reporting the table 1 stats vs resolution
 rather than just the overall values and highest resolution shell.
 
 I also wanted to point out an earlier thread from April about the
 limitations of the PDB's defining the resolution as being that of
 the highest resolution reflection (even if data is incomplete or weak).
 https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=376289
 https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=377673
 
 What we have done in the past for cases of low completeness
 in the outer shell is to define the nominal resolution ala Bart
 Hazes' method of same number of reflections as a complete data set and
 use this in the PDB title and describe it in the remark 3 other
 refinement remarks.
   There is also the possibility of adding a comment to the PDB
 remark 2 which we have not used.
 http://www.wwpdb.org/documentation/format33/remarks1.html#REMARK%202
 This should help convince reviewers that you are not trying
 to mis-represent the resolution of the structure.
 
 
 Regards,
 Mitch
 
 -Original Message-
 From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Edward 
 A. Berry
 Sent: Friday, December 07, 2012 8:43 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] refining against weak data and Table I stats
 
 Yes, well, actually i'm only a middle author on that paper for a good
 reason, but I did encourage Rebecca and Stephan to use all the data.
 But on a later, much more modest submission, where the outer shell
 was not only weak but very incomplete (edges of the detector),
 the reviewers found it difficult to evaluate the quality
 of the data (we had also excluded a zone with bad ice-ring
 problems). So we provided a second table, cutting off above
 the ice ring in the good strong data, which convinced them
 that at least it is a decent 2A structure. In the PDB it is
 a 1.6A structure. but there was a lot of good data between
 the ice ring and 1.6 A.
 
 Bart Hazes (I think) suggested a statistic called effective
 resolution which is the resolution to which a complete dataset
 would have the number of reflectionin your dataset, and we
 reported this, which came out to something like 1.75.
 
 I do like the idea of reporting in multiple shells, not just overall
 and highest shell, and the PDB accomodatesthis, even has a GUI
 to enter it in the ADIT 2.0 software. It could also be used to
 report two different overall ranges, such as completeness, 25 to 1.6 A,
 which would be shocking in my case, and 25 to 2.0 which would
 be more reassuring.
 
 eab
 
 Douglas Theobald wrote:
 Hi Ed,
 
 Thanks for the comments.  So what do you recommend?  Refine against weak 
 data, and report all stats in a single Table I?
 
 Looking at your latest V-ATPase structure paper, it appears you favor 
 something like that, since you report a high res shell with I/sigI=1.34 and 
 Rsym=1.65.
 
 
 On Dec 6, 2012, at 7:24 PM, Edward A. Berryber...@upstate.edu  wrote:
 
 Another consideration here is your PDB deposition. If the reason for using
 weak data is to get a better structure, presumably you are going to deposit
 the structure using all the data. Then the statistics in the PDB file must
 reflect the high resolution refinement.
 
 There are I think three places in the PDB file where the resolution

Re: [ccp4bb] refining against weak data and Table I stats

2012-12-13 Thread Edward A. Berry
/documentation/format33/remarks1.html#REMARK%202
This should help convince reviewers that you are not trying
to mis-represent the resolution of the structure.


Regards,
Mitch

-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Edward A. 
Berry
Sent: Friday, December 07, 2012 8:43 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] refining against weak data and Table I stats

Yes, well, actually i'm only a middle author on that paper for a good
reason, but I did encourage Rebecca and Stephan to use all the data.
But on a later, much more modest submission, where the outer shell
was not only weak but very incomplete (edges of the detector),
the reviewers found it difficult to evaluate the quality
of the data (we had also excluded a zone with bad ice-ring
problems). So we provided a second table, cutting off above
the ice ring in the good strong data, which convinced them
that at least it is a decent 2A structure. In the PDB it is
a 1.6A structure. but there was a lot of good data between
the ice ring and 1.6 A.

Bart Hazes (I think) suggested a statistic called effective
resolution which is the resolution to which a complete dataset
would have the number of reflectionin your dataset, and we
reported this, which came out to something like 1.75.

I do like the idea of reporting in multiple shells, not just overall
and highest shell, and the PDB accomodatesthis, even has a GUI
to enter it in the ADIT 2.0 software. It could also be used to
report two different overall ranges, such as completeness, 25 to 1.6 A,
which would be shocking in my case, and 25 to 2.0 which would
be more reassuring.

eab

Douglas Theobald wrote:

Hi Ed,

Thanks for the comments.  So what do you recommend?  Refine against weak data, 
and report all stats in a single Table I?

Looking at your latest V-ATPase structure paper, it appears you favor something 
like that, since you report a high res shell with I/sigI=1.34 and Rsym=1.65.


On Dec 6, 2012, at 7:24 PM, Edward A. Berryber...@upstate.edu   wrote:


Another consideration here is your PDB deposition. If the reason for using
weak data is to get a better structure, presumably you are going to deposit
the structure using all the data. Then the statistics in the PDB file must
reflect the high resolution refinement.

There are I think three places in the PDB file where the resolution is stated,
but i believe they are all required to be the same and to be equal to the
highest resolution data used (even if there were only two reflections in that 
shell).
Rmerge or Rsymm must be reported, and until recently I think they were not 
allowed
to exceed 1.00 (100% error?).

What are your reviewers going to think if the title of your paper is
structure of protein A at 2.1 A resolution but they check the PDB file
and the resolution was really 1.9 A?  And Rsymm in the PDB is 0.99 but
in your table 1* says 1.3?

Douglas Theobald wrote:

Hello all,

I've followed with interest the discussions here about how we should be refining against weak 
data, e.g. data with I/sigI2 (perhaps using all bins that have a 
significant CC1/2 per Karplus and Diederichs 2012).  This all makes statistical 
sense to me, but now I am wondering how I should report data and model stats in Table I.

Here's what I've come up with: report two Table I's.  For comparability to legacy structure stats, 
report a classic Table I, where I call the resolution whatever bin I/sigI=2.  Use that 
as my high res bin, with high res bin stats reported in parentheses after global stats. 
  Then have another Table (maybe Table I* in supplementary material?) where I report stats for the 
whole dataset, including the weak data I used in refinement.  In both tables report CC1/2 and Rmeas.

This way, I don't redefine the (mostly) conventional usage of resolution, my 
Table I can be compared to precedent, I report stats for all the data and for the model 
against all data, and I take advantage of the information in the weak data during 
refinement.

Thoughts?

Douglas


^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
Douglas L. Theobald
Assistant Professor
Department of Biochemistry
Brandeis University
Waltham, MA  02454-9110

dtheob...@brandeis.edu
http://theobald.brandeis.edu/

  ^\
/`  /^.  / /\
   / / /`/  / . /`
/ /  '   '
'






Re: [ccp4bb] refining against weak data and Table I stats

2012-12-12 Thread James Holton
 definition for it can be devised? Otherwise, we will 
probably one day see 1.0 A used to describe what today we would call a 
3.0 A structure?  The whole point here is to be able to compare results 
done by different people at different periods in history to each other, 
so I think its important to try and keep our definition of resolution 
stable, even if we do use spots that are beyond it.


So, what I would advise is to refine your model with data out to the 
resolution limit defined by CC*, but declare the resolution of the 
structure to be where the merged I/sigma(I) falls to 2. You might even 
want to calculate your Rmerge, Rcryst, Rfree and all the other R values 
to this resolution as well, since including a lot of zeroes does nothing 
but artificially drive up estimates of relative error.  Perhaps we 
should even take a lesson from our small molecule friends and start 
reporting R1, where the R factor is computed only for hkls where 
I/sigma(I) is above 3?


-James Holton
MAD Scientist

On 12/8/2012 4:04 AM, Miller, Mitchell D. wrote:

I too like the idea of reporting the table 1 stats vs resolution
rather than just the overall values and highest resolution shell.

I also wanted to point out an earlier thread from April about the
limitations of the PDB's defining the resolution as being that of
the highest resolution reflection (even if data is incomplete or weak).
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=376289
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=377673

What we have done in the past for cases of low completeness
in the outer shell is to define the nominal resolution ala Bart
Hazes' method of same number of reflections as a complete data set and
use this in the PDB title and describe it in the remark 3 other
refinement remarks.
   There is also the possibility of adding a comment to the PDB
remark 2 which we have not used.
http://www.wwpdb.org/documentation/format33/remarks1.html#REMARK%202
This should help convince reviewers that you are not trying
to mis-represent the resolution of the structure.


Regards,
Mitch

-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Edward A. 
Berry
Sent: Friday, December 07, 2012 8:43 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] refining against weak data and Table I stats

Yes, well, actually i'm only a middle author on that paper for a good
reason, but I did encourage Rebecca and Stephan to use all the data.
But on a later, much more modest submission, where the outer shell
was not only weak but very incomplete (edges of the detector),
the reviewers found it difficult to evaluate the quality
of the data (we had also excluded a zone with bad ice-ring
problems). So we provided a second table, cutting off above
the ice ring in the good strong data, which convinced them
that at least it is a decent 2A structure. In the PDB it is
a 1.6A structure. but there was a lot of good data between
the ice ring and 1.6 A.

Bart Hazes (I think) suggested a statistic called effective
resolution which is the resolution to which a complete dataset
would have the number of reflectionin your dataset, and we
reported this, which came out to something like 1.75.

I do like the idea of reporting in multiple shells, not just overall
and highest shell, and the PDB accomodatesthis, even has a GUI
to enter it in the ADIT 2.0 software. It could also be used to
report two different overall ranges, such as completeness, 25 to 1.6 A,
which would be shocking in my case, and 25 to 2.0 which would
be more reassuring.

eab

Douglas Theobald wrote:

Hi Ed,

Thanks for the comments.  So what do you recommend?  Refine against weak data, 
and report all stats in a single Table I?

Looking at your latest V-ATPase structure paper, it appears you favor something 
like that, since you report a high res shell with I/sigI=1.34 and Rsym=1.65.


On Dec 6, 2012, at 7:24 PM, Edward A. Berryber...@upstate.edu  wrote:


Another consideration here is your PDB deposition. If the reason for using
weak data is to get a better structure, presumably you are going to deposit
the structure using all the data. Then the statistics in the PDB file must
reflect the high resolution refinement.

There are I think three places in the PDB file where the resolution is stated,
but i believe they are all required to be the same and to be equal to the
highest resolution data used (even if there were only two reflections in that 
shell).
Rmerge or Rsymm must be reported, and until recently I think they were not 
allowed
to exceed 1.00 (100% error?).

What are your reviewers going to think if the title of your paper is
structure of protein A at 2.1 A resolution but they check the PDB file
and the resolution was really 1.9 A?  And Rsymm in the PDB is 0.99 but
in your table 1* says 1.3?

Douglas Theobald wrote:

Hello all,

I've followed

Re: [ccp4bb] refining against weak data and Table I stats

2012-12-12 Thread Frank von Delft
 the entire dataset for spots with I/sd  
3.  At large multiplicity, the  Rmerge calculated this way 
asymptotically approaches the average % error for measuring a single 
spot.  If it is more than 5% or so, then there might be something 
wrong with the camera (or the space group choice, etc).  This is only 
true for Rmerge of ALL the data, not when it is relegated to a given 
resolution bin.



 Perhaps it is time we did have a discussion about what we mean by 
the resolution of a structure so that some kind of historically 
relevant and future proof definition for it can be devised? 
Otherwise, we will probably one day see 1.0 A used to describe what 
today we would call a 3.0 A structure?  The whole point here is to be 
able to compare results done by different people at different periods 
in history to each other, so I think its important to try and keep our 
definition of resolution stable, even if we do use spots that are 
beyond it.


So, what I would advise is to refine your model with data out to the 
resolution limit defined by CC*, but declare the resolution of the 
structure to be where the merged I/sigma(I) falls to 2. You might 
even want to calculate your Rmerge, Rcryst, Rfree and all the other R 
values to this resolution as well, since including a lot of zeroes 
does nothing but artificially drive up estimates of relative error.  
Perhaps we should even take a lesson from our small molecule friends 
and start reporting R1, where the R factor is computed only for hkls 
where I/sigma(I) is above 3?


-James Holton
MAD Scientist

On 12/8/2012 4:04 AM, Miller, Mitchell D. wrote:

I too like the idea of reporting the table 1 stats vs resolution
rather than just the overall values and highest resolution shell.

I also wanted to point out an earlier thread from April about the
limitations of the PDB's defining the resolution as being that of
the highest resolution reflection (even if data is incomplete or weak).
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=376289 

https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=377673 



What we have done in the past for cases of low completeness
in the outer shell is to define the nominal resolution ala Bart
Hazes' method of same number of reflections as a complete data set and
use this in the PDB title and describe it in the remark 3 other
refinement remarks.
   There is also the possibility of adding a comment to the PDB
remark 2 which we have not used.
http://www.wwpdb.org/documentation/format33/remarks1.html#REMARK%202
This should help convince reviewers that you are not trying
to mis-represent the resolution of the structure.


Regards,
Mitch

-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of 
Edward A. Berry

Sent: Friday, December 07, 2012 8:43 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] refining against weak data and Table I stats

Yes, well, actually i'm only a middle author on that paper for a good
reason, but I did encourage Rebecca and Stephan to use all the data.
But on a later, much more modest submission, where the outer shell
was not only weak but very incomplete (edges of the detector),
the reviewers found it difficult to evaluate the quality
of the data (we had also excluded a zone with bad ice-ring
problems). So we provided a second table, cutting off above
the ice ring in the good strong data, which convinced them
that at least it is a decent 2A structure. In the PDB it is
a 1.6A structure. but there was a lot of good data between
the ice ring and 1.6 A.

Bart Hazes (I think) suggested a statistic called effective
resolution which is the resolution to which a complete dataset
would have the number of reflectionin your dataset, and we
reported this, which came out to something like 1.75.

I do like the idea of reporting in multiple shells, not just overall
and highest shell, and the PDB accomodatesthis, even has a GUI
to enter it in the ADIT 2.0 software. It could also be used to
report two different overall ranges, such as completeness, 25 to 1.6 A,
which would be shocking in my case, and 25 to 2.0 which would
be more reassuring.

eab

Douglas Theobald wrote:

Hi Ed,

Thanks for the comments.  So what do you recommend?  Refine against 
weak data, and report all stats in a single Table I?


Looking at your latest V-ATPase structure paper, it appears you 
favor something like that, since you report a high res shell with 
I/sigI=1.34 and Rsym=1.65.



On Dec 6, 2012, at 7:24 PM, Edward A. Berryber...@upstate.edu  wrote:

Another consideration here is your PDB deposition. If the reason 
for using
weak data is to get a better structure, presumably you are going to 
deposit
the structure using all the data. Then the statistics in the PDB 
file must

reflect the high resolution refinement.

There are I think three places in the PDB file where the resolution

Re: [ccp4bb] refining against weak data and Table I stats

2012-12-09 Thread DUMAS Philippe (UDS)

Le Vendredi 7 Décembre 2012 18:48 CET, Gerard Bricogne g...@globalphasing.com 
a écrit:

May I add something to Gerard's comment.
In the same vein, provided one does consider two sets of terms with zero mean 
(which corresponds to the proviso mentioned by Gerard), one can define an 
R-factor R as the sine of the same angle leading to a correlation coefficient C 
and one has R^2 + C^2 = 1.
Thus, in some way, on a practical ground, an R-factor is a sensitive criterion 
for higly correlated data, whereas a correlation coefficient is better suited 
for poorly correlated data.
Likely, I just rephrased here  ideas that have been written long time ago in 
well-known papers.
Did I ?
Philippe Dumas



 Dear Zbyszek,

  That is a useful point. Another way of making it is to notice that the
 correlation coefficient between two random variables is the cosine of the
 angle between two vectors of paired values for these, with the proviso that
 the sums of the component values for each vector add up to zero. The fact
 that an angle is involved means that the CC is independent of scale, while
 the fact that it is the cosine of that angle makes it rather insensitive to
 small-ish angles: a cosine remains close to 1.0 for quite a range of angles.

  This is presumably the nature of correlation coefficients you were
 referring to.


  With best wishes,

   Gerard.

 --
 On Fri, Dec 07, 2012 at 11:14:50AM -0600, Zbyszek Otwinowski wrote:
  The difference between one and the correlation coefficient is a square
  function of differences between the datapoints. So rather large 6%

  relative error with 8-fold data multiplicity (redundancy) can lead to
  CC1/2 values about 99.9%.
  It is just the nature of correlation coefficients.
 
  Zbyszek Otwinowski
 
 
 
   Related to this, I've always wondered what CC1/2 values mean for low
   resolution. Not being mathematically inclined, I'm sure this is a naive
   question, but i'll ask anyway - what does CC1/2=100 (or 99.9) mean?
   Does it mean the data is as good as it gets?
  
   Alan
  
  
  
   On 07/12/2012 17:15, Douglas Theobald wrote:
   Hi Boaz,
  
   I read the KK paper as primarily a justification for including

   extremely weak data in refinement (and of course introducing a new
   single statistic that can judge data *and* model quality comparably).
   Using CC1/2 to gauge resolution seems like a good option, but I never
   got from the paper exactly how to do that.  The resolution bin where
   CC1/2=0.5 seems natural, but in my (limited) experience that gives
   almost the same answer as I/sigI=2 (see also KK fig 3).
  
  
  
   On Dec 7, 2012, at 6:21 AM, Boaz Shaanan bshaa...@exchange.bgu.ac.il
   wrote:
  
   Hi,
  
   I'm sure Kay will have something to say  about this but I think the
   idea of the K  K paper was to introduce new (more objective) standards
   for deciding on the resolution, so I don't see why another table is
   needed.
  
   Cheers,
  
  
  
  
  Boaz
  
  
   Boaz Shaanan, Ph.D.
   Dept. of Life Sciences
   Ben-Gurion University of the Negev
   Beer-Sheva 84105
   Israel
  
   E-mail: bshaa...@bgu.ac.il
   Phone: 972-8-647-2220  Skype: boaz.shaanan
   Fax:   972-8-647-2992 or 972-8-646-1710
  
  
  
  
  
   
   From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas
   Theobald [dtheob...@brandeis.edu]
   Sent: Friday, December 07, 2012 1:05 AM
   To: CCP4BB@JISCMAIL.AC.UK
   Subject: [ccp4bb] refining against weak data and Table I stats

  
   Hello all,
  
   I've followed with interest the discussions here about how we should be
   refining against weak data, e.g. data with I/sigI  2 (perhaps using
   all bins that have a significant CC1/2 per Karplus and Diederichs
   2012).  This all makes statistical sense to me, but now I am wondering
   how I should report data and model stats in Table I.
  
   Here's what I've come up with: report two Table I's.  For comparability
   to legacy structure stats, report a classic Table I, where I call the
   resolution whatever bin I/sigI=2.  Use that as my high res bin, with
   high res bin stats reported in parentheses after global stats.   Then
   have another Table (maybe Table I* in supplementary material?) where I
   report stats for the whole dataset, including the weak data I used in
   refinement.  In both tables report CC1/2 and Rmeas.
  
   This way, I don't redefine the (mostly) conventional usage of
   resolution, my Table I can be compared to precedent, I report stats
   for all the data and for the model against all data, and I take
   advantage of the information in the weak data during refinement.
  
   Thoughts?
  
   Douglas
  
  
   ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
   Douglas L. Theobald
   Assistant Professor
   Department of Biochemistry
   Brandeis University
   Waltham, MA  02454-9110
  
   dtheob...@brandeis.edu
   http://theobald.brandeis.edu

Re: [ccp4bb] refining against weak data and Table I stats

2012-12-07 Thread Boaz Shaanan
Hi,

I'm sure Kay will have something to say  about this but I think the idea of the 
K  K paper was to introduce new (more objective) standards for deciding on the 
resolution, so I don't see why another table is needed.

Cheers,




          Boaz


Boaz Shaanan, Ph.D.
Dept. of Life Sciences
Ben-Gurion University of the Negev
Beer-Sheva 84105
Israel

E-mail: bshaa...@bgu.ac.il
Phone: 972-8-647-2220  Skype: boaz.shaanan
Fax:   972-8-647-2992 or 972-8-646-1710






From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas Theobald 
[dtheob...@brandeis.edu]
Sent: Friday, December 07, 2012 1:05 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] refining against weak data and Table I stats

Hello all,

I've followed with interest the discussions here about how we should be 
refining against weak data, e.g. data with I/sigI  2 (perhaps using all bins 
that have a significant CC1/2 per Karplus and Diederichs 2012).  This all 
makes statistical sense to me, but now I am wondering how I should report data 
and model stats in Table I.

Here's what I've come up with: report two Table I's.  For comparability to 
legacy structure stats, report a classic Table I, where I call the resolution 
whatever bin I/sigI=2.  Use that as my high res bin, with high res bin stats 
reported in parentheses after global stats.   Then have another Table (maybe 
Table I* in supplementary material?) where I report stats for the whole 
dataset, including the weak data I used in refinement.  In both tables report 
CC1/2 and Rmeas.

This way, I don't redefine the (mostly) conventional usage of resolution, my 
Table I can be compared to precedent, I report stats for all the data and for 
the model against all data, and I take advantage of the information in the weak 
data during refinement.

Thoughts?

Douglas


^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
Douglas L. Theobald
Assistant Professor
Department of Biochemistry
Brandeis University
Waltham, MA  02454-9110

dtheob...@brandeis.edu
http://theobald.brandeis.edu/

^\
  /`  /^.  / /\
 / / /`/  / . /`
/ /  '   '
'



Re: [ccp4bb] refining against weak data and Table I stats

2012-12-07 Thread Robbie Joosten
Hi Douglas,

Using two Table Is is a good way to show the difference between the two
cut-offs, but I assume you will only discuss one of the models in your
paper. IMO you only need to deposit the high res model, so there should be
no problems with resolution conflicts in the PDB file. The annotators will
probably help you if there is a problem with Rmerge  1.00.

As for the title of your paper: nobody forces you to put a resolution in it
if it causes to much of a stir.

Cheers,
Robbie  

 -Original Message-
 From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of
 Boaz Shaanan
 Sent: Friday, December 07, 2012 12:21
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] refining against weak data and Table I stats
 
 Hi,
 
 I'm sure Kay will have something to say  about this but I think the idea
of the
 K  K paper was to introduce new (more objective) standards for deciding
on
 the resolution, so I don't see why another table is needed.
 
 Cheers,
 
 
 
 
           Boaz
 
 
 Boaz Shaanan, Ph.D.
 Dept. of Life Sciences
 Ben-Gurion University of the Negev
 Beer-Sheva 84105
 Israel
 
 E-mail: bshaa...@bgu.ac.il
 Phone: 972-8-647-2220  Skype: boaz.shaanan
 Fax:   972-8-647-2992 or 972-8-646-1710
 
 
 
 
 
 
 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas
 Theobald [dtheob...@brandeis.edu]
 Sent: Friday, December 07, 2012 1:05 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: [ccp4bb] refining against weak data and Table I stats
 
 Hello all,
 
 I've followed with interest the discussions here about how we should be
 refining against weak data, e.g. data with I/sigI  2 (perhaps using all
bins
 that have a significant CC1/2 per Karplus and Diederichs 2012).  This
all
 makes statistical sense to me, but now I am wondering how I should report
 data and model stats in Table I.
 
 Here's what I've come up with: report two Table I's.  For comparability to
 legacy structure stats, report a classic Table I, where I call the
resolution
 whatever bin I/sigI=2.  Use that as my high res bin, with high res bin
stats
 reported in parentheses after global stats.   Then have another Table
(maybe
 Table I* in supplementary material?) where I report stats for the whole
 dataset, including the weak data I used in refinement.  In both tables
report
 CC1/2 and Rmeas.
 
 This way, I don't redefine the (mostly) conventional usage of
resolution,
 my Table I can be compared to precedent, I report stats for all the data
and
 for the model against all data, and I take advantage of the information in
the
 weak data during refinement.
 
 Thoughts?
 
 Douglas
 
 
 ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
 Douglas L. Theobald
 Assistant Professor
 Department of Biochemistry
 Brandeis University
 Waltham, MA  02454-9110
 
 dtheob...@brandeis.edu
 http://theobald.brandeis.edu/
 
 ^\
   /`  /^.  / /\
  / / /`/  / . /`
 / /  '   '
 '


Re: [ccp4bb] refining against weak data and Table I stats

2012-12-07 Thread Douglas Theobald
Hi Ed,

Thanks for the comments.  So what do you recommend?  Refine against weak data, 
and report all stats in a single Table I?

Looking at your latest V-ATPase structure paper, it appears you favor something 
like that, since you report a high res shell with I/sigI=1.34 and Rsym=1.65.  


On Dec 6, 2012, at 7:24 PM, Edward A. Berry ber...@upstate.edu wrote:

 Another consideration here is your PDB deposition. If the reason for using
 weak data is to get a better structure, presumably you are going to deposit
 the structure using all the data. Then the statistics in the PDB file must
 reflect the high resolution refinement.
 
 There are I think three places in the PDB file where the resolution is stated,
 but i believe they are all required to be the same and to be equal to the
 highest resolution data used (even if there were only two reflections in that 
 shell).
 Rmerge or Rsymm must be reported, and until recently I think they were not 
 allowed
 to exceed 1.00 (100% error?).
 
 What are your reviewers going to think if the title of your paper is
 structure of protein A at 2.1 A resolution but they check the PDB file
 and the resolution was really 1.9 A?  And Rsymm in the PDB is 0.99 but
 in your table 1* says 1.3?
 
 Douglas Theobald wrote:
 Hello all,
 
 I've followed with interest the discussions here about how we should be 
 refining against weak data, e.g. data with I/sigI  2 (perhaps using all 
 bins that have a significant CC1/2 per Karplus and Diederichs 2012).  This 
 all makes statistical sense to me, but now I am wondering how I should 
 report data and model stats in Table I.
 
 Here's what I've come up with: report two Table I's.  For comparability to 
 legacy structure stats, report a classic Table I, where I call the 
 resolution whatever bin I/sigI=2.  Use that as my high res bin, with high 
 res bin stats reported in parentheses after global stats.   Then have 
 another Table (maybe Table I* in supplementary material?) where I report 
 stats for the whole dataset, including the weak data I used in refinement.  
 In both tables report CC1/2 and Rmeas.
 
 This way, I don't redefine the (mostly) conventional usage of resolution, 
 my Table I can be compared to precedent, I report stats for all the data and 
 for the model against all data, and I take advantage of the information in 
 the weak data during refinement.
 
 Thoughts?
 
 Douglas
 
 
 ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
 Douglas L. Theobald
 Assistant Professor
 Department of Biochemistry
 Brandeis University
 Waltham, MA  02454-9110
 
 dtheob...@brandeis.edu
 http://theobald.brandeis.edu/
 
 ^\
   /`  /^.  / /\
  / / /`/  / . /`
 / /  '   '
 '
 
 
 


Re: [ccp4bb] refining against weak data and Table I stats

2012-12-07 Thread Douglas Theobald
Hi Boaz,

I read the KK paper as primarily a justification for including extremely weak 
data in refinement (and of course introducing a new single statistic that can 
judge data *and* model quality comparably).  Using CC1/2 to gauge resolution 
seems like a good option, but I never got from the paper exactly how to do 
that.  The resolution bin where CC1/2=0.5 seems natural, but in my (limited) 
experience that gives almost the same answer as I/sigI=2 (see also KK fig 3).



On Dec 7, 2012, at 6:21 AM, Boaz Shaanan bshaa...@exchange.bgu.ac.il wrote:

 Hi,
 
 I'm sure Kay will have something to say  about this but I think the idea of 
 the K  K paper was to introduce new (more objective) standards for deciding 
 on the resolution, so I don't see why another table is needed.
 
 Cheers,
 
 
 
 
   Boaz
 
 
 Boaz Shaanan, Ph.D.
 Dept. of Life Sciences
 Ben-Gurion University of the Negev
 Beer-Sheva 84105
 Israel
 
 E-mail: bshaa...@bgu.ac.il
 Phone: 972-8-647-2220  Skype: boaz.shaanan
 Fax:   972-8-647-2992 or 972-8-646-1710
 
 
 
 
 
 
 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas 
 Theobald [dtheob...@brandeis.edu]
 Sent: Friday, December 07, 2012 1:05 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: [ccp4bb] refining against weak data and Table I stats
 
 Hello all,
 
 I've followed with interest the discussions here about how we should be 
 refining against weak data, e.g. data with I/sigI  2 (perhaps using all 
 bins that have a significant CC1/2 per Karplus and Diederichs 2012).  This 
 all makes statistical sense to me, but now I am wondering how I should report 
 data and model stats in Table I.
 
 Here's what I've come up with: report two Table I's.  For comparability to 
 legacy structure stats, report a classic Table I, where I call the 
 resolution whatever bin I/sigI=2.  Use that as my high res bin, with high 
 res bin stats reported in parentheses after global stats.   Then have another 
 Table (maybe Table I* in supplementary material?) where I report stats for 
 the whole dataset, including the weak data I used in refinement.  In both 
 tables report CC1/2 and Rmeas.
 
 This way, I don't redefine the (mostly) conventional usage of resolution, 
 my Table I can be compared to precedent, I report stats for all the data and 
 for the model against all data, and I take advantage of the information in 
 the weak data during refinement.
 
 Thoughts?
 
 Douglas
 
 
 ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
 Douglas L. Theobald
 Assistant Professor
 Department of Biochemistry
 Brandeis University
 Waltham, MA  02454-9110
 
 dtheob...@brandeis.edu
 http://theobald.brandeis.edu/
 
^\
  /`  /^.  / /\
 / / /`/  / . /`
 / /  '   '
 '
 


Re: [ccp4bb] refining against weak data and Table I stats

2012-12-07 Thread Alan Cheung
Related to this, I've always wondered what CC1/2 values mean for low 
resolution. Not being mathematically inclined, I'm sure this is a naive 
question, but i'll ask anyway - what does CC1/2=100 (or 99.9) mean? 
Does it mean the data is as good as it gets?


Alan



On 07/12/2012 17:15, Douglas Theobald wrote:

Hi Boaz,

I read the KK paper as primarily a justification for including extremely weak data 
in refinement (and of course introducing a new single statistic that can judge data 
*and* model quality comparably).  Using CC1/2 to gauge resolution seems like a good 
option, but I never got from the paper exactly how to do that.  The resolution bin 
where CC1/2=0.5 seems natural, but in my (limited) experience that gives almost the 
same answer as I/sigI=2 (see also KK fig 3).



On Dec 7, 2012, at 6:21 AM, Boaz Shaanan bshaa...@exchange.bgu.ac.il wrote:


Hi,

I'm sure Kay will have something to say  about this but I think the idea of the K 
 K paper was to introduce new (more objective) standards for deciding on the 
resolution, so I don't see why another table is needed.

Cheers,




   Boaz


Boaz Shaanan, Ph.D.
Dept. of Life Sciences
Ben-Gurion University of the Negev
Beer-Sheva 84105
Israel

E-mail: bshaa...@bgu.ac.il
Phone: 972-8-647-2220  Skype: boaz.shaanan
Fax:   972-8-647-2992 or 972-8-646-1710






From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas Theobald 
[dtheob...@brandeis.edu]
Sent: Friday, December 07, 2012 1:05 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] refining against weak data and Table I stats

Hello all,

I've followed with interest the discussions here about how we should be refining against weak 
data, e.g. data with I/sigI  2 (perhaps using all bins that have a 
significant CC1/2 per Karplus and Diederichs 2012).  This all makes statistical 
sense to me, but now I am wondering how I should report data and model stats in Table I.

Here's what I've come up with: report two Table I's.  For comparability to legacy structure stats, 
report a classic Table I, where I call the resolution whatever bin I/sigI=2.  Use that 
as my high res bin, with high res bin stats reported in parentheses after global stats. 
  Then have another Table (maybe Table I* in supplementary material?) where I report stats for the 
whole dataset, including the weak data I used in refinement.  In both tables report CC1/2 and Rmeas.

This way, I don't redefine the (mostly) conventional usage of resolution, my 
Table I can be compared to precedent, I report stats for all the data and for the model 
against all data, and I take advantage of the information in the weak data during 
refinement.

Thoughts?

Douglas


^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
Douglas L. Theobald
Assistant Professor
Department of Biochemistry
Brandeis University
Waltham, MA  02454-9110

dtheob...@brandeis.edu
http://theobald.brandeis.edu/

^\
  /`  /^.  / /\
/ / /`/  / . /`
/ /  '   '
'






--
Alan Cheung
Gene Center
Ludwig-Maximilians-University
Feodor-Lynen-Str. 25
81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:  +49-89-2180-76999
E-mail: che...@lmb.uni-muenchen.de


Re: [ccp4bb] refining against weak data and Table I stats

2012-12-07 Thread Phil Evans
It is internally consistent, though not necessarily correct


On 7 Dec 2012, at 16:23, Alan Cheung wrote:

 Related to this, I've always wondered what CC1/2 values mean for low 
 resolution. Not being mathematically inclined, I'm sure this is a naive 
 question, but i'll ask anyway - what does CC1/2=100 (or 99.9) mean? Does it 
 mean the data is as good as it gets?
 
 Alan
 
 
 
 On 07/12/2012 17:15, Douglas Theobald wrote:
 Hi Boaz,
 
 I read the KK paper as primarily a justification for including extremely 
 weak data in refinement (and of course introducing a new single statistic 
 that can judge data *and* model quality comparably).  Using CC1/2 to gauge 
 resolution seems like a good option, but I never got from the paper exactly 
 how to do that.  The resolution bin where CC1/2=0.5 seems natural, but in my 
 (limited) experience that gives almost the same answer as I/sigI=2 (see also 
 KK fig 3).
 
 
 
 On Dec 7, 2012, at 6:21 AM, Boaz Shaanan bshaa...@exchange.bgu.ac.il wrote:
 
 Hi,
 
 I'm sure Kay will have something to say  about this but I think the idea of 
 the K  K paper was to introduce new (more objective) standards for 
 deciding on the resolution, so I don't see why another table is needed.
 
 Cheers,
 
 
 
 
   Boaz
 
 
 Boaz Shaanan, Ph.D.
 Dept. of Life Sciences
 Ben-Gurion University of the Negev
 Beer-Sheva 84105
 Israel
 
 E-mail: bshaa...@bgu.ac.il
 Phone: 972-8-647-2220  Skype: boaz.shaanan
 Fax:   972-8-647-2992 or 972-8-646-1710
 
 
 
 
 
 
 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas 
 Theobald [dtheob...@brandeis.edu]
 Sent: Friday, December 07, 2012 1:05 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: [ccp4bb] refining against weak data and Table I stats
 
 Hello all,
 
 I've followed with interest the discussions here about how we should be 
 refining against weak data, e.g. data with I/sigI  2 (perhaps using all 
 bins that have a significant CC1/2 per Karplus and Diederichs 2012).  
 This all makes statistical sense to me, but now I am wondering how I should 
 report data and model stats in Table I.
 
 Here's what I've come up with: report two Table I's.  For comparability to 
 legacy structure stats, report a classic Table I, where I call the 
 resolution whatever bin I/sigI=2.  Use that as my high res bin, with high 
 res bin stats reported in parentheses after global stats.   Then have 
 another Table (maybe Table I* in supplementary material?) where I report 
 stats for the whole dataset, including the weak data I used in refinement.  
 In both tables report CC1/2 and Rmeas.
 
 This way, I don't redefine the (mostly) conventional usage of resolution, 
 my Table I can be compared to precedent, I report stats for all the data 
 and for the model against all data, and I take advantage of the information 
 in the weak data during refinement.
 
 Thoughts?
 
 Douglas
 
 
 ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
 Douglas L. Theobald
 Assistant Professor
 Department of Biochemistry
 Brandeis University
 Waltham, MA  02454-9110
 
 dtheob...@brandeis.edu
 http://theobald.brandeis.edu/
 
^\
  /`  /^.  / /\
 / / /`/  / . /`
 / /  '   '
 '
 
 
 
 
 -- 
 Alan Cheung
 Gene Center
 Ludwig-Maximilians-University
 Feodor-Lynen-Str. 25
 81377 Munich
 Germany
 Phone:  +49-89-2180-76845
 Fax:  +49-89-2180-76999
 E-mail: che...@lmb.uni-muenchen.de


Re: [ccp4bb] refining against weak data and Table I stats

2012-12-07 Thread Edward A. Berry

Yes, well, actually i'm only a middle author on that paper for a good
reason, but I did encourage Rebecca and Stephan to use all the data.
But on a later, much more modest submission, where the outer shell
was not only weak but very incomplete (edges of the detector),
the reviewers found it difficult to evaluate the quality
of the data (we had also excluded a zone with bad ice-ring
problems). So we provided a second table, cutting off above
the ice ring in the good strong data, which convinced them
that at least it is a decent 2A structure. In the PDB it is
a 1.6A structure. but there was a lot of good data between
the ice ring and 1.6 A.

Bart Hazes (I think) suggested a statistic called effective
resolution which is the resolution to which a complete dataset
would have the number of reflectionin your dataset, and we
reported this, which came out to something like 1.75.

I do like the idea of reporting in multiple shells, not just overall
and highest shell, and the PDB accomodatesthis, even has a GUI
to enter it in the ADIT 2.0 software. It could also be used to
report two different overall ranges, such as completeness, 25 to 1.6 A,
which would be shocking in my case, and 25 to 2.0 which would
be more reassuring.

eab

Douglas Theobald wrote:

Hi Ed,

Thanks for the comments.  So what do you recommend?  Refine against weak data, 
and report all stats in a single Table I?

Looking at your latest V-ATPase structure paper, it appears you favor something 
like that, since you report a high res shell with I/sigI=1.34 and Rsym=1.65.


On Dec 6, 2012, at 7:24 PM, Edward A. Berryber...@upstate.edu  wrote:


Another consideration here is your PDB deposition. If the reason for using
weak data is to get a better structure, presumably you are going to deposit
the structure using all the data. Then the statistics in the PDB file must
reflect the high resolution refinement.

There are I think three places in the PDB file where the resolution is stated,
but i believe they are all required to be the same and to be equal to the
highest resolution data used (even if there were only two reflections in that 
shell).
Rmerge or Rsymm must be reported, and until recently I think they were not 
allowed
to exceed 1.00 (100% error?).

What are your reviewers going to think if the title of your paper is
structure of protein A at 2.1 A resolution but they check the PDB file
and the resolution was really 1.9 A?  And Rsymm in the PDB is 0.99 but
in your table 1* says 1.3?

Douglas Theobald wrote:

Hello all,

I've followed with interest the discussions here about how we should be refining against weak 
data, e.g. data with I/sigI   2 (perhaps using all bins that have a 
significant CC1/2 per Karplus and Diederichs 2012).  This all makes statistical 
sense to me, but now I am wondering how I should report data and model stats in Table I.

Here's what I've come up with: report two Table I's.  For comparability to legacy structure stats, 
report a classic Table I, where I call the resolution whatever bin I/sigI=2.  Use that 
as my high res bin, with high res bin stats reported in parentheses after global stats. 
  Then have another Table (maybe Table I* in supplementary material?) where I report stats for the 
whole dataset, including the weak data I used in refinement.  In both tables report CC1/2 and Rmeas.

This way, I don't redefine the (mostly) conventional usage of resolution, my 
Table I can be compared to precedent, I report stats for all the data and for the model 
against all data, and I take advantage of the information in the weak data during 
refinement.

Thoughts?

Douglas


^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
Douglas L. Theobald
Assistant Professor
Department of Biochemistry
Brandeis University
Waltham, MA  02454-9110

dtheob...@brandeis.edu
http://theobald.brandeis.edu/

 ^\
   /`  /^.  / /\
  / / /`/  / . /`
/ /  '   '
'








Re: [ccp4bb] refining against weak data and Table I stats

2012-12-07 Thread Douglas Theobald
A good way to think about it is that if CC1/2=100%, that means you can split 
the data in half, and use one half to perfectly predict the corresponding 
values of the other half. So yes, perfect internal consistency.


On Dec 7, 2012, at 11:41 AM, Phil Evans p...@mrc-lmb.cam.ac.uk wrote:

 It is internally consistent, though not necessarily correct
 
 
 On 7 Dec 2012, at 16:23, Alan Cheung wrote:
 
 Related to this, I've always wondered what CC1/2 values mean for low 
 resolution. Not being mathematically inclined, I'm sure this is a naive 
 question, but i'll ask anyway - what does CC1/2=100 (or 99.9) mean? Does it 
 mean the data is as good as it gets?
 
 Alan
 
 
 
 On 07/12/2012 17:15, Douglas Theobald wrote:
 Hi Boaz,
 
 I read the KK paper as primarily a justification for including extremely 
 weak data in refinement (and of course introducing a new single statistic 
 that can judge data *and* model quality comparably).  Using CC1/2 to gauge 
 resolution seems like a good option, but I never got from the paper exactly 
 how to do that.  The resolution bin where CC1/2=0.5 seems natural, but in 
 my (limited) experience that gives almost the same answer as I/sigI=2 (see 
 also KK fig 3).
 
 
 
 On Dec 7, 2012, at 6:21 AM, Boaz Shaanan bshaa...@exchange.bgu.ac.il 
 wrote:
 
 Hi,
 
 I'm sure Kay will have something to say  about this but I think the idea 
 of the K  K paper was to introduce new (more objective) standards for 
 deciding on the resolution, so I don't see why another table is needed.
 
 Cheers,
 
 
 
 
  Boaz
 
 
 Boaz Shaanan, Ph.D.
 Dept. of Life Sciences
 Ben-Gurion University of the Negev
 Beer-Sheva 84105
 Israel
 
 E-mail: bshaa...@bgu.ac.il
 Phone: 972-8-647-2220  Skype: boaz.shaanan
 Fax:   972-8-647-2992 or 972-8-646-1710
 
 
 
 
 
 
 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas 
 Theobald [dtheob...@brandeis.edu]
 Sent: Friday, December 07, 2012 1:05 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: [ccp4bb] refining against weak data and Table I stats
 
 Hello all,
 
 I've followed with interest the discussions here about how we should be 
 refining against weak data, e.g. data with I/sigI  2 (perhaps using all 
 bins that have a significant CC1/2 per Karplus and Diederichs 2012).  
 This all makes statistical sense to me, but now I am wondering how I 
 should report data and model stats in Table I.
 
 Here's what I've come up with: report two Table I's.  For comparability to 
 legacy structure stats, report a classic Table I, where I call the 
 resolution whatever bin I/sigI=2.  Use that as my high res bin, with 
 high res bin stats reported in parentheses after global stats.   Then have 
 another Table (maybe Table I* in supplementary material?) where I report 
 stats for the whole dataset, including the weak data I used in refinement. 
  In both tables report CC1/2 and Rmeas.
 
 This way, I don't redefine the (mostly) conventional usage of 
 resolution, my Table I can be compared to precedent, I report stats for 
 all the data and for the model against all data, and I take advantage of 
 the information in the weak data during refinement.
 
 Thoughts?
 
 Douglas
 
 
 ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
 Douglas L. Theobald
 Assistant Professor
 Department of Biochemistry
 Brandeis University
 Waltham, MA  02454-9110
 
 dtheob...@brandeis.edu
 http://theobald.brandeis.edu/
 
   ^\
 /`  /^.  / /\
 / / /`/  / . /`
 / /  '   '
 '
 
 
 
 
 -- 
 Alan Cheung
 Gene Center
 Ludwig-Maximilians-University
 Feodor-Lynen-Str. 25
 81377 Munich
 Germany
 Phone:  +49-89-2180-76845
 Fax:  +49-89-2180-76999
 E-mail: che...@lmb.uni-muenchen.de


Re: [ccp4bb] refining against weak data and Table I stats

2012-12-07 Thread Zbyszek Otwinowski
The difference between one and the correlation coefficient is a square
function of differences between the datapoints. So rather large 6%
relative error with 8-fold data multiplicity (redundancy) can lead to
CC1/2 values about 99.9%.
It is just the nature of correlation coefficients.

Zbyszek Otwinowski



 Related to this, I've always wondered what CC1/2 values mean for low
 resolution. Not being mathematically inclined, I'm sure this is a naive
 question, but i'll ask anyway - what does CC1/2=100 (or 99.9) mean?
 Does it mean the data is as good as it gets?

 Alan



 On 07/12/2012 17:15, Douglas Theobald wrote:
 Hi Boaz,

 I read the KK paper as primarily a justification for including
 extremely weak data in refinement (and of course introducing a new
 single statistic that can judge data *and* model quality comparably).
 Using CC1/2 to gauge resolution seems like a good option, but I never
 got from the paper exactly how to do that.  The resolution bin where
 CC1/2=0.5 seems natural, but in my (limited) experience that gives
 almost the same answer as I/sigI=2 (see also KK fig 3).



 On Dec 7, 2012, at 6:21 AM, Boaz Shaanan bshaa...@exchange.bgu.ac.il
 wrote:

 Hi,

 I'm sure Kay will have something to say  about this but I think the
 idea of the K  K paper was to introduce new (more objective) standards
 for deciding on the resolution, so I don't see why another table is
 needed.

 Cheers,




Boaz


 Boaz Shaanan, Ph.D.
 Dept. of Life Sciences
 Ben-Gurion University of the Negev
 Beer-Sheva 84105
 Israel

 E-mail: bshaa...@bgu.ac.il
 Phone: 972-8-647-2220  Skype: boaz.shaanan
 Fax:   972-8-647-2992 or 972-8-646-1710





 
 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas
 Theobald [dtheob...@brandeis.edu]
 Sent: Friday, December 07, 2012 1:05 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: [ccp4bb] refining against weak data and Table I stats

 Hello all,

 I've followed with interest the discussions here about how we should be
 refining against weak data, e.g. data with I/sigI  2 (perhaps using
 all bins that have a significant CC1/2 per Karplus and Diederichs
 2012).  This all makes statistical sense to me, but now I am wondering
 how I should report data and model stats in Table I.

 Here's what I've come up with: report two Table I's.  For comparability
 to legacy structure stats, report a classic Table I, where I call the
 resolution whatever bin I/sigI=2.  Use that as my high res bin, with
 high res bin stats reported in parentheses after global stats.   Then
 have another Table (maybe Table I* in supplementary material?) where I
 report stats for the whole dataset, including the weak data I used in
 refinement.  In both tables report CC1/2 and Rmeas.

 This way, I don't redefine the (mostly) conventional usage of
 resolution, my Table I can be compared to precedent, I report stats
 for all the data and for the model against all data, and I take
 advantage of the information in the weak data during refinement.

 Thoughts?

 Douglas


 ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
 Douglas L. Theobald
 Assistant Professor
 Department of Biochemistry
 Brandeis University
 Waltham, MA  02454-9110

 dtheob...@brandeis.edu
 http://theobald.brandeis.edu/

 ^\
   /`  /^.  / /\
 / / /`/  / . /`
 / /  '   '
 '




 --
 Alan Cheung
 Gene Center
 Ludwig-Maximilians-University
 Feodor-Lynen-Str. 25
 81377 Munich
 Germany
 Phone:  +49-89-2180-76845
 Fax:  +49-89-2180-76999
 E-mail: che...@lmb.uni-muenchen.de



Re: [ccp4bb] refining against weak data and Table I stats

2012-12-07 Thread Miller, Mitchell D.
I too like the idea of reporting the table 1 stats vs resolution 
rather than just the overall values and highest resolution shell.  

I also wanted to point out an earlier thread from April about the 
limitations of the PDB's defining the resolution as being that of 
the highest resolution reflection (even if data is incomplete or weak).  
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=376289
 
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=377673
 

What we have done in the past for cases of low completeness
in the outer shell is to define the nominal resolution ala Bart
Hazes' method of same number of reflections as a complete data set and
use this in the PDB title and describe it in the remark 3 other
refinement remarks.  
  There is also the possibility of adding a comment to the PDB 
remark 2 which we have not used.
http://www.wwpdb.org/documentation/format33/remarks1.html#REMARK%202 
This should help convince reviewers that you are not trying
to mis-represent the resolution of the structure.


Regards,
Mitch

-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Edward A. 
Berry
Sent: Friday, December 07, 2012 8:43 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] refining against weak data and Table I stats

Yes, well, actually i'm only a middle author on that paper for a good
reason, but I did encourage Rebecca and Stephan to use all the data.
But on a later, much more modest submission, where the outer shell
was not only weak but very incomplete (edges of the detector),
the reviewers found it difficult to evaluate the quality
of the data (we had also excluded a zone with bad ice-ring
problems). So we provided a second table, cutting off above
the ice ring in the good strong data, which convinced them
that at least it is a decent 2A structure. In the PDB it is
a 1.6A structure. but there was a lot of good data between
the ice ring and 1.6 A.

Bart Hazes (I think) suggested a statistic called effective
resolution which is the resolution to which a complete dataset
would have the number of reflectionin your dataset, and we
reported this, which came out to something like 1.75.

I do like the idea of reporting in multiple shells, not just overall
and highest shell, and the PDB accomodatesthis, even has a GUI
to enter it in the ADIT 2.0 software. It could also be used to
report two different overall ranges, such as completeness, 25 to 1.6 A,
which would be shocking in my case, and 25 to 2.0 which would
be more reassuring.

eab

Douglas Theobald wrote:
 Hi Ed,

 Thanks for the comments.  So what do you recommend?  Refine against weak 
 data, and report all stats in a single Table I?

 Looking at your latest V-ATPase structure paper, it appears you favor 
 something like that, since you report a high res shell with I/sigI=1.34 and 
 Rsym=1.65.


 On Dec 6, 2012, at 7:24 PM, Edward A. Berryber...@upstate.edu  wrote:

 Another consideration here is your PDB deposition. If the reason for using
 weak data is to get a better structure, presumably you are going to deposit
 the structure using all the data. Then the statistics in the PDB file must
 reflect the high resolution refinement.

 There are I think three places in the PDB file where the resolution is 
 stated,
 but i believe they are all required to be the same and to be equal to the
 highest resolution data used (even if there were only two reflections in 
 that shell).
 Rmerge or Rsymm must be reported, and until recently I think they were not 
 allowed
 to exceed 1.00 (100% error?).

 What are your reviewers going to think if the title of your paper is
 structure of protein A at 2.1 A resolution but they check the PDB file
 and the resolution was really 1.9 A?  And Rsymm in the PDB is 0.99 but
 in your table 1* says 1.3?

 Douglas Theobald wrote:
 Hello all,

 I've followed with interest the discussions here about how we should be 
 refining against weak data, e.g. data with I/sigI   2 (perhaps using all 
 bins that have a significant CC1/2 per Karplus and Diederichs 2012).  
 This all makes statistical sense to me, but now I am wondering how I should 
 report data and model stats in Table I.

 Here's what I've come up with: report two Table I's.  For comparability to 
 legacy structure stats, report a classic Table I, where I call the 
 resolution whatever bin I/sigI=2.  Use that as my high res bin, with high 
 res bin stats reported in parentheses after global stats.   Then have 
 another Table (maybe Table I* in supplementary material?) where I report 
 stats for the whole dataset, including the weak data I used in refinement.  
 In both tables report CC1/2 and Rmeas.

 This way, I don't redefine the (mostly) conventional usage of resolution, 
 my Table I can be compared to precedent, I report stats for all the data 
 and for the model against all data, and I take

Re: [ccp4bb] refining against weak data and Table I stats

2012-12-07 Thread Alan Cheung
I was confused because it seemed like CC1/2 wasn't very informative at 
lower resolution since (in my datasets) they were all 99.9-100.  So if 
i've understood this correctly (and i'm honestly not sure that i have) 
could CC1/2 be useful to show the quality of low resolution data, given 
more precision?



On 07/12/2012 18:14, Zbyszek Otwinowski wrote:

The difference between one and the correlation coefficient is a square
function of differences between the datapoints. So rather large 6%
relative error with 8-fold data multiplicity (redundancy) can lead to
CC1/2 values about 99.9%.
It is just the nature of correlation coefficients.

Zbyszek Otwinowski




Related to this, I've always wondered what CC1/2 values mean for low
resolution. Not being mathematically inclined, I'm sure this is a naive
question, but i'll ask anyway - what does CC1/2=100 (or 99.9) mean?
Does it mean the data is as good as it gets?

Alan



On 07/12/2012 17:15, Douglas Theobald wrote:

Hi Boaz,

I read the KK paper as primarily a justification for including
extremely weak data in refinement (and of course introducing a new
single statistic that can judge data *and* model quality comparably).
Using CC1/2 to gauge resolution seems like a good option, but I never
got from the paper exactly how to do that.  The resolution bin where
CC1/2=0.5 seems natural, but in my (limited) experience that gives
almost the same answer as I/sigI=2 (see also KK fig 3).



On Dec 7, 2012, at 6:21 AM, Boaz Shaanan bshaa...@exchange.bgu.ac.il
wrote:


Hi,

I'm sure Kay will have something to say  about this but I think the
idea of the K  K paper was to introduce new (more objective) standards
for deciding on the resolution, so I don't see why another table is
needed.

Cheers,




Boaz


Boaz Shaanan, Ph.D.
Dept. of Life Sciences
Ben-Gurion University of the Negev
Beer-Sheva 84105
Israel

E-mail: bshaa...@bgu.ac.il
Phone: 972-8-647-2220  Skype: boaz.shaanan
Fax:   972-8-647-2992 or 972-8-646-1710






From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas
Theobald [dtheob...@brandeis.edu]
Sent: Friday, December 07, 2012 1:05 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] refining against weak data and Table I stats

Hello all,

I've followed with interest the discussions here about how we should be
refining against weak data, e.g. data with I/sigI  2 (perhaps using
all bins that have a significant CC1/2 per Karplus and Diederichs
2012).  This all makes statistical sense to me, but now I am wondering
how I should report data and model stats in Table I.

Here's what I've come up with: report two Table I's.  For comparability
to legacy structure stats, report a classic Table I, where I call the
resolution whatever bin I/sigI=2.  Use that as my high res bin, with
high res bin stats reported in parentheses after global stats.   Then
have another Table (maybe Table I* in supplementary material?) where I
report stats for the whole dataset, including the weak data I used in
refinement.  In both tables report CC1/2 and Rmeas.

This way, I don't redefine the (mostly) conventional usage of
resolution, my Table I can be compared to precedent, I report stats
for all the data and for the model against all data, and I take
advantage of the information in the weak data during refinement.

Thoughts?

Douglas


^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
Douglas L. Theobald
Assistant Professor
Department of Biochemistry
Brandeis University
Waltham, MA  02454-9110

dtheob...@brandeis.edu
http://theobald.brandeis.edu/

 ^\
   /`  /^.  / /\
/ / /`/  / . /`
/ /  '   '
'






--
Alan Cheung
Gene Center
Ludwig-Maximilians-University
Feodor-Lynen-Str. 25
81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:  +49-89-2180-76999
E-mail: che...@lmb.uni-muenchen.de






--
Alan Cheung
Gene Center
Ludwig-Maximilians-University
Feodor-Lynen-Str. 25
81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:  +49-89-2180-76999
E-mail: che...@lmb.uni-muenchen.de


Re: [ccp4bb] refining against weak data and Table I stats

2012-12-07 Thread Gerard Bricogne
Dear Zbyszek,

 That is a useful point. Another way of making it is to notice that the
correlation coefficient between two random variables is the cosine of the
angle between two vectors of paired values for these, with the proviso that
the sums of the component values for each vector add up to zero. The fact
that an angle is involved means that the CC is independent of scale, while
the fact that it is the cosine of that angle makes it rather insensitive to
small-ish angles: a cosine remains close to 1.0 for quite a range of angles.

 This is presumably the nature of correlation coefficients you were
referring to.


 With best wishes,
 
  Gerard.

--
On Fri, Dec 07, 2012 at 11:14:50AM -0600, Zbyszek Otwinowski wrote:
 The difference between one and the correlation coefficient is a square
 function of differences between the datapoints. So rather large 6%
 relative error with 8-fold data multiplicity (redundancy) can lead to
 CC1/2 values about 99.9%.
 It is just the nature of correlation coefficients.
 
 Zbyszek Otwinowski
 
 
 
  Related to this, I've always wondered what CC1/2 values mean for low
  resolution. Not being mathematically inclined, I'm sure this is a naive
  question, but i'll ask anyway - what does CC1/2=100 (or 99.9) mean?
  Does it mean the data is as good as it gets?
 
  Alan
 
 
 
  On 07/12/2012 17:15, Douglas Theobald wrote:
  Hi Boaz,
 
  I read the KK paper as primarily a justification for including
  extremely weak data in refinement (and of course introducing a new
  single statistic that can judge data *and* model quality comparably).
  Using CC1/2 to gauge resolution seems like a good option, but I never
  got from the paper exactly how to do that.  The resolution bin where
  CC1/2=0.5 seems natural, but in my (limited) experience that gives
  almost the same answer as I/sigI=2 (see also KK fig 3).
 
 
 
  On Dec 7, 2012, at 6:21 AM, Boaz Shaanan bshaa...@exchange.bgu.ac.il
  wrote:
 
  Hi,
 
  I'm sure Kay will have something to say  about this but I think the
  idea of the K  K paper was to introduce new (more objective) standards
  for deciding on the resolution, so I don't see why another table is
  needed.
 
  Cheers,
 
 
 
 
 Boaz
 
 
  Boaz Shaanan, Ph.D.
  Dept. of Life Sciences
  Ben-Gurion University of the Negev
  Beer-Sheva 84105
  Israel
 
  E-mail: bshaa...@bgu.ac.il
  Phone: 972-8-647-2220  Skype: boaz.shaanan
  Fax:   972-8-647-2992 or 972-8-646-1710
 
 
 
 
 
  
  From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas
  Theobald [dtheob...@brandeis.edu]
  Sent: Friday, December 07, 2012 1:05 AM
  To: CCP4BB@JISCMAIL.AC.UK
  Subject: [ccp4bb] refining against weak data and Table I stats
 
  Hello all,
 
  I've followed with interest the discussions here about how we should be
  refining against weak data, e.g. data with I/sigI  2 (perhaps using
  all bins that have a significant CC1/2 per Karplus and Diederichs
  2012).  This all makes statistical sense to me, but now I am wondering
  how I should report data and model stats in Table I.
 
  Here's what I've come up with: report two Table I's.  For comparability
  to legacy structure stats, report a classic Table I, where I call the
  resolution whatever bin I/sigI=2.  Use that as my high res bin, with
  high res bin stats reported in parentheses after global stats.   Then
  have another Table (maybe Table I* in supplementary material?) where I
  report stats for the whole dataset, including the weak data I used in
  refinement.  In both tables report CC1/2 and Rmeas.
 
  This way, I don't redefine the (mostly) conventional usage of
  resolution, my Table I can be compared to precedent, I report stats
  for all the data and for the model against all data, and I take
  advantage of the information in the weak data during refinement.
 
  Thoughts?
 
  Douglas
 
 
  ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
  Douglas L. Theobald
  Assistant Professor
  Department of Biochemistry
  Brandeis University
  Waltham, MA  02454-9110
 
  dtheob...@brandeis.edu
  http://theobald.brandeis.edu/
 
  ^\
/`  /^.  / /\
  / / /`/  / . /`
  / /  '   '
  '
 
 
 
 
  --
  Alan Cheung
  Gene Center
  Ludwig-Maximilians-University
  Feodor-Lynen-Str. 25
  81377 Munich
  Germany
  Phone:  +49-89-2180-76845
  Fax:  +49-89-2180-76999
  E-mail: che...@lmb.uni-muenchen.de
 

-- 

 ===
 * *
 * Gerard Bricogne g...@globalphasing.com  *
 * *
 * Global Phasing Ltd. *
 * Sheraton House, Castle Park Tel: +44-(0)1223-353033 *
 * Cambridge CB3 0AX, UK   Fax: +44-(0)1223-366889 *
 * *
 ===


[ccp4bb] refining against weak data and Table I stats

2012-12-06 Thread Douglas Theobald
Hello all,

I've followed with interest the discussions here about how we should be 
refining against weak data, e.g. data with I/sigI  2 (perhaps using all bins 
that have a significant CC1/2 per Karplus and Diederichs 2012).  This all 
makes statistical sense to me, but now I am wondering how I should report data 
and model stats in Table I.  

Here's what I've come up with: report two Table I's.  For comparability to 
legacy structure stats, report a classic Table I, where I call the resolution 
whatever bin I/sigI=2.  Use that as my high res bin, with high res bin stats 
reported in parentheses after global stats.   Then have another Table (maybe 
Table I* in supplementary material?) where I report stats for the whole 
dataset, including the weak data I used in refinement.  In both tables report 
CC1/2 and Rmeas.  

This way, I don't redefine the (mostly) conventional usage of resolution, my 
Table I can be compared to precedent, I report stats for all the data and for 
the model against all data, and I take advantage of the information in the weak 
data during refinement. 

Thoughts?

Douglas


^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
Douglas L. Theobald
Assistant Professor
Department of Biochemistry
Brandeis University
Waltham, MA  02454-9110

dtheob...@brandeis.edu
http://theobald.brandeis.edu/

^\
  /`  /^.  / /\
 / / /`/  / . /`
/ /  '   '
'

 

Re: [ccp4bb] refining against weak data and Table I stats

2012-12-06 Thread Edward A. Berry

Another consideration here is your PDB deposition. If the reason for using
weak data is to get a better structure, presumably you are going to deposit
the structure using all the data. Then the statistics in the PDB file must
reflect the high resolution refinement.

There are I think three places in the PDB file where the resolution is stated,
but i believe they are all required to be the same and to be equal to the
highest resolution data used (even if there were only two reflections in that 
shell).
Rmerge or Rsymm must be reported, and until recently I think they were not 
allowed
to exceed 1.00 (100% error?).

What are your reviewers going to think if the title of your paper is
structure of protein A at 2.1 A resolution but they check the PDB file
and the resolution was really 1.9 A?  And Rsymm in the PDB is 0.99 but
in your table 1* says 1.3?

Douglas Theobald wrote:

Hello all,

I've followed with interest the discussions here about how we should be refining against weak 
data, e.g. data with I/sigI  2 (perhaps using all bins that have a 
significant CC1/2 per Karplus and Diederichs 2012).  This all makes statistical 
sense to me, but now I am wondering how I should report data and model stats in Table I.

Here's what I've come up with: report two Table I's.  For comparability to legacy structure stats, 
report a classic Table I, where I call the resolution whatever bin I/sigI=2.  Use that 
as my high res bin, with high res bin stats reported in parentheses after global stats. 
  Then have another Table (maybe Table I* in supplementary material?) where I report stats for the 
whole dataset, including the weak data I used in refinement.  In both tables report CC1/2 and Rmeas.

This way, I don't redefine the (mostly) conventional usage of resolution, my 
Table I can be compared to precedent, I report stats for all the data and for the model 
against all data, and I take advantage of the information in the weak data during 
refinement.

Thoughts?

Douglas


^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
Douglas L. Theobald
Assistant Professor
Department of Biochemistry
Brandeis University
Waltham, MA  02454-9110

dtheob...@brandeis.edu
http://theobald.brandeis.edu/

 ^\
   /`  /^.  / /\
  / / /`/  / . /`
/ /  '   '
'