Re: [ccp4bb] refining against weak data and Table I stats
-resolution Rmerge that is only a few percent worse than the average over the PDB at that time is probably considered okay, and the average just keeps increasing over time. Nevertheless, Rmerge is a useful statistic for evaluating the quality of a diffractometer, provided it is used in the way it was originally defined by Uli Arndt: over the entire dataset for spots with I/sd 3. At large multiplicity, the Rmerge calculated this way asymptotically approaches the average % error for measuring a single spot. If it is more than 5% or so, then there might be something wrong with the camera (or the space group choice, etc). This is only true for Rmerge of ALL the data, not when it is relegated to a given resolution bin. Perhaps it is time we did have a discussion about what we mean by the resolution of a structure so that some kind of historically relevant and future proof definition for it can be devised? Otherwise, we will probably one day see 1.0 A used to describe what today we would call a 3.0 A structure? The whole point here is to be able to compare results done by different people at different periods in history to each other, so I think its important to try and keep our definition of resolution stable, even if we do use spots that are beyond it. So, what I would advise is to refine your model with data out to the resolution limit defined by CC*, but declare the resolution of the structure to be where the merged I/sigma(I) falls to 2. You might even want to calculate your Rmerge, Rcryst, Rfree and all the other R values to this resolution as well, since including a lot of zeroes does nothing but artificially drive up estimates of relative error. Perhaps we should even take a lesson from our small molecule friends and start reporting R1, where the R factor is computed only for hkls where I/sigma(I) is above 3? -James Holton MAD Scientist On 12/8/2012 4:04 AM, Miller, Mitchell D. wrote: I too like the idea of reporting the table 1 stats vs resolution rather than just the overall values and highest resolution shell. I also wanted to point out an earlier thread from April about the limitations of the PDB's defining the resolution as being that of the highest resolution reflection (even if data is incomplete or weak). https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=376289 https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=377673 What we have done in the past for cases of low completeness in the outer shell is to define the nominal resolution ala Bart Hazes' method of same number of reflections as a complete data set and use this in the PDB title and describe it in the remark 3 other refinement remarks. There is also the possibility of adding a comment to the PDB remark 2 which we have not used. http://www.wwpdb.org/documentation/format33/remarks1.html#REMARK%202 This should help convince reviewers that you are not trying to mis-represent the resolution of the structure. Regards, Mitch -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Edward A. Berry Sent: Friday, December 07, 2012 8:43 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] refining against weak data and Table I stats Yes, well, actually i'm only a middle author on that paper for a good reason, but I did encourage Rebecca and Stephan to use all the data. But on a later, much more modest submission, where the outer shell was not only weak but very incomplete (edges of the detector), the reviewers found it difficult to evaluate the quality of the data (we had also excluded a zone with bad ice-ring problems). So we provided a second table, cutting off above the ice ring in the good strong data, which convinced them that at least it is a decent 2A structure. In the PDB it is a 1.6A structure. but there was a lot of good data between the ice ring and 1.6 A. Bart Hazes (I think) suggested a statistic called effective resolution which is the resolution to which a complete dataset would have the number of reflectionin your dataset, and we reported this, which came out to something like 1.75. I do like the idea of reporting in multiple shells, not just overall and highest shell, and the PDB accomodatesthis, even has a GUI to enter it in the ADIT 2.0 software. It could also be used to report two different overall ranges, such as completeness, 25 to 1.6 A, which would be shocking in my case, and 25 to 2.0 which would be more reassuring. eab Douglas Theobald wrote: Hi Ed, Thanks for the comments. So what do you recommend? Refine against weak data, and report all stats in a single Table I? Looking at your latest V-ATPase structure paper, it appears you favor something like that, since you report a high res shell with I/sigI=1.34
[ccp4bb] FW: [ccp4bb] refining against weak data and Table I stats
From: בעז שאנן Sent: Thursday, December 13, 2012 11:42 AM To: Frank von Delft Subject: RE: [ccp4bb] refining against weak data and Table I stats Hi Frank, Also tNCS screw up the CC*/CC1/2 measure. I played with it on a structure that I'm working on, which has a huge off-origin Patterson peak at 1/3c (60% of the origin) and I get very high CC*/CC1/2 values even when I dig deep into the noise. Nevertheless I used this measure to push the resolution from 2.2 to 2. A using the protocol suggested by Karplus and Diederichs, although my results are somewhat different from what they report. I think it's another tool that will show case-by-case dependence. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Frank von Delft [frank.vonde...@sgc.ox.ac.uk] Sent: Thursday, December 13, 2012 9:27 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] refining against weak data and Table I stats I like the R1 idea... report CC* and R1. Of course, anisotropy screws up everything (what do our small molecule friends know about that - ha!). So earlier in the thread, Ed Berry brought up the effective resolution: Bart Hazes (I think) suggested a statistic called effective resolution which is the resolution to which a complete dataset would have the number of reflections in your dataset. We just have to settle on how to determine number of reflections - maybe those with I/s 3? phx On 13/12/2012 06:52, James Holton wrote: I think CC* (derived from CC1/2) is an important step forward in how to decide where to cut off the data you give to your refinement program, but I don't think it is a good idea to re-define what we call the resolution of a structure. These do NOT have to be the same thing! Remember, what we crystallographers call resolution is actually about 3x the resolution a normal person would use. That is, for most types of imaging whether it be 2D (pictures of Mars) or 3D (such as electron density) the resolution is the minimum feature size you can reliably detect in the image. This definition of resolution makes intuitive sense, especially to non-crystallographers. It is also considerably less pessimistic than our current definition since the minimum observable feature size in an electron density map is about 1/3 of the d-spacing of the highest-angle spots. This is basically because the d-spacing is the period of a sine wave in space, but the minimum feature size is related to the full-width at half max of this same wave. So, all you have to do is change your definition of resolution and a 3.0 A structure becomes a 1.0 A structure! However, I think proposing this new way to define resolution in crystallography will be met with some resistance. Why? Because changing the meaning of resolution so drastically after ~100 years would be devastating to its usefulness in structure evaluation. I, for one, do not want to have to check the deposition date and see if the structure was solved before or after the end of the world (Dec 2012) before I can figure out whether or not I need to divide or multiply by 3 to get the real resolution of the structure. I don't think I'm alone in this. Now, calling what used to be a 1.6 A structure a 1.42 A structure (one way to interpret Karplus Diederichs 2012) is not quite as drastic a change as the one I flippantly propose above, but it is still a change, and there is a real danger of definition creep here. Most people these days seem to define the resolution limit of their data at the point where the merged I/sigma(I) drops below 2. However, using CC* = 0.5 would place the new resolution at the point where merged I/sigma(I) drops below 0.5. That's definitely going beyond what anyone would have called the resolution of the structure last year. So, which one is it? Is it a 1.6 A structure (refined using data out to 1.42 A), or is it actually a 1.42 A structure? Unfortunately, if you talk to a number of experienced crystallographers, they will each have a slightly different set of rules for defining the resolution limit that they learned from their thesis advisor, who, in turn, learned it from theirs, etc. Nearly all of these rule sets include some reference to Rmerge, but the acceptable Rmerge seems to vary from 30% to as much as 150%, depending on whom you talk to. However, despite this prevalence of Rmerge in our perception of resolution there does not seem to be a single publication anywhere in the literature that recommends the use of Rmerge to define the resolution limit. Several papers have been cited to that effect, but then if you go and read them they actually made no such claim. Mathematically, it is fairly easy to show that Rmerge is wildly unstable as the average
Re: [ccp4bb] refining against weak data and Table I stats
On Dec 13, 2012, at 1:52 AM, James Holton jmhol...@lbl.gov wrote: [snip] So, what I would advise is to refine your model with data out to the resolution limit defined by CC*, but declare the resolution of the structure to be where the merged I/sigma(I) falls to 2. You might even want to calculate your Rmerge, Rcryst, Rfree and all the other R values to this resolution as well, since including a lot of zeroes does nothing but artificially drive up estimates of relative error. So James --- it appears that you basically agree with my proposal? I.e., (1) include all of the data in refinement (at least up to where CC1/2 or CC* is still significant) (2) keep the definition of resolution to what is more-or-less the defacto standard (res bin where I/sigI=2), (3) report Table I where everything is calculated up to this resolution (where I/sigI=2), and (4) maybe include in Supp Mat an additional table that reports statistics for all the data (I'm leaning towards a table with stats for each res bin) As you argued, and as I argued, this seems to be a good compromise, one that modifies current practice to include weak data, but nevertheless does not change the def of resolution or the Table I stats, so that we can still compare with legacy structures/stats. Perhaps we should even take a lesson from our small molecule friends and start reporting R1, where the R factor is computed only for hkls where I/sigma(I) is above 3? -James Holton MAD Scientist On 12/8/2012 4:04 AM, Miller, Mitchell D. wrote: I too like the idea of reporting the table 1 stats vs resolution rather than just the overall values and highest resolution shell. I also wanted to point out an earlier thread from April about the limitations of the PDB's defining the resolution as being that of the highest resolution reflection (even if data is incomplete or weak). https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=376289 https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=377673 What we have done in the past for cases of low completeness in the outer shell is to define the nominal resolution ala Bart Hazes' method of same number of reflections as a complete data set and use this in the PDB title and describe it in the remark 3 other refinement remarks. There is also the possibility of adding a comment to the PDB remark 2 which we have not used. http://www.wwpdb.org/documentation/format33/remarks1.html#REMARK%202 This should help convince reviewers that you are not trying to mis-represent the resolution of the structure. Regards, Mitch -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Edward A. Berry Sent: Friday, December 07, 2012 8:43 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] refining against weak data and Table I stats Yes, well, actually i'm only a middle author on that paper for a good reason, but I did encourage Rebecca and Stephan to use all the data. But on a later, much more modest submission, where the outer shell was not only weak but very incomplete (edges of the detector), the reviewers found it difficult to evaluate the quality of the data (we had also excluded a zone with bad ice-ring problems). So we provided a second table, cutting off above the ice ring in the good strong data, which convinced them that at least it is a decent 2A structure. In the PDB it is a 1.6A structure. but there was a lot of good data between the ice ring and 1.6 A. Bart Hazes (I think) suggested a statistic called effective resolution which is the resolution to which a complete dataset would have the number of reflectionin your dataset, and we reported this, which came out to something like 1.75. I do like the idea of reporting in multiple shells, not just overall and highest shell, and the PDB accomodatesthis, even has a GUI to enter it in the ADIT 2.0 software. It could also be used to report two different overall ranges, such as completeness, 25 to 1.6 A, which would be shocking in my case, and 25 to 2.0 which would be more reassuring. eab Douglas Theobald wrote: Hi Ed, Thanks for the comments. So what do you recommend? Refine against weak data, and report all stats in a single Table I? Looking at your latest V-ATPase structure paper, it appears you favor something like that, since you report a high res shell with I/sigI=1.34 and Rsym=1.65. On Dec 6, 2012, at 7:24 PM, Edward A. Berryber...@upstate.edu wrote: Another consideration here is your PDB deposition. If the reason for using weak data is to get a better structure, presumably you are going to deposit the structure using all the data. Then the statistics in the PDB file must reflect the high resolution refinement. There are I think three places in the PDB file where the resolution
Re: [ccp4bb] refining against weak data and Table I stats
/documentation/format33/remarks1.html#REMARK%202 This should help convince reviewers that you are not trying to mis-represent the resolution of the structure. Regards, Mitch -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Edward A. Berry Sent: Friday, December 07, 2012 8:43 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] refining against weak data and Table I stats Yes, well, actually i'm only a middle author on that paper for a good reason, but I did encourage Rebecca and Stephan to use all the data. But on a later, much more modest submission, where the outer shell was not only weak but very incomplete (edges of the detector), the reviewers found it difficult to evaluate the quality of the data (we had also excluded a zone with bad ice-ring problems). So we provided a second table, cutting off above the ice ring in the good strong data, which convinced them that at least it is a decent 2A structure. In the PDB it is a 1.6A structure. but there was a lot of good data between the ice ring and 1.6 A. Bart Hazes (I think) suggested a statistic called effective resolution which is the resolution to which a complete dataset would have the number of reflectionin your dataset, and we reported this, which came out to something like 1.75. I do like the idea of reporting in multiple shells, not just overall and highest shell, and the PDB accomodatesthis, even has a GUI to enter it in the ADIT 2.0 software. It could also be used to report two different overall ranges, such as completeness, 25 to 1.6 A, which would be shocking in my case, and 25 to 2.0 which would be more reassuring. eab Douglas Theobald wrote: Hi Ed, Thanks for the comments. So what do you recommend? Refine against weak data, and report all stats in a single Table I? Looking at your latest V-ATPase structure paper, it appears you favor something like that, since you report a high res shell with I/sigI=1.34 and Rsym=1.65. On Dec 6, 2012, at 7:24 PM, Edward A. Berryber...@upstate.edu wrote: Another consideration here is your PDB deposition. If the reason for using weak data is to get a better structure, presumably you are going to deposit the structure using all the data. Then the statistics in the PDB file must reflect the high resolution refinement. There are I think three places in the PDB file where the resolution is stated, but i believe they are all required to be the same and to be equal to the highest resolution data used (even if there were only two reflections in that shell). Rmerge or Rsymm must be reported, and until recently I think they were not allowed to exceed 1.00 (100% error?). What are your reviewers going to think if the title of your paper is structure of protein A at 2.1 A resolution but they check the PDB file and the resolution was really 1.9 A? And Rsymm in the PDB is 0.99 but in your table 1* says 1.3? Douglas Theobald wrote: Hello all, I've followed with interest the discussions here about how we should be refining against weak data, e.g. data with I/sigI2 (perhaps using all bins that have a significant CC1/2 per Karplus and Diederichs 2012). This all makes statistical sense to me, but now I am wondering how I should report data and model stats in Table I. Here's what I've come up with: report two Table I's. For comparability to legacy structure stats, report a classic Table I, where I call the resolution whatever bin I/sigI=2. Use that as my high res bin, with high res bin stats reported in parentheses after global stats. Then have another Table (maybe Table I* in supplementary material?) where I report stats for the whole dataset, including the weak data I used in refinement. In both tables report CC1/2 and Rmeas. This way, I don't redefine the (mostly) conventional usage of resolution, my Table I can be compared to precedent, I report stats for all the data and for the model against all data, and I take advantage of the information in the weak data during refinement. Thoughts? Douglas ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` Douglas L. Theobald Assistant Professor Department of Biochemistry Brandeis University Waltham, MA 02454-9110 dtheob...@brandeis.edu http://theobald.brandeis.edu/ ^\ /` /^. / /\ / / /`/ / . /` / / ' ' '
Re: [ccp4bb] refining against weak data and Table I stats
definition for it can be devised? Otherwise, we will probably one day see 1.0 A used to describe what today we would call a 3.0 A structure? The whole point here is to be able to compare results done by different people at different periods in history to each other, so I think its important to try and keep our definition of resolution stable, even if we do use spots that are beyond it. So, what I would advise is to refine your model with data out to the resolution limit defined by CC*, but declare the resolution of the structure to be where the merged I/sigma(I) falls to 2. You might even want to calculate your Rmerge, Rcryst, Rfree and all the other R values to this resolution as well, since including a lot of zeroes does nothing but artificially drive up estimates of relative error. Perhaps we should even take a lesson from our small molecule friends and start reporting R1, where the R factor is computed only for hkls where I/sigma(I) is above 3? -James Holton MAD Scientist On 12/8/2012 4:04 AM, Miller, Mitchell D. wrote: I too like the idea of reporting the table 1 stats vs resolution rather than just the overall values and highest resolution shell. I also wanted to point out an earlier thread from April about the limitations of the PDB's defining the resolution as being that of the highest resolution reflection (even if data is incomplete or weak). https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=376289 https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=377673 What we have done in the past for cases of low completeness in the outer shell is to define the nominal resolution ala Bart Hazes' method of same number of reflections as a complete data set and use this in the PDB title and describe it in the remark 3 other refinement remarks. There is also the possibility of adding a comment to the PDB remark 2 which we have not used. http://www.wwpdb.org/documentation/format33/remarks1.html#REMARK%202 This should help convince reviewers that you are not trying to mis-represent the resolution of the structure. Regards, Mitch -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Edward A. Berry Sent: Friday, December 07, 2012 8:43 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] refining against weak data and Table I stats Yes, well, actually i'm only a middle author on that paper for a good reason, but I did encourage Rebecca and Stephan to use all the data. But on a later, much more modest submission, where the outer shell was not only weak but very incomplete (edges of the detector), the reviewers found it difficult to evaluate the quality of the data (we had also excluded a zone with bad ice-ring problems). So we provided a second table, cutting off above the ice ring in the good strong data, which convinced them that at least it is a decent 2A structure. In the PDB it is a 1.6A structure. but there was a lot of good data between the ice ring and 1.6 A. Bart Hazes (I think) suggested a statistic called effective resolution which is the resolution to which a complete dataset would have the number of reflectionin your dataset, and we reported this, which came out to something like 1.75. I do like the idea of reporting in multiple shells, not just overall and highest shell, and the PDB accomodatesthis, even has a GUI to enter it in the ADIT 2.0 software. It could also be used to report two different overall ranges, such as completeness, 25 to 1.6 A, which would be shocking in my case, and 25 to 2.0 which would be more reassuring. eab Douglas Theobald wrote: Hi Ed, Thanks for the comments. So what do you recommend? Refine against weak data, and report all stats in a single Table I? Looking at your latest V-ATPase structure paper, it appears you favor something like that, since you report a high res shell with I/sigI=1.34 and Rsym=1.65. On Dec 6, 2012, at 7:24 PM, Edward A. Berryber...@upstate.edu wrote: Another consideration here is your PDB deposition. If the reason for using weak data is to get a better structure, presumably you are going to deposit the structure using all the data. Then the statistics in the PDB file must reflect the high resolution refinement. There are I think three places in the PDB file where the resolution is stated, but i believe they are all required to be the same and to be equal to the highest resolution data used (even if there were only two reflections in that shell). Rmerge or Rsymm must be reported, and until recently I think they were not allowed to exceed 1.00 (100% error?). What are your reviewers going to think if the title of your paper is structure of protein A at 2.1 A resolution but they check the PDB file and the resolution was really 1.9 A? And Rsymm in the PDB is 0.99 but in your table 1* says 1.3? Douglas Theobald wrote: Hello all, I've followed
Re: [ccp4bb] refining against weak data and Table I stats
the entire dataset for spots with I/sd 3. At large multiplicity, the Rmerge calculated this way asymptotically approaches the average % error for measuring a single spot. If it is more than 5% or so, then there might be something wrong with the camera (or the space group choice, etc). This is only true for Rmerge of ALL the data, not when it is relegated to a given resolution bin. Perhaps it is time we did have a discussion about what we mean by the resolution of a structure so that some kind of historically relevant and future proof definition for it can be devised? Otherwise, we will probably one day see 1.0 A used to describe what today we would call a 3.0 A structure? The whole point here is to be able to compare results done by different people at different periods in history to each other, so I think its important to try and keep our definition of resolution stable, even if we do use spots that are beyond it. So, what I would advise is to refine your model with data out to the resolution limit defined by CC*, but declare the resolution of the structure to be where the merged I/sigma(I) falls to 2. You might even want to calculate your Rmerge, Rcryst, Rfree and all the other R values to this resolution as well, since including a lot of zeroes does nothing but artificially drive up estimates of relative error. Perhaps we should even take a lesson from our small molecule friends and start reporting R1, where the R factor is computed only for hkls where I/sigma(I) is above 3? -James Holton MAD Scientist On 12/8/2012 4:04 AM, Miller, Mitchell D. wrote: I too like the idea of reporting the table 1 stats vs resolution rather than just the overall values and highest resolution shell. I also wanted to point out an earlier thread from April about the limitations of the PDB's defining the resolution as being that of the highest resolution reflection (even if data is incomplete or weak). https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=376289 https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=377673 What we have done in the past for cases of low completeness in the outer shell is to define the nominal resolution ala Bart Hazes' method of same number of reflections as a complete data set and use this in the PDB title and describe it in the remark 3 other refinement remarks. There is also the possibility of adding a comment to the PDB remark 2 which we have not used. http://www.wwpdb.org/documentation/format33/remarks1.html#REMARK%202 This should help convince reviewers that you are not trying to mis-represent the resolution of the structure. Regards, Mitch -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Edward A. Berry Sent: Friday, December 07, 2012 8:43 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] refining against weak data and Table I stats Yes, well, actually i'm only a middle author on that paper for a good reason, but I did encourage Rebecca and Stephan to use all the data. But on a later, much more modest submission, where the outer shell was not only weak but very incomplete (edges of the detector), the reviewers found it difficult to evaluate the quality of the data (we had also excluded a zone with bad ice-ring problems). So we provided a second table, cutting off above the ice ring in the good strong data, which convinced them that at least it is a decent 2A structure. In the PDB it is a 1.6A structure. but there was a lot of good data between the ice ring and 1.6 A. Bart Hazes (I think) suggested a statistic called effective resolution which is the resolution to which a complete dataset would have the number of reflectionin your dataset, and we reported this, which came out to something like 1.75. I do like the idea of reporting in multiple shells, not just overall and highest shell, and the PDB accomodatesthis, even has a GUI to enter it in the ADIT 2.0 software. It could also be used to report two different overall ranges, such as completeness, 25 to 1.6 A, which would be shocking in my case, and 25 to 2.0 which would be more reassuring. eab Douglas Theobald wrote: Hi Ed, Thanks for the comments. So what do you recommend? Refine against weak data, and report all stats in a single Table I? Looking at your latest V-ATPase structure paper, it appears you favor something like that, since you report a high res shell with I/sigI=1.34 and Rsym=1.65. On Dec 6, 2012, at 7:24 PM, Edward A. Berryber...@upstate.edu wrote: Another consideration here is your PDB deposition. If the reason for using weak data is to get a better structure, presumably you are going to deposit the structure using all the data. Then the statistics in the PDB file must reflect the high resolution refinement. There are I think three places in the PDB file where the resolution
Re: [ccp4bb] refining against weak data and Table I stats
Le Vendredi 7 Décembre 2012 18:48 CET, Gerard Bricogne g...@globalphasing.com a écrit: May I add something to Gerard's comment. In the same vein, provided one does consider two sets of terms with zero mean (which corresponds to the proviso mentioned by Gerard), one can define an R-factor R as the sine of the same angle leading to a correlation coefficient C and one has R^2 + C^2 = 1. Thus, in some way, on a practical ground, an R-factor is a sensitive criterion for higly correlated data, whereas a correlation coefficient is better suited for poorly correlated data. Likely, I just rephrased here ideas that have been written long time ago in well-known papers. Did I ? Philippe Dumas Dear Zbyszek, That is a useful point. Another way of making it is to notice that the correlation coefficient between two random variables is the cosine of the angle between two vectors of paired values for these, with the proviso that the sums of the component values for each vector add up to zero. The fact that an angle is involved means that the CC is independent of scale, while the fact that it is the cosine of that angle makes it rather insensitive to small-ish angles: a cosine remains close to 1.0 for quite a range of angles. This is presumably the nature of correlation coefficients you were referring to. With best wishes, Gerard. -- On Fri, Dec 07, 2012 at 11:14:50AM -0600, Zbyszek Otwinowski wrote: The difference between one and the correlation coefficient is a square function of differences between the datapoints. So rather large 6% relative error with 8-fold data multiplicity (redundancy) can lead to CC1/2 values about 99.9%. It is just the nature of correlation coefficients. Zbyszek Otwinowski Related to this, I've always wondered what CC1/2 values mean for low resolution. Not being mathematically inclined, I'm sure this is a naive question, but i'll ask anyway - what does CC1/2=100 (or 99.9) mean? Does it mean the data is as good as it gets? Alan On 07/12/2012 17:15, Douglas Theobald wrote: Hi Boaz, I read the KK paper as primarily a justification for including extremely weak data in refinement (and of course introducing a new single statistic that can judge data *and* model quality comparably). Using CC1/2 to gauge resolution seems like a good option, but I never got from the paper exactly how to do that. The resolution bin where CC1/2=0.5 seems natural, but in my (limited) experience that gives almost the same answer as I/sigI=2 (see also KK fig 3). On Dec 7, 2012, at 6:21 AM, Boaz Shaanan bshaa...@exchange.bgu.ac.il wrote: Hi, I'm sure Kay will have something to say about this but I think the idea of the K K paper was to introduce new (more objective) standards for deciding on the resolution, so I don't see why another table is needed. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas Theobald [dtheob...@brandeis.edu] Sent: Friday, December 07, 2012 1:05 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] refining against weak data and Table I stats Hello all, I've followed with interest the discussions here about how we should be refining against weak data, e.g. data with I/sigI 2 (perhaps using all bins that have a significant CC1/2 per Karplus and Diederichs 2012). This all makes statistical sense to me, but now I am wondering how I should report data and model stats in Table I. Here's what I've come up with: report two Table I's. For comparability to legacy structure stats, report a classic Table I, where I call the resolution whatever bin I/sigI=2. Use that as my high res bin, with high res bin stats reported in parentheses after global stats. Then have another Table (maybe Table I* in supplementary material?) where I report stats for the whole dataset, including the weak data I used in refinement. In both tables report CC1/2 and Rmeas. This way, I don't redefine the (mostly) conventional usage of resolution, my Table I can be compared to precedent, I report stats for all the data and for the model against all data, and I take advantage of the information in the weak data during refinement. Thoughts? Douglas ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` Douglas L. Theobald Assistant Professor Department of Biochemistry Brandeis University Waltham, MA 02454-9110 dtheob...@brandeis.edu http://theobald.brandeis.edu
Re: [ccp4bb] refining against weak data and Table I stats
Hi, I'm sure Kay will have something to say about this but I think the idea of the K K paper was to introduce new (more objective) standards for deciding on the resolution, so I don't see why another table is needed. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas Theobald [dtheob...@brandeis.edu] Sent: Friday, December 07, 2012 1:05 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] refining against weak data and Table I stats Hello all, I've followed with interest the discussions here about how we should be refining against weak data, e.g. data with I/sigI 2 (perhaps using all bins that have a significant CC1/2 per Karplus and Diederichs 2012). This all makes statistical sense to me, but now I am wondering how I should report data and model stats in Table I. Here's what I've come up with: report two Table I's. For comparability to legacy structure stats, report a classic Table I, where I call the resolution whatever bin I/sigI=2. Use that as my high res bin, with high res bin stats reported in parentheses after global stats. Then have another Table (maybe Table I* in supplementary material?) where I report stats for the whole dataset, including the weak data I used in refinement. In both tables report CC1/2 and Rmeas. This way, I don't redefine the (mostly) conventional usage of resolution, my Table I can be compared to precedent, I report stats for all the data and for the model against all data, and I take advantage of the information in the weak data during refinement. Thoughts? Douglas ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` Douglas L. Theobald Assistant Professor Department of Biochemistry Brandeis University Waltham, MA 02454-9110 dtheob...@brandeis.edu http://theobald.brandeis.edu/ ^\ /` /^. / /\ / / /`/ / . /` / / ' ' '
Re: [ccp4bb] refining against weak data and Table I stats
Hi Douglas, Using two Table Is is a good way to show the difference between the two cut-offs, but I assume you will only discuss one of the models in your paper. IMO you only need to deposit the high res model, so there should be no problems with resolution conflicts in the PDB file. The annotators will probably help you if there is a problem with Rmerge 1.00. As for the title of your paper: nobody forces you to put a resolution in it if it causes to much of a stir. Cheers, Robbie -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Boaz Shaanan Sent: Friday, December 07, 2012 12:21 To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] refining against weak data and Table I stats Hi, I'm sure Kay will have something to say about this but I think the idea of the K K paper was to introduce new (more objective) standards for deciding on the resolution, so I don't see why another table is needed. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas Theobald [dtheob...@brandeis.edu] Sent: Friday, December 07, 2012 1:05 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] refining against weak data and Table I stats Hello all, I've followed with interest the discussions here about how we should be refining against weak data, e.g. data with I/sigI 2 (perhaps using all bins that have a significant CC1/2 per Karplus and Diederichs 2012). This all makes statistical sense to me, but now I am wondering how I should report data and model stats in Table I. Here's what I've come up with: report two Table I's. For comparability to legacy structure stats, report a classic Table I, where I call the resolution whatever bin I/sigI=2. Use that as my high res bin, with high res bin stats reported in parentheses after global stats. Then have another Table (maybe Table I* in supplementary material?) where I report stats for the whole dataset, including the weak data I used in refinement. In both tables report CC1/2 and Rmeas. This way, I don't redefine the (mostly) conventional usage of resolution, my Table I can be compared to precedent, I report stats for all the data and for the model against all data, and I take advantage of the information in the weak data during refinement. Thoughts? Douglas ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` Douglas L. Theobald Assistant Professor Department of Biochemistry Brandeis University Waltham, MA 02454-9110 dtheob...@brandeis.edu http://theobald.brandeis.edu/ ^\ /` /^. / /\ / / /`/ / . /` / / ' ' '
Re: [ccp4bb] refining against weak data and Table I stats
Hi Ed, Thanks for the comments. So what do you recommend? Refine against weak data, and report all stats in a single Table I? Looking at your latest V-ATPase structure paper, it appears you favor something like that, since you report a high res shell with I/sigI=1.34 and Rsym=1.65. On Dec 6, 2012, at 7:24 PM, Edward A. Berry ber...@upstate.edu wrote: Another consideration here is your PDB deposition. If the reason for using weak data is to get a better structure, presumably you are going to deposit the structure using all the data. Then the statistics in the PDB file must reflect the high resolution refinement. There are I think three places in the PDB file where the resolution is stated, but i believe they are all required to be the same and to be equal to the highest resolution data used (even if there were only two reflections in that shell). Rmerge or Rsymm must be reported, and until recently I think they were not allowed to exceed 1.00 (100% error?). What are your reviewers going to think if the title of your paper is structure of protein A at 2.1 A resolution but they check the PDB file and the resolution was really 1.9 A? And Rsymm in the PDB is 0.99 but in your table 1* says 1.3? Douglas Theobald wrote: Hello all, I've followed with interest the discussions here about how we should be refining against weak data, e.g. data with I/sigI 2 (perhaps using all bins that have a significant CC1/2 per Karplus and Diederichs 2012). This all makes statistical sense to me, but now I am wondering how I should report data and model stats in Table I. Here's what I've come up with: report two Table I's. For comparability to legacy structure stats, report a classic Table I, where I call the resolution whatever bin I/sigI=2. Use that as my high res bin, with high res bin stats reported in parentheses after global stats. Then have another Table (maybe Table I* in supplementary material?) where I report stats for the whole dataset, including the weak data I used in refinement. In both tables report CC1/2 and Rmeas. This way, I don't redefine the (mostly) conventional usage of resolution, my Table I can be compared to precedent, I report stats for all the data and for the model against all data, and I take advantage of the information in the weak data during refinement. Thoughts? Douglas ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` Douglas L. Theobald Assistant Professor Department of Biochemistry Brandeis University Waltham, MA 02454-9110 dtheob...@brandeis.edu http://theobald.brandeis.edu/ ^\ /` /^. / /\ / / /`/ / . /` / / ' ' '
Re: [ccp4bb] refining against weak data and Table I stats
Hi Boaz, I read the KK paper as primarily a justification for including extremely weak data in refinement (and of course introducing a new single statistic that can judge data *and* model quality comparably). Using CC1/2 to gauge resolution seems like a good option, but I never got from the paper exactly how to do that. The resolution bin where CC1/2=0.5 seems natural, but in my (limited) experience that gives almost the same answer as I/sigI=2 (see also KK fig 3). On Dec 7, 2012, at 6:21 AM, Boaz Shaanan bshaa...@exchange.bgu.ac.il wrote: Hi, I'm sure Kay will have something to say about this but I think the idea of the K K paper was to introduce new (more objective) standards for deciding on the resolution, so I don't see why another table is needed. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas Theobald [dtheob...@brandeis.edu] Sent: Friday, December 07, 2012 1:05 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] refining against weak data and Table I stats Hello all, I've followed with interest the discussions here about how we should be refining against weak data, e.g. data with I/sigI 2 (perhaps using all bins that have a significant CC1/2 per Karplus and Diederichs 2012). This all makes statistical sense to me, but now I am wondering how I should report data and model stats in Table I. Here's what I've come up with: report two Table I's. For comparability to legacy structure stats, report a classic Table I, where I call the resolution whatever bin I/sigI=2. Use that as my high res bin, with high res bin stats reported in parentheses after global stats. Then have another Table (maybe Table I* in supplementary material?) where I report stats for the whole dataset, including the weak data I used in refinement. In both tables report CC1/2 and Rmeas. This way, I don't redefine the (mostly) conventional usage of resolution, my Table I can be compared to precedent, I report stats for all the data and for the model against all data, and I take advantage of the information in the weak data during refinement. Thoughts? Douglas ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` Douglas L. Theobald Assistant Professor Department of Biochemistry Brandeis University Waltham, MA 02454-9110 dtheob...@brandeis.edu http://theobald.brandeis.edu/ ^\ /` /^. / /\ / / /`/ / . /` / / ' ' '
Re: [ccp4bb] refining against weak data and Table I stats
Related to this, I've always wondered what CC1/2 values mean for low resolution. Not being mathematically inclined, I'm sure this is a naive question, but i'll ask anyway - what does CC1/2=100 (or 99.9) mean? Does it mean the data is as good as it gets? Alan On 07/12/2012 17:15, Douglas Theobald wrote: Hi Boaz, I read the KK paper as primarily a justification for including extremely weak data in refinement (and of course introducing a new single statistic that can judge data *and* model quality comparably). Using CC1/2 to gauge resolution seems like a good option, but I never got from the paper exactly how to do that. The resolution bin where CC1/2=0.5 seems natural, but in my (limited) experience that gives almost the same answer as I/sigI=2 (see also KK fig 3). On Dec 7, 2012, at 6:21 AM, Boaz Shaanan bshaa...@exchange.bgu.ac.il wrote: Hi, I'm sure Kay will have something to say about this but I think the idea of the K K paper was to introduce new (more objective) standards for deciding on the resolution, so I don't see why another table is needed. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas Theobald [dtheob...@brandeis.edu] Sent: Friday, December 07, 2012 1:05 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] refining against weak data and Table I stats Hello all, I've followed with interest the discussions here about how we should be refining against weak data, e.g. data with I/sigI 2 (perhaps using all bins that have a significant CC1/2 per Karplus and Diederichs 2012). This all makes statistical sense to me, but now I am wondering how I should report data and model stats in Table I. Here's what I've come up with: report two Table I's. For comparability to legacy structure stats, report a classic Table I, where I call the resolution whatever bin I/sigI=2. Use that as my high res bin, with high res bin stats reported in parentheses after global stats. Then have another Table (maybe Table I* in supplementary material?) where I report stats for the whole dataset, including the weak data I used in refinement. In both tables report CC1/2 and Rmeas. This way, I don't redefine the (mostly) conventional usage of resolution, my Table I can be compared to precedent, I report stats for all the data and for the model against all data, and I take advantage of the information in the weak data during refinement. Thoughts? Douglas ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` Douglas L. Theobald Assistant Professor Department of Biochemistry Brandeis University Waltham, MA 02454-9110 dtheob...@brandeis.edu http://theobald.brandeis.edu/ ^\ /` /^. / /\ / / /`/ / . /` / / ' ' ' -- Alan Cheung Gene Center Ludwig-Maximilians-University Feodor-Lynen-Str. 25 81377 Munich Germany Phone: +49-89-2180-76845 Fax: +49-89-2180-76999 E-mail: che...@lmb.uni-muenchen.de
Re: [ccp4bb] refining against weak data and Table I stats
It is internally consistent, though not necessarily correct On 7 Dec 2012, at 16:23, Alan Cheung wrote: Related to this, I've always wondered what CC1/2 values mean for low resolution. Not being mathematically inclined, I'm sure this is a naive question, but i'll ask anyway - what does CC1/2=100 (or 99.9) mean? Does it mean the data is as good as it gets? Alan On 07/12/2012 17:15, Douglas Theobald wrote: Hi Boaz, I read the KK paper as primarily a justification for including extremely weak data in refinement (and of course introducing a new single statistic that can judge data *and* model quality comparably). Using CC1/2 to gauge resolution seems like a good option, but I never got from the paper exactly how to do that. The resolution bin where CC1/2=0.5 seems natural, but in my (limited) experience that gives almost the same answer as I/sigI=2 (see also KK fig 3). On Dec 7, 2012, at 6:21 AM, Boaz Shaanan bshaa...@exchange.bgu.ac.il wrote: Hi, I'm sure Kay will have something to say about this but I think the idea of the K K paper was to introduce new (more objective) standards for deciding on the resolution, so I don't see why another table is needed. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas Theobald [dtheob...@brandeis.edu] Sent: Friday, December 07, 2012 1:05 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] refining against weak data and Table I stats Hello all, I've followed with interest the discussions here about how we should be refining against weak data, e.g. data with I/sigI 2 (perhaps using all bins that have a significant CC1/2 per Karplus and Diederichs 2012). This all makes statistical sense to me, but now I am wondering how I should report data and model stats in Table I. Here's what I've come up with: report two Table I's. For comparability to legacy structure stats, report a classic Table I, where I call the resolution whatever bin I/sigI=2. Use that as my high res bin, with high res bin stats reported in parentheses after global stats. Then have another Table (maybe Table I* in supplementary material?) where I report stats for the whole dataset, including the weak data I used in refinement. In both tables report CC1/2 and Rmeas. This way, I don't redefine the (mostly) conventional usage of resolution, my Table I can be compared to precedent, I report stats for all the data and for the model against all data, and I take advantage of the information in the weak data during refinement. Thoughts? Douglas ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` Douglas L. Theobald Assistant Professor Department of Biochemistry Brandeis University Waltham, MA 02454-9110 dtheob...@brandeis.edu http://theobald.brandeis.edu/ ^\ /` /^. / /\ / / /`/ / . /` / / ' ' ' -- Alan Cheung Gene Center Ludwig-Maximilians-University Feodor-Lynen-Str. 25 81377 Munich Germany Phone: +49-89-2180-76845 Fax: +49-89-2180-76999 E-mail: che...@lmb.uni-muenchen.de
Re: [ccp4bb] refining against weak data and Table I stats
Yes, well, actually i'm only a middle author on that paper for a good reason, but I did encourage Rebecca and Stephan to use all the data. But on a later, much more modest submission, where the outer shell was not only weak but very incomplete (edges of the detector), the reviewers found it difficult to evaluate the quality of the data (we had also excluded a zone with bad ice-ring problems). So we provided a second table, cutting off above the ice ring in the good strong data, which convinced them that at least it is a decent 2A structure. In the PDB it is a 1.6A structure. but there was a lot of good data between the ice ring and 1.6 A. Bart Hazes (I think) suggested a statistic called effective resolution which is the resolution to which a complete dataset would have the number of reflectionin your dataset, and we reported this, which came out to something like 1.75. I do like the idea of reporting in multiple shells, not just overall and highest shell, and the PDB accomodatesthis, even has a GUI to enter it in the ADIT 2.0 software. It could also be used to report two different overall ranges, such as completeness, 25 to 1.6 A, which would be shocking in my case, and 25 to 2.0 which would be more reassuring. eab Douglas Theobald wrote: Hi Ed, Thanks for the comments. So what do you recommend? Refine against weak data, and report all stats in a single Table I? Looking at your latest V-ATPase structure paper, it appears you favor something like that, since you report a high res shell with I/sigI=1.34 and Rsym=1.65. On Dec 6, 2012, at 7:24 PM, Edward A. Berryber...@upstate.edu wrote: Another consideration here is your PDB deposition. If the reason for using weak data is to get a better structure, presumably you are going to deposit the structure using all the data. Then the statistics in the PDB file must reflect the high resolution refinement. There are I think three places in the PDB file where the resolution is stated, but i believe they are all required to be the same and to be equal to the highest resolution data used (even if there were only two reflections in that shell). Rmerge or Rsymm must be reported, and until recently I think they were not allowed to exceed 1.00 (100% error?). What are your reviewers going to think if the title of your paper is structure of protein A at 2.1 A resolution but they check the PDB file and the resolution was really 1.9 A? And Rsymm in the PDB is 0.99 but in your table 1* says 1.3? Douglas Theobald wrote: Hello all, I've followed with interest the discussions here about how we should be refining against weak data, e.g. data with I/sigI 2 (perhaps using all bins that have a significant CC1/2 per Karplus and Diederichs 2012). This all makes statistical sense to me, but now I am wondering how I should report data and model stats in Table I. Here's what I've come up with: report two Table I's. For comparability to legacy structure stats, report a classic Table I, where I call the resolution whatever bin I/sigI=2. Use that as my high res bin, with high res bin stats reported in parentheses after global stats. Then have another Table (maybe Table I* in supplementary material?) where I report stats for the whole dataset, including the weak data I used in refinement. In both tables report CC1/2 and Rmeas. This way, I don't redefine the (mostly) conventional usage of resolution, my Table I can be compared to precedent, I report stats for all the data and for the model against all data, and I take advantage of the information in the weak data during refinement. Thoughts? Douglas ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` Douglas L. Theobald Assistant Professor Department of Biochemistry Brandeis University Waltham, MA 02454-9110 dtheob...@brandeis.edu http://theobald.brandeis.edu/ ^\ /` /^. / /\ / / /`/ / . /` / / ' ' '
Re: [ccp4bb] refining against weak data and Table I stats
A good way to think about it is that if CC1/2=100%, that means you can split the data in half, and use one half to perfectly predict the corresponding values of the other half. So yes, perfect internal consistency. On Dec 7, 2012, at 11:41 AM, Phil Evans p...@mrc-lmb.cam.ac.uk wrote: It is internally consistent, though not necessarily correct On 7 Dec 2012, at 16:23, Alan Cheung wrote: Related to this, I've always wondered what CC1/2 values mean for low resolution. Not being mathematically inclined, I'm sure this is a naive question, but i'll ask anyway - what does CC1/2=100 (or 99.9) mean? Does it mean the data is as good as it gets? Alan On 07/12/2012 17:15, Douglas Theobald wrote: Hi Boaz, I read the KK paper as primarily a justification for including extremely weak data in refinement (and of course introducing a new single statistic that can judge data *and* model quality comparably). Using CC1/2 to gauge resolution seems like a good option, but I never got from the paper exactly how to do that. The resolution bin where CC1/2=0.5 seems natural, but in my (limited) experience that gives almost the same answer as I/sigI=2 (see also KK fig 3). On Dec 7, 2012, at 6:21 AM, Boaz Shaanan bshaa...@exchange.bgu.ac.il wrote: Hi, I'm sure Kay will have something to say about this but I think the idea of the K K paper was to introduce new (more objective) standards for deciding on the resolution, so I don't see why another table is needed. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas Theobald [dtheob...@brandeis.edu] Sent: Friday, December 07, 2012 1:05 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] refining against weak data and Table I stats Hello all, I've followed with interest the discussions here about how we should be refining against weak data, e.g. data with I/sigI 2 (perhaps using all bins that have a significant CC1/2 per Karplus and Diederichs 2012). This all makes statistical sense to me, but now I am wondering how I should report data and model stats in Table I. Here's what I've come up with: report two Table I's. For comparability to legacy structure stats, report a classic Table I, where I call the resolution whatever bin I/sigI=2. Use that as my high res bin, with high res bin stats reported in parentheses after global stats. Then have another Table (maybe Table I* in supplementary material?) where I report stats for the whole dataset, including the weak data I used in refinement. In both tables report CC1/2 and Rmeas. This way, I don't redefine the (mostly) conventional usage of resolution, my Table I can be compared to precedent, I report stats for all the data and for the model against all data, and I take advantage of the information in the weak data during refinement. Thoughts? Douglas ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` Douglas L. Theobald Assistant Professor Department of Biochemistry Brandeis University Waltham, MA 02454-9110 dtheob...@brandeis.edu http://theobald.brandeis.edu/ ^\ /` /^. / /\ / / /`/ / . /` / / ' ' ' -- Alan Cheung Gene Center Ludwig-Maximilians-University Feodor-Lynen-Str. 25 81377 Munich Germany Phone: +49-89-2180-76845 Fax: +49-89-2180-76999 E-mail: che...@lmb.uni-muenchen.de
Re: [ccp4bb] refining against weak data and Table I stats
The difference between one and the correlation coefficient is a square function of differences between the datapoints. So rather large 6% relative error with 8-fold data multiplicity (redundancy) can lead to CC1/2 values about 99.9%. It is just the nature of correlation coefficients. Zbyszek Otwinowski Related to this, I've always wondered what CC1/2 values mean for low resolution. Not being mathematically inclined, I'm sure this is a naive question, but i'll ask anyway - what does CC1/2=100 (or 99.9) mean? Does it mean the data is as good as it gets? Alan On 07/12/2012 17:15, Douglas Theobald wrote: Hi Boaz, I read the KK paper as primarily a justification for including extremely weak data in refinement (and of course introducing a new single statistic that can judge data *and* model quality comparably). Using CC1/2 to gauge resolution seems like a good option, but I never got from the paper exactly how to do that. The resolution bin where CC1/2=0.5 seems natural, but in my (limited) experience that gives almost the same answer as I/sigI=2 (see also KK fig 3). On Dec 7, 2012, at 6:21 AM, Boaz Shaanan bshaa...@exchange.bgu.ac.il wrote: Hi, I'm sure Kay will have something to say about this but I think the idea of the K K paper was to introduce new (more objective) standards for deciding on the resolution, so I don't see why another table is needed. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas Theobald [dtheob...@brandeis.edu] Sent: Friday, December 07, 2012 1:05 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] refining against weak data and Table I stats Hello all, I've followed with interest the discussions here about how we should be refining against weak data, e.g. data with I/sigI 2 (perhaps using all bins that have a significant CC1/2 per Karplus and Diederichs 2012). This all makes statistical sense to me, but now I am wondering how I should report data and model stats in Table I. Here's what I've come up with: report two Table I's. For comparability to legacy structure stats, report a classic Table I, where I call the resolution whatever bin I/sigI=2. Use that as my high res bin, with high res bin stats reported in parentheses after global stats. Then have another Table (maybe Table I* in supplementary material?) where I report stats for the whole dataset, including the weak data I used in refinement. In both tables report CC1/2 and Rmeas. This way, I don't redefine the (mostly) conventional usage of resolution, my Table I can be compared to precedent, I report stats for all the data and for the model against all data, and I take advantage of the information in the weak data during refinement. Thoughts? Douglas ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` Douglas L. Theobald Assistant Professor Department of Biochemistry Brandeis University Waltham, MA 02454-9110 dtheob...@brandeis.edu http://theobald.brandeis.edu/ ^\ /` /^. / /\ / / /`/ / . /` / / ' ' ' -- Alan Cheung Gene Center Ludwig-Maximilians-University Feodor-Lynen-Str. 25 81377 Munich Germany Phone: +49-89-2180-76845 Fax: +49-89-2180-76999 E-mail: che...@lmb.uni-muenchen.de
Re: [ccp4bb] refining against weak data and Table I stats
I too like the idea of reporting the table 1 stats vs resolution rather than just the overall values and highest resolution shell. I also wanted to point out an earlier thread from April about the limitations of the PDB's defining the resolution as being that of the highest resolution reflection (even if data is incomplete or weak). https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=376289 https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204L=ccp4bbD=01=ccp4bb9=AI=-3J=ond=No+Match%3BMatch%3BMatchesz=4P=377673 What we have done in the past for cases of low completeness in the outer shell is to define the nominal resolution ala Bart Hazes' method of same number of reflections as a complete data set and use this in the PDB title and describe it in the remark 3 other refinement remarks. There is also the possibility of adding a comment to the PDB remark 2 which we have not used. http://www.wwpdb.org/documentation/format33/remarks1.html#REMARK%202 This should help convince reviewers that you are not trying to mis-represent the resolution of the structure. Regards, Mitch -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Edward A. Berry Sent: Friday, December 07, 2012 8:43 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] refining against weak data and Table I stats Yes, well, actually i'm only a middle author on that paper for a good reason, but I did encourage Rebecca and Stephan to use all the data. But on a later, much more modest submission, where the outer shell was not only weak but very incomplete (edges of the detector), the reviewers found it difficult to evaluate the quality of the data (we had also excluded a zone with bad ice-ring problems). So we provided a second table, cutting off above the ice ring in the good strong data, which convinced them that at least it is a decent 2A structure. In the PDB it is a 1.6A structure. but there was a lot of good data between the ice ring and 1.6 A. Bart Hazes (I think) suggested a statistic called effective resolution which is the resolution to which a complete dataset would have the number of reflectionin your dataset, and we reported this, which came out to something like 1.75. I do like the idea of reporting in multiple shells, not just overall and highest shell, and the PDB accomodatesthis, even has a GUI to enter it in the ADIT 2.0 software. It could also be used to report two different overall ranges, such as completeness, 25 to 1.6 A, which would be shocking in my case, and 25 to 2.0 which would be more reassuring. eab Douglas Theobald wrote: Hi Ed, Thanks for the comments. So what do you recommend? Refine against weak data, and report all stats in a single Table I? Looking at your latest V-ATPase structure paper, it appears you favor something like that, since you report a high res shell with I/sigI=1.34 and Rsym=1.65. On Dec 6, 2012, at 7:24 PM, Edward A. Berryber...@upstate.edu wrote: Another consideration here is your PDB deposition. If the reason for using weak data is to get a better structure, presumably you are going to deposit the structure using all the data. Then the statistics in the PDB file must reflect the high resolution refinement. There are I think three places in the PDB file where the resolution is stated, but i believe they are all required to be the same and to be equal to the highest resolution data used (even if there were only two reflections in that shell). Rmerge or Rsymm must be reported, and until recently I think they were not allowed to exceed 1.00 (100% error?). What are your reviewers going to think if the title of your paper is structure of protein A at 2.1 A resolution but they check the PDB file and the resolution was really 1.9 A? And Rsymm in the PDB is 0.99 but in your table 1* says 1.3? Douglas Theobald wrote: Hello all, I've followed with interest the discussions here about how we should be refining against weak data, e.g. data with I/sigI 2 (perhaps using all bins that have a significant CC1/2 per Karplus and Diederichs 2012). This all makes statistical sense to me, but now I am wondering how I should report data and model stats in Table I. Here's what I've come up with: report two Table I's. For comparability to legacy structure stats, report a classic Table I, where I call the resolution whatever bin I/sigI=2. Use that as my high res bin, with high res bin stats reported in parentheses after global stats. Then have another Table (maybe Table I* in supplementary material?) where I report stats for the whole dataset, including the weak data I used in refinement. In both tables report CC1/2 and Rmeas. This way, I don't redefine the (mostly) conventional usage of resolution, my Table I can be compared to precedent, I report stats for all the data and for the model against all data, and I take
Re: [ccp4bb] refining against weak data and Table I stats
I was confused because it seemed like CC1/2 wasn't very informative at lower resolution since (in my datasets) they were all 99.9-100. So if i've understood this correctly (and i'm honestly not sure that i have) could CC1/2 be useful to show the quality of low resolution data, given more precision? On 07/12/2012 18:14, Zbyszek Otwinowski wrote: The difference between one and the correlation coefficient is a square function of differences between the datapoints. So rather large 6% relative error with 8-fold data multiplicity (redundancy) can lead to CC1/2 values about 99.9%. It is just the nature of correlation coefficients. Zbyszek Otwinowski Related to this, I've always wondered what CC1/2 values mean for low resolution. Not being mathematically inclined, I'm sure this is a naive question, but i'll ask anyway - what does CC1/2=100 (or 99.9) mean? Does it mean the data is as good as it gets? Alan On 07/12/2012 17:15, Douglas Theobald wrote: Hi Boaz, I read the KK paper as primarily a justification for including extremely weak data in refinement (and of course introducing a new single statistic that can judge data *and* model quality comparably). Using CC1/2 to gauge resolution seems like a good option, but I never got from the paper exactly how to do that. The resolution bin where CC1/2=0.5 seems natural, but in my (limited) experience that gives almost the same answer as I/sigI=2 (see also KK fig 3). On Dec 7, 2012, at 6:21 AM, Boaz Shaanan bshaa...@exchange.bgu.ac.il wrote: Hi, I'm sure Kay will have something to say about this but I think the idea of the K K paper was to introduce new (more objective) standards for deciding on the resolution, so I don't see why another table is needed. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas Theobald [dtheob...@brandeis.edu] Sent: Friday, December 07, 2012 1:05 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] refining against weak data and Table I stats Hello all, I've followed with interest the discussions here about how we should be refining against weak data, e.g. data with I/sigI 2 (perhaps using all bins that have a significant CC1/2 per Karplus and Diederichs 2012). This all makes statistical sense to me, but now I am wondering how I should report data and model stats in Table I. Here's what I've come up with: report two Table I's. For comparability to legacy structure stats, report a classic Table I, where I call the resolution whatever bin I/sigI=2. Use that as my high res bin, with high res bin stats reported in parentheses after global stats. Then have another Table (maybe Table I* in supplementary material?) where I report stats for the whole dataset, including the weak data I used in refinement. In both tables report CC1/2 and Rmeas. This way, I don't redefine the (mostly) conventional usage of resolution, my Table I can be compared to precedent, I report stats for all the data and for the model against all data, and I take advantage of the information in the weak data during refinement. Thoughts? Douglas ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` Douglas L. Theobald Assistant Professor Department of Biochemistry Brandeis University Waltham, MA 02454-9110 dtheob...@brandeis.edu http://theobald.brandeis.edu/ ^\ /` /^. / /\ / / /`/ / . /` / / ' ' ' -- Alan Cheung Gene Center Ludwig-Maximilians-University Feodor-Lynen-Str. 25 81377 Munich Germany Phone: +49-89-2180-76845 Fax: +49-89-2180-76999 E-mail: che...@lmb.uni-muenchen.de -- Alan Cheung Gene Center Ludwig-Maximilians-University Feodor-Lynen-Str. 25 81377 Munich Germany Phone: +49-89-2180-76845 Fax: +49-89-2180-76999 E-mail: che...@lmb.uni-muenchen.de
Re: [ccp4bb] refining against weak data and Table I stats
Dear Zbyszek, That is a useful point. Another way of making it is to notice that the correlation coefficient between two random variables is the cosine of the angle between two vectors of paired values for these, with the proviso that the sums of the component values for each vector add up to zero. The fact that an angle is involved means that the CC is independent of scale, while the fact that it is the cosine of that angle makes it rather insensitive to small-ish angles: a cosine remains close to 1.0 for quite a range of angles. This is presumably the nature of correlation coefficients you were referring to. With best wishes, Gerard. -- On Fri, Dec 07, 2012 at 11:14:50AM -0600, Zbyszek Otwinowski wrote: The difference between one and the correlation coefficient is a square function of differences between the datapoints. So rather large 6% relative error with 8-fold data multiplicity (redundancy) can lead to CC1/2 values about 99.9%. It is just the nature of correlation coefficients. Zbyszek Otwinowski Related to this, I've always wondered what CC1/2 values mean for low resolution. Not being mathematically inclined, I'm sure this is a naive question, but i'll ask anyway - what does CC1/2=100 (or 99.9) mean? Does it mean the data is as good as it gets? Alan On 07/12/2012 17:15, Douglas Theobald wrote: Hi Boaz, I read the KK paper as primarily a justification for including extremely weak data in refinement (and of course introducing a new single statistic that can judge data *and* model quality comparably). Using CC1/2 to gauge resolution seems like a good option, but I never got from the paper exactly how to do that. The resolution bin where CC1/2=0.5 seems natural, but in my (limited) experience that gives almost the same answer as I/sigI=2 (see also KK fig 3). On Dec 7, 2012, at 6:21 AM, Boaz Shaanan bshaa...@exchange.bgu.ac.il wrote: Hi, I'm sure Kay will have something to say about this but I think the idea of the K K paper was to introduce new (more objective) standards for deciding on the resolution, so I don't see why another table is needed. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas Theobald [dtheob...@brandeis.edu] Sent: Friday, December 07, 2012 1:05 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] refining against weak data and Table I stats Hello all, I've followed with interest the discussions here about how we should be refining against weak data, e.g. data with I/sigI 2 (perhaps using all bins that have a significant CC1/2 per Karplus and Diederichs 2012). This all makes statistical sense to me, but now I am wondering how I should report data and model stats in Table I. Here's what I've come up with: report two Table I's. For comparability to legacy structure stats, report a classic Table I, where I call the resolution whatever bin I/sigI=2. Use that as my high res bin, with high res bin stats reported in parentheses after global stats. Then have another Table (maybe Table I* in supplementary material?) where I report stats for the whole dataset, including the weak data I used in refinement. In both tables report CC1/2 and Rmeas. This way, I don't redefine the (mostly) conventional usage of resolution, my Table I can be compared to precedent, I report stats for all the data and for the model against all data, and I take advantage of the information in the weak data during refinement. Thoughts? Douglas ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` Douglas L. Theobald Assistant Professor Department of Biochemistry Brandeis University Waltham, MA 02454-9110 dtheob...@brandeis.edu http://theobald.brandeis.edu/ ^\ /` /^. / /\ / / /`/ / . /` / / ' ' ' -- Alan Cheung Gene Center Ludwig-Maximilians-University Feodor-Lynen-Str. 25 81377 Munich Germany Phone: +49-89-2180-76845 Fax: +49-89-2180-76999 E-mail: che...@lmb.uni-muenchen.de -- === * * * Gerard Bricogne g...@globalphasing.com * * * * Global Phasing Ltd. * * Sheraton House, Castle Park Tel: +44-(0)1223-353033 * * Cambridge CB3 0AX, UK Fax: +44-(0)1223-366889 * * * ===
[ccp4bb] refining against weak data and Table I stats
Hello all, I've followed with interest the discussions here about how we should be refining against weak data, e.g. data with I/sigI 2 (perhaps using all bins that have a significant CC1/2 per Karplus and Diederichs 2012). This all makes statistical sense to me, but now I am wondering how I should report data and model stats in Table I. Here's what I've come up with: report two Table I's. For comparability to legacy structure stats, report a classic Table I, where I call the resolution whatever bin I/sigI=2. Use that as my high res bin, with high res bin stats reported in parentheses after global stats. Then have another Table (maybe Table I* in supplementary material?) where I report stats for the whole dataset, including the weak data I used in refinement. In both tables report CC1/2 and Rmeas. This way, I don't redefine the (mostly) conventional usage of resolution, my Table I can be compared to precedent, I report stats for all the data and for the model against all data, and I take advantage of the information in the weak data during refinement. Thoughts? Douglas ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` Douglas L. Theobald Assistant Professor Department of Biochemistry Brandeis University Waltham, MA 02454-9110 dtheob...@brandeis.edu http://theobald.brandeis.edu/ ^\ /` /^. / /\ / / /`/ / . /` / / ' ' '
Re: [ccp4bb] refining against weak data and Table I stats
Another consideration here is your PDB deposition. If the reason for using weak data is to get a better structure, presumably you are going to deposit the structure using all the data. Then the statistics in the PDB file must reflect the high resolution refinement. There are I think three places in the PDB file where the resolution is stated, but i believe they are all required to be the same and to be equal to the highest resolution data used (even if there were only two reflections in that shell). Rmerge or Rsymm must be reported, and until recently I think they were not allowed to exceed 1.00 (100% error?). What are your reviewers going to think if the title of your paper is structure of protein A at 2.1 A resolution but they check the PDB file and the resolution was really 1.9 A? And Rsymm in the PDB is 0.99 but in your table 1* says 1.3? Douglas Theobald wrote: Hello all, I've followed with interest the discussions here about how we should be refining against weak data, e.g. data with I/sigI 2 (perhaps using all bins that have a significant CC1/2 per Karplus and Diederichs 2012). This all makes statistical sense to me, but now I am wondering how I should report data and model stats in Table I. Here's what I've come up with: report two Table I's. For comparability to legacy structure stats, report a classic Table I, where I call the resolution whatever bin I/sigI=2. Use that as my high res bin, with high res bin stats reported in parentheses after global stats. Then have another Table (maybe Table I* in supplementary material?) where I report stats for the whole dataset, including the weak data I used in refinement. In both tables report CC1/2 and Rmeas. This way, I don't redefine the (mostly) conventional usage of resolution, my Table I can be compared to precedent, I report stats for all the data and for the model against all data, and I take advantage of the information in the weak data during refinement. Thoughts? Douglas ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` Douglas L. Theobald Assistant Professor Department of Biochemistry Brandeis University Waltham, MA 02454-9110 dtheob...@brandeis.edu http://theobald.brandeis.edu/ ^\ /` /^. / /\ / / /`/ / . /` / / ' ' '