Re: [ccp4bb] question about processing data
I think the answer to your question depends on why the data is incomplete. James On Mar 17, 2008, at 3:06 AM, Melody Lin wrote: Hi all, I have always been wondering... for a data set diffracting to say 2.15 Angstrom but in the highest resolution shell (2.25-2.15) the completeness is 74%, should I use merge all the data and call it a 2.15 A dataset or I should cut the data set to say 2.25 A where the highest resolution shell has better completeness (85%)? What is an acceptable completeness value for the highest resolution shell? Thank you. Best, Melody -- James Stroud UCLA-DOE Institute for Genomics and Proteomics Box 951570 Los Angeles, CA 90095 http://www.jamesstroud.com
[ccp4bb] question about processing data
Hi all, I have always been wondering... for a data set diffracting to say 2.15Angstrom but in the highest resolution shell ( 2.25-2.15) the completeness is 74%, should I use merge all the data and call it a 2.15 A dataset or I should cut the data set to say 2.25 A where the highest resolution shell has better completeness (85%)? What is an acceptable completeness value for the highest resolution shell? Thank you. Best, Melody
Re: [ccp4bb] question about processing data
Hi Melody, There was a nice discussion in this year's ccp4 study weekend. In general, one needs to consider several factors.. If you were at 3A, or low symmetry, you would of course try to get the maximum out of it, on the other hand, there are requirements for experimental phasing.. in general, judge it from: 1. Completeness 2. Redundancy 3. I / Sigma 4. R merge statistics Not just one of them. If you are pushing it too far, you will see the effect in later refinement step.. With 74% completeness, how does the other parameters look like? HTH, Partha On Mon, Mar 17, 2008 at 10:06 AM, Melody Lin [EMAIL PROTECTED] wrote: Hi all, I have always been wondering... for a data set diffracting to say 2.15 Angstrom but in the highest resolution shell (2.25-2.15) the completeness is 74%, should I use merge all the data and call it a 2.15 A dataset or I should cut the data set to say 2.25 A where the highest resolution shell has better completeness (85%)? What is an acceptable completeness value for the highest resolution shell? Thank you. Best, Melody -- MRC National Institute for Medical Research Division of Molecular Structure The Ridgeway, NW7 1AA, UK Email: [EMAIL PROTECTED] Phone: + 44 208 816 2515
Re: [ccp4bb] question about processing data
well, redundancy for the highest shell is 4.8, I/sigma is 3, Rmerge for overall is 0.08 for highest shell is 0.336. I/sigma and Rmerge don't seem quite nice... thanks. On Mon, Mar 17, 2008 at 11:51 AM, Partha Chakrabarti [EMAIL PROTECTED] wrote: Hi Melody, There was a nice discussion in this year's ccp4 study weekend. In general, one needs to consider several factors.. If you were at 3A, or low symmetry, you would of course try to get the maximum out of it, on the other hand, there are requirements for experimental phasing.. in general, judge it from: 1. Completeness 2. Redundancy 3. I / Sigma 4. R merge statistics Not just one of them. If you are pushing it too far, you will see the effect in later refinement step.. With 74% completeness, how does the other parameters look like? HTH, Partha On Mon, Mar 17, 2008 at 10:06 AM, Melody Lin [EMAIL PROTECTED] wrote: Hi all, I have always been wondering... for a data set diffracting to say 2.15 Angstrom but in the highest resolution shell (2.25-2.15) the completeness is 74%, should I use merge all the data and call it a 2.15 A dataset or I should cut the data set to say 2.25 A where the highest resolution shell has better completeness (85%)? What is an acceptable completeness value for the highest resolution shell? Thank you. Best, Melody -- MRC National Institute for Medical Research Division of Molecular Structure The Ridgeway, NW7 1AA, UK Email: [EMAIL PROTECTED] Phone: + 44 208 816 2515
Re: [ccp4bb] question about processing data
Looks ok I guess.. for the highest shell, if Rmerge is less than 0.45 and I/sigma is about 2, it is worth a try.. as James said, completeness might be from why it is incomplete.. is it something like C2? experts might tell us more.. Best, Partha On Mon, Mar 17, 2008 at 11:03 AM, Melody Lin [EMAIL PROTECTED] wrote: well, redundancy for the highest shell is 4.8, I/sigma is 3, Rmerge for overall is 0.08 for highest shell is 0.336. I/sigma and Rmerge don't seem quite nice... thanks. On Mon, Mar 17, 2008 at 11:51 AM, Partha Chakrabarti [EMAIL PROTECTED] wrote: Hi Melody, There was a nice discussion in this year's ccp4 study weekend. In general, one needs to consider several factors.. If you were at 3A, or low symmetry, you would of course try to get the maximum out of it, on the other hand, there are requirements for experimental phasing.. in general, judge it from: 1. Completeness 2. Redundancy 3. I / Sigma 4. R merge statistics Not just one of them. If you are pushing it too far, you will see the effect in later refinement step.. With 74% completeness, how does the other parameters look like? HTH, Partha On Mon, Mar 17, 2008 at 10:06 AM, Melody Lin [EMAIL PROTECTED] wrote: Hi all, I have always been wondering... for a data set diffracting to say 2.15 Angstrom but in the highest resolution shell (2.25-2.15) the completeness is 74%, should I use merge all the data and call it a 2.15 A dataset or I should cut the data set to say 2.25 A where the highest resolution shell has better completeness (85%)? What is an acceptable completeness value for the highest resolution shell? Thank you. Best, Melody
Re: [ccp4bb] question about processing data
Melody Lin wrote: Hi all, I have always been wondering... for a data set diffracting to say 2.15 Angstrom but in the highest resolution shell (2.25-2.15) the completeness is 74%, should I use merge all the data and call it a 2.15 A dataset or I should cut the data set to say 2.25 A where the highest resolution shell has better completeness (85%)? What is an acceptable completeness value for the highest resolution shell? Thank you. Best, Melody Hi Melody, This reply is not aimed at you directly as this situation seems to have become systemic in the field. So thanks for bringing it up! We can have a long, and mostly aimless, discussion on what resolution you should claim for your data set but DON'T throw away good data to make the statistics look better. At high resolution the statistics are supposed to get worse! What matters is if the data still contain useful information. The fact that 26% of the data is missing does not normally mean that anything is wrong with the 74% that you did measure. Perhaps you used a square detector and didn't place it close enough to capture the full resolution, or perhaps your diffraction pattern is anisotropic. The only reason to throw out data is if they are too inaccurate for your purpose. When your data is used for phasing, especially anomalous phasing, there is reason to focus on data quality, but I see far too many native data sets that make poor use of the diffraction potential of the crystal. I thought this was due to people not properly collecting the data, but now it seems that people are simply throwing away good data because they don't like the statistics. So my advice; if your high resolution shell data has poor completeness then check why this happened. If you did not collect the data properly then let it be a lesson for the next data collection trip. If it resulted from some issue of the crystal then decide if the measured data is messed up as well. If not then use all the data you trust, which means there is useful signal (I/SigI 1.5 or 2.0 depending who you talk to) and no problems leading to systematic errors or outliers. Bart == Bart Hazes (Assistant Professor) Dept. of Medical Microbiology Immunology University of Alberta 1-15 Medical Sciences Building Edmonton, Alberta Canada, T6G 2H7 phone: 1-780-492-0042 fax:1-780-492-7521 ==
Re: [ccp4bb] question about processing data
On Mon, 2008-03-17 at 10:51 +, Partha Chakrabarti wrote: Not just one of them. If you are pushing it too far, you will see the effect in later refinement step.. And the effect in later refinement step will be the slight increase in R-factor? IMHO, this does not justify throwing away data (which ultimately reduces the quality of your model). -- Edwin Pozharski, PhD, Assistant Professor University of Maryland, Baltimore -- When the Way is forgotten duty and justice appear; Then knowledge and wisdom are born along with hypocrisy. When harmonious relationships dissolve then respect and devotion arise; When a nation falls to chaos then loyalty and patriotism are born. -- / Lao Tse /
Re: [ccp4bb] question about processing data
I would use all the data myself and report that the model was built from a a dataset with 74% completeness in the 2.25 to 2.15 Anngstrom shell. I would not put the number 2.15 A in the manuscript title nor in the poster title. For me the acceptable completeness is 90% in the highest resolution shell for the number to get in the title. You will know I reviewed your paper if you see my telltale reviewer comment. You can put whatever you want in the PDB deposition field. Jim On Mon, 17 Mar 2008, Melody Lin wrote: Hi all, I have always been wondering... for a data set diffracting to say 2.15Angstrom but in the highest resolution shell ( 2.25-2.15) the completeness is 74%, should I use merge all the data and call it a 2.15 A dataset or I should cut the data set to say 2.25 A where the highest resolution shell has better completeness (85%)? What is an acceptable completeness value for the highest resolution shell? Thank you. Best, Melody
Re: [ccp4bb] question about processing data
Redundancy of 4.8 for a 74% complete shell (if I understand which shell these stats are for) suggests you have assumed too much symmetry and are rejecting a lot of reflections during scaling. Is this the case? The I/sigma suggests you could drop the symmetry and re-scale without losing a lot of data if this is the case. On Mar 17, 2008, at 4:03 AM, Melody Lin wrote: well, redundancy for the highest shell is 4.8, I/sigma is 3, Rmerge for overall is 0.08 for highest shell is 0.336. I/sigma and Rmerge don't seem quite nice... thanks. On Mon, Mar 17, 2008 at 11:51 AM, Partha Chakrabarti [EMAIL PROTECTED] wrote: Hi Melody, There was a nice discussion in this year's ccp4 study weekend. In general, one needs to consider several factors.. If you were at 3A, or low symmetry, you would of course try to get the maximum out of it, on the other hand, there are requirements for experimental phasing.. in general, judge it from: 1. Completeness 2. Redundancy 3. I / Sigma 4. R merge statistics Not just one of them. If you are pushing it too far, you will see the effect in later refinement step.. With 74% completeness, how does the other parameters look like? HTH, Partha On Mon, Mar 17, 2008 at 10:06 AM, Melody Lin [EMAIL PROTECTED] wrote: Hi all, I have always been wondering... for a data set diffracting to say 2.15 Angstrom but in the highest resolution shell (2.25-2.15) the completeness is 74%, should I use merge all the data and call it a 2.15 A dataset or I should cut the data set to say 2.25 A where the highest resolution shell has better completeness (85%)? What is an acceptable completeness value for the highest resolution shell? Thank you. Best, Melody -- MRC National Institute for Medical Research Division of Molecular Structure The Ridgeway, NW7 1AA, UK Email: [EMAIL PROTECTED] Phone: + 44 208 816 2515 -- James Stroud UCLA-DOE Institute for Genomics and Proteomics Box 951570 Los Angeles, CA 90095 http://www.jamesstroud.com
Re: [ccp4bb] question about processing data
Hi - I would tend to argue as follows: An I/sigI of 3, and Rmerge of 33.6% are most definitely acceptable values with a redundancy of 4.8. Thus, despite the 74% completeness, that data are most definitely useful and should be included in refinement. A good question now is why is the data only 74% complete. I can think of a few reasons, eg 1. not enough 'degrees' collected in total: too bad, better do better next time, but thats not likely to be your problem. 2. overlaps at high resolution: again be more careful next time, but could you play with the mosaicity to decrease overlaps a bit ? 3. High resolution collected in the corners of detector: put the detector closer next time and dont collect data at the corners ... 4. Severe anisotropy: tough luck, have to live with it .. or try and deal better with it during data collection (adjust exposure) Whatever the case, I would use the data and clearly report in the MM in my paper not only what the numbers are, but also WHY they are like that. And, of course if its trivial to do a better data collection experiment and get the best data, as it often is, then do a better data collection experiment ... My main point is that you should know clearly WHY your high resolution shell is incomplete and then decide. The numbers alone do not always tell the full story. Best , Tassos well, redundancy for the highest shell is 4.8, I/sigma is 3, Rmerge for overall is 0.08 for highest shell is 0.336. I/sigma and Rmerge don't seem quite nice...
Re: [ccp4bb] question about processing data
Dear all, Thank you very much for the useful suggestions! I definitely learned a lot from these discussions. Now looking back at my datasets, I think the incompleteness likely results from high mosaicity (1.009) and anisotropy of the crystal. Detector is square, but the distance is short enough for the resolution, and cell dimensions are not huge (~45x 90x 100 A), so there are not too much overlapping among high resolution spots. I am confident with the symmetry. Well, now I know better how to collect good data. Thanks! Best, Melody On Mon, Mar 17, 2008 at 8:29 PM, Anastassis Perrakis [EMAIL PROTECTED] wrote: Hi - I would tend to argue as follows: An I/sigI of 3, and Rmerge of 33.6% are most definitely acceptable values with a redundancy of 4.8. Thus, despite the 74% completeness, that data are most definitely useful and should be included in refinement. A good question now is why is the data only 74% complete. I can think of a few reasons, eg 1. not enough 'degrees' collected in total: too bad, better do better next time, but thats not likely to be your problem. 2. overlaps at high resolution: again be more careful next time, but could you play with the mosaicity to decrease overlaps a bit ? 3. High resolution collected in the corners of detector: put the detector closer next time and dont collect data at the corners ... 4. Severe anisotropy: tough luck, have to live with it .. or try and deal better with it during data collection (adjust exposure) Whatever the case, I would use the data and clearly report in the MM in my paper not only what the numbers are, but also WHY they are like that. And, of course if its trivial to do a better data collection experiment and get the best data, as it often is, then do a better data collection experiment ... My main point is that you should know clearly WHY your high resolution shell is incomplete and then decide. The numbers alone do not always tell the full story. Best , Tassos well, redundancy for the highest shell is 4.8, I/sigma is 3, Rmerge for overall is 0.08 for highest shell is 0.336. I/sigma and Rmerge don't seem quite nice...