Re: [ccp4bb] question about processing data

2008-03-17 Thread James Stroud
I think the answer to your question depends on why the data is  
incomplete.


James

On Mar 17, 2008, at 3:06 AM, Melody Lin wrote:


Hi all,

I have always been wondering... for a data set diffracting to say  
2.15 Angstrom but in the highest resolution shell (2.25-2.15) the  
completeness is 74%, should I use merge all the data and call it a  
2.15 A dataset or I should cut the data set to say 2.25 A where the  
highest resolution shell has better completeness (85%)? What is an  
acceptable completeness value for the highest resolution shell?


Thank you.

Best,
Melody


--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA  90095

http://www.jamesstroud.com


[ccp4bb] question about processing data

2008-03-17 Thread Melody Lin
Hi all,

I have always been wondering... for a data set diffracting to say
2.15Angstrom but in the highest resolution shell (
2.25-2.15) the completeness is 74%, should I use merge all the data and call
it a 2.15 A dataset or I should cut the data set to say 2.25 A where the
highest resolution shell has better completeness (85%)? What is an
acceptable completeness value for the highest resolution shell?

Thank you.

Best,
Melody


Re: [ccp4bb] question about processing data

2008-03-17 Thread Partha Chakrabarti
Hi Melody,

There was a nice discussion in this year's ccp4 study weekend. In
general, one needs to consider several factors.. If you were at 3A, or
low symmetry, you would of course try to get the maximum out of it, on
the other hand, there are requirements for experimental phasing.. in
general, judge it from:

1. Completeness
2. Redundancy
3. I / Sigma
4. R merge statistics

Not just one of them. If you are pushing it too far, you will see the
effect in later refinement step..
With 74% completeness, how does the other parameters look like?

HTH, Partha


On Mon, Mar 17, 2008 at 10:06 AM, Melody Lin [EMAIL PROTECTED] wrote:
 Hi all,

 I have always been wondering... for a data set diffracting to say 2.15
 Angstrom but in the highest resolution shell (2.25-2.15) the completeness is
 74%, should I use merge all the data and call it a 2.15 A dataset or I
 should cut the data set to say 2.25 A where the highest resolution shell has
 better completeness (85%)? What is an acceptable completeness value for the
 highest resolution shell?

 Thank you.

 Best,
 Melody




-- 
MRC National Institute for Medical Research
Division of Molecular Structure
The Ridgeway, NW7 1AA, UK
Email: [EMAIL PROTECTED]
Phone: + 44 208 816 2515


Re: [ccp4bb] question about processing data

2008-03-17 Thread Melody Lin
well, redundancy for the highest shell is 4.8, I/sigma is 3, Rmerge for
overall is 0.08 for highest shell is 0.336. I/sigma and Rmerge don't seem
quite nice...

thanks.

On Mon, Mar 17, 2008 at 11:51 AM, Partha Chakrabarti [EMAIL PROTECTED]
wrote:

 Hi Melody,

 There was a nice discussion in this year's ccp4 study weekend. In
 general, one needs to consider several factors.. If you were at 3A, or
 low symmetry, you would of course try to get the maximum out of it, on
 the other hand, there are requirements for experimental phasing.. in
 general, judge it from:

 1. Completeness
 2. Redundancy
 3. I / Sigma
 4. R merge statistics

 Not just one of them. If you are pushing it too far, you will see the
 effect in later refinement step..
 With 74% completeness, how does the other parameters look like?

 HTH, Partha


 On Mon, Mar 17, 2008 at 10:06 AM, Melody Lin [EMAIL PROTECTED] wrote:
  Hi all,
 
  I have always been wondering... for a data set diffracting to say 2.15
  Angstrom but in the highest resolution shell (2.25-2.15) the
 completeness is
  74%, should I use merge all the data and call it a 2.15 A dataset or I
  should cut the data set to say 2.25 A where the highest resolution shell
 has
  better completeness (85%)? What is an acceptable completeness value for
 the
  highest resolution shell?
 
  Thank you.
 
  Best,
  Melody
 



 --
 MRC National Institute for Medical Research
 Division of Molecular Structure
 The Ridgeway, NW7 1AA, UK
 Email: [EMAIL PROTECTED]
 Phone: + 44 208 816 2515



Re: [ccp4bb] question about processing data

2008-03-17 Thread Partha Chakrabarti
Looks ok I guess.. for the highest shell, if Rmerge is less than 0.45
and I/sigma is about 2, it is worth a try.. as James said,
completeness might be from why it is incomplete.. is it something like
C2?

experts might tell us more..
Best, Partha

On Mon, Mar 17, 2008 at 11:03 AM, Melody Lin [EMAIL PROTECTED] wrote:
 well, redundancy for the highest shell is 4.8, I/sigma is 3, Rmerge for
 overall is 0.08 for highest shell is 0.336. I/sigma and Rmerge don't seem
 quite nice...

 thanks.



 On Mon, Mar 17, 2008 at 11:51 AM, Partha Chakrabarti [EMAIL PROTECTED]
 wrote:

  Hi Melody,
 
  There was a nice discussion in this year's ccp4 study weekend. In
  general, one needs to consider several factors.. If you were at 3A, or
  low symmetry, you would of course try to get the maximum out of it, on
  the other hand, there are requirements for experimental phasing.. in
  general, judge it from:
 
  1. Completeness
  2. Redundancy
  3. I / Sigma
  4. R merge statistics
 
  Not just one of them. If you are pushing it too far, you will see the
  effect in later refinement step..
  With 74% completeness, how does the other parameters look like?
 
  HTH, Partha
 
 
 
 
 
  On Mon, Mar 17, 2008 at 10:06 AM, Melody Lin [EMAIL PROTECTED] wrote:
   Hi all,
  
   I have always been wondering... for a data set diffracting to say 2.15
   Angstrom but in the highest resolution shell (2.25-2.15) the
 completeness is
   74%, should I use merge all the data and call it a 2.15 A dataset or I
   should cut the data set to say 2.25 A where the highest resolution shell
 has
   better completeness (85%)? What is an acceptable completeness value for
 the
   highest resolution shell?
  
   Thank you.
  
   Best,
   Melody


Re: [ccp4bb] question about processing data

2008-03-17 Thread Bart Hazes

Melody Lin wrote:

Hi all,

I have always been wondering... for a data set diffracting to say 2.15 
Angstrom but in the highest resolution shell (2.25-2.15) the 
completeness is 74%, should I use merge all the data and call it a 2.15 
A dataset or I should cut the data set to say 2.25 A where the highest 
resolution shell has better completeness (85%)? What is an acceptable 
completeness value for the highest resolution shell?


Thank you.

Best,
Melody


Hi Melody,

This reply is not aimed at you directly as this situation seems to have 
become systemic in the field. So thanks for bringing it up!



We can have a long, and mostly aimless, discussion on what resolution 
you should claim for your data set but DON'T throw away good data to 
make the statistics look better. At high resolution the statistics are 
supposed to get worse! What matters is if the data still contain useful 
information. The fact that 26% of the data is missing does not normally 
mean that anything is wrong with the 74% that you did measure. Perhaps 
you used a square detector and didn't place it close enough to capture 
the full resolution, or perhaps your diffraction pattern is anisotropic.


The only reason to throw out data is if they are too inaccurate for your 
purpose. When your data is used for phasing, especially anomalous 
phasing, there is reason to focus on data quality, but I see far too 
many native data sets that make poor use of the diffraction potential of 
the crystal. I thought this was due to people not properly collecting 
the data, but now it seems that people are simply throwing away good 
data because they don't like the statistics.


So my advice; if your high resolution shell data has poor completeness 
then check why this happened. If you did not collect the data properly 
then let it be a lesson for the next data collection trip. If it 
resulted from some issue of the crystal then decide if the measured data 
is messed up as well. If not then use all the data you trust, which 
means there is useful signal (I/SigI 1.5 or 2.0 depending who you talk 
to) and no problems leading to systematic errors or outliers.


Bart

==

Bart Hazes (Assistant Professor)
Dept. of Medical Microbiology  Immunology
University of Alberta
1-15 Medical Sciences Building
Edmonton, Alberta
Canada, T6G 2H7
phone:  1-780-492-0042
fax:1-780-492-7521

==


Re: [ccp4bb] question about processing data

2008-03-17 Thread Ed Pozharski
On Mon, 2008-03-17 at 10:51 +, Partha Chakrabarti wrote:
 Not just one of them. If you are pushing it too far, you will see the
 effect in later refinement step..

And the effect in later refinement step will be the slight increase in
R-factor?  IMHO, this does not justify throwing away data (which
ultimately reduces the quality of your model).  


-- 
Edwin Pozharski, PhD, Assistant Professor
University of Maryland, Baltimore
--
When the Way is forgotten duty and justice appear;
Then knowledge and wisdom are born along with hypocrisy.
When harmonious relationships dissolve then respect and devotion arise;
When a nation falls to chaos then loyalty and patriotism are born.
--   / Lao Tse /


Re: [ccp4bb] question about processing data

2008-03-17 Thread Jim Pflugrath
I would use all the data myself and report that the model was built from a 
a dataset with 74% completeness in the 2.25 to 2.15 Anngstrom shell.  I 
would not put the number 2.15 A in the manuscript title nor in the poster 
title.


For me the acceptable completeness is 90% in the highest resolution shell 
for the number to get in the title.  You will know I reviewed your 
paper if you see my telltale reviewer comment.  You can put whatever you 
want in the PDB deposition field.


Jim

On Mon, 17 Mar 2008, Melody Lin wrote:


Hi all,

I have always been wondering... for a data set diffracting to say
2.15Angstrom but in the highest resolution shell (
2.25-2.15) the completeness is 74%, should I use merge all the data and call
it a 2.15 A dataset or I should cut the data set to say 2.25 A where the
highest resolution shell has better completeness (85%)? What is an
acceptable completeness value for the highest resolution shell?

Thank you.

Best,
Melody



Re: [ccp4bb] question about processing data

2008-03-17 Thread James Stroud
Redundancy of 4.8 for a 74% complete shell (if I understand which  
shell these stats are for) suggests you have assumed too much symmetry  
and are rejecting a lot of reflections during scaling. Is this the  
case? The I/sigma suggests you could drop the symmetry and re-scale  
without losing a lot of data if this is the case.



On Mar 17, 2008, at 4:03 AM, Melody Lin wrote:

well, redundancy for the highest shell is 4.8, I/sigma is 3, Rmerge  
for overall is 0.08 for highest shell is 0.336. I/sigma and Rmerge  
don't seem quite nice...


thanks.

On Mon, Mar 17, 2008 at 11:51 AM, Partha Chakrabarti  
[EMAIL PROTECTED] wrote:

Hi Melody,

There was a nice discussion in this year's ccp4 study weekend. In
general, one needs to consider several factors.. If you were at 3A, or
low symmetry, you would of course try to get the maximum out of it, on
the other hand, there are requirements for experimental phasing.. in
general, judge it from:

1. Completeness
2. Redundancy
3. I / Sigma
4. R merge statistics

Not just one of them. If you are pushing it too far, you will see the
effect in later refinement step..
With 74% completeness, how does the other parameters look like?

HTH, Partha


On Mon, Mar 17, 2008 at 10:06 AM, Melody Lin [EMAIL PROTECTED]  
wrote:

 Hi all,

 I have always been wondering... for a data set diffracting to say  
2.15
 Angstrom but in the highest resolution shell (2.25-2.15) the  
completeness is
 74%, should I use merge all the data and call it a 2.15 A dataset  
or I
 should cut the data set to say 2.25 A where the highest resolution  
shell has
 better completeness (85%)? What is an acceptable completeness  
value for the

 highest resolution shell?

 Thank you.

 Best,
 Melody




--
MRC National Institute for Medical Research
Division of Molecular Structure
The Ridgeway, NW7 1AA, UK
Email: [EMAIL PROTECTED]
Phone: + 44 208 816 2515



--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA  90095

http://www.jamesstroud.com


Re: [ccp4bb] question about processing data

2008-03-17 Thread Anastassis Perrakis

Hi -

I would tend to argue as follows:

An I/sigI of 3, and Rmerge of 33.6% are most definitely acceptable  
values with a redundancy of 4.8. Thus, despite the 74% completeness,  
that data are most definitely useful and should be included in  
refinement.


A good question now is why is the data only 74% complete.

I can think of a few reasons, eg

1. not enough 'degrees' collected in total: too bad, better do better  
next time, but thats not likely to be your problem.
2. overlaps at high resolution: again be more careful next time, but  
could you play with the mosaicity to decrease overlaps a bit ?
3. High resolution collected in the corners of detector: put the  
detector closer next time and dont collect data at the corners ...
4. Severe anisotropy: tough luck, have to live with it .. or try and  
deal better with it during data collection (adjust exposure)


Whatever the case, I would use the data and clearly report in the MM  
in my paper not only what the numbers are,
but also WHY they are like that. And, of course if its trivial to do  
a better data collection experiment and get the best data,

as it often is, then do a better data collection experiment ...

My main point is that you should know clearly WHY your high  
resolution shell is incomplete and then decide.

The numbers alone do not always tell the full story.

Best , Tassos

well, redundancy for the highest shell is 4.8, I/sigma is 3,  
Rmerge for overall is 0.08 for highest shell is 0.336. I/sigma and  
Rmerge don't seem quite nice...


Re: [ccp4bb] question about processing data

2008-03-17 Thread Melody Lin
Dear all,

Thank you very much for the useful suggestions! I definitely learned a lot
from these discussions. Now looking back at my datasets, I think the
incompleteness likely results from high mosaicity (1.009) and anisotropy of
the crystal. Detector is square, but the distance is short enough for the
resolution, and cell dimensions are not huge (~45x 90x 100 A), so there are
not too much overlapping among high resolution spots. I am confident with
the symmetry. Well, now I know better how to collect good data. Thanks!

Best,
Melody

On Mon, Mar 17, 2008 at 8:29 PM, Anastassis Perrakis [EMAIL PROTECTED]
wrote:

 Hi -

 I would tend to argue as follows:

 An I/sigI of 3, and Rmerge of 33.6% are most definitely acceptable
 values with a redundancy of 4.8. Thus, despite the 74% completeness,
 that data are most definitely useful and should be included in
 refinement.

 A good question now is why is the data only 74% complete.

 I can think of a few reasons, eg

 1. not enough 'degrees' collected in total: too bad, better do better
 next time, but thats not likely to be your problem.
 2. overlaps at high resolution: again be more careful next time, but
 could you play with the mosaicity to decrease overlaps a bit ?
 3. High resolution collected in the corners of detector: put the
 detector closer next time and dont collect data at the corners ...
 4. Severe anisotropy: tough luck, have to live with it .. or try and
 deal better with it during data collection (adjust exposure)

 Whatever the case, I would use the data and clearly report in the MM
 in my paper not only what the numbers are,
 but also WHY they are like that. And, of course if its trivial to do
 a better data collection experiment and get the best data,
 as it often is, then do a better data collection experiment ...

 My main point is that you should know clearly WHY your high
 resolution shell is incomplete and then decide.
 The numbers alone do not always tell the full story.

 Best , Tassos

  well, redundancy for the highest shell is 4.8, I/sigma is 3,
  Rmerge for overall is 0.08 for highest shell is 0.336. I/sigma and
  Rmerge don't seem quite nice...