Re: [ccp4bb] Resolution, R factors and data quality

2013-08-28 Thread Bernhard Rupp
Based on the simulations I've done the data should be cut at CC1/2 = 0. 
Seriously. Problem is figuring out where it hits zero. 

 

But the real objective is – where do data stop making an improvement to the 
model. The categorical statement that all data is good

is simply not true in practice. It is probably specific to each data set  
refinement, and as long as we do not always run paired refinement ala KD

or similar in order to find out where that point is, the yearning for a simple 
number will not stop (although I believe automation will make the KD approach 
or similar eventually routine). 

 

As for the resolution of the structure I'd say call that where |Fo-Fc| 
(error in the map) becomes comparable to Sigma(Fo). This is I/Sigma = 2.5 if 
Rcryst is 20%.  That is: |Fo-Fc| / Fo = 0.2, which implies |Io-Ic|/Io = 0.4 or 
Io/|Io-Ic| = Io/sigma(Io) = 2.5.

 

Makes sense to me...

 

As long as it is understood that this ‘model resolution value’ derived via your 
argument from I/sigI is not the same as a I/sigI data cutoff (and that Rcryst 
and Rmerge have nothing in common)….

 

-James Holton

MAD Scientist

 

Best, BR

 

 


On Aug 27, 2013, at 5:29 PM, Jim Pflugrath  mailto:jim.pflugr...@rigaku.com 
jim.pflugr...@rigaku.com wrote:

I have to ask flamingly: So what about CC1/2 and CC*?  

 

Did we not replace an arbitrary resolution cut-off based on a value of Rmerge 
with an arbitrary resolution cut-off based on a value of Rmeas already?  And 
now we are going to replace that with an arbitrary resolution cut-off based on 
a value of CC* or is it CC1/2?

 

I am asked often:  What value of CC1/2 should I cut my resolution at?  What 
should I tell my students?  I've got a course coming up and I am sure they will 
ask me again.

 

Jim

 


  _  


From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Arka Chakraborty 
[arko.chakrabort...@gmail.com]
Sent: Tuesday, August 27, 2013 7:45 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] Resolution, R factors and data quality

Hi all,

does this not again bring up the still prevailing adherence to R factors and 
not  a shift to correlation coefficients ( CC1/2 and CC*) ? (as Dr. Phil Evans 
has indicated).?

The way we look at data quality ( by we I mean the end users ) needs to be 
altered, I guess.

best,

 

Arka Chakraborty

 

On Tue, Aug 27, 2013 at 9:50 AM, Phil Evans p...@mrc-lmb.cam.ac.uk wrote:

The question you should ask yourself is why would omitting data improve my 
model?

Phil



Re: [ccp4bb] Resolution, R factors and data quality

2013-08-28 Thread Phil Evans
We don't currently have a really good measure of that point where adding the 
extra shell of data adds significant information (whatever that means. 
However, my rough trials (see http://www.ncbi.nlm.nih.gov/pubmed/23793146) 
suggested that the exact cutoff point was not very critical, presumably as the 
information content fades out slowly, so it probably isn't something to 
agonise over too much. K  D's paired refinement may be useful though.

I would again caution against looking too hard at CC* rather than CC1/2: they 
are exactly equivalent, but CC* changes very rapidly at small values, which may 
be misleading. The purpose of CC* is for comparison with CCcryst (i.e. Fo to 
Fc).

I would remind any users of Scala who want to look back at old log files to see 
the statistics for the outer shell at the cutoff they used, that CC1/2 has been 
calculated in Scala for many years under the name CC_IMEAN. It's now called 
CC1/2 in Aimless (and Scala) following Kai's excellent suggestion.

Phil


On 28 Aug 2013, at 08:21, Bernhard Rupp hofkristall...@gmail.com wrote:

 Based on the simulations I've done the data should be cut at CC1/2 = 0. 
 Seriously. Problem is figuring out where it hits zero. 
  
 But the real objective is – where do data stop making an improvement to the 
 model. The categorical statement that all data is good
 is simply not true in practice. It is probably specific to each data set  
 refinement, and as long as we do not always run paired refinement ala KD
 or similar in order to find out where that point is, the yearning for a 
 simple number will not stop (although I believe automation will make the KD 
 approach or similar eventually routine).
  
 As for the resolution of the structure I'd say call that where |Fo-Fc| 
 (error in the map) becomes comparable to Sigma(Fo). This is I/Sigma = 2.5 if 
 Rcryst is 20%.  That is: |Fo-Fc| / Fo = 0.2, which implies |Io-Ic|/Io = 0.4 
 or Io/|Io-Ic| = Io/sigma(Io) = 2.5.
  
 Makes sense to me...
  
 As long as it is understood that this ‘model resolution value’ derived via 
 your argument from I/sigI is not the same as a I/sigI data cutoff (and that 
 Rcryst and Rmerge have nothing in common)….
  
 -James Holton
 MAD Scientist
  
 
 Best, BR
 
  
 
  
 
 
 On Aug 27, 2013, at 5:29 PM, Jim Pflugrath jim.pflugr...@rigaku.com wrote:
 
 I have to ask flamingly: So what about CC1/2 and CC*?  
  
 Did we not replace an arbitrary resolution cut-off based on a value of Rmerge 
 with an arbitrary resolution cut-off based on a value of Rmeas already?  And 
 now we are going to replace that with an arbitrary resolution cut-off based 
 on a value of CC* or is it CC1/2?
  
 I am asked often:  What value of CC1/2 should I cut my resolution at?  What 
 should I tell my students?  I've got a course coming up and I am sure they 
 will ask me again.
  
 Jim
  
 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Arka 
 Chakraborty [arko.chakrabort...@gmail.com]
 Sent: Tuesday, August 27, 2013 7:45 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] Resolution, R factors and data quality
 
 Hi all,
 does this not again bring up the still prevailing adherence to R factors and 
 not  a shift to correlation coefficients ( CC1/2 and CC*) ? (as Dr. Phil 
 Evans has indicated).?
 The way we look at data quality ( by we I mean the end users ) needs to be 
 altered, I guess.
 
 best,
  
 Arka Chakraborty
  
 On Tue, Aug 27, 2013 at 9:50 AM, Phil Evans p...@mrc-lmb.cam.ac.uk wrote:
 The question you should ask yourself is why would omitting data improve my 
 model?
 
 Phil


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-28 Thread Arka Chakraborty
Hi all,
 If I am not wrong, the Karplus  Diederich paper suggests that data is
generally meaningful upto CC1/2  value of 0.20 but they suggest a paired
refinement technique ( pretty easy to perform) to actually decide on the
resolution at which to cut the data. This will be the most prudent thing to
do I guess and not follow any arbitrary value, as each data-set is
different. But the fact remains that even where I/sigma(I) falls to 0.5
useful information remains which will improve the quality of the maps, and
when discarded just leads us a bit further away from  truth. However, as
always, Dr Diederich and Karplus will be the best persons to comment on
that ( as they have already done in the paper :) )

best,

Arka Chakraborty

p.s. Aimless seems to suggest a resolution limit bases on CC1/2=0.5
criterion ( which I guess is done to be on the safe side- Dr. Phil Evans
can explain if there are other or an entirely different reason to it! ).
But if we want to squeeze the most from our data-set,  I guess we need to
push a bit further sometimes :)


On Wed, Aug 28, 2013 at 9:21 AM, Bernhard Rupp hofkristall...@gmail.comwrote:

 **Based on the simulations I've done the data should be cut at CC1/2 =
 0. Seriously. Problem is figuring out where it hits zero. 

 ** **

 But the real objective is – where do data stop making an improvement to
 the model. The categorical statement that all data is good

 is simply not true in practice. It is probably specific to each data set 
 refinement, and as long as we do not always run paired refinement ala KD**
 **

 or similar in order to find out where that point is, the yearning for a
 simple number will not stop (although I believe automation will make the KD
 approach or similar eventually routine). 

 ** **

 As for the resolution of the structure I'd say call that where |Fo-Fc|
 (error in the map) becomes comparable to Sigma(Fo). This is I/Sigma = 2.5
 if Rcryst is 20%.  That is: |Fo-Fc| / Fo = 0.2, which implies |Io-Ic|/Io =
 0.4 or Io/|Io-Ic| = Io/sigma(Io) = 2.5.

 ** **

 Makes sense to me...

 ** **

 As long as it is understood that this ‘model resolution value’ derived via
 your argument from I/sigI is not the same as a I/sigI data cutoff (and
 that Rcryst and Rmerge have nothing in common)….

 ** **

 -James Holton

 MAD Scientist

 ** **

 Best, BR

 ** **

 ** **


 On Aug 27, 2013, at 5:29 PM, Jim Pflugrath jim.pflugr...@rigaku.com
 wrote:

 I have to ask flamingly: So what about CC1/2 and CC*?  

 ** **

 Did we not replace an arbitrary resolution cut-off based on a value of
 Rmerge with an arbitrary resolution cut-off based on a value of Rmeas
 already?  And now we are going to replace that with an arbitrary resolution
 cut-off based on a value of CC* or is it CC1/2?

 ** **

 I am asked often:  What value of CC1/2 should I cut my resolution at?
  What should I tell my students?  I've got a course coming up and I am sure
 they will ask me again.

 ** **

 Jim

 ** **
 --

 *From:* CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Arka
 Chakraborty [arko.chakrabort...@gmail.com]
 *Sent:* Tuesday, August 27, 2013 7:45 AM
 *To:* CCP4BB@JISCMAIL.AC.UK
 *Subject:* Re: [ccp4bb] Resolution, R factors and data quality

 Hi all,

 does this not again bring up the still prevailing adherence to R factors
 and not  a shift to correlation coefficients ( CC1/2 and CC*) ? (as Dr.
 Phil Evans has indicated).?

 The way we look at data quality ( by we I mean the end users ) needs to
 be altered, I guess.

 best,

 ** **

 Arka Chakraborty

 ** **

 On Tue, Aug 27, 2013 at 9:50 AM, Phil Evans p...@mrc-lmb.cam.ac.uk wrote:
 

 The question you should ask yourself is why would omitting data improve
 my model?

 Phil




-- 
*Arka Chakraborty*
*ibmb (Institut de Biologia Molecular de Barcelona)**
**BARCELONA, SPAIN**
*


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-28 Thread Phil Evans
Aimless does indeed calculate the point at which CC1/2 falls below 0.5 but I 
would not necessarily suggest that as the best cutoff point. Personally I 
would also look at I/sigI, anisotropy and completeness, but as I said at that 
point I don't think it makes a huge difference

Phil

On 28 Aug 2013, at 10:00, Arka Chakraborty arko.chakrabort...@gmail.com wrote:

 Hi all,
  If I am not wrong, the Karplus  Diederich paper suggests that data is 
 generally meaningful upto CC1/2  value of 0.20 but they suggest a paired 
 refinement technique ( pretty easy to perform) to actually decide on the 
 resolution at which to cut the data. This will be the most prudent thing to 
 do I guess and not follow any arbitrary value, as each data-set is different. 
 But the fact remains that even where I/sigma(I) falls to 0.5 useful 
 information remains which will improve the quality of the maps, and when 
 discarded just leads us a bit further away from  truth. However, as always, 
 Dr Diederich and Karplus will be the best persons to comment on that ( as 
 they have already done in the paper :) )
 
 best,
 
 Arka Chakraborty
 
 p.s. Aimless seems to suggest a resolution limit bases on CC1/2=0.5 criterion 
 ( which I guess is done to be on the safe side- Dr. Phil Evans can explain if 
 there are other or an entirely different reason to it! ). But if we want to 
 squeeze the most from our data-set,  I guess we need to push a bit further 
 sometimes :)
 
 
 On Wed, Aug 28, 2013 at 9:21 AM, Bernhard Rupp hofkristall...@gmail.com 
 wrote:
 Based on the simulations I've done the data should be cut at CC1/2 = 0. 
 Seriously. Problem is figuring out where it hits zero. 
 
  
 
 But the real objective is – where do data stop making an improvement to the 
 model. The categorical statement that all data is good
 
 is simply not true in practice. It is probably specific to each data set  
 refinement, and as long as we do not always run paired refinement ala KD
 
 or similar in order to find out where that point is, the yearning for a 
 simple number will not stop (although I believe automation will make the KD 
 approach or similar eventually routine).
 
  
 
 As for the resolution of the structure I'd say call that where |Fo-Fc| 
 (error in the map) becomes comparable to Sigma(Fo). This is I/Sigma = 2.5 if 
 Rcryst is 20%.  That is: |Fo-Fc| / Fo = 0.2, which implies |Io-Ic|/Io = 0.4 
 or Io/|Io-Ic| = Io/sigma(Io) = 2.5.
 
  
 
 Makes sense to me...
 
  
 
 As long as it is understood that this ‘model resolution value’ derived via 
 your argument from I/sigI is not the same as a I/sigI data cutoff (and that 
 Rcryst and Rmerge have nothing in common)….
 
  
 
 -James Holton
 
 MAD Scientist
 
  
 
 Best, BR
 
  
 
  
 
 
 On Aug 27, 2013, at 5:29 PM, Jim Pflugrath jim.pflugr...@rigaku.com wrote:
 
 I have to ask flamingly: So what about CC1/2 and CC*?  
 
  
 
 Did we not replace an arbitrary resolution cut-off based on a value of Rmerge 
 with an arbitrary resolution cut-off based on a value of Rmeas already?  And 
 now we are going to replace that with an arbitrary resolution cut-off based 
 on a value of CC* or is it CC1/2?
 
  
 
 I am asked often:  What value of CC1/2 should I cut my resolution at?  What 
 should I tell my students?  I've got a course coming up and I am sure they 
 will ask me again.
 
  
 
 Jim
 
  
 
 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Arka 
 Chakraborty [arko.chakrabort...@gmail.com]
 Sent: Tuesday, August 27, 2013 7:45 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] Resolution, R factors and data quality
 
 Hi all,
 
 does this not again bring up the still prevailing adherence to R factors and 
 not  a shift to correlation coefficients ( CC1/2 and CC*) ? (as Dr. Phil 
 Evans has indicated).?
 
 The way we look at data quality ( by we I mean the end users ) needs to be 
 altered, I guess.
 
 best,
 
  
 
 Arka Chakraborty
 
  
 
 On Tue, Aug 27, 2013 at 9:50 AM, Phil Evans p...@mrc-lmb.cam.ac.uk wrote:
 
 The question you should ask yourself is why would omitting data improve my 
 model?
 
 Phil
 
 
 
 
 -- 
 Arka Chakraborty
 ibmb (Institut de Biologia Molecular de Barcelona)
 BARCELONA, SPAIN


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-28 Thread Bernhard Rupp
 We don't currently have a really good measure of that point where adding
the extra shell of data adds significant information 
  so it probably isn't something to agonise over too much. K  D's paired
refinement may be useful though.

That seems to be a correct assessment of the situation and a forceful
argument to eliminate the
review nonsense of nitpicking on I/sigI values, associated R-merges, and
other
pseudo-statistics once and for good. We can now, thanks to data deposition,
at any time generate or download the maps and the models 
and judge for ourselves even minute details of local model quality from
there. 
As far as use and interpretation goes, when the model meets the map is where
the rubber meets the road.
I therefore make the heretic statement that the entire table 1 of data
collection statistics, justifiable in pre-deposition times 
as some means to guess structure quality can go the way of X-ray film and be
almost always eliminated from papers. 
There is nothing really useful in Table 1, and all its data items and more
are in the PDB header anyhow. 
Availability of maps for review and for users is the key point.

Cheers, BR


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-28 Thread Bosch, Juergen
What a statement !
Give reviewers maps, I agree however, what if the reviewer has no clue of these 
things we call structures ? I think for those people table 1 might still 
provide some justification. I would argue it should go into the supplement at 
least.

Jürgen 

Sent from my iPad

On Aug 28, 2013, at 5:58, Bernhard Rupp hofkristall...@gmail.com wrote:

 We don't currently have a really good measure of that point where adding
 the extra shell of data adds significant information 
 so it probably isn't something to agonise over too much. K  D's paired
 refinement may be useful though.
 
 That seems to be a correct assessment of the situation and a forceful
 argument to eliminate the
 review nonsense of nitpicking on I/sigI values, associated R-merges, and
 other
 pseudo-statistics once and for good. We can now, thanks to data deposition,
 at any time generate or download the maps and the models 
 and judge for ourselves even minute details of local model quality from
 there. 
 As far as use and interpretation goes, when the model meets the map is where
 the rubber meets the road.
 I therefore make the heretic statement that the entire table 1 of data
 collection statistics, justifiable in pre-deposition times 
 as some means to guess structure quality can go the way of X-ray film and be
 almost always eliminated from papers. 
 There is nothing really useful in Table 1, and all its data items and more
 are in the PDB header anyhow. 
 Availability of maps for review and for users is the key point.
 
 Cheers, BR


[ccp4bb] Protein Crystallography course via the web at Birkbeck College

2013-08-28 Thread Tracey Barrett

Dear all,
registration is currently open for the postgraduate certificate 
course in Protein Crystallography via the web at Birkbeck that begins on 
Monday, October the 7th. It is for the duration of 1 year during which all 
aspects of protein crystallography will be covered from the fundamentals 
of protein structure to validation. The emphasis is very much on 
techniques and the underlying principles so is ideally suited to those 
currently enroled on PhD programs or those who wish to expand their skills 
in structural biology. Information on registration and course content can 
be found at: http://px13.cryst.bbk.ac.uk/px/course/course.htm under 
General information or contact the course director (Tracey Barrett) at 
p...@mail.cryst.bbk.ac.uk for further details.


Although a stand-alone course, the postgraduate certificate in protein 
crystallography can also be taken as part of the MSc in Structural 
Molecular Biology. For more information, please see 
http://www.bbk.ac.uk/study/2013/postgraduate/programmes/TMSBISCL_C/



Dr Tracey Barrett,
Crystallography,
Senior Lecturer in Structural Biology,
Institute for Structural and Molecular Biology,
Birkbeck College,
Malet Street,
London WC1E 7HX
Tel: 020 7631 6822
Fax: 020 7631 6803


Re: [ccp4bb] Resolution, R factors and data quality

2013-08-28 Thread Pavel Afonine
Hi,

a random thought: the data resolution, d_min_actual, can be thought of as
such that maximizes the correlation (*) between the synthesis calculated
using your data and an equivalent Fmodel synthesis calculated using
complete set of Miller indices in d_min_actual-inf resolution range, where
d_min=d_min_actual and d_min is the highest resolution of data set in
question. Makes sense to me..

(*) or any other more appropriate similarity measure: usual map CC may not
be the best one in this context.

Pavel


On Tue, Aug 27, 2013 at 5:45 AM, Arka Chakraborty 
arko.chakrabort...@gmail.com wrote:

 Hi all,
 does this not again bring up the still prevailing adherence to R factors
 and not  a shift to correlation coefficients ( CC1/2 and CC*) ? (as Dr.
 Phil Evans has indicated).?
 The way we look at data quality ( by we I mean the end users ) needs to
 be altered, I guess.

 best,

 Arka Chakraborty

 On Tue, Aug 27, 2013 at 9:50 AM, Phil Evans p...@mrc-lmb.cam.ac.uk wrote:

 The question you should ask yourself is why would omitting data improve
 my model?

 Phil

 On 27 Aug 2013, at 02:49, Emily Golden 10417...@student.uwa.edu.au
 wrote:

  Hi All,
 
  I have collected diffraction images to 1 Angstrom resolution to the
 edge of the detector and 0.9A to the corner.I collected two sets, one
 for low resolution reflections and one for high resolution reflections.
  I get 100% completeness above 1A and 41% completeness in the 0.9A-0.95A
 shell.
 
  However, my Rmerge in the highest shelll is not good, ~80%.
 
  The Rfree is 0.17 and Rwork is 0.16 but the maps look very good.   If I
 cut the data to 1 Angstrom the R factors improve but I feel the maps are
 not as good and I'm not sure if I can justify cutting data.
 
  So my question is,  should I cut the data to 1Angstrom or should I keep
 the data I have?
 
  Also, taking geometric restraints off during refinement the Rfactors
 improve marginally, am I justified in doing this at this resolution?
 
  Thank you,
 
  Emily




 --
 *Arka Chakraborty*
 *ibmb (Institut de Biologia Molecular de Barcelona)**
 **BARCELONA, SPAIN**
 *



[ccp4bb] Quick resolution cutoff survey

2013-08-28 Thread Bosch, Juergen
Since we keep discussing resolution cutoffs and the benefits of not to include 
all data etc.
I thought I would crowd source your opinion on this particular data set.

processed with XDS, here's the XSCALE.LP output:

 SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE = -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION NUMBER OF REFLECTIONSCOMPLETENESS R-FACTOR  R-FACTOR 
COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
   LIMIT OBSERVED  UNIQUE  POSSIBLE OF DATA   observed  expected
  Corr

 9.4353651009  1028   98.2%   1.7%  2.1% 
5351   68.92 1.8%   100.0* 20.698 691
 6.67   101531756  1760   99.8%   2.2%  2.4%
10134   58.30 2.4%   100.0*   -120.6741404
 5.44   131142217  2223   99.7%   3.4%  3.5%
13097   42.38 3.7%99.9*   -100.7161845
 4.71   152732583  2592   99.7%   3.2%  3.1%
15259   46.24 3.5%99.9*   -140.7332212
 4.22   171832907  2934   99.1%   3.2%  3.2%
17173   45.14 3.6%99.9*   -160.7222538
 3.85   190103183  3217   98.9%   4.4%  4.1%
19000   37.38 4.8%99.9*   -160.7172794
 3.56   207643441  3473   99.1%   5.9%  5.6%
20752   30.36 6.5%99.9*   -130.7543061
 3.33   225163681  3712   99.2%   8.8%  8.5%
22507   22.60 9.7%99.7*   -110.7373293
 3.14   247353963  4001   99.1%  12.4% 13.0%
24725   16.7713.5%99.5*-80.6963565
 2.98   259314127  4161   99.2%  17.2% 18.1%
25924   12.8218.7%99.2*-60.7103751
 2.84   265214291  4386   97.8%  25.4% 26.9%
264959.2627.6%98.3*-40.6833809
 2.72   203573495  4592   76.1%  27.6% 29.3%
202777.9030.2%97.9* 00.7062826
 2.61   159172860  4768   60.0%  33.7% 35.0%
158396.4137.0%96.6*-20.6972171
 2.52   129492394  4944   48.4%  42.5% 45.1%
128774.9146.8%95.3* 00.6941692
 2.43   103101993  5097   39.1%  47.8% 50.7%
102304.0853.0%94.4*-20.6701295
 2.3681801693  5309   31.9%  56.4% 60.1% 
80793.1263.0%92.2*-20.671 961
 2.2960751381  5441   25.4%  69.9% 72.5% 
59712.2878.9%87.2*   -100.618 643
 2.2240011077  5610   19.2%  82.9% 81.9% 
38931.7896.0%80.7*-80.633 340
 2.162491 799  5771   13.8%  78.0% 83.6% 
23761.4792.9%75.9*-40.586 154
 2.11 786 367  59016.2% 103.0%106.4%  
6660.87   129.9%63.1*120.580  28
total  281631   49217 80920   60.8%   7.1%  7.2%   
280625   21.54 7.8%99.9*-70.706   39073

And here's the link so you can voice your opinion in a Survey Monkey. Results 
from this survey will be reported back to the CCP4bb.

http://www.surveymonkey.com/s/YNDKM6G

Thanks for your participation and no there's no iPad or iPod-touch to win, and 
you also don't have to disclose your email.

The survey has only two questions, one is just a click the other one you 
provide your opinion on your decision.

Thanks,

Jürgen

P.S low resolution shell starts at 44 Å - 9.43
P.P.S. the Table1 will be revealed once I report back the outcome of this 
survey.

..
Jürgen Bosch
Johns Hopkins University
Bloomberg School of Public Health
Department of Biochemistry  Molecular Biology
Johns Hopkins Malaria Research Institute
615 North Wolfe Street, W8708
Baltimore, MD 21205
Office: +1-410-614-4742
Lab:  +1-410-614-4894
Fax:  +1-410-955-2926
http://lupo.jhsph.edu






[ccp4bb] Position opening at RCSB PDB/Rutgers University- BIOCHEMICAL INFORMATION ANNOTATION SPECIALIST

2013-08-28 Thread Jasmine Young
The RCSB Protein Data Bank (www.rcsb.org) is a publicly accessible information portal for 
researchers and students interested in structural biology. At its center is the PDB archive-the sole 
international repository for the 3-dimensional structure data of biological macromolecules. These 
structures hold significant promise for the pharmaceutical and biotechnology industries in the 
search for new drugs and in efforts to understand the mysteries of human disease.


The primary mission of the RCSB PDB is to provide accurate, well-annotated data in the most 
timely and efficient way possible to facilitate new discoveries and scientific advances. The RCSB 
PDB processes, stores, and disseminates these important data, and develops the software tools needed 
to assist users in depositing and accessing structural information.


The RCSB Protein Data Bank at Rutgers University in Piscataway, NJ has an opening for a 
Biochemical Information  Annotation Specialist to curate and standardize macromolecular structures 
for distribution in the PDB archive. Annotation Specialists validate, annotate, and release 
structural entries in PDB archive. Annotation Specialists also communicate daily with members of the 
deposition community. The position is an academic position with state benefit. The salary is 
compatible with faculty level.


A background in macromolecular crystallography or small molecule crystallography is a strong 
advantage. Biological chemistry background (PhD, MS) is required. Experience with Linux computer 
systems and biological databases is preferred. The successful candidate should be self-motivated, 
pay close attention to details, possess strong written and oral communication skills, and meet 
deadlines.


This position offers the opportunity to participate in an exciting project with significant 
impact on the scientific community.


Please send resume (PDF preferred) to Dr. Jasmine Young at 
pdbj...@rcsb.rutgers.edu.


--

Jasmine Young, Ph.D.
RCSB Protein Data Bank
Assistant Research Professor
Lead biocurator
Center for Integrative Proteomics Research
Rutgers The State University of New Jersey
174 Frelinghuysen Rd
Piscataway, NJ 08854-8087

Email:  jas...@rcsb.rutgers.edu
Phone:  (848)-445-0103 ext 4920
Fax:(732)-445-4320



Re: [ccp4bb] Resolution, R factors and data quality

2013-08-28 Thread Bernhard Rupp
 what if the reviewer has no clue of these things we call structures ? I think 
 for those people table 1 might still provide some justification.

Someone who knows little about structures probably won’t appreciate the 
technical details in Table 1 either 

J rgen 

Sent from my iPad

On Aug 28, 2013, at 5:58, Bernhard Rupp hofkristall...@gmail.com wrote:

 We don't currently have a really good measure of that point where 
 adding
 the extra shell of data adds significant information
 so it probably isn't something to agonise over too much. K  D's 
 paired
 refinement may be useful though.
 
 That seems to be a correct assessment of the situation and a forceful 
 argument to eliminate the review nonsense of nitpicking on I/sigI 
 values, associated R-merges, and other pseudo-statistics once and for 
 good. We can now, thanks to data deposition, at any time generate or 
 download the maps and the models and judge for ourselves even minute 
 details of local model quality from there.
 As far as use and interpretation goes, when the model meets the map is 
 where the rubber meets the road.
 I therefore make the heretic statement that the entire table 1 of data 
 collection statistics, justifiable in pre-deposition times as some 
 means to guess structure quality can go the way of X-ray film and be 
 almost always eliminated from papers.
 There is nothing really useful in Table 1, and all its data items and 
 more are in the PDB header anyhow.
 Availability of maps for review and for users is the key point.
 
 Cheers, BR


[ccp4bb] 'table 1'

2013-08-28 Thread Tim Gruene
Hi all,

I wonder when the term 'Table 1' entered Newspeak. I heard students use it
rather recently, and to me it sounds derogative, as though they would treat that
table as a black box generated by some program and better not look at it.

The data statistics are an attempt to describe the quality of the actual data as
a result of an experiment. Whether or not this could be done in a better way is
not my point (most crystallographers with some experience will draw their
conclusions from the statistiscs), but people should realise its importance -
everything else in an article is merely interpretation, most of all the model
itself (which is not data, as many often confuse), and to a large extend even
the electron density map.

As I pointed out this is based on my personal impression, based on which I would
like to encourage people not to use the term 'Table 1'. Language has an
influence on how we think, so language should be kept from too much degradation.

All the best,
Tim

-- 
--
Dr Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A



signature.asc
Description: Digital signature


Re: [ccp4bb] 'table 1'

2013-08-28 Thread Bernhard Rupp
I am pleased to hear that Table 1 has finally entered the realm of
politically incorrect terms. Let me fire another insult at it:

 The data statistics are an attempt to describe the quality of the actual
data as a result of an experiment.

Unfortunately, table 1 does not achieve that objective. The statistics in
table 1 are a single global numbers limited to what we believe were the
primary Bragg components of the
diffraction pattern at the time of data processing. The diffraction
experiment is a much more complex, time dependent, etc. process. If you
truly care about the experiment,
demand raw image deposition. 

 everything else in an article is merely interpretation, most of all the
model itself (which is not data, as many often confuse), and to a large
extend even the electron density map.

I take issue with that (not just politically) incorrect and indiscriminate
insult towards electron density. Any SAD or similar experimental map from
decent model-independent phases firmly attests to the opposite. 

 which I would like to encourage people not to use the term 'Table 1'.
Language has an influence on how we think, so language should be kept from
too much degradation.

To this wonderful statement I have only one response:

i=0
do i=1,10
write (*,*) 'Table 1'
   i=i+1
end do  

BR

PS: Never thought Table one has so much fictional (and frictional)
potential. Just wait for Table 2, refinement statistics. 


--
--
Dr Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A


Re: [ccp4bb] Dependency of theta on n/d in Bragg's law

2013-08-28 Thread Ian Tickle
On 22 August 2013 07:54, James Holton jmhol...@lbl.gov wrote:

 Well, yes, but that's something of an anachronism.   Technically, a
 Miller index of h,k,l can only be a triplet of prime numbers (Miller, W.
  (1839). A treatise on crystallography. For J.  JJ Deighton.).  This is
 because Miller was trying to explain crystal facets, and facets don't have
 harmonics.  This might be why Bragg decided to put an n in there.  But
 it seems that fairly rapidly after people starting diffracting x-rays off
 of crystals, the Miller Index became generalized to h,k,l as integers,
 and we never looked back.


Yes but I think it would be a pity if we lost IMO the important distinction
in meaning between Miller indices as defined above as co-prime integers
and (for want of a better term) reflection indices as found in an MTZ
file.  For example, Stout  Jensen makes a careful distinction between them
(as I recall they call reflection indices something like general
indices: sorry I don't have my copy of S  J to hand to check their exact
terminology).

The confusion that can arise by referring to reflection indices as
Miller indices is well illustrated if you try to explain Bragg's equation
to a novice, because the d in the equation (i.e. n lambda = 2d
sin[theta]) is the interplanar separation for planes as calculated from
their Miller indices, whereas the theta is of course the theta angle as
calculated from the corresponding reflection indices.  If you say that
Miller  reflection indices are the same thing you have a hard time
explaining the equation!  One obvious way out of the dilemma is to drop the
n term (so now lambda = 2d sin[theta]) and then redefine d as d/n so
the new d is calculated from the same reflection indices as theta, and the
Miller indices don't enter into it.  But then you have to explain to your
novice why you know better than a Nobel prizewinner!  As you say Bragg no
doubt had a good reason to include the n (i.e. to make the connection
between the macroscopic properties of a crystal and its diffraction
pattern).

Sorry for coming into this discussion somewhat late!

Cheers

-- Ian


[ccp4bb] ALS Call for General User Proposals - DEADLINE SEPTEMBER 4 , 2013

2013-08-28 Thread Banumathi Sankaran

 The deadline for Jan/July 2014 Collaborative
  Crystallography proposals will be *Sep 4, 2013. *

  
 
  Through the Collaborative Crystallography Program  (CC) at the
  Advanced Light Source (ALS), scientists can send protein crystals
  to Berkeley Center for Structural Biology (BCSB) staff researchers
  for data collection and analysis. The CC Program can provide a
  number of benefits to researchers:

* Obtain high quality data and analysis through collaborating with
 expert beamline
  researchers;
* Rapid turn around on projects; and
* Reduced travel costs.

  To apply, please submit  a proposal through the ALS General User
  proposal review process for beamtime allocation. Proposals are
  reviewed and ranked by the Proposal Study Panel, and beamtime is
  allocated accordingly. BCSB staff schedule the CC projects on
  Beamlines 5.0.1 and 5.0.2 to fit into the available resources. Only
  non-proprietary projects will be accepted. As a condition of
  participation, BCSB staff researchers who participate in data
  collection and/or analysis must be appropriately acknowledged -
  typically being included as authors on publications and in PDB
  depositions. Please consult the website for additional information at:

  http://bcsb.als.lbl.gov/wiki/index.php/Collaborative_Crystallography

  
 -




 How To Apply:


Please follow the instructions for proposal submission at:
 http://www-als.lbl.gov/index.php/user-information/user-guide/58.html
  Scroll down to *Structural Biology beamlines (includes protein SAXS)*
  and click on New Proposal.  Enter your proposal information.



  Regards,
 Banumathi Sankaran



Re: [ccp4bb] Quick resolution cutoff survey

2013-08-28 Thread Bosch, Juergen
Dear CCP4bb,

almost 12h have passed since I posted this question to the board. Since some of 
us get daily or weekly digests I will hold off with revealing the results. But 
the replies thus far are really interesting and exciting, and this is without 
any sarcasm tags. However we have only sampled 1.5% of the CCP4 community thus 
far.

I will try to have an almost complete reply compiled to all questions raised in 
the comments box, say in about one week.

Thanks to all those that participated,

Jürgen

..
Jürgen Bosch
Johns Hopkins University
Bloomberg School of Public Health
Department of Biochemistry  Molecular Biology
Johns Hopkins Malaria Research Institute
615 North Wolfe Street, W8708
Baltimore, MD 21205
Office: +1-410-614-4742
Lab:  +1-410-614-4894
Fax:  +1-410-955-2926
http://lupo.jhsph.edu

On Aug 28, 2013, at 11:32 AM, Bosch, Juergen wrote:

Since we keep discussing resolution cutoffs and the benefits of not to include 
all data etc.
I thought I would crowd source your opinion on this particular data set.

processed with XDS, here's the XSCALE.LP output:

 SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE = -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION NUMBER OF REFLECTIONSCOMPLETENESS R-FACTOR  R-FACTOR 
COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
   LIMIT OBSERVED  UNIQUE  POSSIBLE OF DATA   observed  expected
  Corr

 9.4353651009  1028   98.2%   1.7%  2.1% 
5351   68.92 1.8%   100.0* 20.698 691
 6.67   101531756  1760   99.8%   2.2%  2.4%
10134   58.30 2.4%   100.0*   -120.6741404
 5.44   131142217  2223   99.7%   3.4%  3.5%
13097   42.38 3.7%99.9*   -100.7161845
 4.71   152732583  2592   99.7%   3.2%  3.1%
15259   46.24 3.5%99.9*   -140.7332212
 4.22   171832907  2934   99.1%   3.2%  3.2%
17173   45.14 3.6%99.9*   -160.7222538
 3.85   190103183  3217   98.9%   4.4%  4.1%
19000   37.38 4.8%99.9*   -160.7172794
 3.56   207643441  3473   99.1%   5.9%  5.6%
20752   30.36 6.5%99.9*   -130.7543061
 3.33   225163681  3712   99.2%   8.8%  8.5%
22507   22.60 9.7%99.7*   -110.7373293
 3.14   247353963  4001   99.1%  12.4% 13.0%
24725   16.7713.5%99.5*-80.6963565
 2.98   259314127  4161   99.2%  17.2% 18.1%
25924   12.8218.7%99.2*-60.7103751
 2.84   265214291  4386   97.8%  25.4% 26.9%
264959.2627.6%98.3*-40.6833809
 2.72   203573495  4592   76.1%  27.6% 29.3%
202777.9030.2%97.9* 00.7062826
 2.61   159172860  4768   60.0%  33.7% 35.0%
158396.4137.0%96.6*-20.6972171
 2.52   129492394  4944   48.4%  42.5% 45.1%
128774.9146.8%95.3* 00.6941692
 2.43   103101993  5097   39.1%  47.8% 50.7%
102304.0853.0%94.4*-20.6701295
 2.3681801693  5309   31.9%  56.4% 60.1% 
80793.1263.0%92.2*-20.671 961
 2.2960751381  5441   25.4%  69.9% 72.5% 
59712.2878.9%87.2*   -100.618 643
 2.2240011077  5610   19.2%  82.9% 81.9% 
38931.7896.0%80.7*-80.633 340
 2.162491 799  5771   13.8%  78.0% 83.6% 
23761.4792.9%75.9*-40.586 154
 2.11 786 367  59016.2% 103.0%106.4%  
6660.87   129.9%63.1*120.580  28
total  281631   49217 80920   60.8%   7.1%  7.2%   
280625   21.54 7.8%99.9*-70.706   39073

And here's the link so you can voice your opinion in a Survey Monkey. Results 
from this survey will be reported back to the CCP4bb.

http://www.surveymonkey.com/s/YNDKM6G

Thanks for your participation and no there's no iPad or iPod-touch to win, and 
you also don't have to disclose your email.

The survey has only two questions, one is just a click the other one you 
provide your opinion on your decision.

Thanks,

Jürgen

P.S low resolution shell starts at 44 Å - 9.43
P.P.S. the Table1 will be revealed once I report back the outcome of this 
survey.

..
Jürgen Bosch
Johns Hopkins University
Bloomberg School of Public Health
Department of Biochemistry  Molecular Biology
Johns Hopkins Malaria Research Institute
615 North Wolfe Street, W8708
Baltimore, MD 

Re: [ccp4bb] Resolution, R factors and data quality

2013-08-28 Thread Stefan Gajewski
Jim,

This is coming from someone who just got enlightened a few weeks ago on 
resolution cut-offs.

I am asked often:  What value of CC1/2 should I cut my resolution at? 

The KD paper mentioned that the CC(1/2) criterion loses its significance at ~9 
according to student test.

I doubt that this can be a generally true guideline for a resolution cut-off. 
The structures I am doing right now were cut off at ~20 to ~80 CC(1/2)

You probably do not want to do the same mistake again, we all made before, when 
cutting resolution based on Rmerge/Rmeas, do you?


 What should I tell my students?  I've got a course coming up and I am sure 
 they will ask me again.

This is actually the more valuable insight I got from the KD paper. You don't 
use the CC(1/2) as an absolute indicator but rather as an suggestion. The 
resolution limit is determined by the refinement, not by the data processing.

I think I will handle my data in future as follows:

Bins with CC(1/2) less than 9 should be initially excluded.

The structure is then refined against all reflections in the file and only 
those bins that add information to the map/structure are kept in the final 
rounds. In most cases this will probably be more than CC(1/2) 25. If the last 
shell (CC~9) still adds information to the model, process the images again, 
e.g. till CC(1/2) drops to 0, and see if some more useful information is in 
there. You could also go ahead and use CC(1/2) 0 as initial cut-off, but I 
think that will rather increase computation time than help your structure in 
most cases.


So yes, I would feel comfortable with giving true resolution limits based on 
the refinement of the model, and not based on any number derived from data 
processing. In the end, you can always say  I tried it and this was the 
highest resolution I could model vs. I cut at _numerical value X of this 
parameter_ because everybody else does so.