date:20101026

Re: [ccp4bb] diverging Rcryst and Rfree [SEC=UNCLASSIFIED]

2010-10-26 Thread Bernhard Rupp (Hofkristallrat a.D.)

 This rule of thumb has proven successful in providing a defined end point
for building and refining a structure.

 

Hmmm. I always thought things like no more significant explainable
(difference) density define endpoints

in model building and not R-values. This strategy has proven successful in
nailing ligand structures where

R-value rules of thumb were used to define the end points.

 

Cheers, BR

Re: [ccp4bb] diverging Rcryst and Rfree

2010-10-26 Thread Alexandre OURJOUMTSEV

Dear Rakesh, dear Artem,

Since the initial question is not precise (which kind of comments are 
expected?)  I may mention that the most frequent values of R, Rfree and DeltaR 
(that is asked about) are given in our work published in 2009 in Acta Cryst., 
D65, 1283-1291. Interestingly, they are practically linear functions of 
log(resolution). The plots show also the statistics of deviation from these 
lines.

Best regards,

Sacha Urzhumtsev
Universities of Strasbourg  Nancy


De : CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] De la part de Artem 
Evdokimov
Envoyé : mardi 26 octobre 2010 03:36
À : CCP4BB@JISCMAIL.AC.UK
Objet : Re: [ccp4bb] diverging Rcryst and Rfree

http://www.mail-archive.com/ccp4bb@jiscmail.ac.uk/msg04677.html

as well as some notes in the older posts :)

As a very basic rule of thumb, Rfree-Rwork tends to be around Rmerge for the 
dataset for refinements that are not overfitted.

Artem
On Mon, Oct 25, 2010 at 4:10 PM, Rakesh Joshi 
rjo...@purdue.edumailto:rjo...@purdue.edu wrote:
Hi all,

Can anyone comment, in general, on diverging Rcryst and Rfree values(say7%) for
structures with kind of low resolutions(2.5-2.9 angstroms)?

Thanks
RJ

Re: [ccp4bb] Babinet solvent correction / comment for imperfect models

2010-10-26 Thread Alexandre OURJOUMTSEV

Dear all,

Sorry to come back late, when the discussion is over, with one more remark 
relevant to the bulk solvent modeling.
I've just got a comment by a colleague of mine, Adam Ben-Shem, who kindly 
agreed that I post his message (below) to CCP4bb.

 I think he is completely right saying that the models we discussed were 
relevant to the well-solved structures and practically complete atomic models. 
In the process of structure solution the situation may be different, as he saw 
in his own practice (in particular when solving 80S ribosome).

With best regards,
Sacha Urzhumtsev
Universities of Strasbourg  Nancy

== message by Adam Ben-Shem 

I think the discussion of bulk solvent correction should be divided into three 
parts.
Part one - bulk solvent correction for the final model. In this case, the 
physical meaning of the mask model is clear and this is obviously the right 
way to apply the correction.

Part two - bulk solvent correction of partial models. These can be models with 
large flexible domains or models coming from bad maps where building is a very 
iterative process. In these cases the physical meaning of the mask model that 
Pavel is so worried about vanishes. In some extreme cases I can imagine that 
Babinet bulk solvent correction would be better than the mask model. As I 
told you before I suggest a better solution for these cases and that is to 
calculate the mask for the mask model using density modification.

Part three - density modification following refinement. From my own experience, 
for very partial models, phases and FOMs for this procedure should come from 
model alone (without bulk solvent) and let density modification define bulk 
solvent for itself. Bulk solvent correction is still important for the 
refinement process to produce the best model but then the input to density 
modification process should not include bulk solvent correction (and in 
very-very partial models  FOMs should be calculated by old SIGMAA program over 
all reflections and not by refinement program using R-free reflections alone).

Adam

Re: [ccp4bb] diverging Rcryst and Rfree

2010-10-26 Thread Tom Oldfield


Rakesh

Looking at the
http://www.pdbe.org/statistics   
  Structure Statistics


Then for all structures a we see that Rdiff of   0.7 is not that uncommon
with this about 1 sigma away from the mean value of 0.4 for all structutures
and 0.45 for your resolution range

For structures with Rdiff range  of 0.7-0.8 and resolution 2.7 - 2.9  we see
that there 212 structures.

If I edit the query in PDBeDatabase to your exact requirement ranges 
then there

are 1353 example structures out of 53616 examples where this data exists.

My comment is that this is little worse than average, but not 
particularly a problem.


Tom

Hi all,

Can anyone comment, in general, on diverging Rcryst and Rfree values(say7%) for 
structures with kind of low resolutions(2.5-2.9 angstroms)?


Thanks
RJ

Re: [ccp4bb] diverging Rcryst and Rfree

2010-10-26 Thread Ian Tickle

I'm not sure that a lttle worse is the appropriate description -  I
think all you can say is that it deviates from the average when only
resolution is used to define what is average.  My point is that a
number of other factors are known to be involved and so you can't say
that this deviation is worse until you have taken them all into
account.

Specifically, it's known that the expected value of the ratio
Rfree/Rwork (i.e. expected on the basis of the null hypothesis that
the structure is correct and complete and the only errors are random
experimental errors) is directly related to the ratio (no of
significant experimental observations) / (effective no of refined
parameters), where effective here means taking into account the
restraints.

The number of significant observations will clearly depend on a number
of factors, not only resolution, but also solvent content (which you
obviously can't control unless you use a different crystal form), and
data completeness (which you can control up to a point by optimising
the data collection strategy).

The effective no of parameters obviously depends only on the
parameter/restraint model and the weights, both of which you have full
control of, and therefore the effective no of parameters should be
completely determined if the parameter/restraint model that has been
selected is optimal.

Finally, in order to compute Rfree/Rwork from Rdiff = Rfree-Rwork,
Rwork itself must be specified, i.e.:

Rfree/Rwork = (Rfree-Rwork) / Rwork - 1 = Rdiff/Rwork - 1

It's actually much easier to work with Rfree/Rwork instead of Rdiff,
because then you don't need to specify a particular value of Rwork,
and you have one less variable to worry about in the factor analysis.

So assuming the model is optimal, the major factors in addition to
resolution which control the expected value of Rdiff are the solvent
content, the data completeness and Rwork.  The value of Rwork obtained
for an optimal model on convergence is obviously related to the data
quality (e.g. mean I/sig(I)), and of course the resolution.

The bottom line is that unless we are given a lot more information
it's not possible to say whether a specific value of Rdiff deviates
significantly from the expected value.

Cheers

-- Ian


 Then for all structures a we see that Rdiff of   0.7 is not that uncommon
 with this about 1 sigma away from the mean value of 0.4 for all structutures
 and 0.45 for your resolution range

 For structures with Rdiff range  of 0.7-0.8 and resolution 2.7 - 2.9  we see
 that there 212 structures.

 If I edit the query in PDBeDatabase to your exact requirement ranges then
 there
 are 1353 example structures out of 53616 examples where this data exists.

 My comment is that this is little worse than average, but not particularly a
 problem.

 Tom

 Hi all,

 Can anyone comment, in general, on diverging Rcryst and Rfree
 values(say7%) for structures with kind of low resolutions(2.5-2.9
 angstroms)?

 Thanks
 RJ

Re: [ccp4bb] diverging Rcryst and Rfree

2010-10-26 Thread Eleanor Dodson


I would expect such a difference with lowish resolution data.

Your model will be biased towards the restraints - ie the geometry 
willbe good, but there is bearly enough observations to fit the 
actualmodel properly. eg - it will be hard to position solvent, and to 
recognise any deviaions from NCS.

 So dont be too surprised or worried..
Look at the maps - if they look clean then things are probably OK
Eleanor


 On 10/25/2010 10:44 PM, Jacqueline Vitali wrote:

Hi,

I have seen this happening when I had NCS and did not include it in
refinement.  Rwork drops and Rfree increases. In this case the difference
became small when I included the NCS.

Also if your Rmerge is high and you include all reflections in refinement,
Rfree is high.  In my experience, by excluding F  sigma reflections you
drop Rfree a lot.

My limited experience suggests errors in the data and/.or in the way you
handle  the data.

Jackie Vitali
On Mon, Oct 25, 2010 at 5:10 PM, Rakesh Joshirjo...@purdue.edu  wrote:


Hi all,

Can anyone comment, in general, on diverging Rcryst and Rfree
values(say7%) for
structures with kind of low resolutions(2.5-2.9 angstroms)?

Thanks
RJ

Re: [ccp4bb] rigorously compatible coordinate files

2010-10-26 Thread Eleanor Dodson


On 08/20/2010 05:50 PM, Charles W. Carter, Jr wrote:

Is there a program that will read in a pdb coordinate file and re-order the 
side chain atoms in each residue according to a standard order?

I've a program that compares two files for the same structure, but requires 
that the order of the atoms be the same in both cases. I'm using a variety of 
files in which the residue atoms are ordered either main chain first or 
side-chain first. I've not found a suitable program in the CCP4 suite, though 
one might exist. MOLEMAN2 doesn't seem suitable, either.

Thanks,

Charlie



Very old Q but PROCHECK does this I think..
Eleanor

Re: [ccp4bb] diverging Rcryst and Rfree

2010-10-26 Thread Ed Pozharski

Jackie,

please note that (at least imho) the desire to obtain better R-factors
does not justify excluding data from analysis.  Weak reflections that
you suggest should be rejected contain information, and excluding them
will indeed artificially lower the R-factors while reducing the accuracy
of your model.  

Cheers,

Ed.

On Mon, 2010-10-25 at 17:44 -0400, Jacqueline Vitali wrote:
 Also if your Rmerge is high and you include all reflections in
 refinement, Rfree is high.  In my experience, by excluding F  sigma
 reflections you drop Rfree a lot.



-- 
I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs

Re: [ccp4bb] diverging Rcryst and Rfree [SEC=UNCLASSIFIED]

2010-10-26 Thread Ian Tickle

Anthony,

Your rule actually works on the difference (Rfree - Rwork/2), not
(Rfree - Rwork) as you said, so is rather different from what most
people seem to be using.

For example let's say the current values are Rwork = 20, Rfree = 30,
so your current test value is (30 - 20/2) = 20.   Then according to
your rule Rwork = 18, Rfree = 29 is equally acceptable (29 - 18/2 =
20, i.e. same test value), whereas Rwork = 16, Rfree = 29 would not be
acceptable by your rule (29 - 16/2 = 21, so the test value is higher).
 Rwork = 18, Rfree = 28 would represent an improvement by your rule
(28 - 18/2 = 19, i.e. a lower test value).

You say this criterion provides a defined end-point, i.e. a minimum
in the test value above.  However wouldn't other linear combinations
of Rwork  Rfree also have a defined minimum value?  In particular
Rfree itself always has a defined minimum with respect to adding
parameters or changing the weights, so would also satisfy your
criterion.  There has to be some additional criterion that you are
relying on to select the particular linear combination (Rfree -
Rwork.2) over any of the other possible ones?

Cheers

-- Ian

On Tue, Oct 26, 2010 at 6:33 AM, DUFF, Anthony a...@ansto.gov.au wrote:


 One “rule of thumb” based on R and R-free divergence that I impress onto
 crystallography students is this:



 If a change in refinement strategy or parameters (eg loosening restraints,
 introducing TLS) or a round of addition of unimportant water molecules
 results in a reduction of R that is more than double the reduction in
 R-free, then don’t do it.



 This rule of thumb has proven successful in providing a defined end point
 for building and refining a structure.



 The rule works on the differential of R – R-free divergence.  I’ve noticed
 that some structures begin with a bigger divergence than others.  Different
 Rmerge might explain.



 Has anyone else found a student in a dark room carefully adding large
 numbers of partially occupied water molecules?







 Anthony

 Anthony Duff    Telephone: 02 9717 3493  Mob: 043 189 1076



 

 From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Artem
 Evdokimov
 Sent: Tuesday, 26 October 2010 1:45 PM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] diverging Rcryst and Rfree



 Not that rules of thumb always have to have a rationale, nor that they're
 always correct - but it would seem that noise in the data (of which Rmerge
 is an indicator) should have a significant relationship with the R:Rfree
 difference, since Rfree is not (should not be, if selected correctly)
 subject to noise fitting. This rule is easily broken if one refines against
 very noisy data (e.g. that last shell with Rmerge of 55% and I/sigmaI
 ratio of 1.3 is still good, right?) or if the structure is overfit. The
 rule is only an indicative one (i.e. one should get really worried if
 R-Rfree looks very different from Rmerge) and it breaks down at very high
 and very low resolution (more complete picture by GK and shown in BR's
 book).

 Since selection of data and refinement procedures is subject to the
 decisions of the practitioner, I suspect that the extreme divergence shown
 in the figures that you refer to is probably the result of our own
 collective decisions. I have no proof, but I suspect that if a large enough
 section of the PDB were to be re-refined using the same methods and the same
 data trimming practices, the spread would be considerably more narrow.
 That'd be somewhat hard to do - but may be doable now given the abundance of
 auto-building and auto-correcting algorithms.

 Artem

 On Mon, Oct 25, 2010 at 9:07 PM, Bernhard Rupp (Hofkristallrat a.D.)
 hofkristall...@gmail.com wrote:

 And the rationale for that rule being exactly what?



 For stats, see figures 12-23, 12-24

 http://www.ruppweb.org/garland/gallery/Ch12/index_2.htm



 br



 From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Artem
 Evdokimov
 Sent: Monday, October 25, 2010 6:36 PM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] diverging Rcryst and Rfree



 http://www.mail-archive.com/ccp4bb@jiscmail.ac.uk/msg04677.html

 as well as some notes in the older posts :)

 As a very basic rule of thumb, Rfree-Rwork tends to be around Rmerge for the
 dataset for refinements that are not overfitted.

 Artem

 On Mon, Oct 25, 2010 at 4:10 PM, Rakesh Joshi rjo...@purdue.edu wrote:

 Hi all,

 Can anyone comment, in general, on diverging Rcryst and Rfree values(say7%)
 for
 structures with kind of low resolutions(2.5-2.9 angstroms)?

 Thanks
 RJ

Re: [ccp4bb] diverging Rcryst and Rfree

2010-10-26 Thread Ian Tickle

Jackie

I agree completely with Ed (for once!), not only for the reasons he
gave, but also that it's valid to compare statistics such as
likelihood and R factors ONLY if only the model is varied.  Such a
comparison is not valid if the data used are varied (in this case you
are changing the data by deleting some of them).

Cheers

-- Ian

On Tue, Oct 26, 2010 at 2:37 PM, Ed Pozharski epozh...@umaryland.edu wrote:
 Jackie,

 please note that (at least imho) the desire to obtain better R-factors
 does not justify excluding data from analysis.  Weak reflections that
 you suggest should be rejected contain information, and excluding them
 will indeed artificially lower the R-factors while reducing the accuracy
 of your model.

 Cheers,

 Ed.

 On Mon, 2010-10-25 at 17:44 -0400, Jacqueline Vitali wrote:
 Also if your Rmerge is high and you include all reflections in
 refinement, Rfree is high.  In my experience, by excluding F  sigma
 reflections you drop Rfree a lot.



 --
 I'd jump in myself, if I weren't so good at whistling.
                               Julian, King of Lemurs

Re: [ccp4bb] Fill map/mask with dummy atoms?

2010-10-26 Thread Anastassis Perrakis


Apologies for not seen the original post, but

$warpbin/arp_warp  EOF
MODE MIRBUILD
FILES CCP4 MAPFIND 2fofc.map XYZOUT1 dummy.pdb
SYMM your_sym
CELL your_cell
RESOLUTION resol
MIRBUILD ATOMS atoms_in_protein MODELS 1 RESN DUM
END
EOF

will do it.
I recommend getting a map in a 0.3 A grid.

A.

and fill up the keywords
On Oct 25, 2010, at 21:16, Pavel Afonine wrote:


Hi Dirk,

may be too late... but (may be) better later than never -:)

Here is the working example of how you can do it. Note, the  
procedure just builds the dummy atoms in spheres with user-defined  
centers and radia. You can specify as many spheres as you wish.  
Dummy atoms clashing with model atoms or other dummy atoms will not  
be added. The procedure doesn't care about map or data (Fobs or  
whatever): it just geometrically adds dummy atoms where requested.  
Also note, it uses a PHENIX command line tool that is not  
specifically designed for this task but simply can do it with  
appropriate set of parameters.

Ok, that was a preambula -:) Now let's do it:

here is where all the example-files:
/net/cci/afonine/public_html/for_Dirk

The command

phenix.grow_density params

will creates this file with dummy atoms: dummies_DA.pdb

which in PyMol looks like this:

http://cci.lbl.gov/~afonine/for_Dirk/da_only.png

or superposed with the model:

http://cci.lbl.gov/~afonine/for_Dirk/da_plus_model.png

Note, the above command requires the data file (remember, this  
command is meant for something else), but if you have just a PDB  
file (it can be empty I guess), and don't have any data file, you  
can fake it just to run this command. To get fake Fobs:


phenix.fmodel model.pdb high_res=3 type=real r_free=0.1 label='F-obs'
mv model.pdb.mtz data.mtz

I guess this is it. Let me know if I can be of any help with this.

Pavel.


On 10/13/10 4:00 AM, Dirk Kostrewa wrote:


 Dear CCP4ers,

is there a program around that allows to fill an input map or mask  
with dummy atoms?


Best regards,

Dirk.

--

***
Dirk Kostrewa
Gene Center Munich, A5.07
Department of Biochemistry
Ludwig-Maximilians-Universität München
Feodor-Lynen-Str. 25
D-81377 Munich
Germany
Phone: +49-89-2180-76845
Fax: +49-89-2180-76999
E-mail:kostr...@genzentrum.lmu.de
WWW:www.genzentrum.lmu.de
***




P please don't print this e-mail unless you really need to
Anastassis (Tassos) Perrakis, Principal Investigator / Staff Member
Department of Biochemistry (B8)
Netherlands Cancer Institute,
Dept. B8, 1066 CX Amsterdam, The Netherlands
Tel: +31 20 512 1951 Fax: +31 20 512 1954 Mobile / SMS: +31 6 28 597791

Re: [ccp4bb] diverging Rcryst and Rfree

2010-10-26 Thread Frank von Delft


  b) very large Rmerge values:

  Rmerge  Rwork  Rfree  Rfree-Rwork Resolution
 -
  0.9990 0.1815 0.20860.0271 1.80  SG center, unpublished
  0.8700 0.1708 0.22700.0562 1.96  unpublished
  0.7700 0.1870 0.22970.0428 1.56
  0.7600 0.2380 0.26800.0300 2.50  SG center, unpublished
  0.7000 0.1700 0.22530.0553 1.71
  0.6400 0.2179 0.27150.0536 2.75  SG center, unpublished

The most disturbing to me is that of those with very large overall
Rmerge values, 3 come from structural genomics centers.


Is that less or more disturbing than that the other 50% come from not-SG 
centers?


Of course, the authors themselves may be willing to help correct the 
obvious typos -- which will presumably disappear forever once we can 
finally upload log files upon deposition (coming soon, I'm told).


On an unrelated note, it's reassuring to see sound statistical 
principles -- averages, large N, avoidance of small number-anecdotes, 
and such rot -- continue not to be abandoned in the politics of science 
funding, he said airily.


phx

Re: [ccp4bb] diverging Rcryst and Rfree

2010-10-26 Thread Ian Tickle

Yes! - the critical piece of information that we're missing is the
proportion of *all* structures that come from SG centres.  Only
knowing that can we do any serious statistics ...

-- Ian

On Tue, Oct 26, 2010 at 5:07 PM, Frank von Delft
frank.vonde...@sgc.ox.ac.uk wrote:
  b) very large Rmerge values:

      Rmerge  Rwork  Rfree  Rfree-Rwork Resolution
     -
      0.9990 0.1815 0.2086    0.0271     1.80  SG center, unpublished
      0.8700 0.1708 0.2270    0.0562     1.96  unpublished
      0.7700 0.1870 0.2297    0.0428     1.56
      0.7600 0.2380 0.2680    0.0300     2.50  SG center, unpublished
      0.7000 0.1700 0.2253    0.0553     1.71
      0.6400 0.2179 0.2715    0.0536     2.75  SG center, unpublished

 The most disturbing to me is that of those with very large overall
 Rmerge values, 3 come from structural genomics centers.

 Is that less or more disturbing than that the other 50% come from not-SG
 centers?

 Of course, the authors themselves may be willing to help correct the obvious
 typos -- which will presumably disappear forever once we can finally upload
 log files upon deposition (coming soon, I'm told).

 On an unrelated note, it's reassuring to see sound statistical principles --
 averages, large N, avoidance of small number-anecdotes, and such rot --
 continue not to be abandoned in the politics of science funding, he said
 airily.

 phx

[ccp4bb] Against Method (R)

2010-10-26 Thread Bernhard Rupp (Hofkristallrat a.D.)

Hi Folks,

Please allow me a few biased reflections/opinions on the numeRology of the
R-value (not R-factor, because it is neither a factor itself nor does it
factor in anything but ill-posed reviewer's critique. Historically the term
originated from small molecule crystallography, but it is only a
'Residual-value')

a) The R-value itself - based on the linear residuals and of apparent
intuitive meaning - is statistically peculiar to say the least. I could not
find it in any common statistics text. So doing proper statistics with R
becomes difficult.

b) rules of thumb (as much as they conveniently obviate the need for
detailed explanations, satisfy student's desire for quick answers,  and
allow superficial review of manuscripts) become less valuable if they have a
case-dependent large variance, topped with an unknown parent distribution.
Combined with an odd statistic, that has great potential for misguidance and
unnecessarily lost sleep. 

c) Ian has (once again) explained that for example the Rf-R depends on the
exact knowledge of the restraints and their individual weighting, which we
generally do not have. Caution is advised.

d) The answer which model is better - which is actually what you want to
know - becomes a question of model selection or hypothesis testing, which,
given the obscurity of R cannot be derived with some nice plug-in method. As
Ian said the models to be compared must also be based on the same and
identical data.  

e) One measure available that is statistically at least defensible is the
log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes
factor (there is the darn factor again, its a ratio)) and see where this
falls - and the answers are pretty soft and, probably because of that,
correspondingly realistic. This also makes - based on statistics alone -
deciding between different overall parameterizations difficult. 

http://en.wikipedia.org/wiki/Bayes_factor

f) so having said that, what really remains is that the model that fits the
primary evidence (minimally biased electron density) best and is at the same
time physically meaningful, is the best model, i. e., all plausibly
accountable electron density (and not more) is modeled. You can convince
yourself of this by taking the most interesting part of the model out (say a
ligand or a binding pocket) and look at the R-values or do a model selection
test - the result will be indecisive.  Poof goes the global rule of thumb.

g) in other words: global measures in general are entirely inadequate to
judge local model quality (noted many times over already by Jones, Kleywegt,
others, in the dark ages of crystallography when poorly restrained
crystallographers used to passionately whack each other over the head with
unfree R-values).   

Best, BR
-
Bernhard Rupp, Hofkristallrat a.D.
001 (925) 209-7429
+43 (676) 571-0536
b...@ruppweb.org
hofkristall...@gmail.com
http://www.ruppweb.org/
--
Und wieder ein chillout-mix aus der Hofkristall-lounge
--

Re: [ccp4bb] Against Method (R)

2010-10-26 Thread Phoebe Rice

Another issue with these statistics is that the PDB insists on a single value 
of resolution no matter how anisotropic the data.  Especially in the 
outermost bins, Rmerge could be ridiculously high simply because the data only 
exist in one out of 3 directions.
   Phoebe

=
Phoebe A. Rice
Dept. of Biochemistry  Molecular Biology
The University of Chicago
phone 773 834 1723
http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123
http://www.rsc.org/shop/books/2008/9780854042722.asp


 Original message 
Date: Tue, 26 Oct 2010 09:46:46 -0700
From: CCP4 bulletin board CCP4BB@JISCMAIL.AC.UK (on behalf of Bernhard Rupp 
(Hofkristallrat a.D.) hofkristall...@gmail.com)
Subject: [ccp4bb] Against Method (R)  
To: CCP4BB@JISCMAIL.AC.UK

Hi Folks,

Please allow me a few biased reflections/opinions on the numeRology of the
R-value (not R-factor, because it is neither a factor itself nor does it
factor in anything but ill-posed reviewer's critique. Historically the term
originated from small molecule crystallography, but it is only a
'Residual-value')

a) The R-value itself - based on the linear residuals and of apparent
intuitive meaning - is statistically peculiar to say the least. I could not
find it in any common statistics text. So doing proper statistics with R
becomes difficult.

b) rules of thumb (as much as they conveniently obviate the need for
detailed explanations, satisfy student's desire for quick answers,  and
allow superficial review of manuscripts) become less valuable if they have a
case-dependent large variance, topped with an unknown parent distribution.
Combined with an odd statistic, that has great potential for misguidance and
unnecessarily lost sleep. 

c) Ian has (once again) explained that for example the Rf-R depends on the
exact knowledge of the restraints and their individual weighting, which we
generally do not have. Caution is advised.

d) The answer which model is better - which is actually what you want to
know - becomes a question of model selection or hypothesis testing, which,
given the obscurity of R cannot be derived with some nice plug-in method. As
Ian said the models to be compared must also be based on the same and
identical data.  

e) One measure available that is statistically at least defensible is the
log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes
factor (there is the darn factor again, it’s a ratio)) and see where this
falls - and the answers are pretty soft and, probably because of that,
correspondingly realistic. This also makes - based on statistics alone -
deciding between different overall parameterizations difficult. 

http://en.wikipedia.org/wiki/Bayes_factor

f) so having said that, what really remains is that the model that fits the
primary evidence (minimally biased electron density) best and is at the same
time physically meaningful, is the best model, i. e., all plausibly
accountable electron density (and not more) is modeled. You can convince
yourself of this by taking the most interesting part of the model out (say a
ligand or a binding pocket) and look at the R-values or do a model selection
test - the result will be indecisive.  Poof goes the global rule of thumb.

g) in other words: global measures in general are entirely inadequate to
judge local model quality (noted many times over already by Jones, Kleywegt,
others, in the dark ages of crystallography when poorly restrained
crystallographers used to passionately whack each other over the head with
unfree R-values).   

Best, BR
-
Bernhard Rupp, Hofkristallrat a.D.
001 (925) 209-7429
+43 (676) 571-0536
b...@ruppweb.org
hofkristall...@gmail.com
http://www.ruppweb.org/
--
Und wieder ein chillout-mix aus der Hofkristall-lounge
--

Re: [ccp4bb] Against Method (R)

2010-10-26 Thread Ethan Merritt

On Tuesday, October 26, 2010 09:46:46 am Bernhard Rupp (Hofkristallrat a.D.) 
wrote:
 Hi Folks,
 
 Please allow me a few biased reflections/opinions on the numeRology of the
 R-value (not R-factor, because it is neither a factor itself nor does it
 factor in anything but ill-posed reviewer's critique. Historically the term
 originated from small molecule crystallography, but it is only a
 'Residual-value')
 
 a) The R-value itself - based on the linear residuals and of apparent
 intuitive meaning - is statistically peculiar to say the least. I could not
 find it in any common statistics text. So doing proper statistics with R
 becomes difficult.

As WC Hamilton pointed out originally, two [properly weighted] R factors can
be compared by taking their ratio.  Significance levels can then be evaluated
using the standard F distribution.  A concise summary is given in chapter 9
of Prince's book, which I highly recommend to all crystallographers.

W C Hamilton Significance tests on the crystallographic R factor
Acta Cryst. (1965). 18, 502-510

Edward Prince Mathematical Techniques in Crystallography and Materials
Science. Springer-Verlag, 1982.

It is true that we normally indulge in the sloppy habit of paying attention
only to the unweighted R factor even though refinement programs report
both the weighted and unweighted versions.  (shelx users excepted :-)
But the weighted form is there also if you want to do statistical tests.

You are of course correct that this remains a global test, and as such
is of limited use in evaluating local properties of the model.

cheers,

Ethan




 b) rules of thumb (as much as they conveniently obviate the need for
 detailed explanations, satisfy student's desire for quick answers,  and
 allow superficial review of manuscripts) become less valuable if they have a
 case-dependent large variance, topped with an unknown parent distribution.
 Combined with an odd statistic, that has great potential for misguidance and
 unnecessarily lost sleep. 
 
 c) Ian has (once again) explained that for example the Rf-R depends on the
 exact knowledge of the restraints and their individual weighting, which we
 generally do not have. Caution is advised.
 
 d) The answer which model is better - which is actually what you want to
 know - becomes a question of model selection or hypothesis testing, which,
 given the obscurity of R cannot be derived with some nice plug-in method. As
 Ian said the models to be compared must also be based on the same and
 identical data.  
 
 e) One measure available that is statistically at least defensible is the
 log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes
 factor (there is the darn factor again, its a ratio)) and see where this
 falls - and the answers are pretty soft and, probably because of that,
 correspondingly realistic. This also makes - based on statistics alone -
 deciding between different overall parameterizations difficult. 
 
 http://en.wikipedia.org/wiki/Bayes_factor
 
 f) so having said that, what really remains is that the model that fits the
 primary evidence (minimally biased electron density) best and is at the same
 time physically meaningful, is the best model, i. e., all plausibly
 accountable electron density (and not more) is modeled. You can convince
 yourself of this by taking the most interesting part of the model out (say a
 ligand or a binding pocket) and look at the R-values or do a model selection
 test - the result will be indecisive.  Poof goes the global rule of thumb.
 
 g) in other words: global measures in general are entirely inadequate to
 judge local model quality (noted many times over already by Jones, Kleywegt,
 others, in the dark ages of crystallography when poorly restrained
 crystallographers used to passionately whack each other over the head with
 unfree R-values).   
 
 Best, BR
 -
 Bernhard Rupp, Hofkristallrat a.D.
 001 (925) 209-7429
 +43 (676) 571-0536
 b...@ruppweb.org
 hofkristall...@gmail.com
 http://www.ruppweb.org/
 --
 Und wieder ein chillout-mix aus der Hofkristall-lounge
 --
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Against Method (R)

2010-10-26 Thread Gerard Bricogne

Dear all,

Augustine, Confessions, Book 11 Chap. XIV, has it:

If no one ask of me, I know; if I wish to explain to him who asks, I
know not.

With best wishes,

Gerard.

--
On Tue, Oct 26, 2010 at 01:30:11PM -0500, Phoebe Rice wrote:
Another issue with these statistics is that the PDB insists on a single value
of resolution no matter how anisotropic the data. Especially in the
outermost bins, Rmerge could be ridiculously high simply because the data
only exist in one out of 3 directions.
Phoebe

=
Phoebe A. Rice
Dept. of Biochemistry Molecular Biology
The University of Chicago
phone 773 834 1723
http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123
http://www.rsc.org/shop/books/2008/9780854042722.asp

Original message
Date: Tue, 26 Oct 2010 09:46:46 -0700
From: CCP4 bulletin board CCP4BB@JISCMAIL.AC.UK (on behalf of Bernhard
Rupp (Hofkristallrat a.D.) hofkristall...@gmail.com)
Subject: [ccp4bb] Against Method (R)
To: CCP4BB@JISCMAIL.AC.UK

Hi Folks,

Please allow me a few biased reflections/opinions on the numeRology of the
R-value (not R-factor, because it is neither a factor itself nor does it
factor in anything but ill-posed reviewer's critique. Historically the term
originated from small molecule crystallography, but it is only a
'Residual-value')

a) The R-value itself - based on the linear residuals and of apparent
intuitive meaning - is statistically peculiar to say the least. I could not
find it in any common statistics text. So doing proper statistics with R
becomes difficult.

b) rules of thumb (as much as they conveniently obviate the need for
detailed explanations, satisfy student's desire for quick answers, and
allow superficial review of manuscripts) become less valuable if they have a
case-dependent large variance, topped with an unknown parent distribution.
Combined with an odd statistic, that has great potential for misguidance and
unnecessarily lost sleep.

c) Ian has (once again) explained that for example the Rf-R depends on the
exact knowledge of the restraints and their individual weighting, which we
generally do not have. Caution is advised.

d) The answer which model is better - which is actually what you want to
know - becomes a question of model selection or hypothesis testing, which,
given the obscurity of R cannot be derived with some nice plug-in method. As
Ian said the models to be compared must also be based on the same and
identical data.

e) One measure available that is statistically at least defensible is the
log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes
factor (there is the darn factor again, it’s a ratio)) and see where this
falls - and the answers are pretty soft and, probably because of that,
correspondingly realistic. This also makes - based on statistics alone -
deciding between different overall parameterizations difficult.

http://en.wikipedia.org/wiki/Bayes_factor

f) so having said that, what really remains is that the model that fits the
primary evidence (minimally biased electron density) best and is at the same
time physically meaningful, is the best model, i. e., all plausibly
accountable electron density (and not more) is modeled. You can convince
yourself of this by taking the most interesting part of the model out (say a
ligand or a binding pocket) and look at the R-values or do a model selection
test - the result will be indecisive. Poof goes the global rule of thumb.

g) in other words: global measures in general are entirely inadequate to
judge local model quality (noted many times over already by Jones, Kleywegt,
others, in the dark ages of crystallography when poorly restrained
crystallographers used to passionately whack each other over the head with
unfree R-values).

Best, BR
-
Bernhard Rupp, Hofkristallrat a.D.
001 (925) 209-7429
+43 (676) 571-0536
b...@ruppweb.org
hofkristall...@gmail.com
http://www.ruppweb.org/
--
Und wieder ein chillout-mix aus der Hofkristall-lounge
--

===
* *
* Gerard Bricogne g...@globalphasing.com *
* *
* Global Phasing Ltd. *
* Sheraton House, Castle Park Tel: +44-(0)1223-353033 *
* Cambridge CB3 0AX, UK Fax: +44-(0)1223-366889 *
* *

Re: [ccp4bb] Against Method (R)

2010-10-26 Thread Bernhard Rupp (Hofkristallrat a.D.)

W C Hamilton Significance tests on the crystallographic R factor
Acta Cryst. (1965). 18, 502-510

Interestingly enough, I have used the Hamilton tests in Rietveld powder
refinements of small molecules/intermetallics before before. One problem
were partial occupancies vs split conformations in HT superconductors.
Alas, you cannot cheat there either - most of the time the results showed
that numerically the differences were not significant, and one again had to
resort to non-statistical plausibility arguments of references.

Has anyone done Hamiltons on different protein models/parameterizations and
can report?  I think for global parameterization changes like NCS,TLS, etc
that may in fact be interesting.

BR

Re: [ccp4bb] Against Method (R)

2010-10-26 Thread Ian Tickle

Indeed, see: http://scripts.iucr.org/cgi-bin/paper?a07175 .

The Rfree/Rwork ratio that I referred to does strictly use the
weighted ('Hamilton') R-factors, but because only the unweighted
values are given in the PDB we were forced to approximate (against our
better judgment!).

The problem of course is that all refinement software AFAIK writes the
unweighted Rwork  Rfree to the PDB header; there are no slots for the
weighted values, which does indeed make doing serious statistics on
the PDB entries difficult if not impossible!

The unweighted crystallographic R-factor was only ever intended as a
rule of thumb, i.e. to give a rough idea of the relative quality of
related structures; I hardly think the crystallographers of yesteryear
ever imagined that we would be taking it so seriously now!

In particular IMO it should never be used for something as critical as
validation (either global or local), or for guiding refinement
strategy: use the likelihood instead.

Cheers

-- Ian

PS I've always known it as an 'R-factor', e.g. see paper referenced
above, but then during my crystallographic training I used extensively
software developed by both authors of the paper (i.e. Geoff Ford  the
late John Rollett) in Oxford (which eventually became the 'Crystals'
small-molecule package).  Maybe it's a transatlantic thing ...

Cheers

-- Ian

On Tue, Oct 26, 2010 at 7:28 PM, Ethan Merritt merr...@u.washington.edu wrote:
 On Tuesday, October 26, 2010 09:46:46 am Bernhard Rupp (Hofkristallrat a.D.) 
 wrote:
 Hi Folks,

 Please allow me a few biased reflections/opinions on the numeRology of the
 R-value (not R-factor, because it is neither a factor itself nor does it
 factor in anything but ill-posed reviewer's critique. Historically the term
 originated from small molecule crystallography, but it is only a
 'Residual-value')

 a) The R-value itself - based on the linear residuals and of apparent
 intuitive meaning - is statistically peculiar to say the least. I could not
 find it in any common statistics text. So doing proper statistics with R
 becomes difficult.

 As WC Hamilton pointed out originally, two [properly weighted] R factors can
 be compared by taking their ratio.  Significance levels can then be evaluated
 using the standard F distribution.  A concise summary is given in chapter 9
 of Prince's book, which I highly recommend to all crystallographers.

 W C Hamilton Significance tests on the crystallographic R factor
 Acta Cryst. (1965). 18, 502-510

 Edward Prince Mathematical Techniques in Crystallography and Materials
 Science. Springer-Verlag, 1982.

 It is true that we normally indulge in the sloppy habit of paying attention
 only to the unweighted R factor even though refinement programs report
 both the weighted and unweighted versions.  (shelx users excepted :-)
 But the weighted form is there also if you want to do statistical tests.

 You are of course correct that this remains a global test, and as such
 is of limited use in evaluating local properties of the model.

        cheers,

                Ethan




 b) rules of thumb (as much as they conveniently obviate the need for
 detailed explanations, satisfy student's desire for quick answers,  and
 allow superficial review of manuscripts) become less valuable if they have a
 case-dependent large variance, topped with an unknown parent distribution.
 Combined with an odd statistic, that has great potential for misguidance and
 unnecessarily lost sleep.

 c) Ian has (once again) explained that for example the Rf-R depends on the
 exact knowledge of the restraints and their individual weighting, which we
 generally do not have. Caution is advised.

 d) The answer which model is better - which is actually what you want to
 know - becomes a question of model selection or hypothesis testing, which,
 given the obscurity of R cannot be derived with some nice plug-in method. As
 Ian said the models to be compared must also be based on the same and
 identical data.

 e) One measure available that is statistically at least defensible is the
 log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes
 factor (there is the darn factor again, it’s a ratio)) and see where this
 falls - and the answers are pretty soft and, probably because of that,
 correspondingly realistic. This also makes - based on statistics alone -
 deciding between different overall parameterizations difficult.

 http://en.wikipedia.org/wiki/Bayes_factor

 f) so having said that, what really remains is that the model that fits the
 primary evidence (minimally biased electron density) best and is at the same
 time physically meaningful, is the best model, i. e., all plausibly
 accountable electron density (and not more) is modeled. You can convince
 yourself of this by taking the most interesting part of the model out (say a
 ligand or a binding pocket) and look at the R-values or do a model selection
 test - the result will be indecisive.  Poof goes the global rule of

Re: [ccp4bb] Against Method (R)

2010-10-26 Thread Frank von Delft


  Um...

* Given that the weighted Rfactor is weighted by the measurement errors 
(1/sig^2)


* and given that the errors in our measurements apparently have no 
bearing whatsoever on the errors in our models (for macromolecular 
crystals, certainly - the R-vfactor gap)


is the weighted Rfactor even vaguely relevant for anything at all?

phx.



On 26/10/2010 20:44, Ian Tickle wrote:

Indeed, see: http://scripts.iucr.org/cgi-bin/paper?a07175 .

The Rfree/Rwork ratio that I referred to does strictly use the
weighted ('Hamilton') R-factors, but because only the unweighted
values are given in the PDB we were forced to approximate (against our
better judgment!).

The problem of course is that all refinement software AFAIK writes the
unweighted Rwork  Rfree to the PDB header; there are no slots for the
weighted values, which does indeed make doing serious statistics on
the PDB entries difficult if not impossible!

The unweighted crystallographic R-factor was only ever intended as a
rule of thumb, i.e. to give a rough idea of the relative quality of
related structures; I hardly think the crystallographers of yesteryear
ever imagined that we would be taking it so seriously now!

In particular IMO it should never be used for something as critical as
validation (either global or local), or for guiding refinement
strategy: use the likelihood instead.

Cheers

-- Ian

PS I've always known it as an 'R-factor', e.g. see paper referenced
above, but then during my crystallographic training I used extensively
software developed by both authors of the paper (i.e. Geoff Ford  the
late John Rollett) in Oxford (which eventually became the 'Crystals'
small-molecule package).  Maybe it's a transatlantic thing ...

Cheers

-- Ian

On Tue, Oct 26, 2010 at 7:28 PM, Ethan Merrittmerr...@u.washington.edu  wrote:

On Tuesday, October 26, 2010 09:46:46 am Bernhard Rupp (Hofkristallrat a.D.) 
wrote:

Hi Folks,

Please allow me a few biased reflections/opinions on the numeRology of the
R-value (not R-factor, because it is neither a factor itself nor does it
factor in anything but ill-posed reviewer's critique. Historically the term
originated from small molecule crystallography, but it is only a
'Residual-value')

a) The R-value itself - based on the linear residuals and of apparent
intuitive meaning - is statistically peculiar to say the least. I could not
find it in any common statistics text. So doing proper statistics with R
becomes difficult.

As WC Hamilton pointed out originally, two [properly weighted] R factors can
be compared by taking their ratio.  Significance levels can then be evaluated
using the standard F distribution.  A concise summary is given in chapter 9
of Prince's book, which I highly recommend to all crystallographers.

W C Hamilton Significance tests on the crystallographic R factor
Acta Cryst. (1965). 18, 502-510

Edward Prince Mathematical Techniques in Crystallography and Materials
Science. Springer-Verlag, 1982.

It is true that we normally indulge in the sloppy habit of paying attention
only to the unweighted R factor even though refinement programs report
both the weighted and unweighted versions.  (shelx users excepted :-)
But the weighted form is there also if you want to do statistical tests.

You are of course correct that this remains a global test, and as such
is of limited use in evaluating local properties of the model.

cheers,

Ethan





b) rules of thumb (as much as they conveniently obviate the need for
detailed explanations, satisfy student's desire for quick answers,  and
allow superficial review of manuscripts) become less valuable if they have a
case-dependent large variance, topped with an unknown parent distribution.
Combined with an odd statistic, that has great potential for misguidance and
unnecessarily lost sleep.

c) Ian has (once again) explained that for example the Rf-R depends on the
exact knowledge of the restraints and their individual weighting, which we
generally do not have. Caution is advised.

d) The answer which model is better - which is actually what you want to
know - becomes a question of model selection or hypothesis testing, which,
given the obscurity of R cannot be derived with some nice plug-in method. As
Ian said the models to be compared must also be based on the same and
identical data.

e) One measure available that is statistically at least defensible is the
log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes
factor (there is the darn factor again, it’s a ratio)) and see where this
falls - and the answers are pretty soft and, probably because of that,
correspondingly realistic. This also makes - based on statistics alone -
deciding between different overall parameterizations difficult.

http://en.wikipedia.org/wiki/Bayes_factor

f) so having said that, what really remains is that the model that fits the
primary evidence (minimally biased electron density) best and is at the same
time physically meaningful,

[ccp4bb] Help with Optimizing Crystals

2010-10-26 Thread Matthew Bratkowski

Hello.

I have obtained disk shaped crystals of a protein that I am working on.  I
got hits in about 10 different conditions, with a few common precipitants
and pHs, and I have optimized two conditions so far.  In the optimized
conditions, the crystals appear overnight, usually surrounded by or hiding
under heavy precipitant. Under the best conditions, I get what I would
describe as single disks, some of which are of decent size and very round,
that rotate light very well.  Sub-optimal conditions can give small to large
crystal clusters.  I shot the large disk crystals grown from one conditions
at the synchrotron. but they do not diffract.

I was wondering if anyone had any advice about optimizing these crystals in
order to get them to diffract better?  As mentioned before, I have only
tried optimizing a few of the hit conditions (varying precipitant conc., pH,
etc.), but crystals from all of the hits look the same: always round disks
or disk clusters.  This leads me to believe that optimized conditions of the
other hits will produce similar results as before.  Would it be worthwhile
to try optimizing these conditions as well?  I have also tried seeding,
which just produces a lot of clusters, and an additive screen.  Some of the
additives help to produce larger crystals, but again I always get single or
disk clusters.

Any advice would be helpful.

Thanks,
Matt

Re: [ccp4bb] Help with Optimizing Crystals

2010-10-26 Thread Van Den Berg, Bert

Hi Matt,

You'll probably get many different answers to a question like this, but what I 
would do is go back to your protein and make different constructs; chop off 
termini, surface mutations etc, maybe cleave off the tag. Of course more 
screening and optimization might work, but my sense is that since you get many 
hits pretty easily that however don't diffract, there may be something on the 
protein level that needs correcting.

Good luck, Bert


On 10/26/10 4:23 PM, Matthew Bratkowski mab...@cornell.edu wrote:

Hello.

I have obtained disk shaped crystals of a protein that I am working on.  I got 
hits in about 10 different conditions, with a few common precipitants and pHs, 
and I have optimized two conditions so far.  In the optimized conditions, the 
crystals appear overnight, usually surrounded by or hiding under heavy 
precipitant. Under the best conditions, I get what I would describe as single 
disks, some of which are of decent size and very round, that rotate light very 
well.  Sub-optimal conditions can give small to large crystal clusters.  I shot 
the large disk crystals grown from one conditions at the synchrotron. but they 
do not diffract.

I was wondering if anyone had any advice about optimizing these crystals in 
order to get them to diffract better?  As mentioned before, I have only tried 
optimizing a few of the hit conditions (varying precipitant conc., pH, etc.), 
but crystals from all of the hits look the same: always round disks or disk 
clusters.  This leads me to believe that optimized conditions of the other hits 
will produce similar results as before.  Would it be worthwhile to try 
optimizing these conditions as well?  I have also tried seeding, which just 
produces a lot of clusters, and an additive screen.  Some of the additives help 
to produce larger crystals, but again I always get single or disk clusters.

Any advice would be helpful.

Thanks,
Matt

Re: [ccp4bb] Help with Optimizing Crystals

2010-10-26 Thread Artem Evdokimov

First piece of advice I have is to shove them in the beam and see what
happens. A few days ago we got high-resolution data from crystals that are
shaped like eggs. No edges on them whatsoever. In the past, saucer-shaped
crystals diffracted to 2A whereas their hexagonal 'perfect' cousins (grown
from a different PEG, if memory serves) had Cheeseburger-strength
diffraction.

Secondly, if ordinary optimization attempts repeatedly fail, it may be time
for protein optimization, e.g. proteolysis, mutagenesis, methylation and so
forth :)

Artem

On Tue, Oct 26, 2010 at 3:23 PM, Matthew Bratkowski mab...@cornell.eduwrote:

 Hello.

 I have obtained disk shaped crystals of a protein that I am working on.  I
 got hits in about 10 different conditions, with a few common precipitants
 and pHs, and I have optimized two conditions so far.  In the optimized
 conditions, the crystals appear overnight, usually surrounded by or hiding
 under heavy precipitant. Under the best conditions, I get what I would
 describe as single disks, some of which are of decent size and very round,
 that rotate light very well.  Sub-optimal conditions can give small to large
 crystal clusters.  I shot the large disk crystals grown from one conditions
 at the synchrotron. but they do not diffract.

 I was wondering if anyone had any advice about optimizing these crystals in
 order to get them to diffract better?  As mentioned before, I have only
 tried optimizing a few of the hit conditions (varying precipitant conc., pH,
 etc.), but crystals from all of the hits look the same: always round disks
 or disk clusters.  This leads me to believe that optimized conditions of the
 other hits will produce similar results as before.  Would it be worthwhile
 to try optimizing these conditions as well?  I have also tried seeding,
 which just produces a lot of clusters, and an additive screen.  Some of the
 additives help to produce larger crystals, but again I always get single or
 disk clusters.

 Any advice would be helpful.

 Thanks,
 Matt

Re: [ccp4bb] Help with Optimizing Crystals

2010-10-26 Thread Jürgen Bosch

You did check on a gel that they are indeed your protein ?

If you have sufficient amounts available try digesting it with various 
proteases and see if you can identify a stable fragment.

A less radical approach, which might not be accessible to you, you could screen 
your protein for alternative buffer conditions using DSF and then pick a 
condition under which it seems to be very stable according to its melting 
temperature in the buffer.

You've spared us the details of your purification procedure, maybe a polishing 
step at the end with a SEC might do wonders.

Jürgen

-
Jürgen Bosch
Johns Hopkins Bloomberg School of Public Health
Department of Biochemistry  Molecular Biology
Johns Hopkins Malaria Research Institute
615 North Wolfe Street, W8708
Baltimore, MD 21205
Phone: +1-410-614-4742
Lab:  +1-410-614-4894
Fax:  +1-410-955-3655
http://web.mac.com/bosch_lab/

On Oct 26, 2010, at 4:23 PM, Matthew Bratkowski wrote:

 Hello.
 
 I have obtained disk shaped crystals of a protein that I am working on.  I 
 got hits in about 10 different conditions, with a few common precipitants and 
 pHs, and I have optimized two conditions so far.  In the optimized 
 conditions, the crystals appear overnight, usually surrounded by or hiding 
 under heavy precipitant. Under the best conditions, I get what I would 
 describe as single disks, some of which are of decent size and very round, 
 that rotate light very well.  Sub-optimal conditions can give small to large 
 crystal clusters.  I shot the large disk crystals grown from one conditions 
 at the synchrotron. but they do not diffract.
 
 I was wondering if anyone had any advice about optimizing these crystals in 
 order to get them to diffract better?  As mentioned before, I have only tried 
 optimizing a few of the hit conditions (varying precipitant conc., pH, etc.), 
 but crystals from all of the hits look the same: always round disks or disk 
 clusters.  This leads me to believe that optimized conditions of the other 
 hits will produce similar results as before.  Would it be worthwhile to try 
 optimizing these conditions as well?  I have also tried seeding, which just 
 produces a lot of clusters, and an additive screen.  Some of the additives 
 help to produce larger crystals, but again I always get single or disk 
 clusters.
 
 Any advice would be helpful.
 
 Thanks,
 Matt

Re: [ccp4bb] Help with Optimizing Crystals

2010-10-26 Thread Jacob Keller

Seeding! Make seeds, rescreen with seeds. Look in many former ccp4bb posts for 
references about this.
Jacob

  - Original Message - 
  From: Jürgen Bosch 
  To: CCP4BB@JISCMAIL.AC.UK 
  Sent: Tuesday, October 26, 2010 3:47 PM
  Subject: Re: [ccp4bb] Help with Optimizing Crystals

  You did check on a gel that they are indeed your protein ?

  If you have sufficient amounts available try digesting it with various 
proteases and see if you can identify a stable fragment.

  A less radical approach, which might not be accessible to you, you could 
screen your protein for alternative buffer conditions using DSF and then pick a 
condition under which it seems to be very stable according to its melting 
temperature in the buffer.

  You've spared us the details of your purification procedure, maybe a 
polishing step at the end with a SEC might do wonders.

  Jürgen

  -
  Jürgen Bosch
  Johns Hopkins Bloomberg School of Public Health
  Department of Biochemistry  Molecular Biology
  Johns Hopkins Malaria Research Institute
  615 North Wolfe Street, W8708
  Baltimore, MD 21205
  Phone: +1-410-614-4742
  Lab:  +1-410-614-4894
  Fax:  +1-410-955-3655
  http://web.mac.com/bosch_lab/

  On Oct 26, 2010, at 4:23 PM, Matthew Bratkowski wrote:

Hello.

I have obtained disk shaped crystals of a protein that I am working on.  I 
got hits in about 10 different conditions, with a few common precipitants and 
pHs, and I have optimized two conditions so far.  In the optimized conditions, 
the crystals appear overnight, usually surrounded by or hiding under heavy 
precipitant. Under the best conditions, I get what I would describe as single 
disks, some of which are of decent size and very round, that rotate light very 
well.  Sub-optimal conditions can give small to large crystal clusters.  I shot 
the large disk crystals grown from one conditions at the synchrotron. but they 
do not diffract.

I was wondering if anyone had any advice about optimizing these crystals in 
order to get them to diffract better?  As mentioned before, I have only tried 
optimizing a few of the hit conditions (varying precipitant conc., pH, etc.), 
but crystals from all of the hits look the same: always round disks or disk 
clusters.  This leads me to believe that optimized conditions of the other hits 
will produce similar results as before.  Would it be worthwhile to try 
optimizing these conditions as well?  I have also tried seeding, which just 
produces a lot of clusters, and an additive screen.  Some of the additives help 
to produce larger crystals, but again I always get single or disk clusters.

Any advice would be helpful.

Thanks,
Matt

***
Jacob Pearson Keller
Northwestern University
Medical Scientist Training Program
Dallos Laboratory
F. Searle 1-240
2240 Campus Drive
Evanston IL 60208
lab: 847.491.2438
cel: 773.608.9185
email: j-kell...@northwestern.edu
***

Re: [ccp4bb] Against Method (R)

2010-10-26 Thread Ethan Merritt

On Tuesday, October 26, 2010 01:16:58 pm Frank von Delft wrote:
Um...
 
 * Given that the weighted Rfactor is weighted by the measurement errors 
 (1/sig^2)
 
 * and given that the errors in our measurements apparently have no 
 bearing whatsoever on the errors in our models (for macromolecular 
 crystals, certainly - the R-vfactor gap)

You are overlooking causality :-)

Yes, the errors in state-of-the-art models are only weakly limited by the
errors in our measurements.  But that is exactly _because_ we can now weight
properly by the measurement errors (1/sig^2).  In my salad days,
weighting by 1/sig^2 was a mug's game.   Refinement only produced
a reasonable model if you applied empirical corrections rather than
statistical weights.  Things have improved a bit since then,
both on the equipment side (detectors, cryo, ...) and on the processing
side (Maximum Likelihood, error propagation, ...).  
Now the sigmas actually mean something!

 is the weighted Rfactor even vaguely relevant for anything at all?

Yes, it is.  It is the thing you are minimizing during refinement,
at least to first approximation.  Also, as just mentioned, it is a
well-defined value that you can do use for statistical significance
tests. 

Ethan


 
 phx.
 
 
 
 On 26/10/2010 20:44, Ian Tickle wrote:
  Indeed, see: http://scripts.iucr.org/cgi-bin/paper?a07175 .
 
  The Rfree/Rwork ratio that I referred to does strictly use the
  weighted ('Hamilton') R-factors, but because only the unweighted
  values are given in the PDB we were forced to approximate (against our
  better judgment!).
 
  The problem of course is that all refinement software AFAIK writes the
  unweighted Rwork  Rfree to the PDB header; there are no slots for the
  weighted values, which does indeed make doing serious statistics on
  the PDB entries difficult if not impossible!
 
  The unweighted crystallographic R-factor was only ever intended as a
  rule of thumb, i.e. to give a rough idea of the relative quality of
  related structures; I hardly think the crystallographers of yesteryear
  ever imagined that we would be taking it so seriously now!
 
  In particular IMO it should never be used for something as critical as
  validation (either global or local), or for guiding refinement
  strategy: use the likelihood instead.
 
  Cheers
 
  -- Ian
 
  PS I've always known it as an 'R-factor', e.g. see paper referenced
  above, but then during my crystallographic training I used extensively
  software developed by both authors of the paper (i.e. Geoff Ford  the
  late John Rollett) in Oxford (which eventually became the 'Crystals'
  small-molecule package).  Maybe it's a transatlantic thing ...
 
  Cheers
 
  -- Ian
 
  On Tue, Oct 26, 2010 at 7:28 PM, Ethan Merrittmerr...@u.washington.edu  
  wrote:
  On Tuesday, October 26, 2010 09:46:46 am Bernhard Rupp (Hofkristallrat 
  a.D.) wrote:
  Hi Folks,
 
  Please allow me a few biased reflections/opinions on the numeRology of the
  R-value (not R-factor, because it is neither a factor itself nor does it
  factor in anything but ill-posed reviewer's critique. Historically the 
  term
  originated from small molecule crystallography, but it is only a
  'Residual-value')
 
  a) The R-value itself - based on the linear residuals and of apparent
  intuitive meaning - is statistically peculiar to say the least. I could 
  not
  find it in any common statistics text. So doing proper statistics with R
  becomes difficult.
  As WC Hamilton pointed out originally, two [properly weighted] R factors 
  can
  be compared by taking their ratio.  Significance levels can then be 
  evaluated
  using the standard F distribution.  A concise summary is given in chapter 9
  of Prince's book, which I highly recommend to all crystallographers.
 
  W C Hamilton Significance tests on the crystallographic R factor
  Acta Cryst. (1965). 18, 502-510
 
  Edward Prince Mathematical Techniques in Crystallography and Materials
  Science. Springer-Verlag, 1982.
 
  It is true that we normally indulge in the sloppy habit of paying attention
  only to the unweighted R factor even though refinement programs report
  both the weighted and unweighted versions.  (shelx users excepted :-)
  But the weighted form is there also if you want to do statistical tests.
 
  You are of course correct that this remains a global test, and as such
  is of limited use in evaluating local properties of the model.
 
  cheers,
 
  Ethan
 
 
 
 
  b) rules of thumb (as much as they conveniently obviate the need for
  detailed explanations, satisfy student's desire for quick answers,  and
  allow superficial review of manuscripts) become less valuable if they 
  have a
  case-dependent large variance, topped with an unknown parent distribution.
  Combined with an odd statistic, that has great potential for misguidance 
  and
  unnecessarily lost sleep.
 
  c) Ian has (once again) explained that for example the Rf-R depends on the
  exact

Re: [ccp4bb] Help with Optimizing Crystals

2010-10-26 Thread Matthew Bratkowski

Hi.

Here is some additional information.

1.  The purification method that I used included Ni, tag cleavage, and SEC
as a final step.  I have tried samples from three different purification
batches that range in purity, and even the batch with the worst purity seems
to produce crystals.

2. The protein is a proteolyzed fragment since the full length version did
not crystallize.  Mutagenesis and methylation, however, may be techniques to
consider since the protein contains quite a few lysines.

3. There are not any detergents in the buffer, so these are not detergent
crystals.  The protein buffer just contains Tris at pH 8, NaCl, and DTT.

4. Some experiments that I have done thus far seem to suggest that the
crystals are protein.  Izit dye soaks well into the crystals, and the few
crystals that I shot previously did not produce any diffraction pattern
whatsoever.  However, I have had difficulty seeming them on a gel and they
are a bit tough to break.

5.  I tried seeding previously as follows: I broke some crystals, made a
seed stock, dipped in a hair, and did serial streak seeding.  After seeding,
I usually saw small disks or clusters along the path of the hair but nothing
larger or better looking.

I also had one more question.  Has anyone had an instance where changing the
precipitation condition or including an additive improved diffraction but
did not drastically change the shape of the protein?  If so, I may just try
further optimization with the current conditions and shoot some more
crystals.

Thanks for all the helpful advice thus far,
Matt

Re: [ccp4bb] Help with Optimizing Crystals

2010-10-26 Thread Jürgen Bosch


 
 Hi.
 
 Here is some additional information.
 
 1.  The purification method that I used included Ni, tag cleavage, and SEC as 
 a final step.  I have tried samples from three different purification batches 
 that range in purity, and even the batch with the worst purity seems to 
 produce crystals.
Resource Q ? two or more species perhaps ? Does it run as a monomer dimer 
multimer on your SEC ?

 
 2. The protein is a proteolyzed fragment since the full length version did 
 not crystallize.  Mutagenesis and methylation, however, may be techniques to 
 consider since the protein contains quite a few lysines.
 
 3. There are not any detergents in the buffer, so these are not detergent 
 crystals.  The protein buffer just contains Tris at pH 8, NaCl, and DTT.
 
 4. Some experiments that I have done thus far seem to suggest that the 
 crystals are protein.  Izit dye soaks well into the crystals, and the few 
 crystals that I shot previously did not produce any diffraction pattern 
 whatsoever.  However, I have had difficulty seeming them on a gel and they 
 are a bit tough to break.
Do they float or do they sink quickly when you try to mount them ?
 
 5.  I tried seeding previously as follows: I broke some crystals, made a seed 
 stock, dipped in a hair, and did serial streak seeding.  After seeding, I 
 usually saw small disks or clusters along the path of the hair but nothing 
 larger or better looking.
 
 I also had one more question.  Has anyone had an instance where changing the 
 precipitation condition or including an additive improved diffraction but did 
 not drastically change the shape of the protein?  If so, I may just try 
 further optimization with the current conditions and shoot some more crystals.
 

The additive screen from Hampton is not bad and can make a big difference.


A different topic is it a direct cryo what you are using as a condition ? If 
not what do you use a s a cryo ? Have you tried the old-fashioned way of 
shooting at crystals at room temperature using capillaries (WTHIT ?)

You might be killing your crystal by trying to cryo it is what I'm trying to 
say here.

Jürgen


 Thanks for all the helpful advice thus far,
 Matt

Re: [ccp4bb] diverging Rcryst and Rfree

2010-10-26 Thread Maia Cherney

I found a practical solution to a similar problem. When I get large  
gap between Rf/R in refmac I repeat the refinement in PHENIX using the  
same model and the same mtz file, It has always worked for me. And I  
have no theory for that observation, but the tables in publications  
looked better.


Maia


Quoting Ian Tickle ianj...@gmail.com:


Jackie

I agree completely with Ed (for once!), not only for the reasons he
gave, but also that it's valid to compare statistics such as
likelihood and R factors ONLY if only the model is varied.  Such a
comparison is not valid if the data used are varied (in this case you
are changing the data by deleting some of them).

Cheers

-- Ian

On Tue, Oct 26, 2010 at 2:37 PM, Ed Pozharski epozh...@umaryland.edu wrote:

Jackie,

please note that (at least imho) the desire to obtain better R-factors
does not justify excluding data from analysis.  Weak reflections that
you suggest should be rejected contain information, and excluding them
will indeed artificially lower the R-factors while reducing the accuracy
of your model.

Cheers,

Ed.

On Mon, 2010-10-25 at 17:44 -0400, Jacqueline Vitali wrote:

Also if your Rmerge is high and you include all reflections in
refinement, Rfree is high.  In my experience, by excluding F  sigma
reflections you drop Rfree a lot.




--
I'd jump in myself, if I weren't so good at whistling.
                              Julian, King of Lemurs

Re: [ccp4bb] Against Method (R)

2010-10-26 Thread James Holton

Yes, but what I think Frank is trying to point out is that the difference
between Fobs and Fcalc in any given PDB entry is generally about 4-5 times
larger than sigma(Fobs).  In such situations, pretty much any standard
statistical test will tell you that the model is highly unlikely to be
correct.

I am not saying that everything in the PDB is wrong, just that the
dominant source of error is a shortcoming of the models we use.  Whatever
this source of error is, it vastly overpowers the measurement error.  That
is, errors do not add linearly, but rather as squares, and 20%^2+5%^2 ~
20%^2 .

So, since the experimental error is only a minor contribution to the total
error, it is arguably inappropriate to use it as a weight for each hkl.

Yes, refinement does seem to work better when you use experimental sigmas,
and weighted statistics are probably better than no weights at all, but
the problem is that until we do have a model that can explain Fobs to within
experimental error, we will be severely limited in the kinds of conclusions
we can derive from our data.

-James Holton
MAD Scientist

On Tue, Oct 26, 2010 at 1:59 PM, Ethan Merritt merr...@u.washington.eduwrote:

 On Tuesday, October 26, 2010 01:16:58 pm Frank von Delft wrote:
 Um...
 
  * Given that the weighted Rfactor is weighted by the measurement errors
  (1/sig^2)
 
  * and given that the errors in our measurements apparently have no
  bearing whatsoever on the errors in our models (for macromolecular
  crystals, certainly - the R-vfactor gap)

 You are overlooking causality :-)

 Yes, the errors in state-of-the-art models are only weakly limited by the
 errors in our measurements.  But that is exactly _because_ we can now
 weight
 properly by the measurement errors (1/sig^2).  In my salad days,
 weighting by 1/sig^2 was a mug's game.   Refinement only produced
 a reasonable model if you applied empirical corrections rather than
 statistical weights.  Things have improved a bit since then,
 both on the equipment side (detectors, cryo, ...) and on the processing
 side (Maximum Likelihood, error propagation, ...).
 Now the sigmas actually mean something!

  is the weighted Rfactor even vaguely relevant for anything at all?

 Yes, it is.  It is the thing you are minimizing during refinement,
 at least to first approximation.  Also, as just mentioned, it is a
 well-defined value that you can do use for statistical significance
 tests.

Ethan


 
  phx.
 
 
 
  On 26/10/2010 20:44, Ian Tickle wrote:
   Indeed, see: http://scripts.iucr.org/cgi-bin/paper?a07175 .
  
   The Rfree/Rwork ratio that I referred to does strictly use the
   weighted ('Hamilton') R-factors, but because only the unweighted
   values are given in the PDB we were forced to approximate (against our
   better judgment!).
  
   The problem of course is that all refinement software AFAIK writes the
   unweighted Rwork  Rfree to the PDB header; there are no slots for the
   weighted values, which does indeed make doing serious statistics on
   the PDB entries difficult if not impossible!
  
   The unweighted crystallographic R-factor was only ever intended as a
   rule of thumb, i.e. to give a rough idea of the relative quality of
   related structures; I hardly think the crystallographers of yesteryear
   ever imagined that we would be taking it so seriously now!
  
   In particular IMO it should never be used for something as critical as
   validation (either global or local), or for guiding refinement
   strategy: use the likelihood instead.
  
   Cheers
  
   -- Ian
  
   PS I've always known it as an 'R-factor', e.g. see paper referenced
   above, but then during my crystallographic training I used extensively
   software developed by both authors of the paper (i.e. Geoff Ford  the
   late John Rollett) in Oxford (which eventually became the 'Crystals'
   small-molecule package).  Maybe it's a transatlantic thing ...
  
   Cheers
  
   -- Ian
  
   On Tue, Oct 26, 2010 at 7:28 PM, Ethan Merritt
 merr...@u.washington.edu  wrote:
   On Tuesday, October 26, 2010 09:46:46 am Bernhard Rupp (Hofkristallrat
 a.D.) wrote:
   Hi Folks,
  
   Please allow me a few biased reflections/opinions on the numeRology
 of the
   R-value (not R-factor, because it is neither a factor itself nor does
 it
   factor in anything but ill-posed reviewer's critique. Historically
 the term
   originated from small molecule crystallography, but it is only a
   'Residual-value')
  
   a) The R-value itself - based on the linear residuals and of apparent
   intuitive meaning - is statistically peculiar to say the least. I
 could not
   find it in any common statistics text. So doing proper statistics
 with R
   becomes difficult.
   As WC Hamilton pointed out originally, two [properly weighted] R
 factors can
   be compared by taking their ratio.  Significance levels can then be
 evaluated
   using the standard F distribution.  A concise summary is given in
 chapter 9
   of Prince's book, which I

Re: [ccp4bb] Against Method (R)

2010-10-26 Thread Jacob Keller

  - Original Message - 
  From: James Holton 
  To: CCP4BB@JISCMAIL.AC.UK 
  Sent: Tuesday, October 26, 2010 6:31 PM
  Subject: Re: [ccp4bb] Against Method (R)

  Yes, but what I think Frank is trying to point out is that the difference 
between Fobs and Fcalc in any given PDB entry is generally about 4-5 times 
larger than sigma(Fobs).  In such situations, pretty much any standard 
statistical test will tell you that the model is highly unlikely to be 
correct.

Wow, so what is the answer to this? Is that figure |Fcalc - Fobs| = 4-5x 
sigma really true? How, then, do we believe structures? Are there really good 
structures where this is discrepancy is not there, to stake our claim, so to 
speak?

Re: [ccp4bb] Help with Optimizing Crystals

2010-10-26 Thread James Holton

Janet Newman wrote a review on improving diffraction a few years back:
http://dx.doi.org/10.1107/S0907444905032130
it is open access.

Probably the most underappreciated aspect of diffraction is the purity of
the protien.  This is because impurities like slightly misfolded versions
of your native structure can genreally still find their way into the
lattice.  These are defects, and the distorition they create will push
thousands of other unit cells out of place.  So, 95%, or even 99% purity can
be not enough.

 Getting the molecule to the point where it is the brightest band on the gel
is generally enough to screen for crystallization conditions, but it is
not uncommon for crystals from un-clean protein to diffract poorly.

My advice: try adding a column to your purification protocol.  Or, better
yet, try fractional recrystallization.  This is where you use your
crystallization condition to crash out your entire stock, spin down the
crystals, redissolve them, and then maybe do this a few times in a row.
Yes, you loose a lot of material, but the stuff you lost is stuff that
doesn't crystallize anyway.

-James Holton
MAD Scientist

On Tue, Oct 26, 2010 at 2:31 PM, Matthew Bratkowski mab...@cornell.eduwrote:


 Hi.

 Here is some additional information.

 1.  The purification method that I used included Ni, tag cleavage, and SEC
 as a final step.  I have tried samples from three different purification
 batches that range in purity, and even the batch with the worst purity seems
 to produce crystals.

 2. The protein is a proteolyzed fragment since the full length version did
 not crystallize.  Mutagenesis and methylation, however, may be techniques to
 consider since the protein contains quite a few lysines.

 3. There are not any detergents in the buffer, so these are not detergent
 crystals.  The protein buffer just contains Tris at pH 8, NaCl, and DTT.

 4. Some experiments that I have done thus far seem to suggest that the
 crystals are protein.  Izit dye soaks well into the crystals, and the few
 crystals that I shot previously did not produce any diffraction pattern
 whatsoever.  However, I have had difficulty seeming them on a gel and they
 are a bit tough to break.

 5.  I tried seeding previously as follows: I broke some crystals, made a
 seed stock, dipped in a hair, and did serial streak seeding.  After seeding,
 I usually saw small disks or clusters along the path of the hair but nothing
 larger or better looking.

 I also had one more question.  Has anyone had an instance where changing
 the precipitation condition or including an additive improved diffraction
 but did not drastically change the shape of the protein?  If so, I may just
 try further optimization with the current conditions and shoot some more
 crystals.

 Thanks for all the helpful advice thus far,
 Matt

Re: [ccp4bb] Against Method (R)

2010-10-26 Thread Ethan Merritt

On Tuesday, October 26, 2010 04:31:24 pm James Holton wrote:
 Yes, but what I think Frank is trying to point out is that the difference
 between Fobs and Fcalc in any given PDB entry is generally about 4-5 times
 larger than sigma(Fobs).  In such situations, pretty much any standard
 statistical test will tell you that the model is highly unlikely to be
 correct.

But that's not the question we are normally asking.
It is highly unlikely that any model in biology is correct, if by correct 
you mean cannot be improved. Normally we ask the more modest question
have I improved my model today over what it was yesterday?.

 I am not saying that everything in the PDB is wrong, just that the
 dominant source of error is a shortcoming of the models we use.  Whatever
 this source of error is, it vastly overpowers the measurement error.  That
 is, errors do not add linearly, but rather as squares, and 20%^2+5%^2 ~
 20%^2 .
 
 So, since the experimental error is only a minor contribution to the total
 error, it is arguably inappropriate to use it as a weight for each hkl.

I think your logic has run off the track.  The experimental error is an
appropriate weight for the Fobs(hkl) because that is indeed the error
for that observation.  This is true independent of errors in the model.
If you improve the model, that does not magically change the accuracy
of the data.

Ethan


 
 Yes, refinement does seem to work better when you use experimental sigmas,
 and weighted statistics are probably better than no weights at all, but
 the problem is that until we do have a model that can explain Fobs to within
 experimental error, we will be severely limited in the kinds of conclusions
 we can derive from our data.
 
 -James Holton
 MAD Scientist
 
 On Tue, Oct 26, 2010 at 1:59 PM, Ethan Merritt 
 merr...@u.washington.eduwrote:
 
  On Tuesday, October 26, 2010 01:16:58 pm Frank von Delft wrote:
  Um...
  
   * Given that the weighted Rfactor is weighted by the measurement errors
   (1/sig^2)
  
   * and given that the errors in our measurements apparently have no
   bearing whatsoever on the errors in our models (for macromolecular
   crystals, certainly - the R-vfactor gap)
 
  You are overlooking causality :-)
 
  Yes, the errors in state-of-the-art models are only weakly limited by the
  errors in our measurements.  But that is exactly _because_ we can now
  weight
  properly by the measurement errors (1/sig^2).  In my salad days,
  weighting by 1/sig^2 was a mug's game.   Refinement only produced
  a reasonable model if you applied empirical corrections rather than
  statistical weights.  Things have improved a bit since then,
  both on the equipment side (detectors, cryo, ...) and on the processing
  side (Maximum Likelihood, error propagation, ...).
  Now the sigmas actually mean something!
 
   is the weighted Rfactor even vaguely relevant for anything at all?
 
  Yes, it is.  It is the thing you are minimizing during refinement,
  at least to first approximation.  Also, as just mentioned, it is a
  well-defined value that you can do use for statistical significance
  tests.
 
 Ethan
 
 
  
   phx.
  
  
  
   On 26/10/2010 20:44, Ian Tickle wrote:
Indeed, see: http://scripts.iucr.org/cgi-bin/paper?a07175 .
   
The Rfree/Rwork ratio that I referred to does strictly use the
weighted ('Hamilton') R-factors, but because only the unweighted
values are given in the PDB we were forced to approximate (against our
better judgment!).
   
The problem of course is that all refinement software AFAIK writes the
unweighted Rwork  Rfree to the PDB header; there are no slots for the
weighted values, which does indeed make doing serious statistics on
the PDB entries difficult if not impossible!
   
The unweighted crystallographic R-factor was only ever intended as a
rule of thumb, i.e. to give a rough idea of the relative quality of
related structures; I hardly think the crystallographers of yesteryear
ever imagined that we would be taking it so seriously now!
   
In particular IMO it should never be used for something as critical as
validation (either global or local), or for guiding refinement
strategy: use the likelihood instead.
   
Cheers
   
-- Ian
   
PS I've always known it as an 'R-factor', e.g. see paper referenced
above, but then during my crystallographic training I used extensively
software developed by both authors of the paper (i.e. Geoff Ford  the
late John Rollett) in Oxford (which eventually became the 'Crystals'
small-molecule package).  Maybe it's a transatlantic thing ...
   
Cheers
   
-- Ian
   
On Tue, Oct 26, 2010 at 7:28 PM, Ethan Merritt
  merr...@u.washington.edu  wrote:
On Tuesday, October 26, 2010 09:46:46 am Bernhard Rupp (Hofkristallrat
  a.D.) wrote:
Hi Folks,
   
Please allow me a few biased reflections/opinions on the numeRology
  of the
R-value (not R-factor, because

Re: [ccp4bb] Against Method (R)

2010-10-26 Thread James Holton

Some time ago, I computed the mean value of Rcryst(F) / Rmerge(F) across the
whole PDB.  This average was 4.5, and I take this as a rough estimate of
|Fcalc - Fobs| / sigma(Fobs).  More recently, I have been looking in more
detail at deposited data, but so far the few cases where this ratio is close
to 1 are all cases where sigma(Fobs) is unusually high!

I think the answer is that we can believe structures in the PDB to within
20% error.  This is close enough for a few things (such as government
work), but not for traditional statistics like confidence tests.  For me,
it is just really bothersome that we can measure structure factors to better
than 5% accuracy, but still don't know how to model them.

Ethan does make a good point that sig(Fobs) is the error in the measurement,
and that the model-data error is not the weight one should use in
refinement, etc.  However, when you are comparing one PDB entry (yours) to
others (published), I still don't think that sigma(Fobs) plays any
significant role.

-James Holton
MAD Scientist

On Tue, Oct 26, 2010 at 4:45 PM, Jacob Keller 
j-kell...@fsm.northwestern.edu wrote:

  - Original Message -
 *From:* James Holton jmhol...@lbl.gov
 *To:* CCP4BB@JISCMAIL.AC.UK
 *Sent:* Tuesday, October 26, 2010 6:31 PM
 *Subject:* Re: [ccp4bb] Against Method (R)

 Yes, but what I think Frank is trying to point out is that the difference
 between Fobs and Fcalc in any given PDB entry is generally about 4-5 times
 larger than sigma(Fobs).  In such situations, pretty much any standard
 statistical test will tell you that the model is highly unlikely to be
 correct.

 Wow, so what is the answer to this? Is that figure |Fcalc - Fobs| = 4-5x
 sigma really true? How, then, do we believe structures? Are there really
 good structures where this is discrepancy is not there, to stake our
 claim, so to speak?

[ccp4bb] Help with model bias in merihedral twin

2010-10-26 Thread Peter Chan


Hello All,

Not long ago I posted for some help with my twinned dataset at 1.95 A, and have 
confirmed the twinning of P6(5) into P6(5)22. Molecular replacement was 
successful and the twin refinement in Refmac yieled R/Rfree of 21%/26%, with a 
twin fraction of 0.46.

Although the electron density map looks good, I am not sure if I should have 
too much confidence in it because I was not able to obtain 'strong electron 
densities' from omitted sections of the model in a refinement. I don't know if 
this is an indicator for bias introduced somewhere.

I would like to ask what may be some procedures I can try for checking and 
removing these biases, and a few additional related questions.

As suggested to me previously, I have generated a total omit map with sfcheck 
in ccp4i, using the refined pdb and unrefined data in P6(5). The .map file look 
a little worse in quality (is this because of the twinning?) but is still 
reasonable, with a few breaks in the main chain and side chains. Interestingly, 
when I do a real space refinement against the total omit map, I get slightly 
better Rfree at the earlier rounds of Refmac which diverges into the numbers 
above. Why is this the case?

 Cycle   R fact R free
   0   0.2301   0.2523
   1   0.2205   0.2534
   2   0.2164   0.2545
   3   0.2140   0.2554
   4   0.2123   0.2559   
   5   0.2117   0.2570
   6   0.2116   0.2575
   7   0.2112   0.2582
   8   0.2112   0.2584
   9   0.2109   0.2587
  10   0.2106   0.2597

Secondly, I read that I should make sure the Free R flags should be consistent 
throughout the twin-related indices. What may be the adverse outcome if this 
isn't enforced? Is Refmac aware of this in a twin-refinement? If not, which 
tool could I use for this?

I would very much appreciate any comments and suggestions.

Best,
Peter Chan

Re: [ccp4bb] diverging Rcryst and Rfree [SEC=UNCLASSIFIED]

2010-10-26 Thread DUFF, Anthony

Hi Ian,

 

Yes, I guess my rule does work as you say.

 

If, starting the day from (Rwork = 20, Rfree = 30) abbreviate (20,30),
you do something to get (18,29), yes this means that that something was
a bare minimum acceptable thing to do.

 

If you do something to get (16,29) (decreased R by 4, Rfree by 1), then
I would immediately suspect that that thing that was done introduced
excessive over-fitting. 

 

If you do something to get (18,28) (decreased R by 2, Rfree by 2), then
I would say that the thing that was done was a good thing.  

 

Yes, other arbitrary linear combinations could work.  Not great analysis
of this method was performed.  I considered that it came to a question
of what degree of over-fitting is acceptable.  In practice, this rule
stopped endless additions of water molecules and further alternate
conformations, and for that purpose the precise point seemed
unimportant.  However, I also used this rule to determine preferred
parameters for BFAC and the matrix weight.

 

Do you think this is a bad rule, and can you point me to a better rule?

 

Replying to BR:

 This rule of thumb has proven successful in providing a defined end
point for building and refining a structure.

 

Hmmm. I always thought things like no more significant explainable
(difference) density define endpoints

 

in model building and not R-values. This strategy has proven successful
in nailing ligand structures where

 

R-value rules of thumb were used to define the end points.

 

Of course, there are other rules.  One has to explain all significant
residual density.  But this tends to be a finite task.

 

The above rule was not applicable to building active sites, or other
things that would be discussed directly in a paper.

 

The problem I attempt to address is endless fiddling with features of
every-diminishing importance.

 

 

 

Apologies if I have missed a recent relevant thread, but are lists of
rules of thumb for model building and refinement?

 

 

Anthony 

 

Anthony DuffTelephone: 02 9717 3493  Mob: 043 189 1076 

 

 

-Original Message-
From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of
Ian Tickle
Sent: Wednesday, 27 October 2010 12:53 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] diverging Rcryst and Rfree [SEC=UNCLASSIFIED]

 

Anthony,

 

Your rule actually works on the difference (Rfree - Rwork/2), not

(Rfree - Rwork) as you said, so is rather different from what most

people seem to be using.

 

For example let's say the current values are Rwork = 20, Rfree = 30,

so your current test value is (30 - 20/2) = 20.   Then according to

your rule Rwork = 18, Rfree = 29 is equally acceptable (29 - 18/2 =

20, i.e. same test value), whereas Rwork = 16, Rfree = 29 would not be

acceptable by your rule (29 - 16/2 = 21, so the test value is higher).

 Rwork = 18, Rfree = 28 would represent an improvement by your rule

(28 - 18/2 = 19, i.e. a lower test value).

 

You say this criterion provides a defined end-point, i.e. a minimum

in the test value above.  However wouldn't other linear combinations

of Rwork  Rfree also have a defined minimum value?  In particular

Rfree itself always has a defined minimum with respect to adding

parameters or changing the weights, so would also satisfy your

criterion.  There has to be some additional criterion that you are

relying on to select the particular linear combination (Rfree -

Rwork.2) over any of the other possible ones?

 

Cheers

 

-- Ian

 

On Tue, Oct 26, 2010 at 6:33 AM, DUFF, Anthony a...@ansto.gov.au wrote:

 

 

 One rule of thumb based on R and R-free divergence that I impress
onto

 crystallography students is this:

 

 

 

 If a change in refinement strategy or parameters (eg loosening
restraints,

 introducing TLS) or a round of addition of unimportant water molecules

 results in a reduction of R that is more than double the reduction in

 R-free, then don't do it.

 

 

 

 This rule of thumb has proven successful in providing a defined end
point

 for building and refining a structure.

 

 

 

 The rule works on the differential of R - R-free divergence.  I've
noticed

 that some structures begin with a bigger divergence than others.
Different

 Rmerge might explain.

 

 

 

 Has anyone else found a student in a dark room carefully adding large

 numbers of partially occupied water molecules?

 

 

 

 

 

 

 

 Anthony

 

 Anthony DuffTelephone: 02 9717 3493  Mob: 043 189 1076

 

 

 

 

 

 From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of
Artem

 Evdokimov

 Sent: Tuesday, 26 October 2010 1:45 PM

 To: CCP4BB@JISCMAIL.AC.UK

 Subject: Re: [ccp4bb] diverging Rcryst and Rfree

 

 

 

 Not that rules of thumb always have to have a rationale, nor that
they're

 always correct - but it would seem that noise in the data (of which
Rmerge

 is an indicator) should have a significant relationship with the
R:Rfree

[ccp4bb] Hardware question

2010-10-26 Thread Edward A. Berry


Another question about computer hardware- If I configure a computer at the
Dell site, it costs about $700 to add a 2TB SATA drive.
On amazon.com or Staples or such, a 2TB drive costs ~$110. to $200
depending on brand.

Are the Dell-installed drives much faster, or more reliable, or have
a better warranty?  After all, RAID is supposed to stand for redundant
array of inexpensive disks, and we could afford a lot more redundancy
at the Amazon.com price.

And, are there any brands or models that should be avoided due to known
reliability issues?

Thanks,
eab

Re: [ccp4bb] Hardware question

2010-10-26 Thread Jim Fairman

Don't get ripped off by Dell!  Their drives aren't any faster or better
quality than the competition (IMHO they're probably slower and/or lower
quality).  If you're looking for a 2 terabyte drive, I have seven Hitachi
7K2000 2 TB (http://www.newegg.com/Product/Product.aspx?Item=N82E16822145298)
drives
in a RAID6 array inside a Thecus 7700 NAS (
http://www.thecus.com/products_over.php?cid=11pid=82set_language=english)
for 10 terabytes of storage where 2 of the drives can simultaneously fail
and still retain all the data.I have had the drives installed for over a
year now and not a single problem.

On Tue, Oct 26, 2010 at 9:52 PM, Edward A. Berry ber...@upstate.edu wrote:

 Another question about computer hardware- If I configure a computer at the
 Dell site, it costs about $700 to add a 2TB SATA drive.
 On amazon.com or Staples or such, a 2TB drive costs ~$110. to $200
 depending on brand.

 Are the Dell-installed drives much faster, or more reliable, or have
 a better warranty?  After all, RAID is supposed to stand for redundant
 array of inexpensive disks, and we could afford a lot more redundancy
 at the Amazon.com price.

 And, are there any brands or models that should be avoided due to known
 reliability issues?

 Thanks,
 eab




-- 
Jim Fairman, Ph D.
Post-Doctoral Fellow
National Institutes of Health - NIDDK
Lab: 1-301-594-9229
E-mail: fairman@gmail.com james.fair...@nih.gov

Re: [ccp4bb] Hardware question

2010-10-26 Thread Ben Eisenbraun

On Tue, Oct 26, 2010 at 09:52:51PM -0400, Edward A. Berry wrote:
 Another question about computer hardware- If I configure a computer at
 the Dell site, it costs about $700 to add a 2TB SATA drive.  On
 amazon.com or Staples or such, a 2TB drive costs ~$110. to $200 depending
 on brand.
 
 Are the Dell-installed drives much faster

No.

 or more reliable

No.

 or have a better warranty?

No.  In fact they frequently have a worse warranty than the exact same
retail product with a non-Dell part number.

One of the ways that Dell keeps costs down is to negotiate a bulk deal
with the hard drive OEMs where they provide Dell the exact same drives they
sell in the retail channel, but with a shorter warranty, typically 1 year
instead of 3 or 5 years.

 After all, RAID is supposed to stand for redundant array of inexpensive
 disks, and we could afford a lot more redundancy at the Amazon.com price.

RAID is good for performance and uptime reasons, but it is _not_ a
replacement for backups.  You probably knew that, but I'll mention it for
the audience playing along at home.

 And, are there any brands or models that should be avoided due to known
 reliability issues?

Not really.  Seagate had some firmware issues with their first 1.5 TB
models, but they were worked out fairly quickly.  I think any of the major
vendors are going to be fairly competitve when it comes to reliability.

The important thing is to look at the drive warranty.  The lower-end drives
will have 3 year or shorter warranties, and the higher-end drives will have
5 year warranties.  Buy a model with a 5 year warranty.

-ben

--
| Ben Eisenbraun  | Software Sysadmin  |
| Structural Biology Grid | http://sbgrid.org  |
| Harvard Medical School  | http://hms.harvard.edu |

Re: [ccp4bb] Hardware question

2010-10-26 Thread Jürgen Bosch

Hi Ed,

I have four of those 
http://www.newegg.com/Product/Product.aspx?Item=N82E16822136514
and would now buy these
http://www.newegg.com/Product/Product.aspx?Item=N82E16822136764

DELLete it, I mean the quote you have and shop somewhere else.

Jürgen
-
Jürgen Bosch
Johns Hopkins Bloomberg School of Public Health
Department of Biochemistry  Molecular Biology
Johns Hopkins Malaria Research Institute
615 North Wolfe Street, W8708
Baltimore, MD 21205
Phone: +1-410-614-4742
Lab:  +1-410-614-4894
Fax:  +1-410-955-3655
http://web.mac.com/bosch_lab/

On Oct 26, 2010, at 9:52 PM, Edward A. Berry wrote:

 Another question about computer hardware- If I configure a computer at the
 Dell site, it costs about $700 to add a 2TB SATA drive.
 On amazon.com or Staples or such, a 2TB drive costs ~$110. to $200
 depending on brand.
 
 Are the Dell-installed drives much faster, or more reliable, or have
 a better warranty?  After all, RAID is supposed to stand for redundant
 array of inexpensive disks, and we could afford a lot more redundancy
 at the Amazon.com price.
 
 And, are there any brands or models that should be avoided due to known
 reliability issues?
 
 Thanks,
 eab

[ccp4bb] Rules of thumb (was diverging Rcryst and Rfree)

2010-10-26 Thread Robbie Joosten

Dear Anthony,

That is an excellent question! I believe there are quite a lot of 'rules of 
thumb' going around. Some of them seem to lead to very dogmatic thinking and 
have caused (refereeing) trouble for good structures and lack of trouble for 
bad structures. A lot of them were discussed at the CCP4BB so it may be nice to 
try to list them all.
 
 
Rule 1: If Rwork  20%, you are done.
Rule 2: If R-free - Rwork  5%, your structure is wrong.
Rule 3: At resolution X, the bond length rmsd should be  than Y (What is the 
rmsd thing people keep talking about?)
Rule 4: If your resolution is lower than X, you should not 
use_anisotropic_Bs/riding_hydrogens
Rule 5: You should not build waters/alternates at resolutions lower than X
Rule 6: You should do the final refinement with ALL reflections
Rule 7: No one cares about getting the carbohydrates right  
 
 
Obviously, this list is not complete. I may also have overstated some of the 
rules to get the discussion going. Any addidtions are welcome.
 
Cheers,
Robbie Joosten
Netherlands Cancer Institute
 
 Apologies if I have missed a recent relevant thread, but are lists of
 rules of thumb for model building and refinement?





 Anthony



 Anthony Duff Telephone: 02 9717 3493 Mob: 043 189 1076

Re: [ccp4bb] Rules of thumb (was diverging Rcryst and Rfree) [SEC=UNCLASSIFIED]

2010-10-26 Thread DUFF, Anthony

Dear Robbie,


Rules 3-5 I found could be approached using my previous rule of thumb.  If 
anisotropy reduced Rfree by more than half the reduction in R, then I liked it. 
 It helped me decide to introduce anisotropy for xenon, iodine and chlorine 
atoms (supported by non-spherical omit electron density) but not for light 
atoms.

My rule told me to always add riding hydrogens, they typically reduced R and 
Rfree similarly.



Anthony 

Anthony Duff    Telephone: 02 9717 3493  Mob: 043 189 1076 


-Original Message-
From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Robbie 
Joosten
Sent: Wednesday, 27 October 2010 4:29 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] Rules of thumb (was diverging Rcryst and Rfree)

Dear Anthony,

That is an excellent question! I believe there are quite a lot of 'rules of 
thumb' going around. Some of them seem to lead to very dogmatic thinking and 
have caused (refereeing) trouble for good structures and lack of trouble for 
bad structures. A lot of them were discussed at the CCP4BB so it may be nice to 
try to list them all.
 
 
Rule 1: If Rwork  20%, you are done.
Rule 2: If R-free - Rwork  5%, your structure is wrong.
Rule 3: At resolution X, the bond length rmsd should be  than Y (What is the 
rmsd thing people keep talking about?)
Rule 4: If your resolution is lower than X, you should not 
use_anisotropic_Bs/riding_hydrogens
Rule 5: You should not build waters/alternates at resolutions lower than X
Rule 6: You should do the final refinement with ALL reflections
Rule 7: No one cares about getting the carbohydrates right  
 
 
Obviously, this list is not complete. I may also have overstated some of the 
rules to get the discussion going. Any addidtions are welcome.
 
Cheers,
Robbie Joosten
Netherlands Cancer Institute
 
 Apologies if I have missed a recent relevant thread, but are lists of
 rules of thumb for model building and refinement?





 Anthony



 Anthony Duff Telephone: 02 9717 3493 Mob: 043 189 1076

42 matches

Mail list logo