Re: [ccp4bb] diverging Rcryst and Rfree [SEC=UNCLASSIFIED]
This rule of thumb has proven successful in providing a defined end point for building and refining a structure. Hmmm. I always thought things like no more significant explainable (difference) density define endpoints in model building and not R-values. This strategy has proven successful in nailing ligand structures where R-value rules of thumb were used to define the end points. Cheers, BR
Re: [ccp4bb] diverging Rcryst and Rfree
Dear Rakesh, dear Artem, Since the initial question is not precise (which kind of comments are expected?) I may mention that the most frequent values of R, Rfree and DeltaR (that is asked about) are given in our work published in 2009 in Acta Cryst., D65, 1283-1291. Interestingly, they are practically linear functions of log(resolution). The plots show also the statistics of deviation from these lines. Best regards, Sacha Urzhumtsev Universities of Strasbourg Nancy De : CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] De la part de Artem Evdokimov Envoyé : mardi 26 octobre 2010 03:36 À : CCP4BB@JISCMAIL.AC.UK Objet : Re: [ccp4bb] diverging Rcryst and Rfree http://www.mail-archive.com/ccp4bb@jiscmail.ac.uk/msg04677.html as well as some notes in the older posts :) As a very basic rule of thumb, Rfree-Rwork tends to be around Rmerge for the dataset for refinements that are not overfitted. Artem On Mon, Oct 25, 2010 at 4:10 PM, Rakesh Joshi rjo...@purdue.edumailto:rjo...@purdue.edu wrote: Hi all, Can anyone comment, in general, on diverging Rcryst and Rfree values(say7%) for structures with kind of low resolutions(2.5-2.9 angstroms)? Thanks RJ
Re: [ccp4bb] Babinet solvent correction / comment for imperfect models
Dear all, Sorry to come back late, when the discussion is over, with one more remark relevant to the bulk solvent modeling. I've just got a comment by a colleague of mine, Adam Ben-Shem, who kindly agreed that I post his message (below) to CCP4bb. I think he is completely right saying that the models we discussed were relevant to the well-solved structures and practically complete atomic models. In the process of structure solution the situation may be different, as he saw in his own practice (in particular when solving 80S ribosome). With best regards, Sacha Urzhumtsev Universities of Strasbourg Nancy == message by Adam Ben-Shem I think the discussion of bulk solvent correction should be divided into three parts. Part one - bulk solvent correction for the final model. In this case, the physical meaning of the mask model is clear and this is obviously the right way to apply the correction. Part two - bulk solvent correction of partial models. These can be models with large flexible domains or models coming from bad maps where building is a very iterative process. In these cases the physical meaning of the mask model that Pavel is so worried about vanishes. In some extreme cases I can imagine that Babinet bulk solvent correction would be better than the mask model. As I told you before I suggest a better solution for these cases and that is to calculate the mask for the mask model using density modification. Part three - density modification following refinement. From my own experience, for very partial models, phases and FOMs for this procedure should come from model alone (without bulk solvent) and let density modification define bulk solvent for itself. Bulk solvent correction is still important for the refinement process to produce the best model but then the input to density modification process should not include bulk solvent correction (and in very-very partial models FOMs should be calculated by old SIGMAA program over all reflections and not by refinement program using R-free reflections alone). Adam
Re: [ccp4bb] diverging Rcryst and Rfree
Rakesh Looking at the http://www.pdbe.org/statistics Structure Statistics Then for all structures a we see that Rdiff of 0.7 is not that uncommon with this about 1 sigma away from the mean value of 0.4 for all structutures and 0.45 for your resolution range For structures with Rdiff range of 0.7-0.8 and resolution 2.7 - 2.9 we see that there 212 structures. If I edit the query in PDBeDatabase to your exact requirement ranges then there are 1353 example structures out of 53616 examples where this data exists. My comment is that this is little worse than average, but not particularly a problem. Tom Hi all, Can anyone comment, in general, on diverging Rcryst and Rfree values(say7%) for structures with kind of low resolutions(2.5-2.9 angstroms)? Thanks RJ
Re: [ccp4bb] diverging Rcryst and Rfree
I'm not sure that a lttle worse is the appropriate description - I think all you can say is that it deviates from the average when only resolution is used to define what is average. My point is that a number of other factors are known to be involved and so you can't say that this deviation is worse until you have taken them all into account. Specifically, it's known that the expected value of the ratio Rfree/Rwork (i.e. expected on the basis of the null hypothesis that the structure is correct and complete and the only errors are random experimental errors) is directly related to the ratio (no of significant experimental observations) / (effective no of refined parameters), where effective here means taking into account the restraints. The number of significant observations will clearly depend on a number of factors, not only resolution, but also solvent content (which you obviously can't control unless you use a different crystal form), and data completeness (which you can control up to a point by optimising the data collection strategy). The effective no of parameters obviously depends only on the parameter/restraint model and the weights, both of which you have full control of, and therefore the effective no of parameters should be completely determined if the parameter/restraint model that has been selected is optimal. Finally, in order to compute Rfree/Rwork from Rdiff = Rfree-Rwork, Rwork itself must be specified, i.e.: Rfree/Rwork = (Rfree-Rwork) / Rwork - 1 = Rdiff/Rwork - 1 It's actually much easier to work with Rfree/Rwork instead of Rdiff, because then you don't need to specify a particular value of Rwork, and you have one less variable to worry about in the factor analysis. So assuming the model is optimal, the major factors in addition to resolution which control the expected value of Rdiff are the solvent content, the data completeness and Rwork. The value of Rwork obtained for an optimal model on convergence is obviously related to the data quality (e.g. mean I/sig(I)), and of course the resolution. The bottom line is that unless we are given a lot more information it's not possible to say whether a specific value of Rdiff deviates significantly from the expected value. Cheers -- Ian Then for all structures a we see that Rdiff of 0.7 is not that uncommon with this about 1 sigma away from the mean value of 0.4 for all structutures and 0.45 for your resolution range For structures with Rdiff range of 0.7-0.8 and resolution 2.7 - 2.9 we see that there 212 structures. If I edit the query in PDBeDatabase to your exact requirement ranges then there are 1353 example structures out of 53616 examples where this data exists. My comment is that this is little worse than average, but not particularly a problem. Tom Hi all, Can anyone comment, in general, on diverging Rcryst and Rfree values(say7%) for structures with kind of low resolutions(2.5-2.9 angstroms)? Thanks RJ
Re: [ccp4bb] diverging Rcryst and Rfree
I would expect such a difference with lowish resolution data. Your model will be biased towards the restraints - ie the geometry willbe good, but there is bearly enough observations to fit the actualmodel properly. eg - it will be hard to position solvent, and to recognise any deviaions from NCS. So dont be too surprised or worried.. Look at the maps - if they look clean then things are probably OK Eleanor On 10/25/2010 10:44 PM, Jacqueline Vitali wrote: Hi, I have seen this happening when I had NCS and did not include it in refinement. Rwork drops and Rfree increases. In this case the difference became small when I included the NCS. Also if your Rmerge is high and you include all reflections in refinement, Rfree is high. In my experience, by excluding F sigma reflections you drop Rfree a lot. My limited experience suggests errors in the data and/.or in the way you handle the data. Jackie Vitali On Mon, Oct 25, 2010 at 5:10 PM, Rakesh Joshirjo...@purdue.edu wrote: Hi all, Can anyone comment, in general, on diverging Rcryst and Rfree values(say7%) for structures with kind of low resolutions(2.5-2.9 angstroms)? Thanks RJ
Re: [ccp4bb] rigorously compatible coordinate files
On 08/20/2010 05:50 PM, Charles W. Carter, Jr wrote: Is there a program that will read in a pdb coordinate file and re-order the side chain atoms in each residue according to a standard order? I've a program that compares two files for the same structure, but requires that the order of the atoms be the same in both cases. I'm using a variety of files in which the residue atoms are ordered either main chain first or side-chain first. I've not found a suitable program in the CCP4 suite, though one might exist. MOLEMAN2 doesn't seem suitable, either. Thanks, Charlie Very old Q but PROCHECK does this I think.. Eleanor
Re: [ccp4bb] diverging Rcryst and Rfree
Jackie, please note that (at least imho) the desire to obtain better R-factors does not justify excluding data from analysis. Weak reflections that you suggest should be rejected contain information, and excluding them will indeed artificially lower the R-factors while reducing the accuracy of your model. Cheers, Ed. On Mon, 2010-10-25 at 17:44 -0400, Jacqueline Vitali wrote: Also if your Rmerge is high and you include all reflections in refinement, Rfree is high. In my experience, by excluding F sigma reflections you drop Rfree a lot. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] diverging Rcryst and Rfree [SEC=UNCLASSIFIED]
Anthony, Your rule actually works on the difference (Rfree - Rwork/2), not (Rfree - Rwork) as you said, so is rather different from what most people seem to be using. For example let's say the current values are Rwork = 20, Rfree = 30, so your current test value is (30 - 20/2) = 20. Then according to your rule Rwork = 18, Rfree = 29 is equally acceptable (29 - 18/2 = 20, i.e. same test value), whereas Rwork = 16, Rfree = 29 would not be acceptable by your rule (29 - 16/2 = 21, so the test value is higher). Rwork = 18, Rfree = 28 would represent an improvement by your rule (28 - 18/2 = 19, i.e. a lower test value). You say this criterion provides a defined end-point, i.e. a minimum in the test value above. However wouldn't other linear combinations of Rwork Rfree also have a defined minimum value? In particular Rfree itself always has a defined minimum with respect to adding parameters or changing the weights, so would also satisfy your criterion. There has to be some additional criterion that you are relying on to select the particular linear combination (Rfree - Rwork.2) over any of the other possible ones? Cheers -- Ian On Tue, Oct 26, 2010 at 6:33 AM, DUFF, Anthony a...@ansto.gov.au wrote: One “rule of thumb” based on R and R-free divergence that I impress onto crystallography students is this: If a change in refinement strategy or parameters (eg loosening restraints, introducing TLS) or a round of addition of unimportant water molecules results in a reduction of R that is more than double the reduction in R-free, then don’t do it. This rule of thumb has proven successful in providing a defined end point for building and refining a structure. The rule works on the differential of R – R-free divergence. I’ve noticed that some structures begin with a bigger divergence than others. Different Rmerge might explain. Has anyone else found a student in a dark room carefully adding large numbers of partially occupied water molecules? Anthony Anthony Duff Telephone: 02 9717 3493 Mob: 043 189 1076 From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Artem Evdokimov Sent: Tuesday, 26 October 2010 1:45 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] diverging Rcryst and Rfree Not that rules of thumb always have to have a rationale, nor that they're always correct - but it would seem that noise in the data (of which Rmerge is an indicator) should have a significant relationship with the R:Rfree difference, since Rfree is not (should not be, if selected correctly) subject to noise fitting. This rule is easily broken if one refines against very noisy data (e.g. that last shell with Rmerge of 55% and I/sigmaI ratio of 1.3 is still good, right?) or if the structure is overfit. The rule is only an indicative one (i.e. one should get really worried if R-Rfree looks very different from Rmerge) and it breaks down at very high and very low resolution (more complete picture by GK and shown in BR's book). Since selection of data and refinement procedures is subject to the decisions of the practitioner, I suspect that the extreme divergence shown in the figures that you refer to is probably the result of our own collective decisions. I have no proof, but I suspect that if a large enough section of the PDB were to be re-refined using the same methods and the same data trimming practices, the spread would be considerably more narrow. That'd be somewhat hard to do - but may be doable now given the abundance of auto-building and auto-correcting algorithms. Artem On Mon, Oct 25, 2010 at 9:07 PM, Bernhard Rupp (Hofkristallrat a.D.) hofkristall...@gmail.com wrote: And the rationale for that rule being exactly what? For stats, see figures 12-23, 12-24 http://www.ruppweb.org/garland/gallery/Ch12/index_2.htm br From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Artem Evdokimov Sent: Monday, October 25, 2010 6:36 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] diverging Rcryst and Rfree http://www.mail-archive.com/ccp4bb@jiscmail.ac.uk/msg04677.html as well as some notes in the older posts :) As a very basic rule of thumb, Rfree-Rwork tends to be around Rmerge for the dataset for refinements that are not overfitted. Artem On Mon, Oct 25, 2010 at 4:10 PM, Rakesh Joshi rjo...@purdue.edu wrote: Hi all, Can anyone comment, in general, on diverging Rcryst and Rfree values(say7%) for structures with kind of low resolutions(2.5-2.9 angstroms)? Thanks RJ
Re: [ccp4bb] diverging Rcryst and Rfree
Jackie I agree completely with Ed (for once!), not only for the reasons he gave, but also that it's valid to compare statistics such as likelihood and R factors ONLY if only the model is varied. Such a comparison is not valid if the data used are varied (in this case you are changing the data by deleting some of them). Cheers -- Ian On Tue, Oct 26, 2010 at 2:37 PM, Ed Pozharski epozh...@umaryland.edu wrote: Jackie, please note that (at least imho) the desire to obtain better R-factors does not justify excluding data from analysis. Weak reflections that you suggest should be rejected contain information, and excluding them will indeed artificially lower the R-factors while reducing the accuracy of your model. Cheers, Ed. On Mon, 2010-10-25 at 17:44 -0400, Jacqueline Vitali wrote: Also if your Rmerge is high and you include all reflections in refinement, Rfree is high. In my experience, by excluding F sigma reflections you drop Rfree a lot. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] Fill map/mask with dummy atoms?
Apologies for not seen the original post, but $warpbin/arp_warp EOF MODE MIRBUILD FILES CCP4 MAPFIND 2fofc.map XYZOUT1 dummy.pdb SYMM your_sym CELL your_cell RESOLUTION resol MIRBUILD ATOMS atoms_in_protein MODELS 1 RESN DUM END EOF will do it. I recommend getting a map in a 0.3 A grid. A. and fill up the keywords On Oct 25, 2010, at 21:16, Pavel Afonine wrote: Hi Dirk, may be too late... but (may be) better later than never -:) Here is the working example of how you can do it. Note, the procedure just builds the dummy atoms in spheres with user-defined centers and radia. You can specify as many spheres as you wish. Dummy atoms clashing with model atoms or other dummy atoms will not be added. The procedure doesn't care about map or data (Fobs or whatever): it just geometrically adds dummy atoms where requested. Also note, it uses a PHENIX command line tool that is not specifically designed for this task but simply can do it with appropriate set of parameters. Ok, that was a preambula -:) Now let's do it: here is where all the example-files: /net/cci/afonine/public_html/for_Dirk The command phenix.grow_density params will creates this file with dummy atoms: dummies_DA.pdb which in PyMol looks like this: http://cci.lbl.gov/~afonine/for_Dirk/da_only.png or superposed with the model: http://cci.lbl.gov/~afonine/for_Dirk/da_plus_model.png Note, the above command requires the data file (remember, this command is meant for something else), but if you have just a PDB file (it can be empty I guess), and don't have any data file, you can fake it just to run this command. To get fake Fobs: phenix.fmodel model.pdb high_res=3 type=real r_free=0.1 label='F-obs' mv model.pdb.mtz data.mtz I guess this is it. Let me know if I can be of any help with this. Pavel. On 10/13/10 4:00 AM, Dirk Kostrewa wrote: Dear CCP4ers, is there a program around that allows to fill an input map or mask with dummy atoms? Best regards, Dirk. -- *** Dirk Kostrewa Gene Center Munich, A5.07 Department of Biochemistry Ludwig-Maximilians-Universität München Feodor-Lynen-Str. 25 D-81377 Munich Germany Phone: +49-89-2180-76845 Fax: +49-89-2180-76999 E-mail:kostr...@genzentrum.lmu.de WWW:www.genzentrum.lmu.de *** P please don't print this e-mail unless you really need to Anastassis (Tassos) Perrakis, Principal Investigator / Staff Member Department of Biochemistry (B8) Netherlands Cancer Institute, Dept. B8, 1066 CX Amsterdam, The Netherlands Tel: +31 20 512 1951 Fax: +31 20 512 1954 Mobile / SMS: +31 6 28 597791
Re: [ccp4bb] diverging Rcryst and Rfree
b) very large Rmerge values: Rmerge Rwork Rfree Rfree-Rwork Resolution - 0.9990 0.1815 0.20860.0271 1.80 SG center, unpublished 0.8700 0.1708 0.22700.0562 1.96 unpublished 0.7700 0.1870 0.22970.0428 1.56 0.7600 0.2380 0.26800.0300 2.50 SG center, unpublished 0.7000 0.1700 0.22530.0553 1.71 0.6400 0.2179 0.27150.0536 2.75 SG center, unpublished The most disturbing to me is that of those with very large overall Rmerge values, 3 come from structural genomics centers. Is that less or more disturbing than that the other 50% come from not-SG centers? Of course, the authors themselves may be willing to help correct the obvious typos -- which will presumably disappear forever once we can finally upload log files upon deposition (coming soon, I'm told). On an unrelated note, it's reassuring to see sound statistical principles -- averages, large N, avoidance of small number-anecdotes, and such rot -- continue not to be abandoned in the politics of science funding, he said airily. phx
Re: [ccp4bb] diverging Rcryst and Rfree
Yes! - the critical piece of information that we're missing is the proportion of *all* structures that come from SG centres. Only knowing that can we do any serious statistics ... -- Ian On Tue, Oct 26, 2010 at 5:07 PM, Frank von Delft frank.vonde...@sgc.ox.ac.uk wrote: b) very large Rmerge values: Rmerge Rwork Rfree Rfree-Rwork Resolution - 0.9990 0.1815 0.2086 0.0271 1.80 SG center, unpublished 0.8700 0.1708 0.2270 0.0562 1.96 unpublished 0.7700 0.1870 0.2297 0.0428 1.56 0.7600 0.2380 0.2680 0.0300 2.50 SG center, unpublished 0.7000 0.1700 0.2253 0.0553 1.71 0.6400 0.2179 0.2715 0.0536 2.75 SG center, unpublished The most disturbing to me is that of those with very large overall Rmerge values, 3 come from structural genomics centers. Is that less or more disturbing than that the other 50% come from not-SG centers? Of course, the authors themselves may be willing to help correct the obvious typos -- which will presumably disappear forever once we can finally upload log files upon deposition (coming soon, I'm told). On an unrelated note, it's reassuring to see sound statistical principles -- averages, large N, avoidance of small number-anecdotes, and such rot -- continue not to be abandoned in the politics of science funding, he said airily. phx
[ccp4bb] Against Method (R)
Hi Folks, Please allow me a few biased reflections/opinions on the numeRology of the R-value (not R-factor, because it is neither a factor itself nor does it factor in anything but ill-posed reviewer's critique. Historically the term originated from small molecule crystallography, but it is only a 'Residual-value') a) The R-value itself - based on the linear residuals and of apparent intuitive meaning - is statistically peculiar to say the least. I could not find it in any common statistics text. So doing proper statistics with R becomes difficult. b) rules of thumb (as much as they conveniently obviate the need for detailed explanations, satisfy student's desire for quick answers, and allow superficial review of manuscripts) become less valuable if they have a case-dependent large variance, topped with an unknown parent distribution. Combined with an odd statistic, that has great potential for misguidance and unnecessarily lost sleep. c) Ian has (once again) explained that for example the Rf-R depends on the exact knowledge of the restraints and their individual weighting, which we generally do not have. Caution is advised. d) The answer which model is better - which is actually what you want to know - becomes a question of model selection or hypothesis testing, which, given the obscurity of R cannot be derived with some nice plug-in method. As Ian said the models to be compared must also be based on the same and identical data. e) One measure available that is statistically at least defensible is the log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes factor (there is the darn factor again, its a ratio)) and see where this falls - and the answers are pretty soft and, probably because of that, correspondingly realistic. This also makes - based on statistics alone - deciding between different overall parameterizations difficult. http://en.wikipedia.org/wiki/Bayes_factor f) so having said that, what really remains is that the model that fits the primary evidence (minimally biased electron density) best and is at the same time physically meaningful, is the best model, i. e., all plausibly accountable electron density (and not more) is modeled. You can convince yourself of this by taking the most interesting part of the model out (say a ligand or a binding pocket) and look at the R-values or do a model selection test - the result will be indecisive. Poof goes the global rule of thumb. g) in other words: global measures in general are entirely inadequate to judge local model quality (noted many times over already by Jones, Kleywegt, others, in the dark ages of crystallography when poorly restrained crystallographers used to passionately whack each other over the head with unfree R-values). Best, BR - Bernhard Rupp, Hofkristallrat a.D. 001 (925) 209-7429 +43 (676) 571-0536 b...@ruppweb.org hofkristall...@gmail.com http://www.ruppweb.org/ -- Und wieder ein chillout-mix aus der Hofkristall-lounge --
Re: [ccp4bb] Against Method (R)
Another issue with these statistics is that the PDB insists on a single value of resolution no matter how anisotropic the data. Especially in the outermost bins, Rmerge could be ridiculously high simply because the data only exist in one out of 3 directions. Phoebe = Phoebe A. Rice Dept. of Biochemistry Molecular Biology The University of Chicago phone 773 834 1723 http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123 http://www.rsc.org/shop/books/2008/9780854042722.asp Original message Date: Tue, 26 Oct 2010 09:46:46 -0700 From: CCP4 bulletin board CCP4BB@JISCMAIL.AC.UK (on behalf of Bernhard Rupp (Hofkristallrat a.D.) hofkristall...@gmail.com) Subject: [ccp4bb] Against Method (R) To: CCP4BB@JISCMAIL.AC.UK Hi Folks, Please allow me a few biased reflections/opinions on the numeRology of the R-value (not R-factor, because it is neither a factor itself nor does it factor in anything but ill-posed reviewer's critique. Historically the term originated from small molecule crystallography, but it is only a 'Residual-value') a) The R-value itself - based on the linear residuals and of apparent intuitive meaning - is statistically peculiar to say the least. I could not find it in any common statistics text. So doing proper statistics with R becomes difficult. b) rules of thumb (as much as they conveniently obviate the need for detailed explanations, satisfy student's desire for quick answers, and allow superficial review of manuscripts) become less valuable if they have a case-dependent large variance, topped with an unknown parent distribution. Combined with an odd statistic, that has great potential for misguidance and unnecessarily lost sleep. c) Ian has (once again) explained that for example the Rf-R depends on the exact knowledge of the restraints and their individual weighting, which we generally do not have. Caution is advised. d) The answer which model is better - which is actually what you want to know - becomes a question of model selection or hypothesis testing, which, given the obscurity of R cannot be derived with some nice plug-in method. As Ian said the models to be compared must also be based on the same and identical data. e) One measure available that is statistically at least defensible is the log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes factor (there is the darn factor again, it’s a ratio)) and see where this falls - and the answers are pretty soft and, probably because of that, correspondingly realistic. This also makes - based on statistics alone - deciding between different overall parameterizations difficult. http://en.wikipedia.org/wiki/Bayes_factor f) so having said that, what really remains is that the model that fits the primary evidence (minimally biased electron density) best and is at the same time physically meaningful, is the best model, i. e., all plausibly accountable electron density (and not more) is modeled. You can convince yourself of this by taking the most interesting part of the model out (say a ligand or a binding pocket) and look at the R-values or do a model selection test - the result will be indecisive. Poof goes the global rule of thumb. g) in other words: global measures in general are entirely inadequate to judge local model quality (noted many times over already by Jones, Kleywegt, others, in the dark ages of crystallography when poorly restrained crystallographers used to passionately whack each other over the head with unfree R-values). Best, BR - Bernhard Rupp, Hofkristallrat a.D. 001 (925) 209-7429 +43 (676) 571-0536 b...@ruppweb.org hofkristall...@gmail.com http://www.ruppweb.org/ -- Und wieder ein chillout-mix aus der Hofkristall-lounge --
Re: [ccp4bb] Against Method (R)
On Tuesday, October 26, 2010 09:46:46 am Bernhard Rupp (Hofkristallrat a.D.) wrote: Hi Folks, Please allow me a few biased reflections/opinions on the numeRology of the R-value (not R-factor, because it is neither a factor itself nor does it factor in anything but ill-posed reviewer's critique. Historically the term originated from small molecule crystallography, but it is only a 'Residual-value') a) The R-value itself - based on the linear residuals and of apparent intuitive meaning - is statistically peculiar to say the least. I could not find it in any common statistics text. So doing proper statistics with R becomes difficult. As WC Hamilton pointed out originally, two [properly weighted] R factors can be compared by taking their ratio. Significance levels can then be evaluated using the standard F distribution. A concise summary is given in chapter 9 of Prince's book, which I highly recommend to all crystallographers. W C Hamilton Significance tests on the crystallographic R factor Acta Cryst. (1965). 18, 502-510 Edward Prince Mathematical Techniques in Crystallography and Materials Science. Springer-Verlag, 1982. It is true that we normally indulge in the sloppy habit of paying attention only to the unweighted R factor even though refinement programs report both the weighted and unweighted versions. (shelx users excepted :-) But the weighted form is there also if you want to do statistical tests. You are of course correct that this remains a global test, and as such is of limited use in evaluating local properties of the model. cheers, Ethan b) rules of thumb (as much as they conveniently obviate the need for detailed explanations, satisfy student's desire for quick answers, and allow superficial review of manuscripts) become less valuable if they have a case-dependent large variance, topped with an unknown parent distribution. Combined with an odd statistic, that has great potential for misguidance and unnecessarily lost sleep. c) Ian has (once again) explained that for example the Rf-R depends on the exact knowledge of the restraints and their individual weighting, which we generally do not have. Caution is advised. d) The answer which model is better - which is actually what you want to know - becomes a question of model selection or hypothesis testing, which, given the obscurity of R cannot be derived with some nice plug-in method. As Ian said the models to be compared must also be based on the same and identical data. e) One measure available that is statistically at least defensible is the log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes factor (there is the darn factor again, its a ratio)) and see where this falls - and the answers are pretty soft and, probably because of that, correspondingly realistic. This also makes - based on statistics alone - deciding between different overall parameterizations difficult. http://en.wikipedia.org/wiki/Bayes_factor f) so having said that, what really remains is that the model that fits the primary evidence (minimally biased electron density) best and is at the same time physically meaningful, is the best model, i. e., all plausibly accountable electron density (and not more) is modeled. You can convince yourself of this by taking the most interesting part of the model out (say a ligand or a binding pocket) and look at the R-values or do a model selection test - the result will be indecisive. Poof goes the global rule of thumb. g) in other words: global measures in general are entirely inadequate to judge local model quality (noted many times over already by Jones, Kleywegt, others, in the dark ages of crystallography when poorly restrained crystallographers used to passionately whack each other over the head with unfree R-values). Best, BR - Bernhard Rupp, Hofkristallrat a.D. 001 (925) 209-7429 +43 (676) 571-0536 b...@ruppweb.org hofkristall...@gmail.com http://www.ruppweb.org/ -- Und wieder ein chillout-mix aus der Hofkristall-lounge -- -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742
Re: [ccp4bb] Against Method (R)
Dear all, Augustine, Confessions, Book 11 Chap. XIV, has it: If no one ask of me, I know; if I wish to explain to him who asks, I know not. With best wishes, Gerard. -- On Tue, Oct 26, 2010 at 01:30:11PM -0500, Phoebe Rice wrote: Another issue with these statistics is that the PDB insists on a single value of resolution no matter how anisotropic the data. Especially in the outermost bins, Rmerge could be ridiculously high simply because the data only exist in one out of 3 directions. Phoebe = Phoebe A. Rice Dept. of Biochemistry Molecular Biology The University of Chicago phone 773 834 1723 http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123 http://www.rsc.org/shop/books/2008/9780854042722.asp Original message Date: Tue, 26 Oct 2010 09:46:46 -0700 From: CCP4 bulletin board CCP4BB@JISCMAIL.AC.UK (on behalf of Bernhard Rupp (Hofkristallrat a.D.) hofkristall...@gmail.com) Subject: [ccp4bb] Against Method (R) To: CCP4BB@JISCMAIL.AC.UK Hi Folks, Please allow me a few biased reflections/opinions on the numeRology of the R-value (not R-factor, because it is neither a factor itself nor does it factor in anything but ill-posed reviewer's critique. Historically the term originated from small molecule crystallography, but it is only a 'Residual-value') a) The R-value itself - based on the linear residuals and of apparent intuitive meaning - is statistically peculiar to say the least. I could not find it in any common statistics text. So doing proper statistics with R becomes difficult. b) rules of thumb (as much as they conveniently obviate the need for detailed explanations, satisfy student's desire for quick answers, and allow superficial review of manuscripts) become less valuable if they have a case-dependent large variance, topped with an unknown parent distribution. Combined with an odd statistic, that has great potential for misguidance and unnecessarily lost sleep. c) Ian has (once again) explained that for example the Rf-R depends on the exact knowledge of the restraints and their individual weighting, which we generally do not have. Caution is advised. d) The answer which model is better - which is actually what you want to know - becomes a question of model selection or hypothesis testing, which, given the obscurity of R cannot be derived with some nice plug-in method. As Ian said the models to be compared must also be based on the same and identical data. e) One measure available that is statistically at least defensible is the log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes factor (there is the darn factor again, it’s a ratio)) and see where this falls - and the answers are pretty soft and, probably because of that, correspondingly realistic. This also makes - based on statistics alone - deciding between different overall parameterizations difficult. http://en.wikipedia.org/wiki/Bayes_factor f) so having said that, what really remains is that the model that fits the primary evidence (minimally biased electron density) best and is at the same time physically meaningful, is the best model, i. e., all plausibly accountable electron density (and not more) is modeled. You can convince yourself of this by taking the most interesting part of the model out (say a ligand or a binding pocket) and look at the R-values or do a model selection test - the result will be indecisive. Poof goes the global rule of thumb. g) in other words: global measures in general are entirely inadequate to judge local model quality (noted many times over already by Jones, Kleywegt, others, in the dark ages of crystallography when poorly restrained crystallographers used to passionately whack each other over the head with unfree R-values). Best, BR - Bernhard Rupp, Hofkristallrat a.D. 001 (925) 209-7429 +43 (676) 571-0536 b...@ruppweb.org hofkristall...@gmail.com http://www.ruppweb.org/ -- Und wieder ein chillout-mix aus der Hofkristall-lounge -- -- === * * * Gerard Bricogne g...@globalphasing.com * * * * Global Phasing Ltd. * * Sheraton House, Castle Park Tel: +44-(0)1223-353033 * * Cambridge CB3 0AX, UK Fax: +44-(0)1223-366889 * * *
Re: [ccp4bb] Against Method (R)
W C Hamilton Significance tests on the crystallographic R factor Acta Cryst. (1965). 18, 502-510 Interestingly enough, I have used the Hamilton tests in Rietveld powder refinements of small molecules/intermetallics before before. One problem were partial occupancies vs split conformations in HT superconductors. Alas, you cannot cheat there either - most of the time the results showed that numerically the differences were not significant, and one again had to resort to non-statistical plausibility arguments of references. Has anyone done Hamiltons on different protein models/parameterizations and can report? I think for global parameterization changes like NCS,TLS, etc that may in fact be interesting. BR
Re: [ccp4bb] Against Method (R)
Indeed, see: http://scripts.iucr.org/cgi-bin/paper?a07175 . The Rfree/Rwork ratio that I referred to does strictly use the weighted ('Hamilton') R-factors, but because only the unweighted values are given in the PDB we were forced to approximate (against our better judgment!). The problem of course is that all refinement software AFAIK writes the unweighted Rwork Rfree to the PDB header; there are no slots for the weighted values, which does indeed make doing serious statistics on the PDB entries difficult if not impossible! The unweighted crystallographic R-factor was only ever intended as a rule of thumb, i.e. to give a rough idea of the relative quality of related structures; I hardly think the crystallographers of yesteryear ever imagined that we would be taking it so seriously now! In particular IMO it should never be used for something as critical as validation (either global or local), or for guiding refinement strategy: use the likelihood instead. Cheers -- Ian PS I've always known it as an 'R-factor', e.g. see paper referenced above, but then during my crystallographic training I used extensively software developed by both authors of the paper (i.e. Geoff Ford the late John Rollett) in Oxford (which eventually became the 'Crystals' small-molecule package). Maybe it's a transatlantic thing ... Cheers -- Ian On Tue, Oct 26, 2010 at 7:28 PM, Ethan Merritt merr...@u.washington.edu wrote: On Tuesday, October 26, 2010 09:46:46 am Bernhard Rupp (Hofkristallrat a.D.) wrote: Hi Folks, Please allow me a few biased reflections/opinions on the numeRology of the R-value (not R-factor, because it is neither a factor itself nor does it factor in anything but ill-posed reviewer's critique. Historically the term originated from small molecule crystallography, but it is only a 'Residual-value') a) The R-value itself - based on the linear residuals and of apparent intuitive meaning - is statistically peculiar to say the least. I could not find it in any common statistics text. So doing proper statistics with R becomes difficult. As WC Hamilton pointed out originally, two [properly weighted] R factors can be compared by taking their ratio. Significance levels can then be evaluated using the standard F distribution. A concise summary is given in chapter 9 of Prince's book, which I highly recommend to all crystallographers. W C Hamilton Significance tests on the crystallographic R factor Acta Cryst. (1965). 18, 502-510 Edward Prince Mathematical Techniques in Crystallography and Materials Science. Springer-Verlag, 1982. It is true that we normally indulge in the sloppy habit of paying attention only to the unweighted R factor even though refinement programs report both the weighted and unweighted versions. (shelx users excepted :-) But the weighted form is there also if you want to do statistical tests. You are of course correct that this remains a global test, and as such is of limited use in evaluating local properties of the model. cheers, Ethan b) rules of thumb (as much as they conveniently obviate the need for detailed explanations, satisfy student's desire for quick answers, and allow superficial review of manuscripts) become less valuable if they have a case-dependent large variance, topped with an unknown parent distribution. Combined with an odd statistic, that has great potential for misguidance and unnecessarily lost sleep. c) Ian has (once again) explained that for example the Rf-R depends on the exact knowledge of the restraints and their individual weighting, which we generally do not have. Caution is advised. d) The answer which model is better - which is actually what you want to know - becomes a question of model selection or hypothesis testing, which, given the obscurity of R cannot be derived with some nice plug-in method. As Ian said the models to be compared must also be based on the same and identical data. e) One measure available that is statistically at least defensible is the log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes factor (there is the darn factor again, it’s a ratio)) and see where this falls - and the answers are pretty soft and, probably because of that, correspondingly realistic. This also makes - based on statistics alone - deciding between different overall parameterizations difficult. http://en.wikipedia.org/wiki/Bayes_factor f) so having said that, what really remains is that the model that fits the primary evidence (minimally biased electron density) best and is at the same time physically meaningful, is the best model, i. e., all plausibly accountable electron density (and not more) is modeled. You can convince yourself of this by taking the most interesting part of the model out (say a ligand or a binding pocket) and look at the R-values or do a model selection test - the result will be indecisive. Poof goes the global rule of
Re: [ccp4bb] Against Method (R)
Um... * Given that the weighted Rfactor is weighted by the measurement errors (1/sig^2) * and given that the errors in our measurements apparently have no bearing whatsoever on the errors in our models (for macromolecular crystals, certainly - the R-vfactor gap) is the weighted Rfactor even vaguely relevant for anything at all? phx. On 26/10/2010 20:44, Ian Tickle wrote: Indeed, see: http://scripts.iucr.org/cgi-bin/paper?a07175 . The Rfree/Rwork ratio that I referred to does strictly use the weighted ('Hamilton') R-factors, but because only the unweighted values are given in the PDB we were forced to approximate (against our better judgment!). The problem of course is that all refinement software AFAIK writes the unweighted Rwork Rfree to the PDB header; there are no slots for the weighted values, which does indeed make doing serious statistics on the PDB entries difficult if not impossible! The unweighted crystallographic R-factor was only ever intended as a rule of thumb, i.e. to give a rough idea of the relative quality of related structures; I hardly think the crystallographers of yesteryear ever imagined that we would be taking it so seriously now! In particular IMO it should never be used for something as critical as validation (either global or local), or for guiding refinement strategy: use the likelihood instead. Cheers -- Ian PS I've always known it as an 'R-factor', e.g. see paper referenced above, but then during my crystallographic training I used extensively software developed by both authors of the paper (i.e. Geoff Ford the late John Rollett) in Oxford (which eventually became the 'Crystals' small-molecule package). Maybe it's a transatlantic thing ... Cheers -- Ian On Tue, Oct 26, 2010 at 7:28 PM, Ethan Merrittmerr...@u.washington.edu wrote: On Tuesday, October 26, 2010 09:46:46 am Bernhard Rupp (Hofkristallrat a.D.) wrote: Hi Folks, Please allow me a few biased reflections/opinions on the numeRology of the R-value (not R-factor, because it is neither a factor itself nor does it factor in anything but ill-posed reviewer's critique. Historically the term originated from small molecule crystallography, but it is only a 'Residual-value') a) The R-value itself - based on the linear residuals and of apparent intuitive meaning - is statistically peculiar to say the least. I could not find it in any common statistics text. So doing proper statistics with R becomes difficult. As WC Hamilton pointed out originally, two [properly weighted] R factors can be compared by taking their ratio. Significance levels can then be evaluated using the standard F distribution. A concise summary is given in chapter 9 of Prince's book, which I highly recommend to all crystallographers. W C Hamilton Significance tests on the crystallographic R factor Acta Cryst. (1965). 18, 502-510 Edward Prince Mathematical Techniques in Crystallography and Materials Science. Springer-Verlag, 1982. It is true that we normally indulge in the sloppy habit of paying attention only to the unweighted R factor even though refinement programs report both the weighted and unweighted versions. (shelx users excepted :-) But the weighted form is there also if you want to do statistical tests. You are of course correct that this remains a global test, and as such is of limited use in evaluating local properties of the model. cheers, Ethan b) rules of thumb (as much as they conveniently obviate the need for detailed explanations, satisfy student's desire for quick answers, and allow superficial review of manuscripts) become less valuable if they have a case-dependent large variance, topped with an unknown parent distribution. Combined with an odd statistic, that has great potential for misguidance and unnecessarily lost sleep. c) Ian has (once again) explained that for example the Rf-R depends on the exact knowledge of the restraints and their individual weighting, which we generally do not have. Caution is advised. d) The answer which model is better - which is actually what you want to know - becomes a question of model selection or hypothesis testing, which, given the obscurity of R cannot be derived with some nice plug-in method. As Ian said the models to be compared must also be based on the same and identical data. e) One measure available that is statistically at least defensible is the log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes factor (there is the darn factor again, it’s a ratio)) and see where this falls - and the answers are pretty soft and, probably because of that, correspondingly realistic. This also makes - based on statistics alone - deciding between different overall parameterizations difficult. http://en.wikipedia.org/wiki/Bayes_factor f) so having said that, what really remains is that the model that fits the primary evidence (minimally biased electron density) best and is at the same time physically meaningful,
[ccp4bb] Help with Optimizing Crystals
Hello. I have obtained disk shaped crystals of a protein that I am working on. I got hits in about 10 different conditions, with a few common precipitants and pHs, and I have optimized two conditions so far. In the optimized conditions, the crystals appear overnight, usually surrounded by or hiding under heavy precipitant. Under the best conditions, I get what I would describe as single disks, some of which are of decent size and very round, that rotate light very well. Sub-optimal conditions can give small to large crystal clusters. I shot the large disk crystals grown from one conditions at the synchrotron. but they do not diffract. I was wondering if anyone had any advice about optimizing these crystals in order to get them to diffract better? As mentioned before, I have only tried optimizing a few of the hit conditions (varying precipitant conc., pH, etc.), but crystals from all of the hits look the same: always round disks or disk clusters. This leads me to believe that optimized conditions of the other hits will produce similar results as before. Would it be worthwhile to try optimizing these conditions as well? I have also tried seeding, which just produces a lot of clusters, and an additive screen. Some of the additives help to produce larger crystals, but again I always get single or disk clusters. Any advice would be helpful. Thanks, Matt
Re: [ccp4bb] Help with Optimizing Crystals
Hi Matt, You'll probably get many different answers to a question like this, but what I would do is go back to your protein and make different constructs; chop off termini, surface mutations etc, maybe cleave off the tag. Of course more screening and optimization might work, but my sense is that since you get many hits pretty easily that however don't diffract, there may be something on the protein level that needs correcting. Good luck, Bert On 10/26/10 4:23 PM, Matthew Bratkowski mab...@cornell.edu wrote: Hello. I have obtained disk shaped crystals of a protein that I am working on. I got hits in about 10 different conditions, with a few common precipitants and pHs, and I have optimized two conditions so far. In the optimized conditions, the crystals appear overnight, usually surrounded by or hiding under heavy precipitant. Under the best conditions, I get what I would describe as single disks, some of which are of decent size and very round, that rotate light very well. Sub-optimal conditions can give small to large crystal clusters. I shot the large disk crystals grown from one conditions at the synchrotron. but they do not diffract. I was wondering if anyone had any advice about optimizing these crystals in order to get them to diffract better? As mentioned before, I have only tried optimizing a few of the hit conditions (varying precipitant conc., pH, etc.), but crystals from all of the hits look the same: always round disks or disk clusters. This leads me to believe that optimized conditions of the other hits will produce similar results as before. Would it be worthwhile to try optimizing these conditions as well? I have also tried seeding, which just produces a lot of clusters, and an additive screen. Some of the additives help to produce larger crystals, but again I always get single or disk clusters. Any advice would be helpful. Thanks, Matt
Re: [ccp4bb] Help with Optimizing Crystals
First piece of advice I have is to shove them in the beam and see what happens. A few days ago we got high-resolution data from crystals that are shaped like eggs. No edges on them whatsoever. In the past, saucer-shaped crystals diffracted to 2A whereas their hexagonal 'perfect' cousins (grown from a different PEG, if memory serves) had Cheeseburger-strength diffraction. Secondly, if ordinary optimization attempts repeatedly fail, it may be time for protein optimization, e.g. proteolysis, mutagenesis, methylation and so forth :) Artem On Tue, Oct 26, 2010 at 3:23 PM, Matthew Bratkowski mab...@cornell.eduwrote: Hello. I have obtained disk shaped crystals of a protein that I am working on. I got hits in about 10 different conditions, with a few common precipitants and pHs, and I have optimized two conditions so far. In the optimized conditions, the crystals appear overnight, usually surrounded by or hiding under heavy precipitant. Under the best conditions, I get what I would describe as single disks, some of which are of decent size and very round, that rotate light very well. Sub-optimal conditions can give small to large crystal clusters. I shot the large disk crystals grown from one conditions at the synchrotron. but they do not diffract. I was wondering if anyone had any advice about optimizing these crystals in order to get them to diffract better? As mentioned before, I have only tried optimizing a few of the hit conditions (varying precipitant conc., pH, etc.), but crystals from all of the hits look the same: always round disks or disk clusters. This leads me to believe that optimized conditions of the other hits will produce similar results as before. Would it be worthwhile to try optimizing these conditions as well? I have also tried seeding, which just produces a lot of clusters, and an additive screen. Some of the additives help to produce larger crystals, but again I always get single or disk clusters. Any advice would be helpful. Thanks, Matt
Re: [ccp4bb] Help with Optimizing Crystals
You did check on a gel that they are indeed your protein ? If you have sufficient amounts available try digesting it with various proteases and see if you can identify a stable fragment. A less radical approach, which might not be accessible to you, you could screen your protein for alternative buffer conditions using DSF and then pick a condition under which it seems to be very stable according to its melting temperature in the buffer. You've spared us the details of your purification procedure, maybe a polishing step at the end with a SEC might do wonders. Jürgen - Jürgen Bosch Johns Hopkins Bloomberg School of Public Health Department of Biochemistry Molecular Biology Johns Hopkins Malaria Research Institute 615 North Wolfe Street, W8708 Baltimore, MD 21205 Phone: +1-410-614-4742 Lab: +1-410-614-4894 Fax: +1-410-955-3655 http://web.mac.com/bosch_lab/ On Oct 26, 2010, at 4:23 PM, Matthew Bratkowski wrote: Hello. I have obtained disk shaped crystals of a protein that I am working on. I got hits in about 10 different conditions, with a few common precipitants and pHs, and I have optimized two conditions so far. In the optimized conditions, the crystals appear overnight, usually surrounded by or hiding under heavy precipitant. Under the best conditions, I get what I would describe as single disks, some of which are of decent size and very round, that rotate light very well. Sub-optimal conditions can give small to large crystal clusters. I shot the large disk crystals grown from one conditions at the synchrotron. but they do not diffract. I was wondering if anyone had any advice about optimizing these crystals in order to get them to diffract better? As mentioned before, I have only tried optimizing a few of the hit conditions (varying precipitant conc., pH, etc.), but crystals from all of the hits look the same: always round disks or disk clusters. This leads me to believe that optimized conditions of the other hits will produce similar results as before. Would it be worthwhile to try optimizing these conditions as well? I have also tried seeding, which just produces a lot of clusters, and an additive screen. Some of the additives help to produce larger crystals, but again I always get single or disk clusters. Any advice would be helpful. Thanks, Matt
Re: [ccp4bb] Help with Optimizing Crystals
Seeding! Make seeds, rescreen with seeds. Look in many former ccp4bb posts for references about this. Jacob - Original Message - From: Jürgen Bosch To: CCP4BB@JISCMAIL.AC.UK Sent: Tuesday, October 26, 2010 3:47 PM Subject: Re: [ccp4bb] Help with Optimizing Crystals You did check on a gel that they are indeed your protein ? If you have sufficient amounts available try digesting it with various proteases and see if you can identify a stable fragment. A less radical approach, which might not be accessible to you, you could screen your protein for alternative buffer conditions using DSF and then pick a condition under which it seems to be very stable according to its melting temperature in the buffer. You've spared us the details of your purification procedure, maybe a polishing step at the end with a SEC might do wonders. Jürgen - Jürgen Bosch Johns Hopkins Bloomberg School of Public Health Department of Biochemistry Molecular Biology Johns Hopkins Malaria Research Institute 615 North Wolfe Street, W8708 Baltimore, MD 21205 Phone: +1-410-614-4742 Lab: +1-410-614-4894 Fax: +1-410-955-3655 http://web.mac.com/bosch_lab/ On Oct 26, 2010, at 4:23 PM, Matthew Bratkowski wrote: Hello. I have obtained disk shaped crystals of a protein that I am working on. I got hits in about 10 different conditions, with a few common precipitants and pHs, and I have optimized two conditions so far. In the optimized conditions, the crystals appear overnight, usually surrounded by or hiding under heavy precipitant. Under the best conditions, I get what I would describe as single disks, some of which are of decent size and very round, that rotate light very well. Sub-optimal conditions can give small to large crystal clusters. I shot the large disk crystals grown from one conditions at the synchrotron. but they do not diffract. I was wondering if anyone had any advice about optimizing these crystals in order to get them to diffract better? As mentioned before, I have only tried optimizing a few of the hit conditions (varying precipitant conc., pH, etc.), but crystals from all of the hits look the same: always round disks or disk clusters. This leads me to believe that optimized conditions of the other hits will produce similar results as before. Would it be worthwhile to try optimizing these conditions as well? I have also tried seeding, which just produces a lot of clusters, and an additive screen. Some of the additives help to produce larger crystals, but again I always get single or disk clusters. Any advice would be helpful. Thanks, Matt *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program Dallos Laboratory F. Searle 1-240 2240 Campus Drive Evanston IL 60208 lab: 847.491.2438 cel: 773.608.9185 email: j-kell...@northwestern.edu ***
Re: [ccp4bb] Against Method (R)
On Tuesday, October 26, 2010 01:16:58 pm Frank von Delft wrote: Um... * Given that the weighted Rfactor is weighted by the measurement errors (1/sig^2) * and given that the errors in our measurements apparently have no bearing whatsoever on the errors in our models (for macromolecular crystals, certainly - the R-vfactor gap) You are overlooking causality :-) Yes, the errors in state-of-the-art models are only weakly limited by the errors in our measurements. But that is exactly _because_ we can now weight properly by the measurement errors (1/sig^2). In my salad days, weighting by 1/sig^2 was a mug's game. Refinement only produced a reasonable model if you applied empirical corrections rather than statistical weights. Things have improved a bit since then, both on the equipment side (detectors, cryo, ...) and on the processing side (Maximum Likelihood, error propagation, ...). Now the sigmas actually mean something! is the weighted Rfactor even vaguely relevant for anything at all? Yes, it is. It is the thing you are minimizing during refinement, at least to first approximation. Also, as just mentioned, it is a well-defined value that you can do use for statistical significance tests. Ethan phx. On 26/10/2010 20:44, Ian Tickle wrote: Indeed, see: http://scripts.iucr.org/cgi-bin/paper?a07175 . The Rfree/Rwork ratio that I referred to does strictly use the weighted ('Hamilton') R-factors, but because only the unweighted values are given in the PDB we were forced to approximate (against our better judgment!). The problem of course is that all refinement software AFAIK writes the unweighted Rwork Rfree to the PDB header; there are no slots for the weighted values, which does indeed make doing serious statistics on the PDB entries difficult if not impossible! The unweighted crystallographic R-factor was only ever intended as a rule of thumb, i.e. to give a rough idea of the relative quality of related structures; I hardly think the crystallographers of yesteryear ever imagined that we would be taking it so seriously now! In particular IMO it should never be used for something as critical as validation (either global or local), or for guiding refinement strategy: use the likelihood instead. Cheers -- Ian PS I've always known it as an 'R-factor', e.g. see paper referenced above, but then during my crystallographic training I used extensively software developed by both authors of the paper (i.e. Geoff Ford the late John Rollett) in Oxford (which eventually became the 'Crystals' small-molecule package). Maybe it's a transatlantic thing ... Cheers -- Ian On Tue, Oct 26, 2010 at 7:28 PM, Ethan Merrittmerr...@u.washington.edu wrote: On Tuesday, October 26, 2010 09:46:46 am Bernhard Rupp (Hofkristallrat a.D.) wrote: Hi Folks, Please allow me a few biased reflections/opinions on the numeRology of the R-value (not R-factor, because it is neither a factor itself nor does it factor in anything but ill-posed reviewer's critique. Historically the term originated from small molecule crystallography, but it is only a 'Residual-value') a) The R-value itself - based on the linear residuals and of apparent intuitive meaning - is statistically peculiar to say the least. I could not find it in any common statistics text. So doing proper statistics with R becomes difficult. As WC Hamilton pointed out originally, two [properly weighted] R factors can be compared by taking their ratio. Significance levels can then be evaluated using the standard F distribution. A concise summary is given in chapter 9 of Prince's book, which I highly recommend to all crystallographers. W C Hamilton Significance tests on the crystallographic R factor Acta Cryst. (1965). 18, 502-510 Edward Prince Mathematical Techniques in Crystallography and Materials Science. Springer-Verlag, 1982. It is true that we normally indulge in the sloppy habit of paying attention only to the unweighted R factor even though refinement programs report both the weighted and unweighted versions. (shelx users excepted :-) But the weighted form is there also if you want to do statistical tests. You are of course correct that this remains a global test, and as such is of limited use in evaluating local properties of the model. cheers, Ethan b) rules of thumb (as much as they conveniently obviate the need for detailed explanations, satisfy student's desire for quick answers, and allow superficial review of manuscripts) become less valuable if they have a case-dependent large variance, topped with an unknown parent distribution. Combined with an odd statistic, that has great potential for misguidance and unnecessarily lost sleep. c) Ian has (once again) explained that for example the Rf-R depends on the exact
Re: [ccp4bb] Help with Optimizing Crystals
Hi. Here is some additional information. 1. The purification method that I used included Ni, tag cleavage, and SEC as a final step. I have tried samples from three different purification batches that range in purity, and even the batch with the worst purity seems to produce crystals. 2. The protein is a proteolyzed fragment since the full length version did not crystallize. Mutagenesis and methylation, however, may be techniques to consider since the protein contains quite a few lysines. 3. There are not any detergents in the buffer, so these are not detergent crystals. The protein buffer just contains Tris at pH 8, NaCl, and DTT. 4. Some experiments that I have done thus far seem to suggest that the crystals are protein. Izit dye soaks well into the crystals, and the few crystals that I shot previously did not produce any diffraction pattern whatsoever. However, I have had difficulty seeming them on a gel and they are a bit tough to break. 5. I tried seeding previously as follows: I broke some crystals, made a seed stock, dipped in a hair, and did serial streak seeding. After seeding, I usually saw small disks or clusters along the path of the hair but nothing larger or better looking. I also had one more question. Has anyone had an instance where changing the precipitation condition or including an additive improved diffraction but did not drastically change the shape of the protein? If so, I may just try further optimization with the current conditions and shoot some more crystals. Thanks for all the helpful advice thus far, Matt
Re: [ccp4bb] Help with Optimizing Crystals
Hi. Here is some additional information. 1. The purification method that I used included Ni, tag cleavage, and SEC as a final step. I have tried samples from three different purification batches that range in purity, and even the batch with the worst purity seems to produce crystals. Resource Q ? two or more species perhaps ? Does it run as a monomer dimer multimer on your SEC ? 2. The protein is a proteolyzed fragment since the full length version did not crystallize. Mutagenesis and methylation, however, may be techniques to consider since the protein contains quite a few lysines. 3. There are not any detergents in the buffer, so these are not detergent crystals. The protein buffer just contains Tris at pH 8, NaCl, and DTT. 4. Some experiments that I have done thus far seem to suggest that the crystals are protein. Izit dye soaks well into the crystals, and the few crystals that I shot previously did not produce any diffraction pattern whatsoever. However, I have had difficulty seeming them on a gel and they are a bit tough to break. Do they float or do they sink quickly when you try to mount them ? 5. I tried seeding previously as follows: I broke some crystals, made a seed stock, dipped in a hair, and did serial streak seeding. After seeding, I usually saw small disks or clusters along the path of the hair but nothing larger or better looking. I also had one more question. Has anyone had an instance where changing the precipitation condition or including an additive improved diffraction but did not drastically change the shape of the protein? If so, I may just try further optimization with the current conditions and shoot some more crystals. The additive screen from Hampton is not bad and can make a big difference. A different topic is it a direct cryo what you are using as a condition ? If not what do you use a s a cryo ? Have you tried the old-fashioned way of shooting at crystals at room temperature using capillaries (WTHIT ?) You might be killing your crystal by trying to cryo it is what I'm trying to say here. Jürgen Thanks for all the helpful advice thus far, Matt
Re: [ccp4bb] diverging Rcryst and Rfree
I found a practical solution to a similar problem. When I get large gap between Rf/R in refmac I repeat the refinement in PHENIX using the same model and the same mtz file, It has always worked for me. And I have no theory for that observation, but the tables in publications looked better. Maia Quoting Ian Tickle ianj...@gmail.com: Jackie I agree completely with Ed (for once!), not only for the reasons he gave, but also that it's valid to compare statistics such as likelihood and R factors ONLY if only the model is varied. Such a comparison is not valid if the data used are varied (in this case you are changing the data by deleting some of them). Cheers -- Ian On Tue, Oct 26, 2010 at 2:37 PM, Ed Pozharski epozh...@umaryland.edu wrote: Jackie, please note that (at least imho) the desire to obtain better R-factors does not justify excluding data from analysis. Weak reflections that you suggest should be rejected contain information, and excluding them will indeed artificially lower the R-factors while reducing the accuracy of your model. Cheers, Ed. On Mon, 2010-10-25 at 17:44 -0400, Jacqueline Vitali wrote: Also if your Rmerge is high and you include all reflections in refinement, Rfree is high. In my experience, by excluding F sigma reflections you drop Rfree a lot. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] Against Method (R)
Yes, but what I think Frank is trying to point out is that the difference between Fobs and Fcalc in any given PDB entry is generally about 4-5 times larger than sigma(Fobs). In such situations, pretty much any standard statistical test will tell you that the model is highly unlikely to be correct. I am not saying that everything in the PDB is wrong, just that the dominant source of error is a shortcoming of the models we use. Whatever this source of error is, it vastly overpowers the measurement error. That is, errors do not add linearly, but rather as squares, and 20%^2+5%^2 ~ 20%^2 . So, since the experimental error is only a minor contribution to the total error, it is arguably inappropriate to use it as a weight for each hkl. Yes, refinement does seem to work better when you use experimental sigmas, and weighted statistics are probably better than no weights at all, but the problem is that until we do have a model that can explain Fobs to within experimental error, we will be severely limited in the kinds of conclusions we can derive from our data. -James Holton MAD Scientist On Tue, Oct 26, 2010 at 1:59 PM, Ethan Merritt merr...@u.washington.eduwrote: On Tuesday, October 26, 2010 01:16:58 pm Frank von Delft wrote: Um... * Given that the weighted Rfactor is weighted by the measurement errors (1/sig^2) * and given that the errors in our measurements apparently have no bearing whatsoever on the errors in our models (for macromolecular crystals, certainly - the R-vfactor gap) You are overlooking causality :-) Yes, the errors in state-of-the-art models are only weakly limited by the errors in our measurements. But that is exactly _because_ we can now weight properly by the measurement errors (1/sig^2). In my salad days, weighting by 1/sig^2 was a mug's game. Refinement only produced a reasonable model if you applied empirical corrections rather than statistical weights. Things have improved a bit since then, both on the equipment side (detectors, cryo, ...) and on the processing side (Maximum Likelihood, error propagation, ...). Now the sigmas actually mean something! is the weighted Rfactor even vaguely relevant for anything at all? Yes, it is. It is the thing you are minimizing during refinement, at least to first approximation. Also, as just mentioned, it is a well-defined value that you can do use for statistical significance tests. Ethan phx. On 26/10/2010 20:44, Ian Tickle wrote: Indeed, see: http://scripts.iucr.org/cgi-bin/paper?a07175 . The Rfree/Rwork ratio that I referred to does strictly use the weighted ('Hamilton') R-factors, but because only the unweighted values are given in the PDB we were forced to approximate (against our better judgment!). The problem of course is that all refinement software AFAIK writes the unweighted Rwork Rfree to the PDB header; there are no slots for the weighted values, which does indeed make doing serious statistics on the PDB entries difficult if not impossible! The unweighted crystallographic R-factor was only ever intended as a rule of thumb, i.e. to give a rough idea of the relative quality of related structures; I hardly think the crystallographers of yesteryear ever imagined that we would be taking it so seriously now! In particular IMO it should never be used for something as critical as validation (either global or local), or for guiding refinement strategy: use the likelihood instead. Cheers -- Ian PS I've always known it as an 'R-factor', e.g. see paper referenced above, but then during my crystallographic training I used extensively software developed by both authors of the paper (i.e. Geoff Ford the late John Rollett) in Oxford (which eventually became the 'Crystals' small-molecule package). Maybe it's a transatlantic thing ... Cheers -- Ian On Tue, Oct 26, 2010 at 7:28 PM, Ethan Merritt merr...@u.washington.edu wrote: On Tuesday, October 26, 2010 09:46:46 am Bernhard Rupp (Hofkristallrat a.D.) wrote: Hi Folks, Please allow me a few biased reflections/opinions on the numeRology of the R-value (not R-factor, because it is neither a factor itself nor does it factor in anything but ill-posed reviewer's critique. Historically the term originated from small molecule crystallography, but it is only a 'Residual-value') a) The R-value itself - based on the linear residuals and of apparent intuitive meaning - is statistically peculiar to say the least. I could not find it in any common statistics text. So doing proper statistics with R becomes difficult. As WC Hamilton pointed out originally, two [properly weighted] R factors can be compared by taking their ratio. Significance levels can then be evaluated using the standard F distribution. A concise summary is given in chapter 9 of Prince's book, which I
Re: [ccp4bb] Against Method (R)
- Original Message - From: James Holton To: CCP4BB@JISCMAIL.AC.UK Sent: Tuesday, October 26, 2010 6:31 PM Subject: Re: [ccp4bb] Against Method (R) Yes, but what I think Frank is trying to point out is that the difference between Fobs and Fcalc in any given PDB entry is generally about 4-5 times larger than sigma(Fobs). In such situations, pretty much any standard statistical test will tell you that the model is highly unlikely to be correct. Wow, so what is the answer to this? Is that figure |Fcalc - Fobs| = 4-5x sigma really true? How, then, do we believe structures? Are there really good structures where this is discrepancy is not there, to stake our claim, so to speak?
Re: [ccp4bb] Help with Optimizing Crystals
Janet Newman wrote a review on improving diffraction a few years back: http://dx.doi.org/10.1107/S0907444905032130 it is open access. Probably the most underappreciated aspect of diffraction is the purity of the protien. This is because impurities like slightly misfolded versions of your native structure can genreally still find their way into the lattice. These are defects, and the distorition they create will push thousands of other unit cells out of place. So, 95%, or even 99% purity can be not enough. Getting the molecule to the point where it is the brightest band on the gel is generally enough to screen for crystallization conditions, but it is not uncommon for crystals from un-clean protein to diffract poorly. My advice: try adding a column to your purification protocol. Or, better yet, try fractional recrystallization. This is where you use your crystallization condition to crash out your entire stock, spin down the crystals, redissolve them, and then maybe do this a few times in a row. Yes, you loose a lot of material, but the stuff you lost is stuff that doesn't crystallize anyway. -James Holton MAD Scientist On Tue, Oct 26, 2010 at 2:31 PM, Matthew Bratkowski mab...@cornell.eduwrote: Hi. Here is some additional information. 1. The purification method that I used included Ni, tag cleavage, and SEC as a final step. I have tried samples from three different purification batches that range in purity, and even the batch with the worst purity seems to produce crystals. 2. The protein is a proteolyzed fragment since the full length version did not crystallize. Mutagenesis and methylation, however, may be techniques to consider since the protein contains quite a few lysines. 3. There are not any detergents in the buffer, so these are not detergent crystals. The protein buffer just contains Tris at pH 8, NaCl, and DTT. 4. Some experiments that I have done thus far seem to suggest that the crystals are protein. Izit dye soaks well into the crystals, and the few crystals that I shot previously did not produce any diffraction pattern whatsoever. However, I have had difficulty seeming them on a gel and they are a bit tough to break. 5. I tried seeding previously as follows: I broke some crystals, made a seed stock, dipped in a hair, and did serial streak seeding. After seeding, I usually saw small disks or clusters along the path of the hair but nothing larger or better looking. I also had one more question. Has anyone had an instance where changing the precipitation condition or including an additive improved diffraction but did not drastically change the shape of the protein? If so, I may just try further optimization with the current conditions and shoot some more crystals. Thanks for all the helpful advice thus far, Matt
Re: [ccp4bb] Against Method (R)
On Tuesday, October 26, 2010 04:31:24 pm James Holton wrote: Yes, but what I think Frank is trying to point out is that the difference between Fobs and Fcalc in any given PDB entry is generally about 4-5 times larger than sigma(Fobs). In such situations, pretty much any standard statistical test will tell you that the model is highly unlikely to be correct. But that's not the question we are normally asking. It is highly unlikely that any model in biology is correct, if by correct you mean cannot be improved. Normally we ask the more modest question have I improved my model today over what it was yesterday?. I am not saying that everything in the PDB is wrong, just that the dominant source of error is a shortcoming of the models we use. Whatever this source of error is, it vastly overpowers the measurement error. That is, errors do not add linearly, but rather as squares, and 20%^2+5%^2 ~ 20%^2 . So, since the experimental error is only a minor contribution to the total error, it is arguably inappropriate to use it as a weight for each hkl. I think your logic has run off the track. The experimental error is an appropriate weight for the Fobs(hkl) because that is indeed the error for that observation. This is true independent of errors in the model. If you improve the model, that does not magically change the accuracy of the data. Ethan Yes, refinement does seem to work better when you use experimental sigmas, and weighted statistics are probably better than no weights at all, but the problem is that until we do have a model that can explain Fobs to within experimental error, we will be severely limited in the kinds of conclusions we can derive from our data. -James Holton MAD Scientist On Tue, Oct 26, 2010 at 1:59 PM, Ethan Merritt merr...@u.washington.eduwrote: On Tuesday, October 26, 2010 01:16:58 pm Frank von Delft wrote: Um... * Given that the weighted Rfactor is weighted by the measurement errors (1/sig^2) * and given that the errors in our measurements apparently have no bearing whatsoever on the errors in our models (for macromolecular crystals, certainly - the R-vfactor gap) You are overlooking causality :-) Yes, the errors in state-of-the-art models are only weakly limited by the errors in our measurements. But that is exactly _because_ we can now weight properly by the measurement errors (1/sig^2). In my salad days, weighting by 1/sig^2 was a mug's game. Refinement only produced a reasonable model if you applied empirical corrections rather than statistical weights. Things have improved a bit since then, both on the equipment side (detectors, cryo, ...) and on the processing side (Maximum Likelihood, error propagation, ...). Now the sigmas actually mean something! is the weighted Rfactor even vaguely relevant for anything at all? Yes, it is. It is the thing you are minimizing during refinement, at least to first approximation. Also, as just mentioned, it is a well-defined value that you can do use for statistical significance tests. Ethan phx. On 26/10/2010 20:44, Ian Tickle wrote: Indeed, see: http://scripts.iucr.org/cgi-bin/paper?a07175 . The Rfree/Rwork ratio that I referred to does strictly use the weighted ('Hamilton') R-factors, but because only the unweighted values are given in the PDB we were forced to approximate (against our better judgment!). The problem of course is that all refinement software AFAIK writes the unweighted Rwork Rfree to the PDB header; there are no slots for the weighted values, which does indeed make doing serious statistics on the PDB entries difficult if not impossible! The unweighted crystallographic R-factor was only ever intended as a rule of thumb, i.e. to give a rough idea of the relative quality of related structures; I hardly think the crystallographers of yesteryear ever imagined that we would be taking it so seriously now! In particular IMO it should never be used for something as critical as validation (either global or local), or for guiding refinement strategy: use the likelihood instead. Cheers -- Ian PS I've always known it as an 'R-factor', e.g. see paper referenced above, but then during my crystallographic training I used extensively software developed by both authors of the paper (i.e. Geoff Ford the late John Rollett) in Oxford (which eventually became the 'Crystals' small-molecule package). Maybe it's a transatlantic thing ... Cheers -- Ian On Tue, Oct 26, 2010 at 7:28 PM, Ethan Merritt merr...@u.washington.edu wrote: On Tuesday, October 26, 2010 09:46:46 am Bernhard Rupp (Hofkristallrat a.D.) wrote: Hi Folks, Please allow me a few biased reflections/opinions on the numeRology of the R-value (not R-factor, because
Re: [ccp4bb] Against Method (R)
Some time ago, I computed the mean value of Rcryst(F) / Rmerge(F) across the whole PDB. This average was 4.5, and I take this as a rough estimate of |Fcalc - Fobs| / sigma(Fobs). More recently, I have been looking in more detail at deposited data, but so far the few cases where this ratio is close to 1 are all cases where sigma(Fobs) is unusually high! I think the answer is that we can believe structures in the PDB to within 20% error. This is close enough for a few things (such as government work), but not for traditional statistics like confidence tests. For me, it is just really bothersome that we can measure structure factors to better than 5% accuracy, but still don't know how to model them. Ethan does make a good point that sig(Fobs) is the error in the measurement, and that the model-data error is not the weight one should use in refinement, etc. However, when you are comparing one PDB entry (yours) to others (published), I still don't think that sigma(Fobs) plays any significant role. -James Holton MAD Scientist On Tue, Oct 26, 2010 at 4:45 PM, Jacob Keller j-kell...@fsm.northwestern.edu wrote: - Original Message - *From:* James Holton jmhol...@lbl.gov *To:* CCP4BB@JISCMAIL.AC.UK *Sent:* Tuesday, October 26, 2010 6:31 PM *Subject:* Re: [ccp4bb] Against Method (R) Yes, but what I think Frank is trying to point out is that the difference between Fobs and Fcalc in any given PDB entry is generally about 4-5 times larger than sigma(Fobs). In such situations, pretty much any standard statistical test will tell you that the model is highly unlikely to be correct. Wow, so what is the answer to this? Is that figure |Fcalc - Fobs| = 4-5x sigma really true? How, then, do we believe structures? Are there really good structures where this is discrepancy is not there, to stake our claim, so to speak?
[ccp4bb] Help with model bias in merihedral twin
Hello All, Not long ago I posted for some help with my twinned dataset at 1.95 A, and have confirmed the twinning of P6(5) into P6(5)22. Molecular replacement was successful and the twin refinement in Refmac yieled R/Rfree of 21%/26%, with a twin fraction of 0.46. Although the electron density map looks good, I am not sure if I should have too much confidence in it because I was not able to obtain 'strong electron densities' from omitted sections of the model in a refinement. I don't know if this is an indicator for bias introduced somewhere. I would like to ask what may be some procedures I can try for checking and removing these biases, and a few additional related questions. As suggested to me previously, I have generated a total omit map with sfcheck in ccp4i, using the refined pdb and unrefined data in P6(5). The .map file look a little worse in quality (is this because of the twinning?) but is still reasonable, with a few breaks in the main chain and side chains. Interestingly, when I do a real space refinement against the total omit map, I get slightly better Rfree at the earlier rounds of Refmac which diverges into the numbers above. Why is this the case? Cycle R fact R free 0 0.2301 0.2523 1 0.2205 0.2534 2 0.2164 0.2545 3 0.2140 0.2554 4 0.2123 0.2559 5 0.2117 0.2570 6 0.2116 0.2575 7 0.2112 0.2582 8 0.2112 0.2584 9 0.2109 0.2587 10 0.2106 0.2597 Secondly, I read that I should make sure the Free R flags should be consistent throughout the twin-related indices. What may be the adverse outcome if this isn't enforced? Is Refmac aware of this in a twin-refinement? If not, which tool could I use for this? I would very much appreciate any comments and suggestions. Best, Peter Chan
Re: [ccp4bb] diverging Rcryst and Rfree [SEC=UNCLASSIFIED]
Hi Ian, Yes, I guess my rule does work as you say. If, starting the day from (Rwork = 20, Rfree = 30) abbreviate (20,30), you do something to get (18,29), yes this means that that something was a bare minimum acceptable thing to do. If you do something to get (16,29) (decreased R by 4, Rfree by 1), then I would immediately suspect that that thing that was done introduced excessive over-fitting. If you do something to get (18,28) (decreased R by 2, Rfree by 2), then I would say that the thing that was done was a good thing. Yes, other arbitrary linear combinations could work. Not great analysis of this method was performed. I considered that it came to a question of what degree of over-fitting is acceptable. In practice, this rule stopped endless additions of water molecules and further alternate conformations, and for that purpose the precise point seemed unimportant. However, I also used this rule to determine preferred parameters for BFAC and the matrix weight. Do you think this is a bad rule, and can you point me to a better rule? Replying to BR: This rule of thumb has proven successful in providing a defined end point for building and refining a structure. Hmmm. I always thought things like no more significant explainable (difference) density define endpoints in model building and not R-values. This strategy has proven successful in nailing ligand structures where R-value rules of thumb were used to define the end points. Of course, there are other rules. One has to explain all significant residual density. But this tends to be a finite task. The above rule was not applicable to building active sites, or other things that would be discussed directly in a paper. The problem I attempt to address is endless fiddling with features of every-diminishing importance. Apologies if I have missed a recent relevant thread, but are lists of rules of thumb for model building and refinement? Anthony Anthony DuffTelephone: 02 9717 3493 Mob: 043 189 1076 -Original Message- From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Ian Tickle Sent: Wednesday, 27 October 2010 12:53 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] diverging Rcryst and Rfree [SEC=UNCLASSIFIED] Anthony, Your rule actually works on the difference (Rfree - Rwork/2), not (Rfree - Rwork) as you said, so is rather different from what most people seem to be using. For example let's say the current values are Rwork = 20, Rfree = 30, so your current test value is (30 - 20/2) = 20. Then according to your rule Rwork = 18, Rfree = 29 is equally acceptable (29 - 18/2 = 20, i.e. same test value), whereas Rwork = 16, Rfree = 29 would not be acceptable by your rule (29 - 16/2 = 21, so the test value is higher). Rwork = 18, Rfree = 28 would represent an improvement by your rule (28 - 18/2 = 19, i.e. a lower test value). You say this criterion provides a defined end-point, i.e. a minimum in the test value above. However wouldn't other linear combinations of Rwork Rfree also have a defined minimum value? In particular Rfree itself always has a defined minimum with respect to adding parameters or changing the weights, so would also satisfy your criterion. There has to be some additional criterion that you are relying on to select the particular linear combination (Rfree - Rwork.2) over any of the other possible ones? Cheers -- Ian On Tue, Oct 26, 2010 at 6:33 AM, DUFF, Anthony a...@ansto.gov.au wrote: One rule of thumb based on R and R-free divergence that I impress onto crystallography students is this: If a change in refinement strategy or parameters (eg loosening restraints, introducing TLS) or a round of addition of unimportant water molecules results in a reduction of R that is more than double the reduction in R-free, then don't do it. This rule of thumb has proven successful in providing a defined end point for building and refining a structure. The rule works on the differential of R - R-free divergence. I've noticed that some structures begin with a bigger divergence than others. Different Rmerge might explain. Has anyone else found a student in a dark room carefully adding large numbers of partially occupied water molecules? Anthony Anthony DuffTelephone: 02 9717 3493 Mob: 043 189 1076 From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Artem Evdokimov Sent: Tuesday, 26 October 2010 1:45 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] diverging Rcryst and Rfree Not that rules of thumb always have to have a rationale, nor that they're always correct - but it would seem that noise in the data (of which Rmerge is an indicator) should have a significant relationship with the R:Rfree
[ccp4bb] Hardware question
Another question about computer hardware- If I configure a computer at the Dell site, it costs about $700 to add a 2TB SATA drive. On amazon.com or Staples or such, a 2TB drive costs ~$110. to $200 depending on brand. Are the Dell-installed drives much faster, or more reliable, or have a better warranty? After all, RAID is supposed to stand for redundant array of inexpensive disks, and we could afford a lot more redundancy at the Amazon.com price. And, are there any brands or models that should be avoided due to known reliability issues? Thanks, eab
Re: [ccp4bb] Hardware question
Don't get ripped off by Dell! Their drives aren't any faster or better quality than the competition (IMHO they're probably slower and/or lower quality). If you're looking for a 2 terabyte drive, I have seven Hitachi 7K2000 2 TB (http://www.newegg.com/Product/Product.aspx?Item=N82E16822145298) drives in a RAID6 array inside a Thecus 7700 NAS ( http://www.thecus.com/products_over.php?cid=11pid=82set_language=english) for 10 terabytes of storage where 2 of the drives can simultaneously fail and still retain all the data.I have had the drives installed for over a year now and not a single problem. On Tue, Oct 26, 2010 at 9:52 PM, Edward A. Berry ber...@upstate.edu wrote: Another question about computer hardware- If I configure a computer at the Dell site, it costs about $700 to add a 2TB SATA drive. On amazon.com or Staples or such, a 2TB drive costs ~$110. to $200 depending on brand. Are the Dell-installed drives much faster, or more reliable, or have a better warranty? After all, RAID is supposed to stand for redundant array of inexpensive disks, and we could afford a lot more redundancy at the Amazon.com price. And, are there any brands or models that should be avoided due to known reliability issues? Thanks, eab -- Jim Fairman, Ph D. Post-Doctoral Fellow National Institutes of Health - NIDDK Lab: 1-301-594-9229 E-mail: fairman@gmail.com james.fair...@nih.gov
Re: [ccp4bb] Hardware question
On Tue, Oct 26, 2010 at 09:52:51PM -0400, Edward A. Berry wrote: Another question about computer hardware- If I configure a computer at the Dell site, it costs about $700 to add a 2TB SATA drive. On amazon.com or Staples or such, a 2TB drive costs ~$110. to $200 depending on brand. Are the Dell-installed drives much faster No. or more reliable No. or have a better warranty? No. In fact they frequently have a worse warranty than the exact same retail product with a non-Dell part number. One of the ways that Dell keeps costs down is to negotiate a bulk deal with the hard drive OEMs where they provide Dell the exact same drives they sell in the retail channel, but with a shorter warranty, typically 1 year instead of 3 or 5 years. After all, RAID is supposed to stand for redundant array of inexpensive disks, and we could afford a lot more redundancy at the Amazon.com price. RAID is good for performance and uptime reasons, but it is _not_ a replacement for backups. You probably knew that, but I'll mention it for the audience playing along at home. And, are there any brands or models that should be avoided due to known reliability issues? Not really. Seagate had some firmware issues with their first 1.5 TB models, but they were worked out fairly quickly. I think any of the major vendors are going to be fairly competitve when it comes to reliability. The important thing is to look at the drive warranty. The lower-end drives will have 3 year or shorter warranties, and the higher-end drives will have 5 year warranties. Buy a model with a 5 year warranty. -ben -- | Ben Eisenbraun | Software Sysadmin | | Structural Biology Grid | http://sbgrid.org | | Harvard Medical School | http://hms.harvard.edu |
Re: [ccp4bb] Hardware question
Hi Ed, I have four of those http://www.newegg.com/Product/Product.aspx?Item=N82E16822136514 and would now buy these http://www.newegg.com/Product/Product.aspx?Item=N82E16822136764 DELLete it, I mean the quote you have and shop somewhere else. Jürgen - Jürgen Bosch Johns Hopkins Bloomberg School of Public Health Department of Biochemistry Molecular Biology Johns Hopkins Malaria Research Institute 615 North Wolfe Street, W8708 Baltimore, MD 21205 Phone: +1-410-614-4742 Lab: +1-410-614-4894 Fax: +1-410-955-3655 http://web.mac.com/bosch_lab/ On Oct 26, 2010, at 9:52 PM, Edward A. Berry wrote: Another question about computer hardware- If I configure a computer at the Dell site, it costs about $700 to add a 2TB SATA drive. On amazon.com or Staples or such, a 2TB drive costs ~$110. to $200 depending on brand. Are the Dell-installed drives much faster, or more reliable, or have a better warranty? After all, RAID is supposed to stand for redundant array of inexpensive disks, and we could afford a lot more redundancy at the Amazon.com price. And, are there any brands or models that should be avoided due to known reliability issues? Thanks, eab
[ccp4bb] Rules of thumb (was diverging Rcryst and Rfree)
Dear Anthony, That is an excellent question! I believe there are quite a lot of 'rules of thumb' going around. Some of them seem to lead to very dogmatic thinking and have caused (refereeing) trouble for good structures and lack of trouble for bad structures. A lot of them were discussed at the CCP4BB so it may be nice to try to list them all. Rule 1: If Rwork 20%, you are done. Rule 2: If R-free - Rwork 5%, your structure is wrong. Rule 3: At resolution X, the bond length rmsd should be than Y (What is the rmsd thing people keep talking about?) Rule 4: If your resolution is lower than X, you should not use_anisotropic_Bs/riding_hydrogens Rule 5: You should not build waters/alternates at resolutions lower than X Rule 6: You should do the final refinement with ALL reflections Rule 7: No one cares about getting the carbohydrates right Obviously, this list is not complete. I may also have overstated some of the rules to get the discussion going. Any addidtions are welcome. Cheers, Robbie Joosten Netherlands Cancer Institute Apologies if I have missed a recent relevant thread, but are lists of rules of thumb for model building and refinement? Anthony Anthony Duff Telephone: 02 9717 3493 Mob: 043 189 1076
Re: [ccp4bb] Rules of thumb (was diverging Rcryst and Rfree) [SEC=UNCLASSIFIED]
Dear Robbie, Rules 3-5 I found could be approached using my previous rule of thumb. If anisotropy reduced Rfree by more than half the reduction in R, then I liked it. It helped me decide to introduce anisotropy for xenon, iodine and chlorine atoms (supported by non-spherical omit electron density) but not for light atoms. My rule told me to always add riding hydrogens, they typically reduced R and Rfree similarly. Anthony Anthony Duff Telephone: 02 9717 3493 Mob: 043 189 1076 -Original Message- From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Robbie Joosten Sent: Wednesday, 27 October 2010 4:29 PM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] Rules of thumb (was diverging Rcryst and Rfree) Dear Anthony, That is an excellent question! I believe there are quite a lot of 'rules of thumb' going around. Some of them seem to lead to very dogmatic thinking and have caused (refereeing) trouble for good structures and lack of trouble for bad structures. A lot of them were discussed at the CCP4BB so it may be nice to try to list them all. Rule 1: If Rwork 20%, you are done. Rule 2: If R-free - Rwork 5%, your structure is wrong. Rule 3: At resolution X, the bond length rmsd should be than Y (What is the rmsd thing people keep talking about?) Rule 4: If your resolution is lower than X, you should not use_anisotropic_Bs/riding_hydrogens Rule 5: You should not build waters/alternates at resolutions lower than X Rule 6: You should do the final refinement with ALL reflections Rule 7: No one cares about getting the carbohydrates right Obviously, this list is not complete. I may also have overstated some of the rules to get the discussion going. Any addidtions are welcome. Cheers, Robbie Joosten Netherlands Cancer Institute Apologies if I have missed a recent relevant thread, but are lists of rules of thumb for model building and refinement? Anthony Anthony Duff Telephone: 02 9717 3493 Mob: 043 189 1076