Re: [ccp4bb] How many is too many free reflections?
Dear Frank, I was going to reply to Ian's last comment last night, but got distracted. This last paragraph of Ian's message does sound rather negative if detached from the context of the previous one, which was about non-isomorphism between fragment complexes and the apo being the rule rather than the exception. Ian uses the Crick-Magdoff definition of an acceptable level of non-isomorphism, which is quite a stringent one because its refers to a level that would invalidate isomorphism for experimental phasing purposes. A much greater level of non-isomorphism can be tolerated when it comes to solving a target-fragment complex starting from the apo structure, so the Crick-Magdoff criterion is not relevant here. Furthermore I think that Ian identifies perhaps too readily the effect of non-isomorphism in creating noise in the comparison of intensities and its effect on invalidating the working vs. free status of observations. I think, therefore, that Ian's claim that failing the Crick-Magdoff criterion for isomorphism results in scrambling the distinction between the working set and the free set is a very big overstatement. You describe as bookkeeping faff the procedures that Ian and I outlined to preserve the FreeR flags of the apo refinement, and ask for a paper. These matters are probably not glamorous enough to find their way into papers, and would best be discussed (or re-discussed) in a specialised BB like this one. If the shift from the question How many is too many to How the free set should be chosen that I tried to bring about yesterday results in a general sharing of evidence that otherwise gets set aside, I will be very happy. I would find it unwise to dismiss this question by expecting that there would be a mountain of published evidence if it was really important. Let us go ahead, then: could everyone who has evidence (rather than preconceptions) on this matter please come forward and share it? Answering this question is very important, even if the conclusion is that the faff is unimportant. With best wishes, Gerard. -- On Thu, Jun 04, 2015 at 10:43:15PM +0100, Frank von Delft wrote: I'm afraid Gerard an Ian between them have left me a bit confused with conflicting statements: On 04/06/2015 15:29, Gerard Bricogne wrote: snip In order to guard the detection of putative bound fragments against the evils of model bias, it is very important to ensure that the refinement of each complex against data collected on it does not treat as free any reflections that were part of the working set in the refinement of the apo structure. snip On 04/06/2015 17:34, Ian Tickle wrote: snip So I suspect that most of our efforts in maintaining common free R flags are for nothing; however it saves arguments with referees when it comes to publication! snip I also remember conversations and even BB threads that made me conclude that it did NOT matter to have the same Rfree set for independent datasets (e.g. different crystals). I confess I don't remember the arguments, only the relief at not having to bother with all the bookkeeping faff Gerard outlines and Ian describes. So: could someone explain in detail why this matters (or why not), and is there a URL to the evidence (paper or anything else) in either direction? (As far as I remember, the argument went that identical free sets were unnecessary even for exactly isomorphous crystals. Something like this: model bias is not a big deal when the model has largely converged, and that's what you have for molecular substitution (as Jim Pflugrath calls it). In addition, even a weakly binding fragment compounds produces intensity perturbations large enough to make model bias irrelevant.) phx
Re: [ccp4bb] How many is too many free reflections?
Dear Dusan, This is a nice paper and an interestingly different approach to avoiding bias and/or quantifying errors - and indeed there are all kinds of possibilities if you have a particular structure on which you are prepared to spend unlimited time and resources. The specific context in which Graeme's initial question led me to query instead who should set the FreeR flags, at what stage and on what basis? was that of the data analysis linked to high-throughput fragment screening, in which speed is of the essence at every step. Creating FreeR flags afresh for each target-fragment complex dataset without any reference to those used in the refinement of the apo structure is by no means an irrecoverable error, but it will take extra computing time to let the refinement of the complex adjust to a new free set, starting from a model refined with the ignored one. It is in order to avoid the need for that extra time, or for a recourse to various debiasing methods, that the book-keeping faff described yesterday has been introduced. Operating without it is perfectly feasible, it is just likely to not be optimally direct. I will probably bow out here, before someone asks How many [e-mails from me] is too many? :-) . With best wishes, Gerard. -- On Fri, Jun 05, 2015 at 09:14:18AM +0200, dusan turk wrote: Graeme, one more suggestion. You can avoid all the recipes by use all data for WORK set and 0 reflections for TEST set regardless of the amount of data by using the FREE KICK ML target. For explanation see our recent paper Praznikar, J. Turk, D. (2014) Free kick instead of cross-validation in maximum-likelihood refinement of macromolecular crystal structures. Acta Cryst. D70, 3124-3134. Link to the paper you can find at “http://www-bmb.ijs.si/doc/references.HTML” best, dusan On Jun 5, 2015, at 1:03 AM, CCP4BB automatic digest system lists...@jiscmail.ac.uk wrote: Date:Thu, 4 Jun 2015 08:30:57 + From:Graeme Winter graeme.win...@gmail.com Subject: Re: How many is too many free reflections? Hi Folks, Many thanks for all of your comments - in keeping with the spirit of the BB I have digested the responses below. Interestingly I suspect that the responses to this question indicate the very wide range of resolution limits of the data people work with! Best wishes Graeme === Proposal 1: 10% reflections, max 2000 Proposal 2: from wiki: http://strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/Test_set including Randy Read recipe: So here's the recipe I would use, for what it's worth: 1 reflections:set aside 10% 1-2 reflections: set aside 1000 reflections 2-4 reflections: set aside 5% 4 reflections:set aside 2000 reflections Proposal 3: 5% maximum 2-5k Proposal 4: 3% minimum 1000 Proposal 5: 5-10% of reflections, minimum 1000 Proposal 6: 50 reflections per bin in order to get reliable ML parameter estimation, ideally around 150 / bin. Proposal 7: If lots of reflections (i.e. 800K unique) around 1% selected - 5% would be 40k i.e. rather a lot. Referees question use of 5k reflections as test set. Comment 1 in response to this: Surely absolute # of test reflections is not relevant, percentage is. Approximate consensus (i.e. what I will look at doing in xia2) - probably follow Randy Read recipe from ccp4wiki as this seems to (probably) satisfy most of the criteria raised by everyone else. On Tue, Jun 2, 2015 at 11:26 AM Graeme Winter graeme.win...@gmail.com wrote: Hi Folks Had a vague comment handed my way that xia2 assigns too many free reflections - I have a feeling that by default it makes a free set of 5% which was OK back in the day (like I/sig(I) = 2 was OK) but maybe seems excessive now. This was particularly in the case of high resolution data where you have a lot of reflections, so 5% could be several thousand which would be more than you need to just check Rfree seems OK. Since I really don't know what is the right # reflections to assign to a free set thought I would ask here - what do you think? Essentially I need to assign a minimum %age or minimum # - the lower of the two presumably? Any comments welcome! Thanks best wishes Graeme Dr. Dusan Turk, Prof. Head of Structural Biology Group http://bio.ijs.si/sbl/ Head of Centre for Protein and Structure Production Centre of excellence for Integrated Approaches in Chemistry and Biology of Proteins, Scientific Director http://www.cipkebip.org/ Professor of Structural Biology at IPS Jozef Stefan e-mail: dusan.t...@ijs.si phone: +386 1 477 3857 Dept. of Biochem. Mol. Struct. Biol. fax: +386 1 477 3984 Jozef Stefan Institute
Re: [ccp4bb] How many is too many free reflections?
Graeme, one more suggestion. You can avoid all the recipes by use all data for WORK set and 0 reflections for TEST set regardless of the amount of data by using the FREE KICK ML target. For explanation see our recent paper Praznikar, J. Turk, D. (2014) Free kick instead of cross-validation in maximum-likelihood refinement of macromolecular crystal structures. Acta Cryst. D70, 3124-3134. Link to the paper you can find at “http://www-bmb.ijs.si/doc/references.HTML” best, dusan On Jun 5, 2015, at 1:03 AM, CCP4BB automatic digest system lists...@jiscmail.ac.uk wrote: Date:Thu, 4 Jun 2015 08:30:57 + From:Graeme Winter graeme.win...@gmail.com Subject: Re: How many is too many free reflections? Hi Folks, Many thanks for all of your comments - in keeping with the spirit of the BB I have digested the responses below. Interestingly I suspect that the responses to this question indicate the very wide range of resolution limits of the data people work with! Best wishes Graeme === Proposal 1: 10% reflections, max 2000 Proposal 2: from wiki: http://strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/Test_set including Randy Read recipe: So here's the recipe I would use, for what it's worth: 1 reflections:set aside 10% 1-2 reflections: set aside 1000 reflections 2-4 reflections: set aside 5% 4 reflections:set aside 2000 reflections Proposal 3: 5% maximum 2-5k Proposal 4: 3% minimum 1000 Proposal 5: 5-10% of reflections, minimum 1000 Proposal 6: 50 reflections per bin in order to get reliable ML parameter estimation, ideally around 150 / bin. Proposal 7: If lots of reflections (i.e. 800K unique) around 1% selected - 5% would be 40k i.e. rather a lot. Referees question use of 5k reflections as test set. Comment 1 in response to this: Surely absolute # of test reflections is not relevant, percentage is. Approximate consensus (i.e. what I will look at doing in xia2) - probably follow Randy Read recipe from ccp4wiki as this seems to (probably) satisfy most of the criteria raised by everyone else. On Tue, Jun 2, 2015 at 11:26 AM Graeme Winter graeme.win...@gmail.com wrote: Hi Folks Had a vague comment handed my way that xia2 assigns too many free reflections - I have a feeling that by default it makes a free set of 5% which was OK back in the day (like I/sig(I) = 2 was OK) but maybe seems excessive now. This was particularly in the case of high resolution data where you have a lot of reflections, so 5% could be several thousand which would be more than you need to just check Rfree seems OK. Since I really don't know what is the right # reflections to assign to a free set thought I would ask here - what do you think? Essentially I need to assign a minimum %age or minimum # - the lower of the two presumably? Any comments welcome! Thanks best wishes Graeme Dr. Dusan Turk, Prof. Head of Structural Biology Group http://bio.ijs.si/sbl/ Head of Centre for Protein and Structure Production Centre of excellence for Integrated Approaches in Chemistry and Biology of Proteins, Scientific Director http://www.cipkebip.org/ Professor of Structural Biology at IPS Jozef Stefan e-mail: dusan.t...@ijs.si phone: +386 1 477 3857 Dept. of Biochem. Mol. Struct. Biol. fax: +386 1 477 3984 Jozef Stefan Institute Jamova 39, 1 000 Ljubljana,Slovenia Skype: dusan.turk (voice over internet: www.skype.com
Re: [ccp4bb] New ligand 3-letter code
I use any 3 letter/number code that i want. If you read the corresponding cif file into coot it is used in preference to any in the library. The PDB deposition team will assign a code if it is a new ligand to the database. Could you relay this to original poster? Thanks Jim Brannigan On 5 June 2015 at 14:58, Eleanor Dodson eleanor.dod...@york.ac.uk wrote: OK - thank you. How are things? E -- Forwarded message -- From: Jim Brannigan jim.branni...@york.ac.uk Date: 5 June 2015 at 14:39 Subject: Re: New ligand 3-letter code To: Eleanor Dodson eleanor.dod...@york.ac.uk Hi Eleanor I use any 3 letter/number code that i want. If you read the corresponding cif file into coot it is used in preference to any in the library. The PDB deposition team will assign a code if it is a new ligand to the database. Could you relay this to original poster? Thanks Jim Brannigan On 5 June 2015 at 11:28, Eleanor Dodson eleanor.dod...@york.ac.uk wrote: I use your method - trial error.. It would be nice if at least there was a list somewhere of unassigned codes! On 5 June 2015 at 09:16, Lau Sze Yi (SIgN) lau_sze...@immunol.a-star.edu.sg wrote: Hi, What is the proper way of generating 3-letter code for a new ligand? As of now, I insert my ligand in Coot using smiles string and for the 3-letter code I picked a non-existent code by trial and error (not very efficient). A cif file with corresponding name which I generated using Phenix was imported into Coot. I am sure there is a proper way of doing this. Appreciate your feedback. Regards, Sze Yi
Re: [ccp4bb] PyMOL v. Coot map 'level'
Hi Emilia and Steven, (re-posting after accidentally replying to the coot mailing list) After off-list discussion with Steven, I updated: http://pymolwiki.org/index.php/Normalize_ccp4_maps If the goal is to match the display in Coot, this is what I would do: # load map into PyMOL but don't normalize set normalize_ccp4_maps, off load yourmap.ccp4 load yourpdb.pdb # create a mesh which matches Coot's level = 0.3462e/A^3 ( 1.00rmsd) isomesh mesh, yourmap, 0.3462, (yourpdb) PyMOL extends the map based on the symmetry information from the selection in the 4th argument. No need to create an extended map with MAPMASK as long as yourpdb.pdb has symmetry information. Same is true if the map came from an MTZ file. I also updated http://pymolwiki.org/index.php/Display_CCP4_Maps and changed cover 'all atoms in PDB file' to cover 'asymmetric unit'. That way PyMOL's normalization should be identical to Coot's. Regarding the question What does PyMOL's 1.0 mean in electrons/A^3?: After normalization (with normalize_ccp4_maps=on) PyMOL doesn't know about the original values anymore. I assume Coot takes the original values from the file as e/A^3, so if you don't normalize in PyMOL, you'll get e/A^3. Hope that helps. Cheers, Thomas On 05 Jun 2015, at 01:36, Emilia C. Arturo (Emily) ec...@drexel.edu wrote: Thomas, I tried to figure out the PyMOL vs. Coot normalization discrepancy a while ago. As far as I remember, PyMOL normalizes on the raw data array, while Coot normalizes across the unit cell. So if the data doesn't exactly cover the cell, the results might be different. I posted the same question to the Coot mailing list (the thread can be found here: https://goo.gl/YjVtTu) , and got the following reply from Paul Emsley; I highlight the questions that I think you could best answer, with '***': [ ...] I suspect that the issue is related to different answers to the rmsd of what? In Coot, we use all the grid points in the asymmetric unit - other programs make a selection of grid points around the protein (and therefore have less solvent). More solvent means lower rmsd. If one then contours in n-rmsd levels, then absolute level used in Coot will be lower - and thus seem to be noisier (perhaps). I suppose that if you want comparable levels from the same map/mtz file then you should use absolute levels, not rmsd. ***What does PyMOL's 1.0 mean in electrons/A^3?*** Regards, Paul. Regards, Emily. On 01 Jun 2015, at 11:37, Emilia C. Arturo (Emily) ec...@drexel.edu wrote: One cannot understand what is going on without knowing how this map was calculated. Maps calculated by the Electron Density Server have density in units of electron/A^3 if I recall, or at least its best effort to do so. This is what I was looking for! (i.e. what the units are) Thanks. :-) Yes, I'd downloaded the 2mFo-DFc map from the EDS, and got the same Coot v. PyMOL discrepancy whether or not I turned off the PyMOL map normalization feature. If you load the same map into Pymol and ask it to normalize the density values you should set your contour level to Coot's rmsd level. If you don't normalize you should use Coot's e/A^3 level. It is quite possible that they could differ by a factor of two. This was exactly the case. The map e/A^3 level (not the rmsd level) in Coot matched very well, visually, the map 'level' in PyMOL; they were roughly off by a factor of 2. I did end up also generating a 2mFo-DFc map using phenix, which fetched the structure factors of the model in which I was interested. The result was the same (i.e. PyMOL 'level' = Coot e/A^3 level ~ = 1/2 Coot's rmsd level) whether I used the CCP4 map downloaded from the EDS, or generated from the structure factors with phenix. Thanks All. Emily. Dale Tronrud On 5/29/2015 1:15 PM, Emilia C. Arturo (Emily) wrote: Hello. I am struggling with an old question--old because I've found several discussions and wiki bits on this topic, e.g. on the PyMOL mailing list (http://sourceforge.net/p/pymol/mailman/message/26496806/ and http://www.pymolwiki.org/index.php/Display_CCP4_Maps), but the suggestions about how to fix the problem are not working for me, and I cannot figure out why. Perhaps someone here can help: I'd like to display (for beauty's sake) a selection of a model with the map about this selection. I've fetched the model from the PDB, downloaded its 2mFo-DFc CCP4 map, loaded both the map and model into both PyMOL (student version) and Coot (0.8.2-pre EL (revision 5592)), and decided that I would use PyMOL to make the figure. I notice, though, that the map 'level' in PyMOL is not equivalent to the rmsd level in Coot, even when I set normalization off in PyMOL. I expected that a 1.0 rmsd level in Coot would look identical to a 1.0 level in PyMOL, but it does not; rather, a 1.0 rmsd level in
[ccp4bb] CSHL X-ray Methods in Structural Biology Course Oct 12-27, 2015: Application deadline June 15th
The June 15th deadline for applications to the CSHL X-ray Methods in Structural Biology Course to be held later this year, October 12 through October 27, 2015 is rapidly approaching. The official course announcement is here: https://meetings.cshl.edu/courses.aspx?course=C-CRYSyear=15 so please pass this on to folks who might be interested and who would benefit. I think people will agree that this course is an outstanding place to learn both the theoretical and practical aspects of Macromolecular Crystallography because of the extensive lectures from world-renowned teachers and the hands-on experiments. This year's course will see the return of the long-time instruction team of Alex McPherson, Gary Gilliland, Bill Furey and myself along with many talented experts (see the course flyer linked above for more name dropping) to help us give the participants an experience in Macromolecular Crystallography learning that cannot be found anywhere else. (The student:teacher ratio ends up to be about 1:1). We expect to have the participants crystallize several proteins and determine their structures all in about two weeks. They will also become well-versed in the theory of X-diffraction and crystal structure determination while having lots of fun, but not much sleep. The course is limited to 16 participants due to the very hands-on nature of the experiments and the intimate seminar room and laboratory settings. Please check the above web link for more details. In particular, please note the information about fellowships, scholarships, and stipends that are available. This course is supported with funds provided by the National Institute of General Medical Sciences for which we are grateful. If anyone has any questions, please send me e-mail, I will be happy to answer all queries. Thanks, Jim