Re: [ccp4bb] disordered helix
Sometimes a floppy bit of a protein is even more floppy in a particular crystal form. Your maps do not appear to support your model of a helix in this location. I would not build it unless maps based on later refinement show something reasonable in the omit map. (Of course if you leave out the helix, all your maps will be omit maps.) It is quite common to submit models to the PDB that do not contain all of the amino acids expected based on the sequence. If you can't see where the chain goes you certainly can't be expected to build it. Dale Tronrud On 05/13/2013 04:23 AM, atul kumar wrote: I have attached the map and omit map(after deleting helix) images. 2Fo-Fc(1 sigma) Fo-Fc(3sigma) On 5/13/13, Eleanor Dodson eleanor.dod...@york.ac.uk wrote: Hard to say without seeing the maps and experimenting. My first check would be to set the NTD occupancies to 0.0 - refine the CTD alone, then look at the maps in COOT. Or maybe let an automatic modelling building program such as Buccaneer try to rebuild the NTD section, with starting phases from the CTD. Eleanor On 13 May 2013 09:04, atul kumar atulsingh21...@gmail.com wrote: Dear all, I have solved the structure of my protein by molecular replacement at 2.9A, with Rfactor and Rfree 18 and 22 respectively. Overall everything seems fine, its a two domain protein NTD and CTD, the NTD have high average B factor compared to CTD. A helix of NTD seems to be disordered, I tried different geometric weights but the refined structure does not seem to follow proper geometry for this helix. The B-factor of this helix is very high compared to overall B factor for NTD and omit map shows only some partial density in this region( off course not conclusive). All the homologous structure have helix in this region although with high B-factor. Should I submit the current pdb or need more refinement? thanks and regards Atul Kumar
Re: [ccp4bb] ctruncate bug?
If you are refining against F's you have to find some way to avoid calculating the square root of a negative number. That is why people have historically rejected negative I's and why Truncate and cTruncate were invented. When refining against I, the calculation of (Iobs - Icalc)^2 couldn't care less if Iobs happens to be negative. As for why people still refine against F... When I was distributing a refinement package it could refine against I but no one wanted to do that. The R values ended up higher, but they were looking at R values calculated from F's. Of course the F based R values are lower when you refine against F's, that means nothing. If we could get the PDB to report both the F and I based R values for all models maybe we could get a start toward moving to intensity refinement. Dale Tronrud On 06/20/2013 09:06 AM, Douglas Theobald wrote: Just trying to understand the basic issues here. How could refining directly against intensities solve the fundamental problem of negative intensity values? On Jun 20, 2013, at 11:34 AM, Bernhard Rupp hofkristall...@gmail.com wrote: As a maybe better alternative, we should (once again) consider to refine against intensities (and I guess George Sheldrick would agree here). I have a simple question - what exactly, short of some sort of historic inertia (or memory lapse), is the reason NOT to refine against intensities? Best, BR
Re: [ccp4bb] Alternating positive and negative density
Based on eye-balling your map it looks to me that your grid spacing is about 0.5 A. The wavelength of your ripple is 4 grid spacings, and the ripple is right along the x axis. My guess is that you have a rogue reflection with index of h00 where h is about 2 A resolution. How you are getting this in multiple data sets is a mystery to me, but I would concentrate on finding that reflection and figuring out why it is anomalously large. Start with the Fourier coefficients that went into calculating this map to find the exact value of h causing the problem and then track that reflection back through your Fcalc's and Fobs's. Dale Tronrud On 06/23/2013 09:57 PM, Peter Randolph wrote: Short version: Hi, I'm working on what should be a straightforward molecular replacement problem (already solved protein in new space group), but my Fo-Fc map contains a peculiar series of alternating positive and negative peaks of difference density. I'm wondering if anyone has anyone seen this before? Sample images are attached and more background is below. More background: I had initially solved an /apo/ structure of my protein (from previous diffraction data in another crystal form), and more recently collected diffraction data for crystals of the protein co-crystallized with potential binding partners (small RNAs). All the datasets I've processed so far have the same spacegroup (P2(1)2(1)2(1)) and cell dimensions as the apo structure. I have tried two general approaches, both with the same initial steps of indexing / integrating / scaling in XDS, converting to MTZ format without R-free flags, then importing R-free-flags from the (previous) apo structure's MTZ. I would then run phenix.refine for initial rigid-body refinement using the apo-model and the new mtz to see if there were signs of any new positive density corresponding to bound ligands. While the 2Fo-Fc map fits the apo protein 3D model perfectly, the Fo-Fc map shows bands of alternating positive and negative density running throughout the structure. What's odd is that these 'bands' appear to be systematic rather than random (please see attached image), and aren't located anywhere that a binding partner could bind, leading me to suspect they may be artefactual (these bands actually run through the body of the protein, so one possibility is that the b-strands are off-register by a multiple of a peptide unit?). If I use the same mtz file and structural model, and instead do molecular replacement with phaser, I see the same issue. I've tried this workflow with a couple of datasets and using P222 as well as P2(1)2(1)2(1), and each time I see the same issue of spurious(?) bands. Any help or advice would be much appreciated, especially if anyone has seen anything like this? Thanks a lot, Peter Randolph -- Peter Randolph PhD Candidate Mura Laboratory Department of Chemistry University of Virginia (434)924.7979
Re: [ccp4bb] modified amino acids in the PDB
On 07/09/2013 07:23 AM, Mark J van Raaij wrote: - really the only complicated case would be where a group is covalently linked to more than one amino acid, wouldn't it? Any case where only one covalent link with an is present could (should?) be treated as a special amino acid, i.e. like selenomethionine. - groups without any covalent links to the protein are better kept separate I would think (but I guess this is stating the obvious). Let's consider one of your simple cases. Imagine a heterodimer (chains alpha and beta) with a single disulfide link between the peptides. Do you prefer to have an alpha chain with one residue being a CYS with an entire beta chain attached as a single residue, or a beta chain with one residue being a CYS with an entire alpha chain as a single residue. Either way you are going to have trouble fitting into the PDB format's five columns all the unique atom names for the bloated residue. ;-) The problem with this kind of topic is that the molecule is what it is and it doesn't care how we describe it. People break the molecule up into parts to help in their understanding of it and different people have different needs. Do you prefer to think of rhodopsin as containing a LYS residue linked by a Schiff base linkage to retinal or as having a single, monster, residue with a name you have probably never heard of? There is value in both representations depending on the context. A really nice feature of the geometry definitions that have come out of the Refmac community is that one can define the monster residue in terms of the LYS-(Schiff linkage)-retinal breakdown. What hasn't been done is to create the software that will convert a model from one form to the other, as the user needs. I think this is the direction we should go. Instead of arguing if, for example, the B factor column should contain the total isotropic B factor or the residual B factor unfit by the overarching TLS model of motion, the file supplied by the wwPDB should be a complete, unambiguous, representation of the model and software should exist that displays for the user whatever representation they want. Then the representation stored in the master repository would not be very important. Dale Tronrud Mark J van Raaij Lab 20B Dpto de Estructura de Macromoleculas Centro Nacional de Biotecnologia - CSIC c/Darwin 3 E-28049 Madrid, Spain tel. (+34) 91 585 4616 http://www.cnb.csic.es/~mjvanraaij On 9 Jul 2013, at 12:49, Frances C. Bernstein wrote: In trying to formulate a suggested policy on het groups versus modified side chains one needs to think about the various cases that have arisen. Perhaps the earliest one I can think of is a heme group. One could view it as a very large decoration on a side chain but, as everyone knows, one heme group makes four links to residues. In the early days of the PDB we decided that heme obviously had to be represented as a separate group. I would also point out that nobody would seriously suggest that selenomethionine should be represented as a methionine with a missing sulfur and a selenium het group bound to it. Unfortunately all the cases that fall between selenomethionine and heme are more difficult. Perhaps the best that one must hope for is that whichever representation is chosen for a particular case, it be consistent across all entries. Frances P.S. One can also have similar discussions about the representation of microheterogeneity and of sugar chains but we should leave those for another day. = Bernstein + Sons * * Information Systems Consultants 5 Brewster Lane, Bellport, NY 11713-2803 * * *** *Frances C. Bernstein * *** f...@bernstein-plus-sons.com *** * * *** 1-631-286-1339FAX: 1-631-286-1999 = On Tue, 9 Jul 2013, MARTYN SYMMONS wrote: Hi Clemens I guess the reason you say 'arbitrary' is because there is no explanation of this rule decision? It would be nice if some rationalization was available alongside the values given. So a sentence along the lines of 'we set the number owing to the following considerations' ? However a further layer of variation is that the rule does not seem to be consistently applied - just browsing CYS modifications: iodoacetamide treatment gives a CYS with only 4 additional atoms but it is split off as ACM. However some ligands much larger than 10 residues have been kept with the cysteine ( for example CY7 in 2jiv and NPH in 1a18. My betting is that it depends on whether something has been seen 'going solo' as a non-covalent ligand previously so that it pops up as an atomic structural match with a pre-defined three-letter code. This would explain for example the ACM case which you might expect to occur in a modified Cys. But it has also been observed as a non
Re: [ccp4bb] Dose anyone see this ligand before?
Do you have any reason to expect either of these molecules would be in your crystal? The model you build has to fit the density, be consistent with the surrounding environment (which you haven't shared with us) and you have to have some story for how that molecule got in your crystal. Personally I would steer away from industrial compounds and focus more on biological molecules and common additives used in purification and crystallization. The environment is critical to identifying this molecule. What hydrogen bonds does this molecule make? What charges are near by? Certainly the presence or absence of hydrogen bonds will distinguish between these two compounds before you go to the trouble to build a model of either. Dale Tronrud On 7/17/2013 6:35 AM, Wei Feng wrote: Dear all, Thank you for your advices. I had tried to use MPD and pyrophosphate etc to fix the density map but all of them were too small. We guess that the molecular formula should be C8H18O2. So we search this formula in google and find two candidate molecules 1: http://flyingexport.en.ecplaza.net/dhad-99-5--137042-689140.html 2: http://en.m.wikipedia.org/wiki/Di-tert-butyl_peroxide Could you tell me how to get the coordinate of these molecules? Thank you for your time! Wei
Re: [ccp4bb] A case of perfect pseudomerehedral twinning?
Since Phil is no doubt in bed, I'll answer the easier part. Your second matrix is nearly the equivalent position (x,-y,-z). This is a two-fold rotation about the x axis. You also have a translation of about 31 A along x so if your A cell edge is about 62 A you have a 2_1 screw. Dale Tronrud On 10/15/2013 12:29 PM, Yarrow Madrona wrote: Hi Phil, Thanks for your help. I ran a Find-NCS routine in the phenix package. It came up with what I pasted below: I am assuming the the first rotation matrix is just the identity. I need to read more to understand rotation matrices but I think the second one should have only a single -1 to account for a possible perfect 2(1) screw axis between the two subunits in the P21 asymetric unit. I am not sure why there are two -1 values. I may be way off in my interpretation in which case I will go read some more. I will also try what you suggested. Thanks. -Yarrow NCS operator using PDB #1 new_operator rota_matrix1.0.0. rota_matrix0.1.0. rota_matrix0.0.1. tran_orth 0.0.0. center_orth 17.72011.4604 71.4860 RMSD = 0 (Is this the identity?) #2 new_operator rota_matrix0.9994 -0.02590.0250 rota_matrix -0.0260 -0.99970.0018 rota_matrix0.0249 -0.0025 -0.9997 tran_orth -30.8649 -11.9694 166.9271 Hello Yarrow, Since you have a refined molecular replacement solution I recommend using that rather than global intensity statistics. Obviously if you solve in P21 and it's really P212121 you should have twice the number of molecules in the asymmetric unit and one half of the P21 asymmetric unit should be identical to the other half. Since you've got decent resolution I think you can determine the real situation for yourself: one approach would be to test to see if you can symmetrize the P21 asymmetric unit so that the two halves are identical. You could do this via stiff NCS restraints (cartesian would be better than dihedral). After all the relative XYZs and even B-factors would be more or less identical if you've rescaled a P212121 crystal form in P21. If something violates the NCS than it can't really be P212121. Alternatively you can look for clear/obvious symmetry breaking between the two halves: different side-chain rotamers for surface side-chains for example. If you've got an ordered, systematic, difference in electron density between the two halves of the asymmetric unit in P21 then that's a basis for describing it as P21 rather than P212121. However if the two halves look nearly identical, down to equivalent water molecule densities, then you've got no experimental evidence that P21 with 2x molecules generates a better model than P212121 than 1x molecules. An averaging program would show very high correlation between the two halves of the P21 asymmetric unit if it was really P212121 and you could overlap the maps corresponding to the different monomers using those programs. Phil Jeffrey Princeton
Re: [ccp4bb] A case of perfect pseudomerehedral twinning?
R factors cannot be used to detect twining. The traditional R is calculated using structure factors (roughly the square root of intensity) but you can't do that calculation in the presence of twining because each structure factor contributes to two intensities. The formula for the R in the presence of twining is very different than that of the formula used in its absence. It would have been better to have used a different name and prevent the confusion. If you are worried about your systematic absences you need to figure out which images they were recorded on and judge the spot for yourself. Everything you have said points to your crystal being P212121 (or very nearly P212121). Dale Tronrud On 10/15/2013 02:31 PM, Yarrow Madrona wrote: Thank you Dale, I will hit-the-books to better the rotation matrices. I am concluding from all of this that the space group is indeed P212121. So I still wonder why I have some outliers in the intensity stats for the two additional screw axis and why R and Rfree both drop by 5% when I apply a twin law to refinement in P21. Thanks for your help. -Yarrow Since Phil is no doubt in bed, I'll answer the easier part. Your second matrix is nearly the equivalent position (x,-y,-z). This is a two-fold rotation about the x axis. You also have a translation of about 31 A along x so if your A cell edge is about 62 A you have a 2_1 screw. Dale Tronrud On 10/15/2013 12:29 PM, Yarrow Madrona wrote: Hi Phil, Thanks for your help. I ran a Find-NCS routine in the phenix package. It came up with what I pasted below: I am assuming the the first rotation matrix is just the identity. I need to read more to understand rotation matrices but I think the second one should have only a single -1 to account for a possible perfect 2(1) screw axis between the two subunits in the P21 asymetric unit. I am not sure why there are two -1 values. I may be way off in my interpretation in which case I will go read some more. I will also try what you suggested. Thanks. -Yarrow NCS operator using PDB #1 new_operator rota_matrix1.0.0. rota_matrix0.1.0. rota_matrix0.0.1. tran_orth 0.0.0. center_orth 17.72011.4604 71.4860 RMSD = 0 (Is this the identity?) #2 new_operator rota_matrix0.9994 -0.02590.0250 rota_matrix -0.0260 -0.99970.0018 rota_matrix0.0249 -0.0025 -0.9997 tran_orth -30.8649 -11.9694 166.9271 Hello Yarrow, Since you have a refined molecular replacement solution I recommend using that rather than global intensity statistics. Obviously if you solve in P21 and it's really P212121 you should have twice the number of molecules in the asymmetric unit and one half of the P21 asymmetric unit should be identical to the other half. Since you've got decent resolution I think you can determine the real situation for yourself: one approach would be to test to see if you can symmetrize the P21 asymmetric unit so that the two halves are identical. You could do this via stiff NCS restraints (cartesian would be better than dihedral). After all the relative XYZs and even B-factors would be more or less identical if you've rescaled a P212121 crystal form in P21. If something violates the NCS than it can't really be P212121. Alternatively you can look for clear/obvious symmetry breaking between the two halves: different side-chain rotamers for surface side-chains for example. If you've got an ordered, systematic, difference in electron density between the two halves of the asymmetric unit in P21 then that's a basis for describing it as P21 rather than P212121. However if the two halves look nearly identical, down to equivalent water molecule densities, then you've got no experimental evidence that P21 with 2x molecules generates a better model than P212121 than 1x molecules. An averaging program would show very high correlation between the two halves of the P21 asymmetric unit if it was really P212121 and you could overlap the maps corresponding to the different monomers using those programs. Phil Jeffrey Princeton
Re: [ccp4bb] Problematic PDBs
I would start with 1E4M (residue 361 of chain M) and 1QW9 (170 of chain B). First show the model and then reveal the electron density. This promotes a healthy skepticism of PDB models and enforces the importance of always looking at a model in the context of the map. For model building I would recommend 2PWJ and 3SQK. In 3SQK the linker to the His tag in chain B was built using the wrong sequence. It is fairly easy to build a sequence into the density and then recognize what the linker actually is. In 2PWJ the wrong sequence was used up to residue 31. I've never been able to figure out how this error came to be. Some horrible, horrible mistake was made when sequencing the gene and the person who built the model believed the sequence more than the density. The model building required to correct 2PWJ is more challenging since a number of short cuts were made cutting out loops. If I recall, my model has about 10 more amino acids than the PDB model. In all of these cases the majority of the resides in each model are fine. 3SQK has been replaced with a corrected model (4F4J). Dale Tronrud On 10/17/2013 06:51 AM, Lucas wrote: Dear all, I've been lecturing in a structural bioinformatics course where graduate students (always consisting of people without crystallography background to that point) are expected to understand the basics on how x-ray structures are obtained, so that they know what they are using in their bioinformatics projects. Practices include letting them manually build a segment from an excellent map and also using Coot to check problems in not so good structures. I wonder if there's a list of problematic structures somewhere that I could use for that practice? Apart from a few ones I'm aware of because of (bad) publicity, what I usually do is an advanced search on PDB for entries with poor resolution and bound ligands, then checking then manually, hopefully finding some examples of creative map interpretation. But it would be nice to have specific examples for each thing that can go wrong in a PDB construction. Best regards, Lucas
Re: [ccp4bb] AW: [ccp4bb] Fwd: undefined edensity blob at glutamine sidechain
It doesn't look like you left a CN on your gold atom. These things are pretty much covalently bound. Dale Tronrud On 12/10/2013 08:13 AM, PriyankMaindola wrote: dear all: I. i am not able to fit trp, since 1. trp doesnt fit well 2. positive density comes in fo-fc map after refinement 3. this is a soaked crystal str with heavy atom soln, the native one has perfect density for gln, so mutation to trp is unlikely II. on increasing contour level, 2fo-fc map fades above 4.5 rmsd if I do not put anything in the blob and see the refined map III. placing Au+ and refining (occ 1 ; B-fac 63 A2) gives figure 4.png (attached below) however anomalous diff map does give positive density but not a clear round, spherical one. On 10 December 2013 19:29, herman.schreu...@sanofi.com mailto:herman.schreu...@sanofi.com wrote: My first guess was also a metal ion. However, a tryptophan as Fred suggested cannot be ruled out. A simple preliminary test is to scroll up the contouring level and look when the contours of the blob disappear. If the contours quickly disappear, you have something disordered or light. If the contours of the blob disappear at the same moment or later as e.g. sulfur atoms, you have something heavy like a metal ion. You still have to fit all possibilities and see what refines best. __ __ Best, Herman __ __ *Von:*CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK] *Im Auftrag von *Matthias Zebisch *Gesendet:* Dienstag, 10. Dezember 2013 14:21 *An:* CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK *Betreff:* Re: [ccp4bb] Fwd: undefined edensity blob at glutamine sidechain __ __ check an anomalous map! The obvious thing to do to rule out gold binding - Dr. Matthias Zebisch Division of Structural Biology, Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK __ __ Phone (+44) 1865 287549; Fax (+44) 1865 287547 Email matth...@strubi.ox.ac.uk mailto:matth...@strubi.ox.ac.uk Website http://www.strubi.ox.ac.uk - On 12/10/2013 12:44 PM, PriyankMaindola wrote: dear members __ __ i am trying to solve this crystal structure but I am puzzled with an undefined blob that appeared at a glutamine residue after refinement. I have attached pics of that below. Is it a covalent modification of acid-amide side chain.. ... as there is no charged environment around and density seems continuous. __ __ please suggest __ __ following reagents were encountered by protein during purification, crystallization and soaking : phenyl methyl sulfonyl fluoride benzamidine tris dtt (could it be cyclized dtt?) __ __ k[au(cn)2] acidic pH isopropanol citrate sulfate phosfate K+, Na+, Cl- map contour: 2fo-fc: 1rmsd fo-fc (green): 3 rmsd __ __ __ __ -- *Priyank* __ __ -- *Priyank Maindola*
Re: [ccp4bb] resubmission of pdb
I would write back to the annotator who send the processed files to you and ask if you can restart the deposition. The worst they can say is no and you're back to ADIT. On the other hand they will probably be as happy as you to save the work that has already been done. Dale Tronrud On 01/31/2014 01:04 PM, Faisal Tarique wrote: Dear all Dear Dr. PDB, Some time back i had submitted a coordinate in PDB but because of unacceptance of the manuscript we had to retract the submission. During this procedure i got few zipped file from the annotator such as 1. rcsb0.cif-public.gz, 2. rcsb0.pdb.gz and 3. rcsb0-sf.cif.gz..Now i want to submit the same ..My question is what is the best way to do it again..?? Should we start from the beginning through ADIT Deposition tool and resubmit it with a new PDB id or there is some way to submit again those zip files which the annotator sent us after retraction..May you please suggest what could be the easiest way to submit our structure to PDB without much efforts. -- Regards Faisal School of Life Sciences JNU
[ccp4bb] Meeting Announcement: Northwest Crystallography Workshop
(Pacific) Northwest Crystallography Workshop http://oregonstate.edu/conferences/event/nwcw2014/ June 20-22, 2014 Registration is now open for this year's edition of the Northwest Crystallography Workshop. It is being hosted at Oregon State University in Corvallis in the heart of the Willamette Valley surrounded by wine country, wildlife refuges, and with both the Cascade Mountains and the Pacific Ocean within easy driving distance. It can be easily accessed from either the Portland or Eugene airports. This biennial meeting has been held at various locations in the Pacific Northwest since 1981. It has always proven to be a great venue to meet other researchers in the region who are interested both in using macromolecular crystallography to solve structures and in developing and enhancing methods. The workshop part of the name will be taken seriously. We will have talks and posters with priority for speaking slots given to students and post-docs who focus on methodologies or interesting structure determination stories, and/or how structural observations provide insight into function and biology. There will be a reception on Friday evening. On Saturday there will be talks during the day and a banquet followed by a keynote address in the evening. Talks will continue on Sunday morning with the workshop wrapping-up at noon. A light lunch will be served on Saturday and a boxed lunch will be available on Sunday. Register today and get your abstracts submitted anytime between now and the April 30 abstract deadline. We've tried to keep registration costs low and early registration (through April 30) is $75 for students and $100 for others. Reasonably priced on-campus dormitory housing is available and must be arranged at the time of registration. Special rates at two local hotels have been arranged for those who want to book their own lodgings. We look forward to a great meeting and celebration of the International Year of Crystallography. Dale and Andy Dale E. Tronrud and P. Andrew Karplus Department of Biochemistry Biophysics 2011 ALS Bldg Oregon State University Corvallis, OR 97331 USA
Re: [ccp4bb] Can not see density map when I turn off normalization in PYMOL
When you don't normalize the map you have to specify your contour level in whatever units the map came in. Your output says the stdev is 0.075 so I guess you need to contour at 0.225 to see the equivalent image. Dale Tronrud P.S. I feel compelled to note that what the program is reporting as the standard deviation is really the root mean square deviation from zero. The standard deviation of a map is a much more subtle quantity as discussed recently in PNAS. On 02/19/2014 09:30 AM, hongshi WANG wrote: Hello there, I am making a fo-fc map for one ligand using pymol. I strictly followed the pymol wiki protocol (Display CCP4 Maps). Finally, I can get the ligand map using command: isomesh fo-fc_ligand, omitmap, 3, ligand, carve=2. However, the problem is the map I got from pymol is smaller than the one I can see in coot at the same contour level (3.0). So I gave a second trial based on the assumption that it may be caused by the mis-normalization. I input the command: “unset normalize_ccp4_maps” to stop PyMOL from normalizing a cpp4 map. After that I loaded my ccp4 map file and tried to do the same things as what I did for the first time. But I could not see any mesh net (density map) shown up. I check the command window. /PyMOLunset normalize_ccp4_maps/ / Setting: normalize_ccp4_maps set to off./ / ObjectMapCCP4: Map Size 134 x 128 x 122/ / ObjectMapCCP4: Map will not be normalized./ / ObjectMapCCP4: Current mean = -0.66 and stdev = 0.074981./ / ObjectMap: Map read. Range: -0.511 to 0.616/ / Crystal: Unit Cell 200 300 100/ / Crystal: Alpha Beta Gamma90.000 100.354 90.000/ / Crystal: RealToFrac Matrix/ / Crystal:0.0060 -0.0.0011/ / Crystal:0.0.0045 -0./ / Crystal:0.0.0.0053/ / Crystal: FracToReal Matrix/ / Crystal: 2000. -34.5817/ / Crystal:0. 3000./ / Crystal:0.0. 100/ / Crystal: Unit Cell Volume 6993536./ / ExecutiveLoad: E:/ bdligand002.ccp4 loaded as bdligand002, through state 1./ / PyMOLisomesh fo-fc_ligand, bdligand002, 3, ligand, carve=2/ / Executive: object fo-fc_ligand created./ / Isomesh: created fo-fc_ligand, setting level to 2/ / ObjectMesh: updating fo-fc_ligand./ It seems like no error, but my ligand map, fo-fc_ligand has no density map shown up. I also tried to show the whole mesh at level 2.0 for bdligand002. I still could not see the density map. My pymol is version 1.3 in windows 8 operation system. Any help will be greatly appreciated! Thanks in advance hongshi
Re: [ccp4bb] minimum acceptable sigma level for very small ligand and more
Hi, First, there is nothing magical about contouring a map at 1 rms. The standard deviation of the electron density values really has no relationship to the rms of those values, and appears to generally be much smaller. This is discussed quite brilliantly in the recent paper http://www.ncbi.nlm.nih.gov/pubmed/24363322 If you have a ligand with low occupancy you expect you will have to dial down the contour level to see it. The question isn't how low can you go but does your model fit ALL the available data and is there any other model that will also fit those data. Even if a ligand has low occupancy it still must have good bond lengths and angles and must make reasonable interactions with the rest of the molecule. One of your observations is that full occupancy cracks the crystal. It would be good if your model explains this observation as well. If your ligand is present 60% of the time, what is there the other 40%? Usually when there is a partially occupied ligand there is water present the rest of the time. The apparent superposition of the ligand and the water will result in some density that is strong. Those strong bits will give clues about the minor conformation. The low occupancy water model must also make sense in terms of hydrogen bonds and bad contacts. Remember, if you are looking at lower than usual rms contours in the 2Fo-Fc style map you must evaluate your refined model by looking at lower contour levels in your Fo-Fc style map. You can't give your model a free ride by excusing a weak density map but then blowing off weak difference peaks. You must be very careful to consider alternative models and to accepts that sometimes you just can't figure these things out. Just because the density is weak does not mean that you can give it a pass for not fitting the model. The model has to fit the density, and fit it better than any other model. You must also make clear to your readers what the occupancy of your ligand is and the quality of the maps that lead you to this conclusion. Dale Tronrud P.S. I have had great experiences with the maps produced by Buster for looking at weak ligand density. I have also published a model with a 0.35 occupancy ligand although the resolution there was 1.3 A. On 03/19/2014 07:39 AM, Amit Kumar wrote: Hello, My protein is 26 kDa and the resolution of the data is 1.90 angs. My ligand is 174 Daltons. and it was soaked into the crystal. Ligand is colored and the crystal after soaking takes up intense color. However if we soak more than optimum, the color deepens in intensity but the crystal diffracts no more. So perhaps the ligand's occupancy can not be the 1.00. After model building I see ligand density, starting to appear at 0.7 sigma and clear at 0.5-0.6 sigma, close to the protein residue where it should bind. Occupancy is ~0.6 after the refinement and B factors for the atoms of the ligand range from 30-80. Questions I have (1) What is the acceptable sigma level for very small ligands for peer review/publication? (2) I did refinement by Refmac and by Phenix refine, separately. The map quality for the ligand is better after the refmac refinement than after the Phenix refinement. Why is such a difference and which one should I trust? I used mostly default parameters for both (Phenix and Refmac) before the refinement. Thanks for your time. Amit
[ccp4bb] Second announcement for the (Pacific) Northwest Crystallography Workshop
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Second announcement for the (Pacific) Northwest Crystallography Workshop http://oregonstate.edu/conferences/event/nwcw2014/ June 20-22, 2014 Registration is continuing for this year's edition of the Northwest Crystallography Workshop. It is being hosted at Oregon State University in Corvallis in the heart of the Willamette Valley surrounded by wine country, wildlife refuges, and with both the Cascade Mountains and the Pacific Ocean within easy driving distance. It can be easily accessed from either the Portland or Eugene airports. The workshop part of the name will be taken seriously. We will have talks and posters with priority for speaking slots given to students and post-docs who focus on methodologies or interesting structure determination stories, and/or how structural observations provide insight into function and biology. Oregon State is the home of the Ava Helen and Linus Pauling Papers, which is a fascinating collection that goes far beyond papers. We have arranged two tours of this collection for Friday afternoon before the workshop. If you can make it to Corvallis for either the 2 PM or 4 PM tour you will be amazed by this collection. Let us know if you plan to attend and we will reserve a spot. On May 1st the registration fee will increase by $25 from the current $75 for students and $100 for others. And get those abstracts in! We look forward to a great meeting and celebration of the International Year of Crystallography. Dale and Andy Dale E. Tronrud and P. Andrew Karplus Department of Biochemistry Biophysics 2011 ALS Bldg Oregon State University Corvallis, OR 97331 USA -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlNFojcACgkQU5C0gGfAG13evACfWzJo2D/RQEGT6xeYICJl/kTI 4eYAn0m4YyqkOajtKkuY6zojeic1+9Il =lYDp -END PGP SIGNATURE-
Re: [ccp4bb] crystallographic confusion
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I see no problem with saying that the model was refined against every spot on the detector that the data reduction program said was observed (and I realize there is argument about this) but declare that the resolution of the model is a number based on the traditional criteria. This solution allows for the best possible model to be constructed and the buyer is still allowed to make quality judgements the same way as always. Dale Tronrud On 4/18/2014 5:22 PM, Lavie, Arnon wrote: Dear Kay. Arguably, the resolution of a structure is the most important number to look at; it is definitely the first to be examined, and often the only one examined by non-structural biologists. Since this number conveys so much concerning the quality/reliability of the the structure, it is not surprising that we need to get this one parameter right. Let us examine a hypothetical situation, in which a data set at the 2.2-2.0 resolution shell has 20% completeness. Is this a 2.0 A resolution structure? While you make a sound argument that including that data may result in a better refined model (more observations, more restraints), I would not consider that model the same quality as one refined against a data set that has 90% completeness at that resolution shell. As I see it, there are two issues here: one, is whether to include such data in refinement? I am not sure if low completeness (especially if not random) can be detrimental to a correct model, but I will let other weigh in on that. The second question is where to declare the resolution limit of a particular data set? To my mind, here high completeness (the term high needs a precise definition) better describes the true resolution limit of the diffraction, and with this what I can conclude about the quality of the refined model. My two cents. Arnon Lavie On Fri, April 18, 2014 6:51 pm, Kay Diederichs wrote: Hi everybody, since we seem to have a little Easter discussion about crystallographic statistics anyway, I would like to bring up one more topic. A recent email sent to me said: Another referee complained that the completeness in that bin was too low at 85% - my answer was that I consider the referee's assertion as indicating a (unfortunately not untypical case of) severe statistical confusion. Actually, there is no reason at all to discard a resolution shell just because it is not complete, and what would be a cutoff, if there were one? What constitutes too low? The benefit of including also incomplete resolution shells is that every reflection constitutes a restraint in refinement (and thus reduces overfitting), and contributes its little bit of detail to the electron density map. Some people may be mis-lead by a wrong understanding of the cats and ducks examples by Kevin Cowtan: omitting further data from maps makes Fourier ripples/artifacts worse, not better. The unfortunate consequence of the referee's opinion (and its enforcement and implementation in papers) is that the structures that result from the enforced re-refinement against truncated data are _worse_ than the original data that included the incomplete resolution shells. So could we as a community please abandon this inappropriate and un-justified practice - of course after proper discussion here? Kay -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlNRz14ACgkQU5C0gGfAG138HwCfYbUXb5MgQvC/8iCftiuuP1pn H0AAn24ej2FSBxbNbndjnHoJ/xAKCitK =Xh7C -END PGP SIGNATURE-
[ccp4bb] NWCW 2014: Last day to register at early registration rates
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Northwest Crystallography Workshop http://oregonstate.edu/conferences/event/nwcw2014/ June 20-22, 2014 The (Pacific) Northwest Crystallography Workshop is a regional gathering of people who are interested in macromolecular structure determination but folk from anywhere are welcome. This year it will be held at Oregon State University in Corvallis Oregon. Today is the last day to register at the early registration rates of $75/$100 for students/others. Tomorrow the prices will rise by $25. We encourage you to sign up today! We will continue to accept abstracts until May 16th, but please try to get them in ASAP. Registration is not linked to abstract submission so you can register today and submit an abstract later. Next week, however, we will begin to define the speaking schedule. Dale E. Tronrud and P. Andrew Karplus Department of Biochemistry and Biophysics 2011 ALS Bldg Oregon State University Corvallis, OR 97331 -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlNhKJgACgkQU5C0gGfAG13b5wCfdlIVFbwJlZX2MPzXLdSODCpm nxwAoL19Va3igMKmXQhtkBmBf+7rkPsk =U+SM -END PGP SIGNATURE-
Re: [ccp4bb] stalled refinement after MR solution
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Refinement of a model with only 50% completeness is problematic, but you have four copies of a molecule (in P1) so your molecular replacement is only looking for 24 parameters. You should be able to get a solution with 50% completeness. Dale Tronrud On 5/8/2014 1:43 PM, Yarrow Madrona wrote: Hi Jacob. I am worried that I would dramatically suffer in data completeness. I am not sure how reliable the data is when you are have 50% completeness. These crystals are also pretty much impossible to reproduce at the moment. On Thu, May 8, 2014 at 1:30 PM, Keller, Jacob kell...@janelia.hhmi.org mailto:kell...@janelia.hhmi.org wrote: Since your search model is so good, why not go down to p1 to see what’s going on, then re-merge if necessary? __ __ JPK __ __ *From:*yarrowmadr...@gmail.com mailto:yarrowmadr...@gmail.com [mailto:yarrowmadr...@gmail.com mailto:yarrowmadr...@gmail.com] *On Behalf Of *Yarrow Madrona *Sent:* Thursday, May 08, 2014 4:29 PM *To:* Keller, Jacob *Subject:* Re: [ccp4bb] stalled refinement after MR solution __ __ I have had problems in the past with a and c cell being equal and having pseudo-merhohedral twining where the space group looked like C2221 but the true space group was P21 (near perfect 2fold NCS). But I didn't think twining was possible in this case. __ __ On Thu, May 8, 2014 at 12:43 PM, Keller, Jacob kell...@janelia.hhmi.org mailto:kell...@janelia.hhmi.org wrote: The b and c cell constants look remarkably similar JPK -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Randy Read Sent: Thursday, May 08, 2014 3:41 PM To: CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] stalled refinement after MR solution Hi Yarrow, If Dale said that, he probably wasn't saying what he meant clearly enough! The NCS 2-fold axis has to be parallel to the crystallographic 2-fold (screw) axis to generate tNCS. In your case, the NCS is a 2-fold approximately parallel to the y-axis, but it's nearly 9 degrees away from being parallel to y. That explains why the Patterson peak is so small, and there will be very little disruption from the statistical effects of tNCS. The anisotropy could be an issue. It might be interesting to look at the R-factors for the stronger subset of the data. It can make sense to apply an elliptical cutoff of the data using the anisotropy server (though Garib says that having systematically incomplete data can create problems for Refmac), but I hope you're not using the anisotropically scaled data for refinement. The determination of the anisotropic B-factors by Phaser without a model (underlying the anisotropy server) will not be as accurate as what Refmac or phenix.refine can do with a model. Finally, as Phil Evans always says, the space group is just a hypothesis, so you should always be willing to go back and look at the evidence for the space group if something doesn't work as expected. Best wishes, Randy Read - Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical ResearchTel: +44 1223 336500 tel:%2B44%201223%20336500 Wellcome Trust/MRC Building Fax: +44 1223 336827 tel:%2B44%201223%20336827 Hills Road E-mail: rj...@cam.ac.uk mailto:rj...@cam.ac.uk Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk http://www-structmed.cimr.cam.ac.uk On 8 May 2014, at 18:11, Yarrow Madrona amadr...@uci.edu mailto:amadr...@uci.edu wrote: Hello CCP4 community, I am stumped and would love some help. I have a molecular replacement solution that has Rfree stuck around 40% while Rwork is aorund 30%. The model is actually the same enzyme with a similar inhibitor bound. Relevant information is below. -Yarrow I have solved a structure in a P21 spacegroup: 51.53 88.91 89.65, beta = 97.1. Processing stats (XDS) are very good with low Rmerge (~5% overall) and good completeness. I don't think twinning is an option with these unit cell dimensions. My data was highly aniosotropic. I ran the data through the UCLA anisotropic server to scale in the B- direction (http://services.mbi.ucla.edu/anisoscale/) I get a small (a little over 5) patterson peak suggesting there is not much t-NCS to worry about. However, the output structure does have 2 fold symmetry (see below) and as Dale Tronrud pointed out, there is always tNCS in a P21 space group with two monomers related by a 2-fold axis. I calculated the translation to be unit cell fractions of 0.36 0.35, 0.32. rota_matrix -0.9860 -0.1636 -0.0309 rota_matrix -0.1659 0.95110.2605 rota_matrix -0.01320.2620 -0.9650 tran_orth 34.3310 -24.0033 107.0457 center_orth 15.76077.2426 77.7512 Phaser stats: SOLU SET RFZ=20.3 TFZ
Re: [ccp4bb] PDB passes 100,000 structure milestone
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 The policy doesn't say you can supersede someone else's entry. It says you can deposit your own version, if you have a publication. Then there will be two bogus structures instead of one. Pretty soon the PDB will start to look like one of the crappy Matrix movies. Dale Tronrud On 5/14/2014 6:47 PM, James Holton wrote: A little loophole that might make everyone happy can be found here: http://www.wwpdb.org/policy.html search for A re-refined structure based on the data from a different research group Apparently, anyone can supersede any PDB entry, even if they weren't the original depositor. All they need is a citation. Presumably, someone could re-refine 2hr0 against the data that were deposited with it. Possibly showing how to get an R-factor of 0% out of it. I'd definitely cite that paper. -James Holton MAD Scientist On 5/14/2014 11:01 AM, Nat Echols wrote: On Wed, May 14, 2014 at 10:53 AM, Mark Wilson mwilso...@unl.edu mailto:mwilso...@unl.edu wrote: As for the meaning of integrity, I'm using this word in place of others that might be considered more legally actionable. A franker conversation would likely more clearly draw the line that we're wrestling with here. The reference to integrity was Bernhard's - quoting the PDB mission statement; I just disagree with his interpretation of the meaning. As far as 2hr0 is concerned, I think we're quite safe calling it fraudulent at this point, since (ironically) Nature itself has said as much: http://www.nature.com/news/2009/091222/full/462970a.html -Nat -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlN0V1UACgkQU5C0gGfAG124eQCffE9h2fdDDi2TDLSwr9DabrZI GzoAn2QTo1/VTW8ZYSHCpcgCX+EHFv/q =Ja+6 -END PGP SIGNATURE-
[ccp4bb] Deadline Approaches for Registration to the Northwest Crystallographic Workshop
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 (Pacific) Northwest Crystallography Workshop http://oregonstate.edu/conferences/event/nwcw2014/ June 20-22, 2014 Summer is fast approaching and so is the Northwest Crystal- lography Workshop to be held here in beautiful Corvallis Oregon. The last day to register is next Tuesday, June 10th. If you are planing to attend but have not registered, you'd best get to it! This workshop has been held at various locations in the Pacific Northwest since 1981. It has always proven to be a great venue to meet other researchers in the region who are interested both in using macromolecular crystallography to solve structures and in developing and enhancing methods. This will be a cozy meeting with lots to learn and plenty of networking opportunities. Dale and Andy - -- Dale E. Tronrud and P. Andrew Karplus Department of Biochemistry and Biophysics 2011 ALS Bldg Oregon State University Corvallis, OR 97331 -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlOQtYEACgkQU5C0gGfAG10MBgCgozEbT3eZohyTRBkwj7sN6Bwi 2gcAoLaKt9Xw+KAgs4HX6+fuNDTHBkLR =sVDs -END PGP SIGNATURE-
Re: [ccp4bb] Solvent channels
On 06/27/2014 06:33 AM, Bernhard Rupp wrote: For small ion soaking for phasing purposes, partial occupancy is not a problem. For example, a few 1/2 occupied Iodines still can phase quite well. 1/2 a C is only 3 electrons, not that great. Add in higher displacement, and odds are that the ligand interpretation will become difficult. Particularly when the binding constants are poor, one will out of principle never reach full occupancy, which further exacerbates the weak density problem. Patience is definitely a virtue here. BR Here you are starting to mix equilibrium arguments with the previous kinetic arguments. If you have a weak binder you can always get full occupancy by adding enough of the compound - to determine how much, you must consider not only the binding constant but the number of binding sites in the crystal and the total volume of the drop containing your crystal. Time is not a factor. Halide ions and cryoprotectants are known to pervade crystals very rapidly, but they are usually added with overwhelming force. Much more is added than is required to bind to every specific binding site in the crystal. The rate of diffusion, as mass flow, depends not only on viscosity but on the concentration of unbound molecules inside the crystal. When I was soaking an inhibitor into a crystal of Thermolysin I was having problems with the crystals falling apart. My belief was that the inhibitor caused a small change in cell constants and since the inhibitor first bound in a shell around the surface of the crystal strain was created and the crystal cracked. My solution was to add small aliquots of inhibitor with a long enough wait between to allow each batch to diffuse throughout the crystal. Despite waiting up to 6 hours between additions the crystals still cracked. This is when I realized that after the inhibitor bound in the outer shell of the crystal the remaining concentration of free inhibitor was one billionth (since the binding constant was nanomolar) that of the concentration of active sites and the remaining mass flow within the crystal was insignificant. Of course the next aliquot would rapidly diffuse through the occupied region of the crystal and be bound in the shell just below it, becoming trapped itself and increasing the strain. Your movie doesn't include any details of concentration of your dye, nor what its binding constant is to any sites in a protein nor any mention of kon or koff. The lack of information makes it very difficult to draw any conclusions from the experiment, but I believe the experience from many other molecules is that small molecules do move very rapidly through protein crystals, until they are caught by a binding site. I don't believe your movie represents typical diffusion of small molecules in a protein crystal. My interpretation of your movie is: 1) The dye rapidly diffuses into the crystal reaching a simple equilibrium where the concentration in the bulk solvent matches that of the outside solution. Since the protein excludes about half of the volume of the crystal the overall concentration is half that of the mother liquor and the color of the crystal is 1/2 as dark as the surrounding solution. 2) With a slow kon, the dye molecules within the crystal start binding specifically to the protein. Since the dye is aromatic it probably has to dig deep into the protein to find a binding site and this takes time. As dye is removed from the bulk solvent it is rapidly replaced by diffusion from outside the crystal, and the crystal begins to darken, eventually becoming darker than the surrounding liquid. The speed of binding is controlling the kinetics not diffusion. Dale Tronrud -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Keller, Jacob Sent: Friday, June 27, 2014 3:07 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] Solvent channels And yet halides--even iodide--permeate those same lysozyme crystals and others entirely in 30--60 sec. JPK -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Bernhard Rupp Sent: Friday, June 27, 2014 9:00 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] Solvent channels Just a remark: diffusion is a slow and random-walk process. Particularly large molecules in viscous media (PEG anybody?) move (diffuse) slowly in solution. To simply extrapolate from the fact that the ligand is smaller than the solvent channels to the odds of the presence of a ligand is a risky proposition. Positive omit difference density after 'shoot first' as Boaz indicated is a much better indication. And shoot you probably will a lot. The little movie below shows how slowly even a small aromatic dye molecule soaks into a crystal. Total time 10 hrs. http://www.ruppweb.org/cryscam/lysozyme_dye_small.wmv The literally hundreds of empty ligand structures collected in Twilight
[ccp4bb] New Version of the Protein Geometry Database Now Available
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Protein Geometry Database Server V 1.0 http://pgd.science.oregonstate.edu/ Developed by Andy Karplus' laboratory at Oregon State University We are pleased to announce the availability of an enhanced version of the Protein Geometry Database (PGD) web service, originally announced in Berkholz et al (2010) Nucleic Acids Research 38, D320-5. This server allows you to explore the many backbone and side chain conformations that exist in the PDB as well as the protein geometry (lengths and angles) that occur in those conformations. This service is ideal for finding instances of particular conformations or peculiar bond lengths or angles. It is also quite adept at identifying sets of fragments that can then be examined for systematic variation in ideal geometry. The expanded PGD now includes all conformational and covalent geometry information not just for the backbone but also for the sidechains. There are three basic operations available: selecting a set of fragments via a delimited search, analyzing the geometry of those fragments, and dumping the results to your computer for more specialized analysis. To control bias in statistical analyses due to the variable number of entries with the same or similar sequence, the database contains only the highest quality model in each sequence cluster as identified by the Pisces server from Roland Dunbrack's lab. Two settings, 90% and 25% sequence identity, are available. Currently, at the 90% sequence identity level there are 16,000 chains and at the 25% level this drops to about 11,000 chains. You can filter a search based on the quality of the model as indicated by resolution and R values. A search can also be filtered based on DSSP secondary structure, amino acid type, the phi/psi/omega angles and bond lengths, angles, and chi angles. For example, you can find all cysteine residues in the center of three-residue peptide fragments (i.e. not at a peptide terminus), in beta sheet, with both peptide bonds trans, and CB-SG length greater than 1.85 A from models with resolution better than 1.5 A. By the way, in the no more than 25% sequence identity category there are 25 of them. Once you have a set of results, you can create 2D plots showing the relationships of up to three features (i.e. bond lengths, bond angles, or conformational angles). For instance, you can look at how a given feature varies with phi and psi using a phi(i)/psi(i) plot. Or, you can just as easily look at the variation with psi(i)/phi(i+1), or even the relationships between any selected bond angles. As one example, it is instructive to perform a default search and plot NCaCb vs NCaC colored based on CbCaC. As this search is restricted to just the highest resolution models, you can see the justification for chiral volume restraints. Finally, all of your results can be downloaded for your own analysis. Development of the PGD continues. If you have worked with the site and have any ideas and suggestions for how to improvement it, please drop us a line. The publication describing the PGD is: Berkholz, D.S., Krenesky, P.B., Davidson, J.R., Karplus, P.A. (2010) Protein Geometry Database: A flexible engine to explore backbone conformations and their relationships with covalent geometry. Nucleic Acids Res. 38, D320-5. Also, some examples of published analyses enabled by earlier versions of the PGD are listed here:. Berkholz, D.S., Shapovalov, M.V., Dunbrack, R.L.J. Karplus, P.A. (2009). Conformation dependence of backbone geometry in proteins. Structure 17, 1316-1325. Hollingsworth, S.A., Berkholz, D.S. Karplus, P.A. (2009). On the occurrence of linear groups in proteins. Protein Science 18, 1321-1325 Hollingsworth, S.A. Karplus, P. A. (2010). Review: A fresh look at the Ramachandran plot and the occurrence of standard structures in proteins. BioMolecular Concepts 1, 271-283. Berkholz, D.S., Driggers, C.M., Shapovalov, M.V., Dunbrack, R.L., Jr. Karplus P.A. (2012) Nonplanar peptide bonds in proteins are common and conserved but not biased toward active sites. Proc Natl Acad Sci U S A. 109, 449-53. Dale Tronrud P. Andrew Karplus Department of Biochemistry and Biophysics Oregon State University -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlOt044ACgkQU5C0gGfAG12aZACdG68Vyjw7JJimw0ofMZrJQLu8 B1IAn0Q5xx8ptRosgMXswdYmdjNKyFkA =D63d -END PGP SIGNATURE-
Re: [ccp4bb] New Version of the Protein Geometry Database Now Available
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 The Protein Geometry Database looks at proteins as collections of bond lengths, angles, and torsion angles. It is not the place to go when you want to know how a protein part is related in space to some other (covalently) distant part. Andy tells me that Jacque Fetrow, who was at Wake Forest University, has a database that might answer your query. There is a paper at J Mol Biol. 2003 Nov 28;334(3):387-401. Structure-based active site profiles for genome analysis and functional family subclassification. Neither one of us has used it. Hope that helps, Dale Tronrud On 6/27/2014 1:49 PM, Keller, Jacob wrote: I have wanted for some time to search for catalytic-triad-like configurations by defining three Ca-Cb bonds from known catalytic triads, then searching the pdb for matches, but have not thought of a quick and/or easy way to do this--can your software do this sort of thing, or is there some other software which could be used for this? JPK -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Dale Tronrud Sent: Friday, June 27, 2014 4:27 PM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] New Version of the Protein Geometry Database Now Available Protein Geometry Database Server V 1.0 http://pgd.science.oregonstate.edu/ Developed by Andy Karplus' laboratory at Oregon State University We are pleased to announce the availability of an enhanced version of the Protein Geometry Database (PGD) web service, originally announced in Berkholz et al (2010) Nucleic Acids Research 38, D320-5. This server allows you to explore the many backbone and side chain conformations that exist in the PDB as well as the protein geometry (lengths and angles) that occur in those conformations. This service is ideal for finding instances of particular conformations or peculiar bond lengths or angles. It is also quite adept at identifying sets of fragments that can then be examined for systematic variation in ideal geometry. The expanded PGD now includes all conformational and covalent geometry information not just for the backbone but also for the sidechains. There are three basic operations available: selecting a set of fragments via a delimited search, analyzing the geometry of those fragments, and dumping the results to your computer for more specialized analysis. To control bias in statistical analyses due to the variable number of entries with the same or similar sequence, the database contains only the highest quality model in each sequence cluster as identified by the Pisces server from Roland Dunbrack's lab. Two settings, 90% and 25% sequence identity, are available. Currently, at the 90% sequence identity level there are 16,000 chains and at the 25% level this drops to about 11,000 chains. You can filter a search based on the quality of the model as indicated by resolution and R values. A search can also be filtered based on DSSP secondary structure, amino acid type, the phi/psi/omega angles and bond lengths, angles, and chi angles. For example, you can find all cysteine residues in the center of three-residue peptide fragments (i.e. not at a peptide terminus), in beta sheet, with both peptide bonds trans, and CB-SG length greater than 1.85 A from models with resolution better than 1.5 A. By the way, in the no more than 25% sequence identity category there are 25 of them. Once you have a set of results, you can create 2D plots showing the relationships of up to three features (i.e. bond lengths, bond angles, or conformational angles). For instance, you can look at how a given feature varies with phi and psi using a phi(i)/psi(i) plot. Or, you can just as easily look at the variation with psi(i)/phi(i+1), or even the relationships between any selected bond angles. As one example, it is instructive to perform a default search and plot NCaCb vs NCaC colored based on CbCaC. As this search is restricted to just the highest resolution models, you can see the justification for chiral volume restraints. Finally, all of your results can be downloaded for your own analysis. Development of the PGD continues. If you have worked with the site and have any ideas and suggestions for how to improvement it, please drop us a line. The publication describing the PGD is: Berkholz, D.S., Krenesky, P.B., Davidson, J.R., Karplus, P.A. (2010) Protein Geometry Database: A flexible engine to explore backbone conformations and their relationships with covalent geometry. Nucleic Acids Res. 38, D320-5. Also, some examples of published analyses enabled by earlier versions of the PGD are listed here:. Berkholz, D.S., Shapovalov, M.V., Dunbrack, R.L.J. Karplus, P.A. (2009). Conformation dependence of backbone geometry in proteins. Structure 17, 1316-1325. Hollingsworth, S.A., Berkholz, D.S. Karplus, P.A. (2009). On the occurrence
Re: [ccp4bb] very informative - Trends in Data Fabrication
I'm not sure how encryption can solve a problem of truth or falsity. Public key encryption only says that the message that is decrypted using the public key must have been encrypted by someone who knows the private key. A person can use their private key to encrypt a lie as well as the truth. I don't quite follow your prescription, but if you are saying that the beamline gives the depositor a code when they collect data that proves data were collected, how does the beamline personal know the contents of the crystal? Couldn't one simply collect HEWL and then deposit any model they like? The beamline could encrypt all images with their private key, and the data integration program could decrypt the images using the public key. That way when a depositor presents a set of images it could be proved that those images came, unmodified, from that beamline. The images would still have to be deposited, however. (And this provides no protection against forgeries of home source data sets.) Dale Tronrud On 04/03/12 13:19, Bryan Lepore wrote: On the topic of MX fraud : could not an encryption algorithm be applied to answer the question of truth or falsity of a pdb/wwpdb/pdbe entry? has anyone proposed such an idea before? for example (admittedly this is a mess): * a detector parameter - perhaps the serial number - is used as a public key. the detector parameter is shared among beamlines/companies/*pdb. specifically, the experimentor requests it at beamtime. * experimentor voluntarily encrypts something, using GPLv3 programs, small but essential to the deposition materials, like the R-free set indices (or please suggest something better), using their private key. maybe symmetric cipher would work better for this. or the Free R set indices are used to generate a key. * at deposition time, the *pdb unencrypts the relevant entry components using their private key related to the detector used. existing deposition methods pass or fail based on this (so maybe not the Free R set). * why do this : at deposition time, *pdb will have a yes-or-no result from a single string of characters. can be a stop-gap measure until images can be archived easily. all elements of the chain are required to be free and unencumbered by proprietary interests. importantly, it is voluntary. this will prevent entries such as Schwarzenbacher or Ajees getting past deposition - so admittedly, not many. references: http://en.wikipedia.org/wiki/RSA_(algorithm) http://en.wikipedia.org/wiki/Diffie-Hellman_key_exchange -Bryan
Re: [ccp4bb] Disorder or poor phases?
Dear Gerard, No, the updated model (4BCL) was published in 1993 (although apparently not deposited until 1998 - What was wrong with me?) Both were refined with that classic least-squares program TNT. I hope there was some improvement in the software between 1986 and 1993, and I always tried to work with the most recent version, but there wasn't a switch in target function. I agree that the distortions in these maps would have been less if an ML approach had been used and perhaps the location of the disordered residues would have been apparent earlier in the process. Maybe this sort of problem will not be seen again at 1.9 A resolution. My goal was simply to provide an example where errors due to model phases didn't distribute evenly throughout the map but had greater consequence in some locations. Dale On 04/10/12 13:45, Gerard Bricogne wrote: Dear Dale, There is perhaps a third factor in the progress you were able to make, namely the improvement in the refinement programs. Your first results were probably obtained with a least-squares-based program, while the more recent would have come from maximum-likelihood-based ones. The difference lies in the quality of the phase information produced from the model through comparison of Fo and Fc, with much greater bias-correction capabilities in the ML approach. Here, it removed the bias towards some regions being absent in the model, and made them no longer be absent in the maps. So it is a question of the quality of the phase information. With best wishes, Gerard. -- On Tue, Apr 10, 2012 at 12:00:28PM -0700, Dale Tronrud wrote: The phases do have effects all over the unit cell but that does not prevent them from constructively and destructively interfering with one another in particular locations. Some years ago I refined a model of the bacteriochlorophyll containing protein to a 1.9 A data set when the sequence of that protein was unknown. This is primarily a beta sheet protein and a number of the loops between the strands were disordered. Later the amino acid sequence was determined and I finished the refinement after building in these corrections. The same data set was used, but a number of the loops had become ordered. While the earlier model (3BCL) had 357 amino acids the final model (4BCL) had 366. These nine amino acids didn't become ordered over the intervening years. They were just as ordered when I was building w/o a sequence, it is just that I couldn't see how to build them based on the map's appearance. One possibility is that the density for these residues was weak and the noise (that was uniform over the entire map) obliterated their signal where it only obscured the stronger density. Another possibility is that the better model had a better match of the low resolution F's and less intense ripples radiating from the surface of the molecule, resulting in things sticking out being less affected. Whatever the details, the density for these amino acids were too weak to model with the poorer model phases and became buildable with better phases. The fact that they could not be seen in the early map was not an indication that they were disordered. The first six amino acids of this protein have never been seen in any map, including the 1.3 A resolution model 3EOJ (which by all rights should have been called 5BCL ;-) ). These residues appear to be truly disordered. Going back to 3BCL - The map for this model is missing density for a number of residues of which we know some are disordered and some simply unmodelable because of the low quality of the phases. I don't know of a way, looking at that map alone, of deciding which is which. Because of this observation I don't believe it is supportable to say I don't see density for these atoms therefore they must be disordered. Additional evidence is required. Dale Tronrud On 04/10/12 08:38, Tim Gruene wrote: Dear Francis, the phases calculated from the model affect the whole unit cell hence it is more likely this is real(-space, local) disorder rather than poor phases. Regards, Tim P.S.: The author should not look at an 2fofc-map but a sigma-A-weighted map to reduce model bias. On 04/10/12 17:22, Francis E Reyes wrote: Hi all, Assume that the diffraction resolution is low (say 3.0A or worse) and the model (a high resolution homologue, from 2A xray data is available) was docked into experimental phases (say 4A or worse) and extended to the 3.0A data using refinement (the high resolution model as a source of restraints). There are some conformational differences between the high resolution model and the target crystal. The author observes that in the 2fofc map at 3A, most of the model shows reasonable density, but for a stretch of backbone the density is weak. Is the weakness of the density in this region because of disorder or bad model phases? Would
Re: [ccp4bb] Disorder or poor phases?
On 4/10/2012 10:44 PM, Kay Diederichs wrote: Hi Dale, my experience is that high-B regions may become visible in maps only late in refinement. So my answer to the original poster would be - both global reciprocal-space (phase quality) and local real-space (high mobility) features contribute to a region not appearing ordered in the map. This would be supported by your experience if those residues that you could not model in 3BCL had high (or at least higher) B-factors compared to the rest of the model. Is that so? Actually the residues I couldn't model in 3BCL had no B's... :) Seriously, the residues that appeared for 4BCL did have B values much higher than average. Their density was weak in the best of circumstances and more susceptible to obliteration by the distortions caused by imprecision in the phases. I don't really want to describe this as phase error as that phrase conjures notions of large changes in phase. The R value only dropped from 18.9% to 17.8% from 3BCL to 4BCL. I don't expect there were huge differences in the phase angles, but the differences were enough. Dale best, Kay
Re: [ccp4bb] Criteria for Ligand fitting
While I'm quite happy with all the responses this question has provoked there is an additional point I would like to contribute. It is not enough to say that you can interpret your map with a model based on what you expect. You have to also show that you can't interpret your map with any other reasonable model. Saying that my map is consistent with my model is a very weak statement in the absence of exclusivity. A recent example of this sort of problem can be read about at (warning: tooting my own horn) http://www.springerlink.com/content/b8h6lg138635380v/?MUD=MP Dale Tronrud On 04/23/12 21:02, Naveed A Nadvi wrote: Dear Crystallographers, We have obtained a 1.7 A dataset for a crystal harvested from crystallization drop after 2 weeks of soaking with inhibitor. The inhibitor has an aromatic ring and also an acidic tail derived from other known inhibitors. The active site hydrophobic crown had been reported to re-orient and a charged residue is known to position for forming a salt-bridge with similar ligands. When compared to apo strucutres, we can clearly see the re-orientation of these protein residues. However, there are no clear density visible for the ligand in the Fo-Fc map. Some density is visible in the 2Fo-Fc map with default settings in COOT. We were expecting co-valent modifcations between the inhbitor, co-factor and protein residues. In fact, the Fo-Fc map suggested the protein residue is no longer bonded to the co-factor (red negative density) and a green positive density is observed nearby for the protein residue. These observations, along with the extended soaking and the pre-determined potency convince us that the inhibitor is present in the complex. When I lower the threshold of the blue 2Fo-Fc map (0.0779 e/A^3; 0.2 rmsd) we can see the densities for the aromatic ring and the overall structural features. These densities were observed without the cofactor and the inhibtor in the initial MR search model. The R/Rfree for this dataset without inhibitor was 0.20/0.24 (overall Bfactor 17.4 A^2). At 50% occupancy, modeling the inhibtor showed no negative desities upon subsequent refinement. With the inhibtor, the R/Rfree was 0.18/0.22 (overall Bfactor 18.8 A^2). The temp factors of the inhibitor atoms (50% occ) were 15-26 A^2. My understanding is phase from the MR search model may influence Fo-Fc maps, and the 2Fo-Fc map minimizes phase bias. Since the inhibitor was absent from the MR search model, can these observations be used to justify the fitting of the ligand in the map? Given the low map-level used to 'see' the ligand, would this be considered noise? Can I justfiy the subsequent fall in R/Rfree and the absence of negative density upon ligand fitting as proof of correct inhibtor modeling? I would appreciate if you could comment on this issue. Or tell me that I'm dying to see the inhibitor and hence imagining things! Kind Regards, Naveed Nadvi.
Re: [ccp4bb] Anisotropic diffraction
If the data set had P6 symmetry before anisotropic scaling it would keep that symmetry afterwards. If it was only P2 symmetry before, it certainly would not have P6 afterwards. Any anisotropic scaling I've seen constrains the anisotropy to the lattice symmetry so symmetry cannot be degraded via its application. If your data set had, in principle, P6 symmetry but was expressed in a lower symmetry asymmetric unit and contained nonsymmetry-conforming noise before anisotropic scaling it would also contain broken symmetry afterwards. The higher symmetry was not lost, it was never there to begin with. Dale Tronrud On 4/28/2012 12:06 AM, Zhijie Li wrote: Hi, My first thought was same with David: the truncation won't change the crystal's space group. The symmetry of the crystal is reflected by the symmetry of the amplitudes of many many reflections across all resolutions. Ellipsoidal truncation itself only removes some very weak reflections from the outer shells. The remaining reflections will still have a good number of reflections carrying the symmetry of the crystal. However a second thought on the anisotropic scaling and B-factor correction led me to this scenario: suppose we have a crystal that's really P6, but we have cowardly indexed it to a lower space group P2, with the 2-fold axis, b, coinciding the real 6-fold axis. By losing the a=c restrain, the anisotropic scaling along H and L now may not be strictly equal (for example, could be caused by outliers that would have been identified and filtered out if indexed correctly as P6), resulting in the loss of the 6-fold symmetry in the scaled dataset. Apparently this is an artifact due to an improper SG assignment before the anisotropic scaling and B-factor correction. Just some crazy thoughts. Please correct me if I am wrong. BTW, to Theresa: an very informative introduction on ellipsoidal truncation and anisotropic scaling can be found here: http://services.mbi.ucla.edu/anisoscale/ -- From: Theresa Hsu theresah...@live.com Sent: Friday, April 27, 2012 3:18 PM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] Anisotropic diffraction Dear crystallographers A very basic question, for anisotropic diffraction, does data truncation with ellipsoidal method change the symmetry? For example, if untruncated data is space group P6, will truncated data index as P622 or P2? Thank you. Theresa
Re: [ccp4bb] Strange Density
Your holo structure has a Ca++ and three water molecules that have not been built into your low resolution apo map. These atoms are not expected to be resolved at 3 A resolution, so I would expect them to appear as a large, misshapened, blob. Your screenshot only shows one contour level. It is quite possible that the highest density value is not at the center of the blob. You might have a lower occupancy Ca++ atom at the site and the image is confused by the low resolution. Remember, even if the concentration of Ca++ is lower in this mother liquor any Ca++ that binds will bind exactly as it does in the fully occupied case. A weakly binding Ca++ site will not bind before the strongly binding site. I would first look to see what your holo map looks like when it's resolution is truncated to 3 A. This will give you a sense of what a Ca++ binding in this site would look like. You could try refining a model with the Ca++ and water molecules, with lower occupancy, and see what the residual difference map looks like. You will, of course, have to have strong restraints on the geometry to hold this model together at 3 A resolution, but fortunately you have a higher resolution model to base these restraints on. The PDB file is a statement of your belief of what is in the crystal. Don't waste your time refining models that don't make chemical sense. An ion floating in space with no ligands is not a reasonable model so even if it fits the density it can't be correct. There a multiple ways of justifying the model of a crystal and others on the list will likely have different ideas for the criteria that should be used. My belief is that you know the holo model and the most likely outcome of your Ca++ extraction experiment (in a Bayesian prior sense) is a lower occupancy binding of the Ca++ and its water molecules. If you build and refine that model and the difference map is acceptable you can say that this model is consistent with your experiment. If there is residual density then you can conclude that something is replacing the Ca++, but untangling superimposed, partial occupancy, models at 3.1 A resolution is extremely difficult. I think all you will be able to say is that something replaces the Ca++ but it cannot be identified. Not everything can be identified in a 3 A map. Not everything can be identified in a 1 A map. Your job is to say these parts I understand and these parts I don't. Dale Tronrud On 05/15/12 07:51, RHYS GRINTER wrote: Dear Community, As I'm a relatively new to protein crystallography this might turn out to be an obvious question, however. I'm working on the structure of a enzyme requiring Ca2+ for activity and with calcium coordinated in the active site by Asp and 2x backbone carbonyl groups, in a crystal structure with Ca in the crystallisation conditions (http://i1058.photobucket.com/albums/t401/__Rhys__/MDC_TD_15A.jpg). When Ca is omitted from the crystallizing conditions and a divalent chelator (EGTA) is added the crystals are of significantly lower resolution (3.13A). Refinement of this data reveals density for a molecule coordinated by the Ca coordinating Asp and backbone, however this density is significantly further away (3.4-3.8A) too far away for water or a strongly coordinated divalent cation(http://i1058.photobucket.com/albums/t401/__Rhys__/MDC_EGTA_315.jpg). The density is also much weaker than for Ca in the previous model disappearing at 3.5 sigma. The crystallisation conditions for the Ca free condition is: 0.1M Tris/Bicine buffer [pH 8.5] 8% PEG 8000 30% Ethylene Glycol 1mM EGTA The protein was purified by nickel affinity/SEC and dialysed into: 20mM NaCl 20mM Tris [pH 8.0] A colleague suggested that sulphate or phosphate could fit at these distances, but these ions have not been added at any stage of the crystallisation process. Could anyone give me some insight into what this density might represent? Thanks in advance, Rhys Grinter PhD Candidate University of Glasgow
Re: [ccp4bb] Calculating ED Maps from structure factor files with no sigma
On 05/23/12 08:06, Nicholas M Glykos wrote: Hi Ed, I may be wrong here (and please by all means correct me), but I think it's not entirely true that experimental errors are not used in modern map calculation algorithm. At the very least, the 2mFo-DFc maps are calibrated to the model error (which can be ideologically seen as the error of experiment if you include model inaccuracies into that). This is an amplitude modification. It does not change the fact that the sigmas are not being used in the inversion procedure [and also does not change the (non) treatment of missing data]. A more direct and relevant example to discuss (with respect to Francisco's question) would be the calculation of a Patterson synthesis (where the phases are known and fixed). I have not done extensive (or any for that matter) testing, but my evidence-devoid gut feeling is that maps not using experimental errors (which in REFAMC can be done either via gui button or by excluding SIGFP from LABIN in a script) will for a practicing crystallographer be essentially indistinguishable. It seems that although you are not doubting the importance of maximum likelihood for refinement, you do seem to doubt the importance of closely related probabilistic methods (such as maximum entropy methods) for map calculation. I think you can't have it both ways ... :-) The reason for this is that model errors as estimated by various maximum likelihood algorithms tend to exceed experimental errors. It may be that these estimates are inflated (heretical thought but when you think about it uniform inflation of the SIGMA_wc may have only proportional impact on the log-likelihood or even less so when they correlate with experimental errors). Or it may be that the experimental errors are underestimated (another heretical thought). My experience from comparing conventional (FFT-based) and maximum-entropy- related maps is that the main source of differences between the two maps has more to do with missing data (especially low resolution overloaded reflections) and putative outliers (for difference Patterson maps), but in certain cases (with very accurate or inaccurate data) standard deviations do matter. In a continuation of this torturous diversion from the original question... Since your concern is not how the sigma(Fo) plays out in refinement but how uncertainties are dealt with in the map calculation itself (where an FFT calculates the most probable density values and maximum entropy would calculate the best, or centroid, density values) I believe the most relevant measure of the uncertainty of the Fourier coefficients would be sigma(2mFo-DFc). This would be estimated from a complex calculation of sigma(sigmaA), sigma(Fo), sigma(Fc) and sigma(Phic). I expect that the contribution of sigma(Fo) would be one of the smallest contributors to this calculation, as long as Fo is observed. I wouldn't expect the loss of sigma(Fo) to be catastrophic. Wouldn't sigma(sigmaA) be the largest component since sigmaA is a function of resolution and based only on the test set? Dale Tronrud All the best, Nicholas
Re: [ccp4bb] Fwd: [ccp4bb] Death of Rmerge
On 05/31/12 12:07, Jacob Keller wrote: Alas, how many lines like the following from a recent Science paper (PMID: 22605777), probably reviewer-incited, could have been avoided! Here, we present three high-resolution crystal structures of the Thermus thermophilus (Tth) 70S ribosome in complex withRMF, HPF, or YfiA that were refined by using data extending to 3.0 Å (I/sI = 1), 3.1 Å (I/sI = 1), and 2.75 Å (I/sI = 1) resolution, respectively. The resolutions at which I/sI = 2 are 3.2 Å, 3.4 Å, and 2.9 Å, respectively. I don't see how you can avoid something like this. With the new, higher, resolution limits for data (which are good things) people will tend to assume that a 2.6 A resolution model will have roughly the same quality as a 2.6 A resolution model from five years ago when the old criteria were used. KK show that the weak high resolution data contain useful information but certainly not as much information as the data with stronger intensity. The resolution limit of the data set has been such an important indicator of the quality of the resulting model (rightly or wrongly) that it often is included in the title of the paper itself. Despite the fact that we now want to include more, weak, data than before we need to continue to have a quality indicator that readers can use to assess the models they are reading about. While cumbersome, one solution is to state what the resolution limit would have been had the old criteria been used, as was done in the paper you quote. This simply gives the reader a measure they can compare to their previous experiences. Now would be a good time to break with tradition and institute a new measure of quality of diffraction data sets. I believe several have been proposed over the years, but have simply not caught on. SFCHECK produces an optical resolution. Could this be used in the title of papers? I don't believe it is sensitive to the cutoff resolution and it produces values that are consistent with what the readers are used to. With this solution people could include whatever noisy data they want and not be guilty of overstating the quality of their model. Dale Tronrud JPK On Thu, May 31, 2012 at 1:59 PM, Edward A. Berry ber...@upstate.edu wrote: Yes! I want a copy of this program RESCUT. REMARK 200 R SYM FOR SHELL(I) : 1.21700 I noticed structure 3RKO reported Rmerge in the last shell greater than 1, suggesting the police who were defending R-merge were fighting a losing battle. And this provides a lot of ammunition to those they are fighting. Jacob Keller wrote: Dear Crystallographers, in case you have not heard, it would appear that the Rmerge statistic has died as of the publication of PMID: 22628654. Ding Dong...? JPK -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program email: j-kell...@northwestern.edu *** -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program email: j-kell...@northwestern.edu ***
Re: [ccp4bb] sigma levels of averaged maps in coot ( or e/A3)
I'm afraid I seriously mistrust the sigma and e/A^3 numbers reported by Coot for ncs averaged maps. I work with a crystal with near perfect 6-fold ncs and the e/A^3 numbers make no sense. For a 2Fo-Fc style map the e/A^3 values should be nearly the same after as before. They are not. The sigma of an averaged map has a definitional problem - what is the volume to normalized over. With a map with crystal symmetry the answer is pretty clear, use the asymmetric unit. The asymmetric unit of an averaged map will be the least-common-multiple of the rotated unit cells and could easily measure in hundreds if not thousands of unit cells. Not very practical and not very useful. Paul says that he normalizes over a box, which is the easy way out, but I don't believe it has any statistical meaning. The box will contain some parts of the ncs asymmetric unit multiple times, and include some cs related regions. My opinion is that the e/A^3 calculation for ncs averaged density in Coot is broken. (I have not used the daily-build version, just the stable releases, but none have worked in my hands for years.) I usually contour an unaveraged map at my desired level, and then adjust the averaged map so that it mostly matches those contours. If your ncs is less perfect this will not work as well for you. Dale Tronrud On 6/6/2012 2:20 PM, Paul Emsley wrote: On 06/06/12 21:47, Ursula Schulze-Gahmen wrote: I calculated threefold averaged omit maps in coot. These maps look nice and clean, but I am having trouble making sense of the displayed sigma levels or e/A3 values. When I display the unaveraged and averaged maps at a similar density level for the protein the unaveraged map is at 0.024 e/A3 and 2.7 sigma, while the averaged map is displayed at 0.0016e/A3 and 7.6 sigma. I read the previous discussion about this issue where it was recommended to rely on the e/A3 values for comparison, but even that doesn't seem to work in this case. Don't forget that in one case you are looking at a whole map and in the other (an average of) maps generated from a box encapsulating each chain. I wouldn't stress if I were you... Paul.
Re: [ccp4bb] Chiral volume outliers SO4
While this change has made your symptom go away it is stretching it a bit to call this a fix. You have not corrected the root problem that the names you have given your atoms do not match the convention which is being applied for SO4 groups. Changing the cif means that you don't have to worry about it, but people who study such details will be forced to deal with the incorrect labels of your model in the future. Wouldn't it just be easier to swap the names of two oxygen atoms in each SO4, leaving the cif alone? Your difficulties will go away and people using your model in the future will also have a simpler life. This labeling problem is not new. The fight to standardize the labeling of the methyl groups in Valine and Leucine was raging in the 1980's. Standardizing the labels on the PO4 groups in DNA/RNA was much more recent. It helps everyone when you know you can overlay two models and have a logical solution without a rotation matrix with a determinate of -1. Besides, you will continue to be bitten by this problem as you use other programs, until you actually swap some labels. Dale Tronrud On 07/12/12 15:00, Joel Tyndall wrote: Hi all, Thanks very much to all who responded so quickly. The fix is a one liner in the SO4.cif file (last line) SO4 chir_01 S O1 O2 O3both which I believe is now in the 6.3.0 release. Interestingly the chirality parameters were not in the SO4.cif file in 6.1.3 but then appeared in 6.2.0. Once again I'm very happy to get to the bottom of this and get it fixed. I do wonder if it had become over parametrised. Cheers Joel -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Robbie Joosten Sent: Thursday, 12 July 2012 12:16 a.m. To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] Chiral volume outliers SO4 Hi Ian, @Ian: You'd be surprised how well Refmac can flatten sulfates if you have a chiral volume outlier (see Figure 1d in Acta Cryst. D68: 484-496 (2012)). But this is only because the 'negative' volume sign was erroneously used in the chiral restraint instead of 'both' (or better still IMO no chiral restraint at all), right? If so I don't find it surprising at all that Refmac tried to flip the sulphate and ended up flattening it. Seems to be a good illustration of the GIGO (garbage in - garbage out) principle. Just because the garbage input in this case is in the official CCP4 distribution and not (as is of course more commonly the case) perpetrated by the user doesn't make it any less garbage. The problem is that in the creation of chiral volume targets chemically equivalent (groups of) atoms are not recognized as such. So any new or recreated restraint files will have either 'positiv' or 'negativ' and the problem starts all over again. That is why it is better to stay consistent and choose one chirality (the same one as in the 'ideal' coordinates in the PDB ligand descriptions). This will also make it easier compare ligands after aligning them (this applies to ligands more complex than sulfate). Obviously, users should not be forced to deal with these things. Programs like Refmac and COOT should fix chiral volume inversions for the user, because it is only relevant inside the computer. That is the idea of chiron, just fix these 'problems' automatically by swapping equivalent atoms whenever Refmac gives a chiral volume inversion warning. It should make life a bit easier. The point I was making is that in this and similar cases you don't need a chiral restraint at all: surely 4 bond lengths and 6 bond angles define the chiral volume pretty well already? Or are there cases where without a chiral restraint the refinement still tries to flip the chirality (I would fine that hard to believe). I agree with you for sulfate, and also for phosphate ;). I don't know what happens in other compounds at poor resolution, when bond and angle targets (and their SDs) are not equivalent. I guess that some angle might 'give way' before others. That is something that should be tested. I have a growing list of chiral centers that have this problem if you are interested. Cheers, Robbie
Re: [ccp4bb] Chiral volume outliers SO4
On 7/13/2012 1:58 AM, Tim Gruene wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear all, I am surprised by the discussion about chiraliy of an utterly centrosymmetric molecule. Shouldn't the four Oxygen atoms be at least from a QM point-of-view to indistinguishable? What reason is there to maintain a certain 'order' in the human-induced numbering scheme? There are good reasons for maintaining order in this human-induced numbering scheme. A common operation is to superimpose two molecules and calculate the rmsd of the positional differences. This calculation is not useful when the Val CG1 and CG2 are swapped in one molecule relative to the other. Suddenly you have, maybe a handful, of atoms that differ in position by about 3.5 A when most of us would consider this to be nonsense. We want the rmsd between equivalent atoms regardless of the human-induced numbering scheme. There are two ways this can come about. 1) The overlay program could swap the labels on one to match the other or 2) The labels can be defined to be consistent from the start. Neither 1) or 2) are objectively better in any absolute sense. The Powers that Be, however, have decided that for Val, Leu, Phe, Tyr, and the PO2 in DNA, RNA, and many co-enzymes models should be adjusted to conform to a standard. If we are doing this for these groups in order to make comparison of models simpler, why stop there? If we say there are standards for some groups but not others we have the worst of both worlds - We have to both modify models and write complicated comparison programs. The failure of comparison programs to correct for labeling differences is generally a silent error - A handful of 3.5 A differences mixed into thousands of very small differences will not likely cause an increase in the rmsd that would be noticed. Only if the individual differences are plotted, or the biggest differences are listed will the user notice the problem. Silent errors are the worst errors since they are the most likely to make it all the way to publication. As I see it, the problem here lies in the program that created the original poster's SO4 group. If it had matched the convention now present in the CCP4 cif there would be none of these problems. That program should be tracked down and updated. The problem of labeling groups that have symmetry along a rotatable torsion angle is a persistent problem that, I'm afraid, has no permanent solution other than CONSTANT VIGILANCE. I see that the newer versions of Coot have taken up this burden, at least for Phe and Tyr. (I guess we need a picture of a coot with one big roving eye.) Since we are already unambiguously defining the labeling for a number of the groups we use, I think it is up to you to justify why this group should be treated differently. Dale Tronrud P.S. On a only slightly off topic note - I'm quite afraid of using both as the definition for chirality. I've noticed that this keyword is used as an excuse to not figure out what the real chirality of an atom is and as a result people build models with bad chiral centers that are not flagged by their software (another silent error that makes it to publication.) The PDB is littered with cofactors and ligands that have inverted chiral centers (Even centers that Pasteur would approve). I would prefer that both was not a legal value and researchers would be required to think about chirality. Cheers, Tim On 07/13/12 00:22, Dale Tronrud wrote: While this change has made your symptom go away it is stretching it a bit to call this a fix. You have not corrected the root problem that the names you have given your atoms do not match the convention which is being applied for SO4 groups. Changing the cif means that you don't have to worry about it, but people who study such details will be forced to deal with the incorrect labels of your model in the future. Wouldn't it just be easier to swap the names of two oxygen atoms in each SO4, leaving the cif alone? Your difficulties will go away and people using your model in the future will also have a simpler life. This labeling problem is not new. The fight to standardize the labeling of the methyl groups in Valine and Leucine was raging in the 1980's. Standardizing the labels on the PO4 groups in DNA/RNA was much more recent. It helps everyone when you know you can overlay two models and have a logical solution without a rotation matrix with a determinate of -1. Besides, you will continue to be bitten by this problem as you use other programs, until you actually swap some labels. Dale Tronrud On 07/12/12 15:00, Joel Tyndall wrote: Hi all, Thanks very much to all who responded so quickly. The fix is a one liner in the SO4.cif file (last line) SO4 chir_01 S O1 O2 O3both which I believe is now in the 6.3.0 release. Interestingly the chirality parameters were not in the SO4.cif file in 6.1.3 but then appeared in 6.2.0. Once again
Re: [ccp4bb] refining large region with multiple conformers
You have to build the model you actually believe matches what is in the crystal. Do you believe that each amino acid is occupying two conformations independent of its neighbors? I wouldn't go that way. I would start with an apo conformation, labeled with 'g', and a holo conformation including the ligand, labeled with 'h', and allow all of 'g's one occupancy value, and the 'h's another, and insist that they sum to 1.0. You may have to build water molecules in the binding site of 'g' that are displaced by the ligand in 'h'. If this, simplest of models, doesn't do the trick you have to be lead by your difference maps and chemical intuition to devise more complex models. Select the simplest model that makes sense and fits your data. It would be interesting to see if you can find a program that would allow you to restrain the ncs of the second protein chain and the 'h' conformation of your mixed model, leaving the 'g' conformation unrestrained by ncs. Dale Tronrud P.S. I'm avoiding the use of 'A' and 'B' alt locs because these are routinely used when splitting side chains but are almost never intended to imply that all 'A's are coordinated with each other and all 'B's are likewise. To be proper, the reuse of alt loc codes for unrelated conformations should not be allowed, but there are simply not enough letters to allow the rule to be enforced. On 08/07/12 07:59, Kendall Nettles wrote: Hi, We have a structure with the ligand showing two overlapping conformers. When we refine it with both conformers separately, it is pretty clear that there are substantial differences in the protein as a result, for about a third of the protein chain. My question is, would it be better to try to define alternate conformers for those specific regions, or would it be OK to refine with two entire alternate protein chains? There is also a second protein chain that shows only a single binding mode for the ligand. It's a 2.0 angstrom structure. The yellow 2Fo-Fc map goes with the green model in the attached pic. Also, do we want to let each amino acid have its own occupancy? or should one ligand copy and one chain all have the same occupancy? I'm leaning towards the latter since the differences should be directly tied to the ligand binding mode. Kendall Nettles
Re: [ccp4bb] Unexplainable Density
It is hard for me to visualize density with just screenshots - I like to rotate the image to see the 3d. This is really clear density, however, and you should be able to figure out what it is. As far as I can tell from your images it looks like half of an EDTA with the other half trailing off into the space above. Was your protein exposed to that compound somewhere along the way? Dale Tronrud On 08/08/12 04:56, Mario Sniady wrote: While building our crystal structure model we encountered density which we weren't able to assign. The unknown molecule/molecules seems to coordinate a Ni^2+ -ion. This ion is also coordinated by a histidin and probably one H_2 O-molecule. The crystals have been grown from a protein-complex including the carotinoid peridinin, the lipid DGDG and Chlorophyll. Besides this the solution in which they have been grown contains Tris pH 8.5, NiCl_2 and PEG 2000 MME. The linked images show a rotation around the density in 90°-steps (1.1A, 1.2sigma contour level). The above mentioned H_2 O-molecule has been removed. It is supposed to fill the density that is below the Ni^2+ -Ion in the first picture. Images: http://www.bioxtal.rub.de/myst.html.en Any hints are welcome =) Mario
Re: [ccp4bb] loading maps in coot using EDS
It appears that the Electron Density Server could not calculate a map for 3TVN. These cryptic messages are what you get from Coot when there is no map on the server. I can see from the RCSB web page that there is no EDS link in the Experimental Details section, which also happens when the EDS comes up empty. When the EDS fails to calculate a reasonable map for an entry they do not tell us why. If they knew what the problem was they would fix it themselves. They remain silent hoping that the authors of the entry will contact them and give them some help. It is absolutely amazing that they can calculate as many maps as they do. Dale Tronrud On 08/08/12 13:39, Shya Biswas wrote: Hi all, I was trying to get maps using the *fetch PDB and Map using EDS option* in coot, however the map would not open I am using coot version 0.6.2 was wondering if anybody else had similar problems and how to fix this, the following is the error message I get. It used to work fine with a previous version of coot. CCP4MTZfile: open_read - File missing or corrupted: coot-download/3tvn_sigmaa.mtz INFO:: not an mtz file: coot-download/3tvn_sigmaa.mtz ERROR: no f_cols! ERROR: no phi_cols! valid_labels(coot-download/3tvn_sigmaa.mtz,FOFCWT,PHFOFCWT,,0) returns 0 CCP4 library signal library_file:End of File (Error) raised in ccp4_file_raw_read System signal 0:Success (Error) raised in ccp4_file_rarch CCP4 library signal library_file:End of File (Error) raised in ccp4_file_raw_read System signal 0:Success (Error) raised in ccp4_file_readchar CCP4 library signal mtz:Read failed (Error) raised in MtzGet CCP4MTZfile: open_read - File missing or corrupted: coot-download/3tvn_sigmaa.mtz INFO:: not an mtz file: coot-download/3tvn_sigmaa.mtz ERROR: no f_cols! ERROR: no phi_cols! WARNING:: label(s) not found in mtz file coot-download/3tvn_sigmaa.mtz FOFCWT PHFOFCWT WARNING:: -1 is not a valid molecule in set_scrollable_map thanks, Shya
Re: [ccp4bb] loading maps in coot using EDS
On 08/08/12 14:31, Katherine Sippel wrote: The 3tvn coordinates/SF were released today. I'm not sure what the lag time is between the PDB and EDS but you'd probably need to download the structure factors and generate the map yourself. A very good point. I saw the deposition date of 2011 but didn't read down to the release date. The EDS does not get an advanced look at entries in the PDB. The data has to be released to the public before it can begin the calculations. This can take a couple weeks. In addition, the server, itself, appears to be down at the moment. I don't think you could download the map even if it existed. Dale Tronrud If you're not in a super rush I know the person who refined that specific PDB and I may be able to get you a copy of her final maps to send you off-board once she gets back from vacation. Cheers, Katherine On Wed, Aug 8, 2012 at 3:39 PM, Shya Biswas shyabis...@gmail.com mailto:shyabis...@gmail.com wrote: Hi all, I was trying to get maps using the *fetch PDB and Map using EDS option* in coot, however the map would not open I am using coot version 0.6.2 was wondering if anybody else had similar problems and how to fix this, the following is the error message I get. It used to work fine with a previous version of coot. CCP4MTZfile: open_read - File missing or corrupted: coot-download/3tvn_sigmaa.mtz INFO:: not an mtz file: coot-download/3tvn_sigmaa.mtz ERROR: no f_cols! ERROR: no phi_cols! valid_labels(coot-download/3tvn_sigmaa.mtz,FOFCWT,PHFOFCWT,,0) returns 0 CCP4 library signal library_file:End of File (Error) raised in ccp4_file_raw_read System signal 0:Success (Error) raised in ccp4_file_rarch CCP4 library signal library_file:End of File (Error) raised in ccp4_file_raw_read System signal 0:Success (Error) raised in ccp4_file_readchar CCP4 library signal mtz:Read failed (Error) raised in MtzGet CCP4MTZfile: open_read - File missing or corrupted: coot-download/3tvn_sigmaa.mtz INFO:: not an mtz file: coot-download/3tvn_sigmaa.mtz ERROR: no f_cols! ERROR: no phi_cols! WARNING:: label(s) not found in mtz file coot-download/3tvn_sigmaa.mtz FOFCWT PHFOFCWT WARNING:: -1 is not a valid molecule in set_scrollable_map thanks, Shya
Re: [ccp4bb] comparing differences across multiple structures of the same protein
I believe that the definition of significant for crystallographic data should be based on the difference map. If a shift of that magnitude causes a feature to appear in the map, then the crystal data is driving the shift. If you can have a shift that large, for the particular atoms in question, and the difference map remains flat then the crystal data doesn't care. A refinement program will move an atom for lots of reasons in addition to the diffraction data, sometimes for no reason at all (simulated annealing, for example). The difference map is a pure expression of the will of the diffraction data. The most sensitive calculation is the F(holo)-F(apo) map, but this requires isomorphous crystals. It might be possible to paste into the holo model a couple residues from the apo model, refine all parameters except the position of these atoms, and see if the Fo-Fc map objects. Remember, a lysine on the surface can probably be built in twenty different conformations and the difference map flat in every case while a couple atoms elsewhere could have a shift of 0.1 A that lights up the map. There are no generic cut-offs or thresholds that work. Dale Tronrud On 9/10/2012 9:01 PM, Michael Murphy wrote: I am trying to compare structures of the same protein in the apo form and when bound to several different ligands. There are differences, but they are subtle and I am unsure whether they are actually significant or just do to coordinate error or something similar. Is there a theoretical minimum (in Angstroms maybe?) that a side chain or secondary structure element needs to be displaced by between structures to be considered to be real? This may depend on resolution/B-factors as well? Phenix reports overall coordinate error for each structure, but this must vary for at least a bit for certain amino acid residues just like B-factors do.
Re: [ccp4bb] Strange density
Hi, These sorts of questions are always difficult, particularly in the absence of any information about the protein or the contents of the mother liquor. If the carbonyl you are talking about is the little magenta dot visible through the hole in your blob, this could be a metal atom with some long chelating molecule around the equator. In the extreme it could be some sort of porphyrin, although the density would be very poor if it was. Dale Tronrud On 11/28/2012 7:48 AM, Read, Jon wrote: Anyone see anything like this before? The data is 1.7Angstrom data with good statistics. The picture shows the solid FoFc density contoured at 3 Sigma in light brown and -3 Sigma in purple. The density is odd as it appears to be bound to a peptide carbonyl with no other obvious interactions with the protein. There is a characteristic tail at one end. AstraZeneca UK Limited is a company incorporated in England and Wales with registered number: 03674842 and a registered office at 2 Kingdom Street, London, W2 6BD. *Confidentiality Notice: *This message is private and may contain confidential, proprietary and legally privileged information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorised use or disclosure of the contents of this message is not permitted and may be unlawful. *Disclaimer:* Email messages may be subject to delays, interception, non-delivery and unauthorised alterations. Therefore, information expressed in this message is not given or endorsed by AstraZeneca UK Limited unless otherwise notified by an authorised representative independent of this message. No contractual relationship is created by this message by any person unless specifically indicated by agreement in writing other than email. *Monitoring: *AstraZeneca UK Limited may monitor email traffic data and content for the purposes of the prevention and detection of crime, ensuring the security of our computer systems and checking compliance with our Code of Conduct and policies.
Re: [ccp4bb] Strange density
Actually one can make a lot of sense about e/A^3 in the absence of F(000). You actually think of the density as difference from the average and not an absolute measurement. For an Fo-Fc style map the F(000) term is simply the difference between the number of missing electrons in the model and the number of extra electrons. Since we are probably missing all data of resolution lower than about 20 A because of the beamstop the model defects are only counted if they are within about 20 A or so of the point you are looking at. In the latter stages of refinement, when one is trying to identify strange density, the rest of the model should be pretty good and the expected mean value of the difference map very near zero. Of course your model is missing atoms for the blob itself so the difference density will tend to sink, resulting in somewhat lower peaks and negative density around the edges but this effect is usually not huge. On the other hand, contouring based on rmsd (i.e. sigma ack!) causes huge differences depending on the other things that are going on in your map. The rmsd of your first difference map can be many times larger than it is in your last. The density for a missing water molecule contoured at 3 rmsd in the first map will look very different than the same water molecule contoured at 3 rmsd in the last map. That water molecule contoured at, say, 0.18 e/A^3 would look pretty much the same. In the first difference map that water molecule will be surrounded by a huge number of other features when you contour at 0.18 e/A^3 and by very few in the last map, but isn't that as it should be? The map is supposed to be flatter at the end. Dale Tronrud On 11/28/12 12:30, Pavel Afonine wrote: For map in e-/A^3 units to make sense one needs to obtain F000, which may be more tricky than one may think. Interesting, how Coot does this given just a set of Fourier map coefficients? Pavel On Wed, Nov 28, 2012 at 12:21 PM, Greg Costakes gcost...@purdue.edu mailto:gcost...@purdue.edu wrote: You stated that the map is set to 3 sigma, but what is the e-/A^3? In Coot I often find that my fo-fc map needs to be maxed out (max sigma) in order to get to an acceptable e-/A^3. It is possible that your fo-fc map at 3 sigma has an e-/A^3 of 0.04 or something low like that. --- Greg Costakes PhD Candidate Department of Structural Biology Purdue University Hockmeyer Hall, Room 320 240 S. Martin Jischke Drive, West Lafayette, IN 47907 *From: *Jon Read jon.r...@astrazeneca.com mailto:jon.r...@astrazeneca.com *To: *CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK *Sent: *Wednesday, November 28, 2012 10:48:04 AM *Subject: *[ccp4bb] Strange density Anyone see anything like this before? The data is 1.7Angstrom data with good statistics. The picture shows the solid FoFc density contoured at 3 Sigma in light brown and -3 Sigma in purple. The density is odd as it appears to be bound to a peptide carbonyl with no other obvious interactions with the protein. There is a characteristic tail at one end. AstraZeneca UK Limited is a company incorporated in England and Wales with registered number: 03674842 and a registered office at 2 Kingdom Street, London, W2 6BD. *Confidentiality Notice: *This message is private and may contain confidential, proprietary and legally privileged information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorised use or disclosure of the contents of this message is not permitted and may be unlawful. *Disclaimer:* Email messages may be subject to delays, interception, non-delivery and unauthorised alterations. Therefore, information expressed in this message is not given or endorsed by AstraZeneca UK Limited unless otherwise notified by an authorised representative independent of this message. No contractual relationship is created by this message by any person unless specifically indicated by agreement in writing other than email. *Monitoring: *AstraZeneca UK Limited may monitor email traffic data and content for the purposes of the prevention and detection of crime, ensuring the security of our computer systems and checking compliance with our Code of Conduct and policies.
Re: [ccp4bb] Strange density
No such luck! If one calculated the Root Mean Square Deviation from the Mean then F(000) makes no difference, but everyone I know calculates the Deviation from 0.0. I guess that makes it an rms and not an rmsd. We can use maps calculated w/o the F(000) because we are generally more interested in the shape of the density than its height. We use the shape to come up with interpretations, which is the hard part. The height can give us a clue about the occupancy an atom would have if we refined one there - It is a shortcut to avoid building and refining models that are destined to be nonsense. If the molecule you are building will refine to an occupancy of 0.1 you could spend you time better by doing something else. Dale Tronrud On 11/28/12 15:13, Lijun Liu wrote: F000 contributes to the whole map as a level (F000/V). If calculated with a only difference of with or w/o F000, should the sigma levels of the two maps be the same? That is why we could rely on maps for modeling that are calculated w/o F000 item. Lijun On Nov 28, 2012, at 2:30 PM, Pavel Afonine wrote: For map in e-/A^3 units to make sense one needs to obtain F000, which may be more tricky than one may think. Interesting, how Coot does this given just a set of Fourier map coefficients? Pavel On Wed, Nov 28, 2012 at 12:21 PM, Greg Costakes gcost...@purdue.edu mailto:gcost...@purdue.edu wrote: You stated that the map is set to 3 sigma, but what is the e-/A^3? In Coot I often find that my fo-fc map needs to be maxed out (max sigma) in order to get to an acceptable e-/A^3. It is possible that your fo-fc map at 3 sigma has an e-/A^3 of 0.04 or something low like that. --- Greg Costakes PhD Candidate Department of Structural Biology Purdue University Hockmeyer Hall, Room 320 240 S. Martin Jischke Drive, West Lafayette, IN 47907 *From: *Jon Read jon.r...@astrazeneca.com mailto:jon.r...@astrazeneca.com *To: *CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK *Sent: *Wednesday, November 28, 2012 10:48:04 AM *Subject: *[ccp4bb] Strange density Anyone see anything like this before? The data is 1.7Angstrom data with good statistics. The picture shows the solid FoFc density contoured at 3 Sigma in light brown and -3 Sigma in purple. The density is odd as it appears to be bound to a peptide carbonyl with no other obvious interactions with the protein. There is a characteristic tail at one end. AstraZeneca UK Limited is a company incorporated in England and Wales with registered number: 03674842 and a registered office at 2 Kingdom Street, London, W2 6BD. *Confidentiality Notice: *This message is private and may contain confidential, proprietary and legally privileged information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorised use or disclosure of the contents of this message is not permitted and may be unlawful. *Disclaimer:* Email messages may be subject to delays, interception, non-delivery and unauthorised alterations. Therefore, information expressed in this message is not given or endorsed by AstraZeneca UK Limited unless otherwise notified by an authorised representative independent of this message. No contractual relationship is created by this message by any person unless specifically indicated by agreement in writing other than email. *Monitoring: *AstraZeneca UK Limited may monitor email traffic data and content for the purposes of the prevention and detection of crime, ensuring the security of our computer systems and checking compliance with our Code of Conduct and policies.
Re: [ccp4bb] archival memory?
Good luck on your search in 100 years for a computer with a USB port. You will also need software that can read a FAT32 file system. Dale Glad I didn't buy a lot of disk drives with Firewire Tronrud On 12/12/2012 1:02 PM, Richard Gillilan wrote: SanDisk advertises a Memory Vault disk for archival storage of photos that they claim will last 100 years. (note: they do have a scheme for estimating lifetime of the memory, Arrhenius Equation ... interesting. Check it out: www.sandisk.com/products/usb/memory-vault/ and click the Chronolock tab.). Has anyone here looked into this or seen similar products? Richard Gillilan MacCHESS
Re: [ccp4bb] archival memory?
I don't believe there is a solution that does not involve active management. You can't write your data and pick up those media 25 years later and expect to get your data back -- not without some heroic effort involving the construction of your own hardware. I have data from Brian Matthews' lab going back to the mid-1970's and those data started life on 7-track mag tapes. I've moved them from there to 9-track 1600 bpi tapes, to 9-track 6250 bpi tapes, to just about every density of Exabyte tape, to DVD, and most recently to external magnetic hard drives (each with USB, Firewire, and eSATA interfaces). The hard drives are about five years old and so far are holding up. Last time I checked I could still read the 10 year old DVD's. I'm having real trouble reading Exabyte tapes. Write your data to some medium that you expect to last for at least five years but anticipate that you will then have to move them to something else. Instead of spending time working on the 100 year solution you should spend your time annotating your data so that someone other than you can figure out what it is. Lack of annotation and editing is the biggest problem with old data. Dale Tronrud P.S. If someone needs the intensities for heavy atom derivatives of Thermolysin written in VENUS format, I'm your man. On 12/12/2012 1:57 PM, Richard Gillilan wrote: Better option? Certainly not TAPE or electromechanical disk drive. CD's and DVD's don't last nearly that long and James Holton has pointed out. I suppose there might be a cloud solution where you rely upon data just floating around out there in cyberspace with a life of its own. Richard On Dec 12, 2012, at 4:41 PM, Dale Tronrud wrote: Good luck on your search in 100 years for a computer with a USB port. You will also need software that can read a FAT32 file system. Dale Glad I didn't buy a lot of disk drives with Firewire Tronrud On 12/12/2012 1:02 PM, Richard Gillilan wrote: SanDisk advertises a Memory Vault disk for archival storage of photos that they claim will last 100 years. (note: they do have a scheme for estimating lifetime of the memory, Arrhenius Equation ... interesting. Check it out: www.sandisk.com/products/usb/memory-vault/ and click the Chronolock tab.). Has anyone here looked into this or seen similar products? Richard Gillilan MacCHESS
Re: [ccp4bb] archival memory?
On 12/12/2012 3:19 PM, Bosch, Juergen wrote: Hey Dale, you really should get your personal RAID with hot swappable discs, since you don't like Firewire, how about Thunderbolt and a Pegasus RAID with 6 bays ? If a drive fails you replace it with a new one. Last summer someone in the lab above ours decided they needed a full sink of water. Before this task was complete they decided they needed to go home. The resulting flood destroyed the contents of the desks of two of our lab members. That was a lot of paper that didn't make 100 years - including a Handbook of Chemistry and Physics that had almost made 60. If the lab RAID had been under the waterfall it would have lost all of its drives in one go. I don't know how big a RAID number you have to have to survive that, but RAID-5 isn't going to do it. I have run a flash drive through my washing machine a couple times and it is still going strong so I have high hopes for solid-state memory. It will be several years before 1 TB SSD's drop in price enough for the next move of my little archive. The SanDisk Memory Vault that started this thread maxes out at 16 GB. Dale Tronrud By the way if anybody has a functional DAT4 tape drive, could I send you one to read out a tape with some data ? If so, then off list reply would be nice, thanks. Jürgen On Dec 12, 2012, at 5:22 PM, Dale Tronrud wrote: I don't believe there is a solution that does not involve active management. You can't write your data and pick up those media 25 years later and expect to get your data back -- not without some heroic effort involving the construction of your own hardware. I have data from Brian Matthews' lab going back to the mid-1970's and those data started life on 7-track mag tapes. I've moved them from there to 9-track 1600 bpi tapes, to 9-track 6250 bpi tapes, to just about every density of Exabyte tape, to DVD, and most recently to external magnetic hard drives (each with USB, Firewire, and eSATA interfaces). The hard drives are about five years old and so far are holding up. Last time I checked I could still read the 10 year old DVD's. I'm having real trouble reading Exabyte tapes. Write your data to some medium that you expect to last for at least five years but anticipate that you will then have to move them to something else. Instead of spending time working on the 100 year solution you should spend your time annotating your data so that someone other than you can figure out what it is. Lack of annotation and editing is the biggest problem with old data. Dale Tronrud P.S. If someone needs the intensities for heavy atom derivatives of Thermolysin written in VENUS format, I'm your man. On 12/12/2012 1:57 PM, Richard Gillilan wrote: Better option? Certainly not TAPE or electromechanical disk drive. CD's and DVD's don't last nearly that long and James Holton has pointed out. I suppose there might be a cloud solution where you rely upon data just floating around out there in cyberspace with a life of its own. Richard On Dec 12, 2012, at 4:41 PM, Dale Tronrud wrote: Good luck on your search in 100 years for a computer with a USB port. You will also need software that can read a FAT32 file system. Dale Glad I didn't buy a lot of disk drives with Firewire Tronrud On 12/12/2012 1:02 PM, Richard Gillilan wrote: SanDisk advertises a Memory Vault disk for archival storage of photos that they claim will last 100 years. (note: they do have a scheme for estimating lifetime of the memory, Arrhenius Equation ... interesting. Check it out: www.sandisk.com/products/usb/memory-vault/ http://www.sandisk.com/products/usb/memory-vault/ and click the Chronolock tab.). Has anyone here looked into this or seen similar products? Richard Gillilan MacCHESS .. Jürgen Bosch Johns Hopkins University Bloomberg School of Public Health Department of Biochemistry Molecular Biology Johns Hopkins Malaria Research Institute 615 North Wolfe Street, W8708 Baltimore, MD 21205 Office: +1-410-614-4742 Lab: +1-410-614-4894 Fax: +1-410-955-2926 http://lupo.jhsph.edu
Re: [ccp4bb] engh huber
There was an update by EH in 2001 in the International Tables Vol F. There are a small number of modifications to the 1991 values in the update as well as the addition of several conformational variabilities. If I understand correctly, Refmac and Phenix use the 2001 values, with the only conformational variability being some changes with cis-peptide bonds. Shelxl still uses EH 1991. Dale Tronrud On 01/14/13 09:54, Ed Pozharski wrote: To what extent modern geometric restraints have been upgraded over original EnghHuber? And where I can find a consensus set of values (with variances)? For example, Fisher et al., Acta D68:800 discusses how histidine angles change with protonation, and refers to EnghHuber when it says that ND1-CE1-NE2 goes from 111.2 to 107.5 when histidine acquires positive charge (Fig.6). But angle table (Table 3) in original EnghHuber from 1991 does not have any 107.5 value and seems to suggest that the numbers should rather be 111.7+-1.3 and 108.4+-1.0, respectively. I understand that these values are derived from structural databases and thus can be frequently updated. Is there some resource where most current values would be listed? Cheers, Ed.
Re: [ccp4bb] how many metal sites
Zn is a very electron rich atom so a 2.3 A resolution data set should be a fine experiment to determine the number of fully occupied metal sites. It is always hard to be sure about screen shots of density, but it looks to me that you only have evidence for one zinc here. In my opinion, it is not useful to build models that don't make sense. Your zinc cluster does not make chemical sense to me and the atoms are not in the density. I suspect that you built this cluster, and not the obvious model with fewer zinc atoms, simply because you wanted to match the magic number of four. Use the things you know with confidence as your guide. Dale Tronrud On 01/16/13 11:15, ruisher hu wrote: Hi, Dear All, I recently got a dataset about 2.3 A resolution, however, I got some trouble assigning the metal sites. It suppose to have multiple binding site(possibly four) around those four glu residues in the center (see the attached figure), however, it shows up a huge single positive density ,clustered in the binding center. The signal is pretty strong and I think zn is definitely there. When I tried to put four zns around, the geometry doesn't look very good and there is still some positive density in the center (although get weaker) and the bfactor of metals are high like 100. Does anyone know what's going on?Does it mean only one single site in the middle?Or maybe just metals are too mobile? What's the best way to tell how many metal sites are actually there?Which experiment can I use to test? Thanks very much. Best, R On Wed, Nov 7, 2012 at 9:29 AM, SD Y ccp4...@hotmail.com mailto:ccp4...@hotmail.com wrote: Dear all, I have a related question to the one I have posted low resolution and SG, on which I am still working based on the suggestions I have got. The model I have used, has Zn co-ordinated well in tetrahydral fashion by 3 cys and 1 His residues. They have add Zn in to their experiment. In my 3.4 A structure (I am still working on right SG), initial maps show very strong positive density (sigma=6.5) at the place of Zn ( https://www.dropbox.com/s/4jd6gdor87ab9lj/Zn-coordination.png). I have not used Zn in my experiment. I could only suspect Tryptone and yeast extract which I used to make media. I would like to know how likely this positive density belongs to Zn? How to reason the presence of Zn when its not been used? Is there is any way to confirm if its Zn. If this is not Zn, what else could it be? Any thing I could try to rule out or in Zn or other ions. I appreciate your help and suggestions. Sincerely, SDY
Re: [ccp4bb] CCP4 Update victim of own success
FYI I have a small herd of computers here and find it cumbersome to ssh to each and fire up ccp4i just to update the systems. ccp4i takes a while to draw all those boxes (particularly over ssh) and leaves files behind in my disk areas on computers that I'm not likely to, personally, run crystallographic computations. I much prefer to simply run ccp4um from the command line. In fact, I would rather put it in cron and forget about it -- and I expect that is what --check-silent is for. The usage statement, however, doesn't explicitly say that this installs the new updates it finds. I'll have to experiment a bit. Dale Tronrud On 04/11/2013 05:17 AM, eugene.krissi...@stfc.ac.uk wrote: Sorry that this was unclear. We assume that updater is used primarily from ccp4i, where nothing changed (and why it should be used from command line at all ?:)). The name was changed because it is reserved in Windows, which caused lots of troubles. Now it will stay as is. Eugene On 11 Apr 2013, at 05:16, James Stroud wrote: On Apr 10, 2013, at 9:30 PM, eugene.krissi...@stfc.ac.ukmailto:eugene.krissi...@stfc.ac.uk eugene.krissi...@stfc.ac.ukmailto:eugene.krissi...@stfc.ac.uk wrote: No, it got renamed to ccp4um :) That should have been written in update descriptions, was it not? There was only one mention of ccp4um that I could find in all update descriptions that I found (6.3.0-020). I only figured out what information was trying to be communicated because of your message (see attachment). James um-what.png On 11 Apr 2013, at 03:54, James Stroud wrote: Hello All, I downloaded a crispy new version of CCP4 and ran update until the update update script disappeared. Is the reason that CCP4 has reached its final update? James
Re: [ccp4bb] problem with anisotropic refinement using refmac
As I see it, the size of the test set is a question of the desired precision of the free R. At the point of test set selection there is variability between the many possible choices: you could happen to pick a test set with a spuriously low free R or one with an unfortunately high free R. These variations don't indicate anything about the quality of your model because you haven't created one yet. It is just statistical fluctuations. To investigate this I selected several structures I had worked on, where I had an unrefined starting model. For each structure I picked a percentage for the size of the test set and started a loop of test set selection and free R calculations (Since there had been no refinement yet the R value for the whole data set is the true free R: all free R's calculated from subsets are just estimates. For each percentage of each structure I selected 900 test sets. The result is that the variance of the free R estimate is not a function of the size of the protein, the space group, the solvent content, the magnitude of free R (only checked between about 35% and 55%), nor the size of the test set measured as a percent. It is simply a function of the number of reflections in the test set. As Axel said in his paper, a test set of 1000 reflections has a precision of about 1%. (and varies with counting statistics: 1/Sqrt[n]) If you have a test set of 1000 reflections and your free R estimate is 40% you have a 95% confidence that the true free R is between 43% and 37%, if I recall my confidence intervals correctly. The open question is how these deviants track with refinement. If you luck out and happen to pick a test set with a particularly low free R (estimate) does this mean that all your future free R's will look, inappropriately, too good? I suspect so, but I have not done the test of performing 900 independent refinements with differing test sets. My short answer to the original question: The precision of a R free estimate is determined by the number of reflections, not the percent of the total data set. Your 0.3% test set is as precise as a 10% test set in HEWL. (Even though the affect of leaving these reflections out of the refinement will be quite different, of course.) Dale Tronrud Andreas Forster wrote: Hey all, let me give this discussion a little kick and see if it spins into outer space. How many reflections do people use for cross-validation? Five per cent is a value that I read often in papers. Georg Zocher started with 5% but lowered that to 1.5% in the course of refinement. We've had problems with reviewers once complaining that the 0.3% of reflections we used were not enough. However, Axel Brünger's initial publication deems 1000 reflections sufficient, and that's exactly what 0.3% of reflections corresponded to in our data set. I would think the fewer observations are discarded, the better. Can one lower this number further by picking reflections smartly, eg. avoiding symmetry-related reflections as was discussed on the ccp4bb a little while back? Should one agonize at all, given that one should do a last run of refinement without any reflections excluded? Andreas On 1/31/07, *Georg Zocher* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: First of all, I would like to thank you for your comments. After consideration of all your comments, I conclude that there are three possibilities. 1.) search for some particularly poorly-behaved regions using parvati-server a.) refining the occupancy of that atoms and/or b.) tightening the restraints Problems which have already been metioned: If I tighten the restraints, the anisotropic model may not be statistically justified, which seems to be the case. Using all reflections may not help that much, because I chose a set of 1.5% for Rfree (~1300 reflections) to get as much data as possible for the refinement. For my first tries of anisotropic refinement I used 5% of the reflections for Rfree but the same problem arose, so that I decided to cut the Rfree to 1.5%. 2.) Using shelxl 3.) TLS with multi-groups Should be the safe way!? I will try all the possiblities, but especially the tls refinement seems to be a good option to be worthy to try. Thanks for your helpful advices, georg
[ccp4bb] Converting TNT HKL Files to MTZ using CCP4I in CCP4 6.0.2
Hi, I have a basic operational problem. I am trying to convert a pair of TNT HKL format files into an MTZ file using the CCP4I interface. (I realize this is probably not the most heavily exercised code in the package.) When I run the task I get the following output. #CCP4I VERSION CCP4Interface 1.4.4.2 #CCP4I SCRIPT LOG import #CCP4I DATE 08 Feb 2007 22:59:11 #CCP4I USER dale #CCP4I PROJECT TST #CCP4I JOB_ID 17 #CCP4I SCRATCH /tmp/dale #CCP4I HOSTNAME terbium.uoregon.edu #CCP4I PID 18342 #CCP4I TERMINATION STATUS 0 Error from script /usr/local/ccp4-6.0.2/ccp4i/scripts/import.script: can't read fo: no such variable #CCP4I TERMINATION TIME 08 Feb 2007 22:59:11 #CCP4I MESSAGE Task failed I'm afraid I'm not fluent in TCL/TK. Can anyone tell me what I should do to get this working? Thanks, Dale Tronrud
Re: [ccp4bb] Summary - Valid to stop Refmac after TLS refinement?
Bernhard Rupp wrote: People also felt that the RMSD bond/angle of 0.016/1.6 was still a little high. This was subject of a discussion before on the board and I still don't understand it: If I recall correctly, even in highly accurate and precise small molecule structures, the rmsd of corresponding bonds and angles are ~0.014A and 1.8deg. It always seems to me that getting these values much below is not a sign of crystallographic prowess but over-restraining them? Is it just that - given good resolution in the first place - the balance of restraints (matrix weight) vs low R (i.e., Xray data) gives the best Rfree or lowest gap at (artificially?) lower rmsd? Is that then the best model? I understand that even thermal vibration accounts for about 1.7 deg angle deviation - are lower rmsd deviations then a manifestation of low temp? But that does not seem to be much of an effect, if one looks at the tables from the CSD small mol data (shown in nicely in comparison to the 91 Engh/Huber data in Tables F, pp385). This is an on-going topic of discussion so let me put in my two cents. We calculate libraries of ideal geometry based on precise, small molecule structures. When these small molecule crystal structures are compared to our derived libraries they are found to contain deviations. These deviations are larger than the uncertainty in these models and are presumed to reflect real features of the molecule; perturbations due to the local environment in the crystal. These same perturbations are present in our crystals and we should expect to find deviations from ideal geometry on the same scale as that seen in the precise models. This expectation lead to the practice in the 1980's of setting r.m.s. targets of 0.02A and 3 degrees for agreement to bond length and angle libraries. While this seems quite reasonable, we are left with the question: Are the deviations from ideal geometry we see in a particular model in any way related to the actual deviations of the molecule in the crystal? The uncertainties (su's) of the bond lengths in a model based on 4A diffraction data are huge compared to the absolute value of the true deviation. For example, if the model had a deviation from ideal geometry of 0.02A but the uncertainty of the distance is 0.2A can we say that we have detected a signal that is significantly different than zero, the null hypothesis? If we have a model with a collection of deviations from ideal geometry but we have no expectation that those deviations are indicative of the true deviations of the molecule in the crystal, are those deviations serving any purpose? If they do not reflect any property of the crystal they are noise and should be filtered out. By this argument a model based on 4A resolution diffraction data should have no deviation from idea geometry while one based on 0.9A diffraction data should have no restraints on ideal geometry since the deviations are probably all real and significant (except for specific regions of the molecule that have problems). The problem we all face is the vast area between these extremes, compounded by our inability to calculate proper uncertainties for the parameters of our models. The free R is our current tool-of-choice when it comes to attempting to judge the statistical significance of aspects of our model, without performing proper statistical tests which we don't know how to do. If we allow our model the freedom to deviate from our library and the free R improves a significant (??) amount then the resulting deviations must have some similarity to the true deviations in the crystal, but if the free R does not improve then the deviations must not be related to reality and should be suppressed. This is the type of assumption we make whenever we use the free R to make a choice. What we end of doing is not making a yes/no decision but instead we variably suppress the amplitude of the deviations from idea geometry and that is harder to justify. I think a reasonable argument can be made, but I have already written too many words in this letter. It doesn't really matter because we left the road of mathematical rigor when we took the R free path. Unfortunately, many people have ignored what Brunger said in Methods in Enzymology about choosing your X-ray/geometry weight based on the free R and just starting saying the rms bond length deviation must be 0.007A. The deviations from idea geometry of your model should be no more or no less than what you can justifiably claim is a reflection of the true state of the molecule in your crystal. Dale Tronrud
Re: [ccp4bb] DANO from PDB
Wouldn't it be reasonable to use the sigma one calculates from the sigmaA? That sigma would reflect the uncertainty in the calculated structure factor amplitude due to the uncertainty in the parameters in your model. Of course, one then realizes that you should down weight you structure factors amplitudes with sigmaA too. Then you would have a set of structure factors amplitudes and sigmas that reflects the uncertainties of your model. If you don't believe in the idea of sigmaA's cloud of possible atoms and just want the structure factors of your PDB file, as though you know all the parameters to infinite precision, your sigma would only be non-zero because of uncertainties due to numerical problems in the Fourier Transform. These sigmas would be very small, in most cases, and be determined by the method you used to perform the calculation. This is probably not a useful solution. Dale Tronrud Peter Adrian Meyer wrote: I add a fake sigma column for each data column because so many programs require one. This is slightly tangential, but does anyone know of a good way to generate semi-realistic sigma values for calculated/simulated data? The best I've been able to do is borrow from an experimental dataset of the same protein (after scaling), but that doesn't work unless you've got an experimental dataset corresponding to your simulated one. I also tried a least-squares fit (following a reference I don't have in front of me...this was a while ago), which didn't result in a good fit for our data. Pete Pete Meyer Fu Lab BMCB grad student Cornell University
Re: [ccp4bb] Ligand fitting in COOT and SHELX refinement
U Sam wrote: Hi I would like to know following issue for a ligand. A ligand of a long alkyl chain can have multiple conformation. In coot in order to fit any protein residues into difference Density, we can select a specific rotamer conformation and refine. For fitting ligand of above kind, how does it work out. For amino acids there are tens of thousands of examples from which one can derive rotomer libraries. There is no such luck with most other compounds. This is why Coot has special case code for handling amino acids that does not understand your (or my) favorite molecules. Fortunately, Coot does not require such information to run its real space refinement. You do need a cif definition that includes, amongst other things, the ideal bond lengths and bond angles. You can work with Coot to build the conformation of your molecule that fits your density. All conformations will be consistent with the same bonds and angles, unless you have a very strange molecule. Taking the PDB with ligand when we go to refine in SHELX, how restraints like (DFIX, DANG etc) are mentioned for such kind of ligand which can have multiple conformation (particularly for the long alkyl chain) and during refinement values can deviate a lot from a particular value taken from the literature. SHELXL will take whatever conformation you build and come up with a model that is consistent with the values on the DFIX and DANG statements. It should never produce a model that deviates from the literature values, if you put those values on your DFIX and DANG statements. The final model will have a configuration similar to what you built in Coot. Use Coot to make the big changes and SHELXL to fine tune. I have been refining some long chain hydrocarbons along with my protein and have had no problems, once I was able to create the proper definitions for Coot and SHELXL. SHELXL is certainly easier to create a library for, but you need both if you want to model build and refine. Building a cif with ideal geometry for Coot/Refmac is not an easy task. You need to understand your chemistry and the file format, which is not well documented. You have several options: 1) You can sit down and figure out how to create a cif definition. This is hard to do but a valuable skill to acquire. 2) You can find a compound similar to yours for which there is a definition built into Coot and modify it for your needs. You still need to understand the file format, but you can get away with less understanding because you are starting with something that works. 3) You can use web resources to find/create a file for you. A number of options are available, none of which I would trust completely. The HIC-UP website is perhaps the most popular, but the values are quite unreliable. These files can be used as a starting point but always verify... The Elbow builder in Phenix is quite reasonable, but takes a bit of study to understand, and again, don't trust it. Remember the quote from the Harry Potter books: Never trust anything you can't see where it keeps its brains. I appreciate suggestion and comments. Many Thanks Sam _ With Windows Live Hotmail, you can personalize your inbox with your favorite color. www.windowslive-hotmail.com/learnmore/personalize.html?locale=en-usocid=TXT_TAGLM_HMWL_reten_addcolor_0607
Re: [ccp4bb] The importance of USING our validation tools
In the cases you list, it is clearly recognized that the fault lies with the investigator and not the method. In most of the cases where serious problems have been identified in published models the authors have stonewalled by saying that the method failed them. The methods of crystallography are so weak that we could not detect (for years) that our program was swapping F+ and F-. The scattering of X-rays by bulk solvent is a contentious topic. We should have pointed out that the B factors of the peptide are higher then those of the protein. It appears that the problems occurred because these authors were not following established procedures in this field. They are, as near as I can tell, somehow immune from the consequences of their errors. Usually the paper isn't even retracted, when the model is clearly wrong. They can dump blame on the technique and escape personal responsibility. This is what upsets so many of us. It would be so refreshing to read in one of these responses We were under a great deal of pressure to get our results out before our competitors and cut corners that we shouldn't have, and that choice resulted in our failure to detect the obvious errors in our model. If we did see papers retracted, if we did see nonrenewal of grants, if we did see people get fired, if we did see prison time (when the line between carelessness and fraud is crossed), then we could be comforted that there is practical incentive to perform quality work. Dale Tronrud Edwin Pozharski wrote: Mischa, I don't think that the field of nanotechnology crumbled when allegations against Jan Hendrik Schon (21 papers withdrawn, 15 in Science/Nature) turned out to be true. I don't think that nobody trusts biologists anymore because of Eric Poehlman (17 falsified grants, 10 papers with fabricated data, 12 month in prison). We are still excited to hear about stem cell research despite of what Hwang Woo-suk did or didn't do. What recent events demonstrate is that in macromolecular crystallography (and in science in general) mistakes, deliberate or not, will be discovered. Ed. Mischa Machius wrote: Due to these recent, highly publicized irregularities and ample (snide) remarks I hear about them from non-crystallographers, I am wondering if the trust in macromolecular crystallography is beginning to erode. It is often very difficult even for experts to distinguish fake or wishful thinking from reality. Non-crystallographers will have no chance at all and will consequently not rely on our results as much as we are convinced they could and should. If that is indeed the case, something needs to be done, and rather sooner than later. Best - MM Mischa Machius, PhD Associate Professor UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd.; ND10.214A Dallas, TX 75390-8816; U.S.A. Tel: +1 214 645 6381 Fax: +1 214 645 6353
Re: [ccp4bb] Questions about diffraction
Michel Fodje wrote: Dear Crystallographers, Here are a few paradoxes about diffraction I would like to get some answers about: ... 3. What happens to the photon energy when waves destructively interfere as mentioned in the text books. Doesn't 'destructive interference' appear to violate the first and second laws of thermodynamics? Besides, since the sources are non-coherent, how come the photon 'waves' don't annihilate each other before reaching the sample? If they were coherent, would we just end up with a single wave any how? With what will it interfere to cause diffraction? For every direction where there is destructive interference and a loss of energy there is a direction where there is constructive interference that piles up energy. If you integrate over all directions energy is conserved. I'm not sure what your concern is about the second law. The radiation is spreading out into space and so entropy increases. Dale Tronrud
Re: [ccp4bb] Questions about diffraction
Michel Fodje wrote: For every direction where there is destructive interference and a loss of energy there is a direction where there is constructive interference that piles up energy. If you integrate over all directions energy is conserved. For the total integrated energy to be conserved, energy will have to be created in certain directions to compensate for the loss in other directions. So in a direction in which the condition is met, the total will have to be more than the sum of the waves in that direction. How about considering the possibility that all photons coming into the sample are diffracted -- just in different directions. So that what is happening is not constructive and destructive interference but a kind sorting of the photons based on a certain property of the photons, maybe the phase. You seem to be operating under the impression that there are two diffracting waves that later destructively interfere. All constructive and destructive interference occurs at the point of scattering. There is no energy that heads off in a direction that later disappears - Nothing ever went in that direction. You have the same problem with your idea of two waves, out of phase but identical in wavelength, that scatter off an electron. The two waves, if they are coherent, would interfere with each other long before they reach the electron and become a single wave. If they are not coherent they will interact with the scatter independently and produce incoherent diffraction waves, which will sum by intensity independent of phase. I can get into deep trouble with this next point so I hope a physicist jumps on me where I'm wrong. All light sources are coherent to a degree. A laser is pretty much 100% coherent and my pocket flashlight is hardly coherent at all. I seem to recall that there is a parameter called the coherence length that measures the distance within a beam that the light is coherent. The coherence length of a rotating anode X-ray generator is small but unit cells are smaller so there are plenty of unit cells to form a nice diffraction pattern. Your second paragraph is just the Copenhagen Interpretation of the wave function. If you want to think of photons then the diffraction wave we are talking about is the wave function and that function maps the probability of finding a photon. Wave/particle duality says we can look at the experiment either way. Dale Tronrud
Re: [ccp4bb] alternating strong/weak intensities in reciprocal planes - P622
On possibility for #5, the B factors all dropping to the lower limit during refinement. If you are including all of your low resolution data (which you should) but have not used a model for the bulk solvent scattering of X-rays (which would be bad) then you will observe this result. The refinement program will attempt to overestimate the amplitudes of the high resolution Fc's to match the overestimated low resolution Fc's. Check you log files to ensure you bulk solvent correction is operating correctly. Dale Tronrud Jorge Iulek wrote: Dear all, Please, maybe you could give some suggestions to the problem below. 1) Images show smeared spots, but xds did a good job integrating them. The cell is 229, 229, 72, trigonal, and we see alternating strong and weak rows of spots in the images (spots near each other, but rows more separated, must be by c*). They were scaled with xscale, P622 (no systematic abscences), R_symm = 5.3 (15.1), I/sigI = 34 (14) and redundancy = 7.3 (6.8), resolution 2.8 A. Reciprocal space show strong spots at h, k, l=2n and weak spots at h, k, l=2n+1 (I mean, l=2n intensities are practically all higher than l=2n+1 intensities, as expected from visual inspection of the images). Within planes h, k, l=2n+1, the average intensity is clearly and much *higher at high resolution than at low resolution*. Also, within planes h, k, l=2n, a subjective observation is that average intensity apparently does not decay much from low to high resolution. The data were trucated with truncate, which calculated Wilson B factor to be 35 A**2. 2) Xtriage points a high (66 % of the origin) off-origin Patterson peak. Also, ML estimate of overall B value of F,SIGF = 25.26 A**2. 3) I suspect to have a 2-fold NCS parallel to a (or b), halfway the c parameter, which is almost crystallographic. 4) I submitted the data to the Balbes server which using pseudo-translational symmetry suggested some solutions, one with a good contrast to others, with a 222 tetramer, built from a structure with 40 % identity and 58% positives, of a well conserved fold. 5) I cannot refine below 49 % with either refmac5, phenix.refine or CNS. Maps are messy, except for rather few residues and short stretches near the active site, almost impossible for rebuilding from thereby. Strange, to me, is that all programs freeze all B-factors, taking them the program minimum (CNS lowers to almost its minimum). Might this be due to by what I observed in the reciprocal space as related in 1 ? If so, might my (intensity) scaling procedure have messed the intensities due to their intrinsic property to be stronger in alternating planes ? How to overcome this ? 6) I tried some different scaling strategies *in the refinement step*, no success at all. 7) A Patterson of the solution from Balbes also shows an off-origin Patteron at the same position of the native data, although a little lower. 8) Processed in P6, P312 and P321, all of course suggest twinning. I would thank suggestions, point to similar cases, etc... In fact, currently I wondered why refinement programs take B-factor to such low values Many thanks, Jorge
Re: [ccp4bb] How to number atoms in a ligand
Dear Joe, Atom labels are, in principle, arbitrary. The molecule doesn't case what we call its atoms. To make the PDB more useful, it is handy if all the people working with a particular compound use the same names for their atoms. If you find that someone has already deposited a structure containing your compound you are expected to use the same names they did. There are lists of compounds and naming conventions on the PDB web site. The small molecule literature doesn't count as precedence for naming conventions, only what is in the PDB. No one will hold you to the names used in the small molecule structure paper. If you are the first to work with this compound in the macromolecular world you are free to choose whatever names you want. Please choose something sensible as the rest of use will be stuck with your choice forever. Consistency is king here. If a similar compound is in the PDB, using atom names based on it would simplify comparisons. Dale Tronrud Zheng Zhou wrote: Hi, all I am a rookie in crystallography. I know this may be a little bit off topic. I have cocrystallized several compounds with my favorite protein. I found crystal structures for some of these chemicals. But the numbering systems are different in those original papers for the small molecules. Some numbering system has all the atoms from 1 to the end (C1-O3-O8-N9-C15), while others have numbers for each individual element. (C1-C12, O1-O2, N1). I was trying to search a unified theme for ligands in pdb. I even emailed [EMAIL PROTECTED] mailto:[EMAIL PROTECTED], but so far I haven't heard anything back. Could anyone give me some suggestions? Any help would be greatly appreciated. Thanks, Sorry to bother others Joe
Re: [ccp4bb] SFALL grid
SFALL is calculating structure factors from the map you supplied, so there is only one grid, the one you used when you created the map in NCSMASK. The choice of sampling rates for maps to be Fourier transformed is a deep topic. The mathematical law is that you have to sample the map at, at least, twice the frequency of the highest Fourier component in the map. This is, unfortunately, often misinterpreted as twice the frequency of the highest component you are interested in. The fact that you are interested in, say, 2A structure factors has nothing to do with the calculation of the Fourier transform of your map. All that matters is the frequencies that were present in your map before you sampled it on the grid. Ten Eyck, (1977) Acta Cryst A33, 800-804 has a discussion of this and provided the classic solution to this problem when the map to be transformed is a calculated electron density map. I presume you have an NCS averaged map and the required interpolations introduce needs of their own that are significant. Gerard Bricogne has written on that topic, also back in the 1970's, but I don't have the reference at hand. The manual for your NCS averaging program should give you guidance on the choice of sampling rate based on its interpolation method. If you are not even sampling at twice the resolution you are interested in you are sampling way too coarsely. All FFT based structure factor programs require that the sampling rates along each axis be even. They may have other required factors depending on the space group, but they will be happy to inform you if you make a choice it doesn't like. They are also more efficient when the prime factors of the sampling rates are small numbers. Try to stick with multiples of 2,3, and 5 if possible. Since the program has no way of knowing the highest resolution component actually in the map before you sampled it on your grid, it assumes that the map contains no components of higher resolution then you asked it to produce. All FFT programs will fail if you sample your map courser than twice that frequency, as SFALL did for you. That does NOT mean that twice the frequency you are interested in is sufficient. You MUST read your NCS averaging program's documentation and if that doesn't tell you, complain the the program's author, and read Gerard's papers on the matter. NCS averaging a map that is only sampled at twice the rate you are interested in will not be a useful way to spend your time. Dale Tronrud whittle wrote: Hi: I am trying to generate structure factors from a mask/map that I made with NCSMASK. I get the following error message in SFALL: The program run with command: sfall HKLOUT /tmp/whittle/109_5_7_1_mtz.tmp MAPIN /home/whittle/projects/109_5/CCP4/center_50.msk has failed with error message SFALL: Grid too small- NZ must be 2*Lmax+1 Which grid is this referring to? The grid used by SFALL or by NCSMASK when I initially generated the map? How does one choose an appropriate grid and extent for these programs? Thanks for your help! --James
Re: [ccp4bb] Unidentified ligand (electron density) found at active site
Hi, Why is it that you are so reluctant to identify this compound as Ser-Gly? Its fit to the density is great. To wax historical, we saw unexpected density in the active site of apo Thermolysin. It appeared to be Val-Ala, but with refinement it developed into Val-Lys. It happens that Val-Lys are the last two residues of the protein. Residues 315 and 316 were present at full occupancy in the crystal so I presume the peptide was clipped off molecules that didn't crystallize. Of course proving that the density actually represents Ser-Gly, or any other compound you decide upon is much harder than building a model to fit the density. What is harder than identifying a bit of density, is coming up with an experiment to prove it. Dale Tronrud Ronaldo Alves Pinto Nagem wrote: Dear CCP4bb users, As suggested by some users, I am attaching to this email the electron density of the unidentified ligand. As I mentioned before it looks like a dipeptide GlySer, but we are still in doubt. Attempts to correlate with the protein function are being done. One might see in the pictures that the ligand coordinate a metal ion. Cheers Ronaldo. - Prof. Dr. Ronaldo Alves Pinto Nagem Universidade Federal de Minas Gerais Instituto de Ciências Biológicas Departamento de Bioquímica e Imunologia Av. Antônio Carlos, 6627 - Caixa Postal 486 Bairro Pampulha - CEP: 31270-901 Belo Horizonte, MG - Brasil Tel: +55 31 3499-2626 Fax: +55 31 3499-2614 E-mail: [EMAIL PROTECTED]
Re: [ccp4bb] Sulfate ion on 2-fold axis
Dear Jie, It also depends on whether you believe the SO4 sits with its internal two-fold along the crystal's two-fold axis. If it does you should probably have a 0.5 occ sulfur and two 1.0 occ oxygen atoms. If the symmetry is not obeyed you will have to have four 0.5 occ oxygen atoms. Be careful, some refinement programs will not be able to handle the bond length and angle restraints if you only supply two oxygen atoms. They will not allow bonds between atoms and symmetry images of atoms. If you are using such a program you will have to supply four oxygen atoms even if this is not what you would otherwise do. Dale Tronrud Charlie Bond wrote: Hi Jie, Depending on your resolution, you may be forced use to use S(0.5) and 4x(0.5) in order to restrain the SO4 to stay tetrahedral in refinement. Cheers, Charlie Jie Liu wrote: Dear all I have a sulfate ion sitting on a 2-fold axis. Should I put in pdb file one S atom with occu=0.5 and two O atoms with occu=1, or should I put one S and four O atoms all with occu=0.5? Thanks for your inputs. Jie .
Re: [ccp4bb] unbiased electron density map
A 2Fo-Fc map is simply an Fc map with two times the Fo-Fc map added to it. ( Fc + 2(Fo-Fc) = Fc + 2Fo - 2Fc = 2Fo - Fc ) The phase comes from the Fc's. The basic formulation is biased toward the model used to calculate the Fc's. You did, after all, start with a pure Fc map! Various techniques are used to reduce bias in these maps. Usually a technique that reduces bias in one kind of map reduces the bias in the other, since they are so closely related. The procedures I know of work by changing the calculation of Fc (and the weights on the individual reflections, which aren't mentioned when using the simple name Fo-Fc but are there none-the-less) and since the Fc is in both maps both maps are debiased. These methods reduce bias. Unbiased is a stronger claim and if you use that word you should state clearly how you know the bias is gone. Your quote bring up another matter. An initial map, i.e. before refinement, is unbiased if it is an omit map. If you have done no refinement and you leave the interesting part out of the calculation of Fc there can not be any bias in either the Fo-Fc or the 2Fo-Fc map. This is why an initial map is more reliable for proving binding of a compound, for example, than a bias reduced map after refinement. Dale Tronrud michael nelson wrote: In my understanding, unbiased electron density map usually refers to the Fo-Fc map.But I have seen in some papers sentences like that The initial, unbiased 2F_o -F_c map is contoured at.I was a bit confused since I was told by my instructor that the 2Fo-Fc was usually biased. Can anyone clear up this concept for me? Mike Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://us.rd.yahoo.com/evt=51734/*http://tools.search.yahoo.com/newsearch/category.php?category=shopping
Re: [ccp4bb] unbiased electron density map
Dirk Kostrewa wrote: Am 10.01.2008 um 01:53 schrieb Dale Tronrud: ... Your quote bring up another matter. An initial map, i.e. before refinement, is unbiased if it is an omit map. If you have done no refinement and you leave the interesting part out of the calculation of Fc there can not be any bias in either the Fo-Fc or the 2Fo-Fc map. This is why an initial map is more reliable for proving binding of a compound, for example, than a bias reduced map after refinement. ... But isn't this only true, if the model that is put in was not refined against a related data set before? If the new Fobs are related to the old Fobs (against which the model was refined before) then you carry any model bias over to the new data, because a simple omit-map using the old data will have model bias. Right you are, and it's an important point too. My only defense is that I said If you have done no refinement I don't think of the refinement of several models to data from isomorphous crystals as separate refinements, but sometimes forget to mention this when talking to others. I should be more careful. Dale Tronrud
Re: [ccp4bb] an over refined structure
I'm afraid I have to disagree with summary point (i): that crystallographic and noncrystallographic symmetry are incomparable. Crystallographic symmetry is a special case of ncs where the symmetry happens to synchronize with the lattice symmetry. There are plenty of cases where this synchronization is not perfect and the ncs is nearly crystallographic. For some reason this situation seems to be particularly popular with P21 space group crystals with a dimer in the asymmetric unit. Quite often the two-fold of the dimer is nearly parallel to the screw axis resulting in a nearly C2 space group crystal. These crystals form a bridging case in the continuum between ncs, where the symmetry is unrelated to the lattice symmetry, and those cases where the unit cell symmetry is perfectly compatible with the lattice. The only saving grace of the nearly centered ncs crystals is that the combination of the crystal and noncrystallographic symmetry brings the potential contamination of a reflection in the working set back to itself. Unless you have a very high copy number, and a corresponding large G function, you can't have any feedback from a working set reflection to a test reflection. Crystallographic symmetry is just a special case of noncrystallographic symmetry, but our computational methods treat them in very different ways. This choice of ours creates a discontinuity in the treatment of symmetry that is quite artificial, and I believe, is the root cause of many of the problems we have with ncs in refinement and structure solution. Dale Tronrud Dirk Kostrewa wrote: Dear Dean and others, Peter Zwart gave me a similar reply. This is very interesting discussion, and I would like to have a somewhat closer look to this to maybe make things a little bit clearer (please, excuse the general explanations - this might be interesting for beginners as well): 1). Ccrystallographic symmetry can be applied to the whole crystal and results in symmetry-equivalent intensities in reciprocal space. If you refine your model in a lower space group, there will be reflections in the test-set that are symmetry-equivalent in the higher space group to reflections in the working set. If you refine the (symmetry-equivalent) copies in your crystal independently, they will diverge due to resolution and data quality, and R-work and R-free will diverge to some extend due to this. If you force the copies to be identical, the R-work R-free will still be different due to observational errors. In both cases, however, the R-free will be very close to the R-work. 2). In case of NCS, the continuous molecular transform will reflect this internal symmetry, but because it is only a local symmetry, the observed reflections sample the continuous transform at different points and their corresponding intensities are generally different. It might, however, happen that a test-set reflection comes _very_ close in reciprocal space to a NCS-related working-set reflection, and in such a case their intensities will be very similar and this will make the R-free closer to the R-work. If you do not apply NCS-averaging in form of restraints/constraints, these accidentally close reflections will be the only cases where R-free might be too close to R-work. If you apply NCS-averaging, then in real space you multiply the electron density with a mask and average the NCS-related copies within this mask at all NCS-related positions. In reciprocal space, you then convolute the Fourier-transform of that mask with your observed intensities in all NCS-related positions. This will force to make test-set reflections more similar to NCS-related working-set reflections and thus the R-free will be heavily based towards R-work. The range of this influence in reciprocal space can be approximated by replacing the mask with a sphere and calculate the Fourier-transform of this sphere. This will give the so-called G-function, whose radius of the first zero-value determines its radius of influence in reciprocal space. To summarize: (i) One can't directly compare crystallographic and non-crystallographic symmetry (ii) In case of NCS, I have to admit, that even if you do not apply NCS-restraints/constraints, there will be some effect on the R-free by chance. So, my original statement was too strict in this respect. But only if you really apply NCS-restraints/constraints, you force to bias the R-free towards the R-work with an approximte radius of the G-function in reciprocal space. What an interesting discussion! Best regards, Dirk. Am 07.02.2008 um 18:57 schrieb Dean Madden: Hi Dirk, I disagree with your final sentence. Even if you don't apply NCS restraints/constraints during refinement, there is a serious risk of NCS contaminating your Rfree. Consider the limiting case in which the NCS is produced simply by working in an artificially low symmetry space-group (e.g. P1, when the true symmetry is P2
Re: [ccp4bb] an over refined structure
[EMAIL PROTECTED] wrote: Rotational near-crystallographic ncs is easy to handle this way, but what about translational pseudo-symmetry (or should that be pseudo-translational symmetry)? In such cases one whole set of spots is systematically weaker than the other set. Then what is the theoretically correct way to calculate Rfree? Write one's own code to sort the spots into two piles? Phoebe Dear Phoebe, I've always been a fan of splitting the test set in these situations. The weak set of reflections provide information about the differences between the ncs mates (and the deviation of the ncs operator from a true crystallography operator) while the strong reflections provide information about the average of the ncs mates. If you mix the two sets in your Rfree calculation the strong set will tend to dominate and will obscure the consequences of allowing you ncs mates too much freedom to differ. Let's say you have a pseudo C2 crystal with the dimer as the ncs pair and you are starting with a perfect C2 symmetry model. The initial rigid body refinement will cause the Rfree(weak) to drop because the initial model had Fc's equal to zero for all these reflections and the deviation from crystal symmetry allows nonzero values to arise. Now you want to test if there are real differences between the two copies. If you allow variation between the two copies but monitor the Rfree(strong) you are actually monitoring the quality of the average of the two copies, and you basically have a two-fold multimodel. It is the same as putting two molecules at each site in the crystal and forcing both models to have perfect ncs. Axel Brunger's Methods in Enzymology chapter indicates that a two-fold multimodel is expected to have a lower Rfree than a single model and we would expect in our imaginary crystal that the Rfree(strong) will drop even if there is no real difference between the ncs mates. When you allow differences between the ncs mates the Rfree(strong) will tend to drop even if those differences are not real. The Rfree(weak) is a different story, however. It is controlled specifically by the differences between the two ncs mates and will drop only if the refinement creates differences that are significant. This is the statistic that can be used to determine the ncs weight. (Or probably the log likelihood gain (weak)) If you insist on mixing the strong and weak reflections in your test set you have to design your null hypothesis test differently. First you should do a refinement where you have two models at each site, with exact ncs imposed. The you do a refinement with one copy at each site but allow differences between the ncs mates. Compare the Rfree of each model to decide which is the better model. There are exactly the same number of parameters in each model but one allows the ncs to be violated and the other does not. Even so, the signal in the Rfree is mixed unless you split the systematically weak from the systematically strong. If you have a general ncs and don't have weak and strong subsets of reflections you still have to worry about the multimodel affect. If a refinement that allows ncs violations does not drop the Rfree by more that a two-fold multimodel with perfect ncs you cannot justify breaking your ncs. A drop in Rfree when you break ncs does not necessarily mean that breaking ncs is a good idea. You always have to perform the proper null hypothesis test. Dale Tronrud At 01:05 PM 2/8/2008, Axel Brunger wrote: In such cases, we always define the test set first in the high-symmetry space group choice. Then, if it is warranted to lower the crystallographic symmetry and replace with NCS symmetry, we expand the test set to the lower symmetry space group. In other words, the test set itself will be invariant upon applying any of the crystallographic or NCS operators, so will be maximally free in these cases. It is then also possible to directly compare the free R between the high and low crystallographic space group choices. Our recent Neuroligin structure is such an example (Arac et al., Neuron 56, 992-, 2007). Axel On Feb 8, 2008, at 10:48 AM, Ronald E Stenkamp wrote: I've looked at about 10 cases where structures have been refined in lower symmetry space groups. When you make the NCS operators into crystallographic operators, you don't change the refinement much, at least in terms of structural changes. That's the case whether NCS restraints have been applied or not. In the cases I've re-done, changing the refinement program and dealing with test set choices makes some difference in the R and Rfree values. One effect of changing the space group is whether you realize the copies of the molecule in the lower symmetry asymmetric unit are identical or not. (Where identical means crystallographically identical, i.e., in the same packing environments, subject to all the caveats about accuracy, precision, thermal
Re: [ccp4bb] an over refined structure
Bart Hazes wrote: Dale Tronrud wrote: [EMAIL PROTECTED] wrote: Rotational near-crystallographic ncs is easy to handle this way, but what about translational pseudo-symmetry (or should that be pseudo-translational symmetry)? In such cases one whole set of spots is systematically weaker than the other set. Then what is the theoretically correct way to calculate Rfree? Write one's own code to sort the spots into two piles? Phoebe Dear Phoebe, I've always been a fan of splitting the test set in these situations. The weak set of reflections provide information about the differences between the ncs mates (and the deviation of the ncs operator from a true crystallography operator) while the strong reflections provide information about the average of the ncs mates. If you mix the two sets in your Rfree calculation the strong set will tend to dominate and will obscure the consequences of allowing you ncs mates too much freedom to differ. I haven't had to deal with this situation but my first impression is to use the strong reflections for Rfree. For the strong reflections, and any normal data, Rwork Rfree are dominated by model errors and not measurement errors. For the weak reflections measurement errors become more significant if not dominant. In that case Rwork Rfree will not be a sensitive measure to judge model improvement and refinement strategy. A second and possibly more important issue arises with determination of Sigmaa values for maximum likelihood refinement. Sigmaa values are related to the correlation between Fc and Fo amplitudes. When half of your observed data is systematically weakened then this correlation is going to be very high, even if the model is poor or completely wrong, as long as it obeys the same pseudo-translation. If you only use the strong reflections for Rfree I expect that should get around some of the issue. Of course it can be valuable to also monitor the weak reflections to optimize NCS restraints but probably not to drive maximum likelihood refinement or to make general refinement strategy choices. Bart Dear Bart, I agree that the way one uses the test set depends critically on the question you are asking. In my letter I was focusing on that aspect of the pseudo centered crystal problem where the strong/weak divide can be used to particular advantage. I have not thought as much about the matter of using the test set to estimate the level of uncertainty in the parameters of a given model. My gut response is that the strong/weak distinction is still significant. Since the weak reflections contain information about the differences between the two, ncs related, copies I suspect that a great many systematic errors are subtracted out. For example, if your model contains isotropic B's when, of course, the atoms move anisotropically, your maps will contain difference features due to these unmodeled motions. Since the anisotropic motions are probably common to the two molecules, these features will be present in the average structure described by the strong reflections but will be subtracted out in the difference structure described by the weak reflections. This argument implies to me that the strong reflections need to be judged by the Sigma A derived from the strong test set and the weak reflections judged by the weak test set. Dale Tronrud
Re: [ccp4bb] an over refined structure
of whether that attempt is appropriate or inappropriate, every symmetry image of that atom will be pulled in the corresponding way. The symmetry related structure factors, both crystallographic and noncrystallographic, will be affected in the same way and a reflection in the test set will be tied to its mate in the working set. In summary, this argument depends on two assertions that you can argue with me about: 1) When a parameter is being used to fit the signal it was designed for, the resulting model develops predictive power and can lower both the working and free R. When a signal is perturbing the value of a parameter for which is was not designed, it is unlikely to improve its predictive power and the working R will tend to drop, but the free R will not (and may rise). 2) If the unmodeled signal in the data set is a property in real space and has the same symmetry as the molecule in the unit cell, the inappropriate fitting of parameters will be systematic with respect to that symmetry and the presence of a reflection in the working set will tend to cause its symmetry mate in the test set to be better predicted despite the fact that this predictive power does not extend to reflections that are unrelated by symmetry. This bias will occur for any kind of error as long as that error obeys the symmetry of the unit cell in real space. I'm sorry for the long winded post, but sometimes I get these things stuck in my head and I can't get any work done until I get it out. I hope it helps, or at least is not complete nonsense. Dale Tronrud
[ccp4bb] Calculating R-factor and maps from a Refmac model containing TLS downloaded from the PDB
Hi, I am looking over a number of models from the PDB but have been unable to reproduce the R-factors for any model that was refined with Refmac and contains TLS parameters. I usually can't get within 5% of the reported value. On the other hand, I usually do pretty well for models w/o TLS. An example is the model 1nkz. The PDB header gives an R value of 17% but even when I use tlsanal in CCP4i to generate a PDB with anisotropic B's that mimic the TLS parameters I get an R value of 22.4% using SFCheck. (I'm not implying that I suspect any problem with 1nkz, in fact I have every reason to believe this is the great model its published stats indicate.) I've found a CCP4 BB letter that stated that SFCheck does not pay attention to anisotropic B's but that letter was dated 2002. I hope this limitation has been removed, or at least the output would mention this limitation. Setting up a refinement in Refmac involves a large overhead, since even for zero cycles of refinement the program insists on a complete stereochemical definition for the strange and wondrous groups in this model. I would just like to verify the R factor and calculate a proper map for inspection in Coot. Since I have many models I would like to look at, I would like a simple procedure. I did set up a Refmac run for another model, for which I do have all the .cif's required, but even after refinement I was not close to the reported R. I see that the models I'm interested in are not present in the Electron Density Server, so I suspect I'm not alone in fighting this battle. Any advice would be appreciated, Dale Tronrud
Re: [ccp4bb] Summary: Calculating R-factor and maps from a Refmac model containing TLS downloaded from the PDB
Hi again, I guess this is only a partial summary, since I still don't understand all the issues this question raises. Pavel Afonine reported that his extensive tests of the PDB reveals that reproducing R values from models with TLS ADP's is a wide-spread and serious problem. The principal problems (IMHO) are 1) Incorrect or illegal TLS definitions in the REMARK. 2) Some files list in the ATOM B column the residual B after TLS has been accounted for while others list the total B (TLS and residual). There is no clear indication in the PDB file which interpretation is being used. Tassos, Eleanor, and others recommended taking the TLS definition from the PDB header and running zero cycles of unrestrained refinement in Refmac to get it to calculate R factors and Maps w/o the need to define ideal geometry for co-factors. I have yet to see this work, however (See below) Ulrich Baumann wrote to tell me of two of his PDB's that he knows will give back the reported R values. They are 2qua and 2qub. I grabbed 2qua from the RCSB server, extracted the TLS groups with CCP4i, and found that the TLS definitions were incorrect. There is one polypeptide in this model and three TLS groups. The first and third group did not have a residue range, while the second group defined a residue range in the middle of the peptide. I made the assumption that the first and third TLS groups were intended to cover the beginning and end of the peptide and corrected the .tls file. I loaded this into Refmac and asked for zero cycles of unrestrained refinement and got an R value of 19.4%. The PDB file says it should be 17.3%. I then asked Refmac to run 10 cycles of TLS and 10 cycles of restrained refinement and got an R value of 17.5%. Good enough. From this result I infer that Refmac is unable to calculate the original ADP's given this PDB file and TLS definition. It can reconstruct them via refinement, basically ignoring the B values of the PDB file. This particular PDB entry appears to contain in its B column the residual B's. I also tried entry 2qub, but with less luck. This model has seven peptides and 30 TLS groups. The first seven TLS groups defined in the header of the PDB cover each of the seven chains, while the other 23 groups had no residue range. I can guess that the intension was to have five TLS groups for each of the seven chains, but without additional information from Dr. Baumann, I'm unable to even start trying to reproduce R values and calculate maps. So... 1) Pavel is correct, there are many clear errors in the TLS REMARKs of PDB entries. 2) It seems necessary to ask Refmac to recreate the ADP description for a PDB entry from scratch, assuming the TLS group definition can be deduced from the PDB header. This, currently, requires refinement which requires .cif's for the unusual groups. If CCP4I could ask Refmac to perform only TLS/B refinement, holding positions fixed, the need for detailed .cifs would be greatly reduced. I have no desire to move the atoms anyway. Better yet, if someone could find out what Refmac is expecting to find in its starting PDB (what it wants in the B column) one could add a tool to CCP4I that could convert a PDB entry to what Refmac wants w/o refinement. Since there appear to be two varieties of entries one could try both possibilities and choose the one with the lowest R value. I have to close with additional problems, I'm afraid. I can't run the required refinement on 1nkz to test TLS/B refinement but I have tried it on 3bsd, where I have a good .cif for the Bchl-a groups. When I pull out the TLS definition, and perform 10 cycles of TLS and 10 cycles of restrained refinement I get an R value of 20.2% while the entry asserts that the correct value is 17.8%. The final TLS parameters look, by eye, pretty similar to the deposited ones, so I don't know what is going on here. Dale Tronrud Dale Tronrud wrote: Hi, I am looking over a number of models from the PDB but have been unable to reproduce the R-factors for any model that was refined with Refmac and contains TLS parameters. I usually can't get within 5% of the reported value. On the other hand, I usually do pretty well for models w/o TLS. An example is the model 1nkz. The PDB header gives an R value of 17% but even when I use tlsanal in CCP4i to generate a PDB with anisotropic B's that mimic the TLS parameters I get an R value of 22.4% using SFCheck. (I'm not implying that I suspect any problem with 1nkz, in fact I have every reason to believe this is the great model its published stats indicate.) I've found a CCP4 BB letter that stated that SFCheck does not pay attention to anisotropic B's but that letter was dated 2002. I hope this limitation has been removed, or at least the output would mention this limitation. Setting up a refinement in Refmac involves a large overhead, since even for zero cycles of refinement the program insists on a complete
Re: [ccp4bb] Friedel vs Bijvoet
There was a mistake in the letter that listed the Bijvoet pairs for a monoclinic space group and that is confusing you. Let me try. The equivalent positions for a B setting monoclinic are h,k,l; -h,k,-l. The Friedel mates for the general position (h,k,l) are (-h,-k,-l). This means that the equivalent positions also have Friedel mates at h,-k,l. The Bijvoet mates of h,k,l are therefore, according to the definitions given in previous letters, -h,-k,-l; and h,-k,l. There are more Bijvoet mates to a reflection then Fridel mates. A centric reflection is a reflection that is BOTH a symmetry equivalent reflection AND a Bijvoet mate to some other reflection. This is a very small subset of all reflections. Every reflection has one Friedel mate and has N Bijvoet mates, where N is the number of equivalent positions. Only a small number of reflections are centric (with the limiting case of only F000). Dale Tronrud Bernhard Rupp wrote: Let's try this again, with definitions, and pls scream if I am wrong: a) Any reflection pair hR = h forms a symmetry related pair. R is any one of G point group operators of the SG. This is a set of reflections (S). Their amplitudes are invariably the same. They do not even show up as individual pairs in the asymmetric unit of the reciprocal space. NB: their phases are restricted but not the same. b) a set h=-h (set F) exist where reflections may or may not carry anomalous signal. They form the centrosymmetrically related wedge of the asymmetric unit of reciprocal space. c) a centric reflection (set C) is defined as hR=-h and cannot carry anomalous signal. Example zone h0l in PG 2. As Ian Tickle pointed out, the CCP4 wiki is wrong: Centric reflections in space group P2 and P21 are thus those with 0,k,0. Not so; an example listing is attached at the end. d) therefore, some e:F exist that carry AS (F.ne.C) and some that do not carry AS (F.el.C). I hope we can agree on those facts. Now for the name calling: (S) is simply the set of symmetry related reflections, defined as hR=h. (F) is the set of Friedel pairs, defined as h=-h. (C) are centric reflections, defined as hR=-h. Thus, only if (F.ne.C), anomalous signal. I thought those are Bijvoet pairs. They are, but it may not be the definition of a Bijvoet pair. Try 1: Bijvoet pair is F(h) and any mate that is symmetry-related to F(-h), e.g., F(hkl) and F(-h,k,-l) in monoclinic. hkl is not related to -hk-l via h = -h. Only h0l is, and those are (e:C). So, I cannot quote follow that, probably try 1 is not a good definition. Try 2: I've always thought that a Bijvoet pair is any pair for which an anomalous difference could be observed. Good start. I subscribe to that. This includes Friedel pairs (h h-bar) Good. That's the definition of F. but it also includes pairs of the form h h', where h' is symmetry-related to h-bar. Ooops. That is the definition of a centric reflection. Thus Friedel pairs are a subset of all possible Bijvoet pairs. Cannot see that. I still maintain that Bijvoet pairs are a subset of Friedel pairs (which does include Pat's definition). I fail to see anything else but Friedel pairs in my list of reflections - some of them carry AS (F.ne.C) and some don't (F.el.C). B = F.ne.C. Seems to be a necessary and sufficient condition, in agreement with Pat's definition (though not the explanation). But - isn't that exactly what I said from the beginning? A Bijvoet pair is an acentric Friedel pair... Or - where are any other Bijvoet pairs hiding? Where did I miss them? (NB: Absence of anisotropic AS assumed -let's not go there) See reflection list P2 (hkl |F| fom phi 2theta stol2) last 3 items: centric flag, epsilon, m(h) 0 0 1 993.54 1.00 179.9965.61 0.581 1 1 2 0 0 -1 993.54 1.00 179.9965.61 0.581 1 1 2 1 0 0 1412.58 1.00 0.1438.22 0.0001711 1 1 2 -1 0 0 1412.58 1.00 0.1438.22 0.0001711 1 1 2 0 0 2 3279.49 1.00 180.3132.80 0.0002323 1 1 2 0 0 -2 3279.49 1.00 180.3132.80 0.0002323 1 1 2 1 0 1 379.89 1.00 180.2530.36 0.0002712 1 1 2 -1 0 -1 379.89 1.00 180.2530.36 0.0002712 1 1 2 -1 0 2 1355.06 1.00 0.1327.97 0.0003195 1 1 2 1 0 -2 1355.06 1.00 0.1327.97 0.0003195 1 1 2 0 1 0 2432.85 1.0021.0924.35 0.0004216 0 2 1 0 -1 0 2434.14 1.00 339.6524.35 0.0004216 0 2 1 0 1 1 621.36 1.00 101.6722.83 0.0004797 0 1 2 0 -1 -1 623.27 1.00 258.4922.83 0.0004797 0 1 2 1 0 2 319.68 1.00 359.9822.65 0.0004874 1 1 2 -1 0 -2 319.68 1.00 359.9822.65 0.0004874 1 1 2 0 0 3 426.17 1.00 180.9921.87 0.0005227 1
Re: [ccp4bb] Friedel vs Bijvoet
Bernhard Rupp wrote: I quote from these pages: Bijvoet pairs are Bragg reflections which are true symmetry equivalents to a Friedel pair. These true symmetry equivalents have *equal amplitudes, even in the presence of anomalous scattering*. This is poorly worded. I would change it to A Bijvoet MATE IS A Bragg reflection which IS A true symmetry equivalent to THE Friedel MATE OF SOME OTHER REFLECTION. These true symmetry equivalents have *equal amplitudes, even in the presence of anomalous scattering*. Note that the Bijvoet mate is symmetry related to the Friedel mate not the original reflection. Dale Tronrud Sounds more like centric or perhaps simply symmetry related to me. A few lines below: A Bijvoet difference refers to the difference in measured amplitude for a Bijvoet pair I don't think you can have it both ways ?? BR
Re: [ccp4bb] D-Amino acids to L-Amino acids
First you should look into why your chiral centers flipped. In my experience the most common cause is that the neighboring peptide bond needs to be flipped. If you just want to flip a chiral center in Coot, I think the easiest way is to real space refine the residue and before accepting the result, drag the CA to the side you want. You may have over-drag to get it to stay. It only takes a moment. But don't assume that your refinement program is just doing something stupid. Look for the primal cause. Dale Tronrud Yusuf Akhter wrote: Hi Everybody, I am refining structure of a protein at 3 Angstrom. I am doing model building in Coot. After several rounds of refinement using Refmac when I tried to run PROCHECK on my partially build model I found that some of the residues are D-amino acids. How to change these D-amino acids to L-amino acids?? Is there any option in Coot for that?? Thanks in advance, yusuf
Re: [ccp4bb] Refmac5 and dual conformation of a dual conformation
I have battled a similar piece of structure in a recent project. There is no software in crystallography that can handle hierarchical alternative conformations w/o tricks. I was refining in Shelxl but the same trick will work elsewhere. I had to define a new residue type, with two heads. The A conformation had atoms for both heads. The B conformation didn't have atoms that couldn't be seen. I was missing the entire side chain in that conformation so the SC had no atoms. Besides creating the new geometry restraint library, you will have to ensure that no bad contacts are flagged between the two heads. In Shelxl you can tie all the occupancies together properly, with a couple annoyances. In other programs you are on your own. Another possibility is to create four conformations for the entire stretch but then you'll have to have the program keep many pairs of atoms superimposed. I don't know if Refmac has that feature. You will, of course, have difficulties when you deposit this thing with the PDB. There is no way to correctly represent your model in a PDB file, nor I believe in mmCIF. It may be reasonable to switch to the four conformation model for deposition, since you will not have to worry about enforcing the superposition of atoms any longer. That may be clearer to people who use the model at a later date. Dale Tronrud Andy Millston wrote: I am currently trying to refine a structure where a 5 residue stretch of a chain is in 2 conformations. Oddly enough, 1 of these 5 residues is in dual conformations in both the conformations! Is there a conventional nomenclature for defining such dual-dual conformations? Refmac5 does not accept the intuitive way of naming such an atom. For example: A normal dual conformer GLY would be named as AGLY and BGLY in PDB file and this is acceptable to Refmac5 When I name a dual-dual GLY as AAGLY and BAGLY, Refmac5 fails! Any idea, WHY?? Thank you! Here is the error log: Logical name: ATOMSF, Filename: /programs/ccp4-6.0.2/lib/data/atomsf.lib *** * Information from CCP4Interface script *** The program run with command: refmac5 XYZIN /home/../myfile.pdb XYZOUT ...tmp HKLIN mtz HKLOUT tmp LIBOUT ..._56_lib.cif has failed with error message fmt: read unexpected character apparent state: internal I/O last format: (I4) lately reading sequential formatted internal IO *** #CCP4I TERMINATION STATUS 0 fmt: read unexpected character apparent state: internal I/O last format: (I4) lately reading sequential formatted internal IO
Re: [ccp4bb] comparison of maps, intensities and other basics
A map file stores a density value for each point on a grid. The units and nature of that item is not defined in the format of the map. A map can store any number of things. The actual values are defined by the process that created the map file. For electron density maps you will find that some contain values measured in e/A^3, others contain values that are normalized Z scores (The standard deviation of the variation about the mean is set to 1.0), or just a bunch of numbers with arbitrary and mysterious units. One tends to use e/A^3 when trying to relate the map to expected electron density or to compare one map to another. A normalized map is useful if you are interested in the frequency that a density value of that magnitude appears in the map. (Is this value common or rare?) One uses arbitrary values if one has an attachment to honesty. Calculating an electron density map in units of e/A^3 is not an easy task. The diffracted intensities are not measured, themselves, in real units. Their magnitude only has meaning as intensities relative to the other intensities in the same dataset. For the map to be expressed in units of e/A^3 the diffraction intensities must be expressed in units of e/Unit Cell (at least that is the convention). This is a hard problem and many papers have been written on the topic. If you have a well refined and complete model for the contents of the crystal you can use the calculated diffraction pattern as a template to scale the observed intensities and calculate maps in e/A^3, but this is an approximation as no model is complete or completely correct. The other big issue is that we cannot measure the one reflection that defines the average of the electron density in the crystal. It happens to always hit the beamstop. Because of this problem our maps usually have an average value of zero, which is of course wrong. Even when the density values are expressed in e/A^3 the intention is that each value in the map must have a number added to it to achieve the true value at that point. At least it's the same number everywhere in the map, although we don't know its value. Because of these issues and uncertainties, when maps are compared they are usually compared using a correlation coefficient. The correlation coefficient is relatively unaffected by these scaling problems and will usually give the same answer when given any of the kinds of maps I described. If you want a more detailed comparison of electron density values you really have to get into the details of each of the datasets and scaling that was applied to ensure that your results are meaningful. Estimating the error bars of an electron density map is another enormous problem. As you would expect, it depends critically on the origin of the map. The error analysis of a map calculated from MAD phasing is quite different than that of a map calculated using a refined model as a reference. One complication is that the error level is not necessarily the same everywhere in the map. In addition the errors at different regions of the map are not independent. The correlation of deviations at different regions of the map are likely more important to any analysis then any simple overall error bar. However, if you insist on an error level, my best guess would be to identify the regions of bulk solvent and calculate the rms deviation from the mean there. Since these regions should be flat, and deviations from the mean must be due to something that does not represent election density. We might as well call it error. Dale Tronrud Peter Schmidtke wrote: Dear CCP4BB List Members, first of all I am not a crystallographer, but I would like to get some things clear, things I did not find in Crystallography Made Crystal Clear and on the internet for now. I am trying to read electron density maps in the EZD format. These maps contain scaled values of electron density and size and shape of the unit cell. How can I convert the values of intensities (what is the unit of these values?) to the probabilities you can see in coot for example (1.03 electron / A^3), Once I have achieved this conversion, can I compare densities of different maps of different proteins? If not directly, is there a way to do so? Last, is there a way to know the experimental error made on intensity values of a map? Thanks in advance.
Re: [ccp4bb] 3ftt and gremlins
This thread has evolved into two different topics. Just to clarify: 1) There is a need for additional validation of structure factor depositions. My recollection is that the output of SF Check is available to the depositor via ADIT on the RCSB site. I have found that report to be quite helpful in checking for gross errors in my structure factor files. The Electron Density Server performs similar checks. It shows that the R value for 3ftt is 6.4% with a correlation coefficient between Fo and Fc of 0.996. The EDS flags entries as interesting if the calculated R value is more than 5% higher than the reported R value. Maybe it should also note when the R value is more than 5% lower. The tools for validating structure factors exist but perhaps could be put more in the face of the depositor to more strongly encourage that they be looked at. 2) It would be useful to have a central repository of raw diffraction images. Most of the discussion on this point is the technical difficulty of storing this quantity of data. What has not been mentioned is the much greater difficulty of validating these images. You may think the images for an entry have been deposited only to find out that the investigator's wedding photos were accidentally deposited instead. Validating that the images correspond to the claimed structure will be an enormous task; probably more difficult than coming up with enough hard drives to store them all. Dale Tronrud Frank von Delft wrote: Gerard Bricogne wrote: Looking forward to the archiving of the REAL data ... i.e. the images. Using any other form of data is like having to eat out of someone else's dirty plate! That may be so -- but if I'm hungry now, I just pop it in the sink -- I don't publish a call for tenders on an industrial-scale dish-washer, call up the architects and engineers to redesign the room, re-lay the plumbing, vamp up my electricity transformer and install a new drainage system. Which doesn't mean the industrial-scale washer isn't necessary; but honestly, can't we start by just washing the plate?? phx.
Re: [ccp4bb] Is it possible to mutate a reversible epimerase into an inreversible one?
Hi, I'm more of a Fourier coefficient kind of guy, but I thought that a ΔG of zero simply corresponded to an equilibrium constant of one. You can certainly have reversible reactions with other equilibrium constants. In fact I think irreversible reactions are simply ones where the equilibrium constant is so far to one side that, in practice, the reaction always goes all the way to product. As Randy pointed out the enzyme cannot change the ΔG (or the equilibrium constant). You could drive a reaction out of equilibrium by coupling it to some other reaction which itself is way out of equilibrium (such as ATP hydrolysis in the cell) but I don't think that's a simple mutation of your enzyme. ;-) Dale Tronrud On 05/18/10 00:31, Vinson LIANG wrote: Dear all, Sorry for this silly biochemistory question. Thing is that I have a reversible epimerase and I want to mutate it into an inreversible one. However, I have been told that the ΔG of a reversible reaction is zero. Which direction the reaction goes depends only on the concentration of the substrate. So the conclusion is, A: I can mutate the epimerase into an inreversible one. But it has no influence on the reaction direction, and hence it has little mean. B: There is no way to change a reversible epimerase into an inversible one. Could somebody please give me some comment on the two conclution? Thank you all for your time. Best, Vinson
Re: [ccp4bb] Is it possible to mutate a reversible epimerase into an inreversible one?
I think we are having a problem with the definition of reversible and irreversible. By Lijun's definition the reaction is irreversible because it proceeds from far from equilibrium toward equilibrium. That situation is more a property of the system than the enzyme. If you make the enzyme 1000 times faster the reaction will proceed more quickly toward equilibrium despite the fact that the reverse reaction is also 1000 times faster. The reverse reaction doesn't matter when there is no product to act upon. The original question was about having a reversible epimerase and I want to mutate it into an inreversible(sic) one. Clearly the poster is talking about a property of the enzyme. I interpreted this question to be a request to differentially change the forward and backward reaction rates, but I could be mistaken. Maybe the original poster could clarify the question. Dale Tronrud On 05/21/10 13:53, Lijun Liu wrote: If I understand what you are saying, I think it is too. You imply that asymmetry in the enzyme results in two isomerase pathways. This may be true, but it has no consequence on the prospects for irreversibility. To avoid confusion, let's call these pathways D and S. Both the D and S pathways would have their own kf and kr kinetic constants such that kf_D/kr_D = kr_S/kf_S = Keq, which reflects the dG of the reaction. When the dG is close to zero for the isomerase reaction (which I assume here), then you can't make it irreversible. This is not the case, at least in part. Such kind of enzymes, if no cofactor-needed, use the identical intermediate for the mirror symmetric reaction. For the D -- Intermediate -- S reaction, the enzyme uses the same pathway. Enzymes, for example, glutamate racemase and aspartate racemase, use a kind of psudo-mirror symmetric alignment at the active site to adapt the binding of D or S isomer in the half A.S., respectively. Other 3 atoms associated to the chiral center keeps fixed relative conformation during the inversion. Standard dG(0) of such a reaction is 0. However, at the time when enzyme works (for example, cell needs D-ASP in an almost pure L-ASP environment), the racemase moves L-Asp to D-Asp, in this regard, the dG of the reaction (not standard) is not 0. Your last sentence means: for a reaction (assuming dG(0) = 0 like racemic reaction) almost reaches EQ (dG ~ 0), you cannot make it irreversiblethis is true. Just please do not forget: such kind of enzymes work when the D -- S EQ is highly broken by nature (dG 0) [not dG(0)]. Hopefully I explained clearly! Lijun James All natural epimerases, isomerase and racemases use a mechanism based on L-amino acids to deal with a mirror-symmetric (quasi-, sometimes) reaction. In another word, these enzymes use a non-mirror symmetric structure to deal with a mirror-symmetric reaction, which itself causes the asymmetric kinetics for different direction, though the dG is 0. The Arrhenius Law k = A*exp(-dE/RT) should be understood like this: a mutation's effect to dE will be symmetric as Dale pointed out. However, the effects on A are asymmetric. A is related to intramolecular diffusion, substrate- and product-binding affinity, etc. That is why with mutation these enzymes changed their kinetics on two directions differently. Please check glutamate racemase, alanine racemase, aspartate racemase, DAPE epimerase, if you are interested. Never a 1000 to 1000 relation! Thus, mutation is possible to make one direction more favored---the point is you need the correct hit. Of course, such an experiment is never a Maxwell's demon. Lijun On May 19, 2010, at 8:51 AM, Maia Cherney wrote: You absolutely right, I thought about it. Maia Marius Schmidt wrote: Interestingly, Maxwell's demon pops up here, wh... , don't do it. If you change the reaction rate in one direction 1000 times slower than in the other direction, then the reaction becomes practically irreversible. And the system might not be at equilibrium. Maia R. M. Garavito wrote: Vinson, As Dale and Randy pointed out, you cannot change the #916G of a reaction by mutation: enzyme, which is a catalyst, affects only the activation barrier (#916E double-dagger). You can just make it a better (or worse) catalyst which would allow the reaction to flow faster (or slower) towards equilibrium. Nature solves this problem very elegantly by taking a readily reversible enzyme, like an epimerase or isomerase, and coupling it to a much less reversible reaction which removes product quickly. Hence, the mass action is only in one direction. An example of such an arrangement is the triose phosphate isomerase (TIM)-glyceraldehyde 3-phosphate dehydrogenase (GAPDH) reaction pair. TIM is readily reversible (DHA = G3P), but G3P is rapidly converted to 1,3-diphosphoglycerate by GAPDH. The oxidation and phosphorylation reactions
Re: [ccp4bb] Far to good r-factors
This would be a possible explanation, and certainly is a problem with low resolution refinements, but the free R indicates that overfitting is not the problem here. (I'm assuming that the proper choice of test set has been made in this case.) In my experience, for very isomorphous pairs of structures, when a high resolution model is used as the starting point for a low resolution refinement, even the R values before refinement will be very good and that means fitting the noise can't be the cause. Our methods today are simply not as good at fitting low resolution data in the absence of high resolution data as they are in its presence. Dale Tronrud On 06/01/10 04:51, Ian Tickle wrote: On Mon, May 31, 2010 at 9:15 PM, Dale Tronrud det...@uoxray.uoregon.edu wrote: One of the great mysteries of refinement is that a model created using high resolution data will fit a low resolution data set much better than a model created only using the low resolution data. It appears that there are many types of errors that degrade the fit to low resolution data that can only be identified and fixed by using the information from high resolution data. Is it such a mystery? Isn't it just a case of overfitting to the experimental errors in the low res data if you tried to use the same parameterization restraint weighting as for the high res refinement? Consequently you are forced to use fewer parameters and/or higher restraint weighting at low res which obviously is not going to give as good a fit. Cheers -- Ian
Re: [ccp4bb] MR on low resolution soaking data.
You haven't given much detail to work with so I can only guess about your problem. A Wilson B of 20 for a 4 A data set is ridiculous, but the uncertainty in the Wilson B calculation at 4 A is enormous, so what might be a more reasonable statement would be to say your Wilson B calculates to 20 +_ 300 A^2 and the true value would be in that range. I don't think a precise Wilson B is important for MR so I wouldn't worry about it. An R value of .7 after MR is very large. Its size implies a systematic problem with your model - I would be looking for a second monomer. You haven't said anything about the structure of your monomer. Often a ligand will bind in the cleft between two domains, and the domains move relative to each other upon binding. You may have to perform separate searches for each domain or construct a range of trial models with different angles between the domains. Don't worry about the ligand until you solve the protein structure. Whether you see it in the end will depend on how big it is and how good your 4 A data are. Of course, it's possible that it doesn't bind at all. Dale Tronrud On 06/07/10 12:17, yang li wrote: Dear colleagues, We are now trying to soak some ligands into a protein, which is about 60kd in size and the structure has been solved before. But the molecular replacement cannot give a right solution. Below is some contrast of the data: Native 2A P212121 monomer Soaked4A F222 monomer (more than 70% solvent) or dimer(more possible) I wonder if it is possible to find the ligand in the case of such low resolution, provided the ligand is not so small. What facts could probably lead to the failure of MR? Molrep gave a model of monomer but the rfree is as high as 0.7, while phaser could get no result. I tried phenix.explore_metric_symmetry to find the two spacegroups are not compatible, and the Rmerge of the data seems reasonable. One more question is: the wilson B of the data is lower than 20 from ccp4. Is it common for a 4A data? Since I donnot have the experience of handling this low resolution data yet. By the way, any suggestions about refinement methods in low resolution will be appreciated! Best wishes Yang
[ccp4bb] New Version of the Protein Geometry Database Operational
A new version of the Protein Geometry Database (PGD) has just been released. This version includes - The ability to compose queries and analyze the behavior of side chain chi angles. - Structures released in the wwPDB up to April 8, 2010 consisting of roughly 18,000 nonredundant protein chains from crystal structures. That's over 1.8 million residues! The PGD enables users to easily and flexibly query information about the conformation alone, the backbone geometry alone, and the relationships between them. The capabilities the PGD provides are valuable for assessing the uniqueness of observed conformational or geometric features in protein structure as well as discovering novel features and principles of protein structure. So if you observe a certain conformation or geometric feature and wonder how unusual it is, the PGD may be able to provide the answer. Queries can be based on amino acid type, secondary structure, phi/psi/omega/chi angles, B factors and main chain bond lengths and angles. Queries for motifs of up to 10 residues in length can be made. Once a query has been made, plots can be drawn to show the relationship between any pair of conformational angles and/or main chain bond lengths or angles. In addition, the results of the query can be downloaded for local analysis. The PGD server is available at http://pgd.science.oregonstate.edu/ For more information please read http://pgd.science.oregonstate.edu/static/pdf/Berkholz-PGD-2010.pdf Happy hunting
Re: [ccp4bb] Odd loop stabilised by an cation
You can look for similar loops in other structures in the PDB using the Protein Geometry Database (http://pgd.science.oregonstate.edu/). The search page allows you to specify phi/psi ranges for loops up to ten amino acids long and the Browse Results page will list out the ID codes and residues of any matches found. If you need any detailed assistance using this server, I'd be happy to help. Dale Tronrud On 07/02/10 05:44, Domen Zafred wrote: Dear all, There is an odd loop on the surface of my structure. Three back-bone oxygen atoms are turned in the same direction the structure is stabilized by an cation and water molecules. Also, the ion is probably partly occupied (as discussed in the recent post of Ivan Xaravich). The pictures in crossed-eye stereo are in the attachment. Electron densities are at 1.8 and 3.5 sigma. I have two problems regarding this loop: Is such a loop something known or common, or is it unique? How could I find structure with a similar feature? Is there a smart guess for finding out the right ion? Mg is the smallest of all and there is still some red density. Ca on the other hand is more common in cells and the puzzle is whether it is a small ion or is it bigger, but with lover occupancy. Any suggestions, comments or answers will be greatly appreciated. Regards, Domen Zafred
Re: [ccp4bb] How to make fft-map more physically meaningful?
Edward A. Berry wrote: Hailiang Zhang wrote: Hi there: I found that the grid values in the map file generated by CCP4-fft generally has a mean value of ~0, and of course there will be lots of negative values. This apparently is not the real physics, since the electron density has to be positive everywhere (hope I am right). Can somebody give me any hint how to convert the fft map file which has mean value of 0, to a more physically meaningful map which has positive densities everywhere? (I thought about offsetting the whole map by the minimum negative values to make everything positive, but I doubt it is right). Best Regards, Hailiang Actually taking the minimum value as zero might be a good approximation, as long as the resolution is high enough so there are gaps in the protein too small to be solvent-filled but large enough to be resolved from surrounding density. Maps from FFT will always have average value zero unless you include the 0,0,0 reflection: the transform is a sum of sin and cos terms, all of which have zero value when integrated over the unit cell, except the cos(0.X) term. So any linear combination of these terms will average to zero if it doesn't include the zero order term. The 0,0,0 reflection is hard or impossible to measure because it gets mixed up with the undiffracted beam. But it is easy to calculate, because the integral of unity against the electron density is just the average electron density times the volume, or the total number of electrons. So if you know the total number of electrons in the unit cell, you can divide by the unit cell volume to get the average electron density (OK, I guess that is obvious) and add it to the zero-average FFT map. This assumes the map is on an absolute scale, which won't be quite true, so your idea of offsetting the minimum to zero may be more satisfactory. Ed Ed is right, of course. Just remember to include ALL the electrons in the unit cell - both those of the protein and those of the solvent, ordered and disordered. Dale Tronrud
Re: [ccp4bb] Mysterious density
Cyclized DTT can look similar to this blob. Of course the sulfur atoms would make one end of the blob more dense than the other. Dale Tronrud On 07/09/10 05:12, Nick Quade wrote: Dear CCP4 community, I have solved the structure of a protein in complex with DNA. But, inside the protein there seems to be a ligand binding pocket with some strong density (*http://picasaweb.google.de/113264696790316881054/Desktop#). *The protein was in Tris buffer, with some NaCl, MgCl2 and DTT and crystallized in Li2SO4 with MES. What could this density be? I can exclude MES as crystals grown with citrate buffer have the same density. So I guess it might be something I co-purified or perhaps some degradation product of the DNA? The electron density in the pictures is at 1.5sigma. Thanks in advance. Nick
Re: [ccp4bb] Deposition of riding H
While I am sympathetic to Ethan's and George's arguments, what is missing in the world as it stands is a section in PDB files that encode the parameters and rules used to generate the riding hydrogen atoms for that particular model. George has his favorite hydrogen atoms to build, his favorite bond lengths for placing them (and good arguments for his selections) and one could, I suppose, look them up in the documentation for Shelxl, but they should be encoded in the PDB file to allow automatic regeneration of the hydrogen atoms. An explicit listing of the rules for generation is particularly needed since all these matters can, and often are, modified by the user. I know that in my refinements I manually move the hydrogen from one nitrogen to the other in a couple Histidine side chains, and have created my own rules for hydrogen generation in co-factors. CIF tags will have to be agreed upon (and that's always a fun job) that would allow the description of the details of the various hydrogen atom generation schemes that are in use, or may be used in the future. It would also be handy to have a reference implementation, available under some forgiving license, that would materialize the hydrogen atoms given the PDB header information, and would reproduce the exact model refined, for any of the refinement programs. This is a worthwhile goal, but a tall order. Until this infrastructure is in place I think the hydrogen atoms have to be included in the PDB file. Otherwise it's the same as saying that I've refined TLS ADP's but not saying what the TLS parameters were nor listing the atoms in each TLS group. Dale Tronrud P.S. George: Do you think hydrogen atoms generated by the HFIX 137 command should be deposited? They are placed based on the electron density map with the dihedral angle of the methyl group becoming a parameter of the model -- a parameter not recorded anywhere other than in the hydrogen atom locations. On 09/14/10 12:41, George M. Sheldrick wrote: Even though SHELXL refinements often involve resolutions of 1.5A or better, I discourage SHELXL users from depositing their hydrogen coordinates. There are three reasons: 1. The C-H, N-H and O-H distances required to give the best fit to the electron density are significantly shorter than those required for molecular modeling and tests on non-bonded interactions (or located by neutron diffraction). It is ESSENTIAL to recalculate them hydrogens at longer distances before using MolProbity and other validation software. 2. There is considerable confusion concerning the names to be assigned to the hydrogens. This is not made easier by the application of a chirality test to -CH2- groups! 3. O-H hydrogens are particularly difficult to 'see' and the geometrical calculation of their positions is often ambiguous. The same applies to the protonation states of histidines and carboxylic acids. In addition such hydrogen positions are often disordered. For refinement I recommend including C-H and N-H but not O-H hydrogens. For very high resolution structures this reduces Rfree by 0.5-1.0% and clearly improves the model. At all resolutions the antibumping restraints involving hydrogens are useful. George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582 On Tue, 14 Sep 2010, Dr. Mark Mayer wrote: Here's one for the community, which I'll post to both Phenix and CCP4 BBs. Where does the crystallographic community stand on deposition of coordinates with riding hydrogens? Explicit H are required for calculating all atom clash scores with Molprobity, and their use frequently gives better geometry (especially at low resolution). Phenix uses explicit riding H for refinement, and outputs these in the refined PDB. Refmac also uses riding H but does not output H coordinates. While depositing a series of structures refined at 1.4 - 2.75 A with Phenix got the following email from the RCSB, who asked I resupply coordinates without H for two of the structures. Since we can't see H even at 1.4 Å I don't understand why an arbitrary cut off of 1.5 Å was chosen, and also why explicit H atoms used in refinement and geometry validation should be stripped from the file. FROM RCSB We encourage depositors not to use hydrogens in the final PDB file for the low resolution structures ( 1.5 A). Please provide an updated PDB file. We request you to use processed PDB file as a starting point for making any corrections to the coordinates and/or re-refinement. -- Mark
Re: [ccp4bb] Map density level
The main advantage of contouring in absolute units is consistency. The density for a water molecule with a B factor of 20 A^2 will look about the same even if the noise level of one map is higher than another. (Within limits, of course) This means that the actual value you contour at isn't as important as the choice of the same value all the time. You want to train your eye for what good density looks like at the level you use. I've picked 0.36 e/A^3 (don't get me started on units!) for density maps and 0.18 e/A^3 for difference maps as my personal values. The former is usually about 1 sigma and the latter about 3 sigma but of course the sigmas float based on other factors. I will look at the map contoured at lower levels when looking for atoms at lower than full occupancy, but I always start at the same place. When first learning model building it is useful to leave out a water molecule. That way you can see what a good water molecule looks like in your current difference map and can compare other potential water molecules to it. Dale Tronrud On 09/16/10 10:13, Nathaniel Clark wrote: Hi, It can, just do fm-mode select rmsd I am curious though, I have heard that it is 'better' to build in units of absolute density, but I couldn't find any values. Does any one have a suggestion as to what absolute electron density setting is 'correct' for an Fo-Fc difference map? Or do you just eyeball it? Nat On Thu, Sep 16, 2010 at 1:03 PM, Hailiang Zhang zhan...@umbc.edu wrote: Hi, I generated a map using FFT, and tried to display it in O. By comparing with coot, I found that the level in O seems to be the absolute electron density instead of the sigma level. I am sorry I ask a question more related to O: can O draw the map by a given sigma level instead of the absolute density, just like coot? Thanks! Best Regards, Hailiang
Re: [ccp4bb] embarrassingly simple MAD phasing question (another)
Just to throw a monkey wrench in here (and not really relevant to the original question)... I've understood that, just as the real part of F(000) is the sum of all the normal scattering in the unit cell, the imaginary part is the sum of all the anomalous scattering. This means that in the presence of anomalous scattering the phase of F(000) is not zero. It is also the only reflection who's phase is not affected by the choice of origin. Dale Tronrud On 10/13/10 22:38, James Holton wrote: An interesting guide to doing phasing by hand is to look at direct methods (I recommend Stout Jensen's chapter on this). In general there are several choices for the origin in any given space group, so for the first reflection you set about trying to phase you get to resolve the phase ambiguity arbitrarily. In some cases, like P1, you can assign the origin to be anywhere in the unit cell. So, in general, you do get to phase one or two reflections essentially for free, but after that, things get a lot more complicated. Although for x-ray diffraction F000 may appear to be mythical (like the sound a tree makes when it falls in the woods), it actually plays a very important role in other kinds of optics: the kind where the wavelength gets very much longer than the size of the atoms, and the scattering cross section gets to be very very high. A familiar example of this is water or glass, which do not absorb visible light very much, but do scatter it very strongly. So strongly, in fact, that the incident beam is rapidly replaced by the F000 reflection, which looks the same as the incident beam, except it lags by 180 degrees in phase, giving the impression that the incident beam has slowed down. This is the origin of the index of refraction. It is also easy to see why the phase of F000 is zero if you just look at a diagram for Bragg's law. For theta=0, there is no change in direction from the incident to the scattered beam, so the path from source to atom to direct-beam-spot is the same for every atom in the unit cell, including our reference electron at the origin. Since the structure factor is defined as the ratio of the total wave scattered by a structure to that of a single electron at the origin, the phase of the structure factor in the case of F000 is always no change or zero. Now, of course, in reality the distance from source to pixel via an atom that is not on the origin will be _slightly_ longer than if you just went straight through the origin, but Bragg assumed that the source and detector were VERY far away from the crystal (relative to the wavelength). This is called the far field, and it is very convenient to assume this for diffraction. However, looking at the near field can give you a feeling for exactly what a Fourier transform looks like. That is, not just the before- and after- photos, but the during. It is also a very pretty movie, which I have placed here: http://bl831.als.lbl.gov/~jamesh/nearBragg/near2far.html -James Holton MAD Scientist On 10/13/2010 7:42 PM, Jacob Keller wrote: So let's say I am back in the good old days before computers, hand-calculating the MIR phase of my first reflection--would I just set that phase to zero, and go from there, i.e. that wave will define/emanate from the origin? And why should I choose f000 over f010 or whatever else? Since I have no access to f000 experimentally, isn't it strange to define its phase as 0 rather than some other reflection? JPK On Wed, Oct 13, 2010 at 7:27 PM, Lijun Liulijun@ucsf.edu wrote: When talking about the reflection phase: While we are on embarrassingly simple questions, I have wondered for a long time what is the reference phase for reflections? I.e. a given phase of say 45deg is 45deg relative to what? = Relative to a defined 0. Is it the centrosymmetric phases? = Yes. It is that of F(000). Or a theoretical wave from the origin? = No, it is a real one, detectable but not measurable. Lijun Jacob Keller - Original Message - From: William Scottwgsc...@chemistry.ucsc.edu To:CCP4BB@JISCMAIL.AC.UK Sent: Wednesday, October 13, 2010 3:58 PM Subject: [ccp4bb] Summary : [ccp4bb] embarrassingly simple MAD phasing question Thanks for the overwhelming response. I think I probably didn't phrase the question quite right, but I pieced together an answer to the question I wanted to ask, which hopefully is right. On Oct 13, 2010, at 1:14 PM, SHEPARD William wrote: It is very simple, the structure factor for the anomalous scatterer is FA = FN + F'A + iFA (vector addition) The vector FA is by definition always +i (90 degrees anti-clockwise) with respect to the vector FN (normal scattering), and it represents the phase lag in the scattered wave. So I guess I should have started by saying I knew f'' was imaginary, the absorption term, and always needs to be 90 degrees in phase ahead
Re: [ccp4bb] quantum diffraction
On 10/15/10 12:38, Bart Hazes wrote: The photon moves through the crystal in finite time and most of the time it keeps going without interacting with the crystal, i.e. no diffraction. However, if diffraction occurs it is instantaneous, or at least so fast as to consider it instantaneous. In some cases a diffracted photon diffracts another time while passing through the remainder of the crystal. Or in Ruppian terms, a poof-pop-poof-pop event. If you listen carefully you may be able to hear it. The photon both diffracts and doesn't diffract as it passes through the crystal and it diffracts into all the directions that match the Bragg condition. The wave function doesn't collapse to a single outcome until the detector measures something - which in the scheme of things occurs long after the photon left the crystal. The photon also interacts with the electrons for as long as the wave functions overlap. You have to solve the time-dependent Schrodinger equation to get the details. In all the the QM classes I've had they start by writing the time-dependent equation and then immediately erasing it - never to be mentioned again. All the rest of the term was spent with the time-independent equation and the approximation of the instantaneous quantum jump. If you assume that nothing changes with time the only way to model changes is with discontinuities. Dale Bart On 10-10-15 12:43 PM, Jacob Keller wrote: but yes, each photon really does interact with EVERY ELECTRON IN THE CRYSTAL at once. A minor point: the interaction is not really at once, is it? The photon does have to move through the crystal over a finite time. JPK
[ccp4bb] Enforcing ncs on water molecules
Hi I'm refining my first structure with a significant amount of ncs and am not looking forward to my usual, manual, editing of the water model. Could someone point me in the direction of a program that will encourage my water to obey the ncs? What I have in mind is to, first, find each cluster of water molecules related by the ncs. Then if some threshold is not reached, say only 1/3rd of the sites are occupied for a particular cluster, kill that cluster. If more than some threshold are occupied but less than 100%, fill in the missing water molecules. I would also like to reset the waters in each cluster to the average location. Has someone already written something along these lines? If so, I would rather not duplicate their effort. Thanks in advance, Dale Tronrud
Re: [ccp4bb] Space group and R/Rfree value
It is not at all unusual for a biological homodimer to sit on a crystallographic two-fold symmetry axis. It is also not unusual for such a dimer to sit entirely in the asymmetric unit. This cannot be used to identify the space group. The space group is determined by the diffraction data. The difference between C2221 and P212121 is that many of the reflections predicted for P212121 have intensity equal to zero in C2221. Since you have a confusion between these two, I presume the P212121 model has a pseudotranslational symmetry of (0.5,0.5,0.0). This pseudo- translational symmetry should be reported by xtriage, and will mislead the twin detection tests. To determine which of these choices is the correct space group you do not perform refinement, you look at the diffraction pattern to see if there are non-zero intensities for the spots that must be zero if the space group were C2221. In P212121 with pseudo-C centering these spots will be weak but observable. I am not surprised that your refinement in P212121 gives higher R values than C2221. In P212121 with pseudo-C symmetry half of the reflections are weak and will have low signal/noise ratio. With the assumption of C centering these weak reflections are discarded and the R values will go down. Your goal is not to reduce the R values, but to fit the data. If these reflections have non-zero intensities you must integrate them and add them to your refinement. Dale Tronrud On 12/01/10 08:31, Xiaopeng Hu wrote: Dear Dr. Kelly Daughtry, Thanks for your help. The enzyme I am working on now functions as a dimmer and the active site is located at the interface. In previously published homology structures, there is one dimmer in the ASU and the dimmer has a tight NCS. With C2221, the dimmer formed by symmetry mates fits the homology dimmer very very well. It is hard for me to understand how a enzyme can has such a crystallographic dimmer. I am not good with Phenix, so I only tried xtriage to check the data set. With C2221, the twin test gives a good Z score which is much smaller than the critic 3.5, while with P212121, the Z score is high (10). I didn't go further. The maps look just the same between the two space groups. - 原始邮件 - 发件人: Kelly Daughtry kddau...@bu.edu 收件人: Xiaopeng Hu huxp...@mail.sysu.edu.cn 发送时间: 星期四, 2010年 12 月 02日 上午 12:05:08 主题: Re: [ccp4bb] Space group and R/Rfree value Sorry, I meant with the P212121 refinement. You mentioned that it is probably twinned. Including the twin law in refinement with the P212121 data should help lower your R and Rfree values. If you have already included the twin law in your phenix refinement for the P212121 data, and R and Rfree can not be lowered, I would suggest that your data probably is C2221. Also, the fact that C2221 is not twinned while P212121 is twinned is an indicator to me that C2221 is probably correct as well. I wouldn't exclude C2221 as the real space group for not having the desired dimer. I have had tetrameric proteins crystallize with one mol / ASU, trimers with one mol/ ASU. If you turn on symmetry mates, do you see your intended dimer with the C2221 data? Last question/suggestion: Do the maps look the same between the two space groups? I would assume that the P212121 calculated maps are somewhat worse than the C2221 maps. With space group identity problems like these, you have to let the data tell you what is the correct space group. And from the looks of it, C2221 is the way to go. Best of luck, Kelly *** Kelly Daughtry, Ph.D. Post-Doctoral Fellow, Raetz Lab Biochemistry Department Duke University Alex H. Sands, Jr. Building 303 Research Drive RM 250 Durham, NC 27710 P: 919-684-5178 *** 2010/12/1 Xiaopeng Hu huxp...@mail.sysu.edu.cn 1: No, the data reduction software didnt find twin and C2221 works well, so I never tried twin in refinement. 2: C2221 gives out a monomer in the ASU. - 原始邮件 - 发件人: Kelly Daughtry kddau...@bu.edu 收件人: Xiaopeng Hu huxp...@mail.sysu.edu.cn 发送时间: 星期三, 2010年 12 月 01日 下午 11:32:42 主题: Re: [ccp4bb] Space group and R/Rfree value Just to clarify, did you use the twin law in the phenix refinement? Also, is the C2221 solution a monomer or dimer in the ASU? *** Kelly Daughtry, Ph.D. Post-Doctoral Fellow, Raetz Lab Biochemistry Department Duke University Alex H. Sands, Jr. Building 303 Research Drive RM 250 Durham, NC 27710 P: 919-684-5178 *** 2010/12/1 Xiaopeng Hu huxp...@mail.sysu.edu.cn Dear all, I am working on a data-set (2.3A) and the space group problem bothers me a lot.The space group of the data-set could be C2221 or P212121, since our protein functions
Re: [ccp4bb] Fwd: [ccp4bb] Wyckoff positions and protein atoms
The proper occupancy for an atom on a special position depends on how one defines the meaning of the number in that column. In the past, refinement programs, at least I know mine did, simply expanded all atoms in the coordinate file by the symmetry operators to determine the contents of the unit cell. With that operation the occupancy of the atoms on special positions had to be reduced. It is certainly true that there are 1/3 the number of atoms in the unit cell represented by ZN D 31 than, for example, the CA of residue 50. Most modern refinement programs try to handle this automatically, since users proved unreliable at detecting this condition and modifying their coordinate files. They use the interpretation that the site is fully occupied but there are only 1/3 the number of these sites than sites at general positions. Personally I find it disturbing to have the occupancy of B 31 set to 0.33 and that of D 31 set to 1.00 simply because of an insignificant shift in the position of the atom. Dale Tronrud On 12/10/10 13:53, Ian Tickle wrote: Good point Colin! 2-Zn insulin is of course a classic example of this, where the two independent Zn2+ ions both sit on the crystallographic 3-fold in R3. It doesn't matter whether you count the metal ion as part of the protein or not: if I understand Gloria's original question correctly, all that matters is that the atom/ion is present in the crystal structure. In fact here are some extracts from the PDB entry (4INS): REMARK 375 ZNZN B 31 LIES ON A SPECIAL POSITION. REMARK 375 ZNZN D 31 LIES ON A SPECIAL POSITION. REMARK 375 HOH B 251 LIES ON A SPECIAL POSITION. REMARK 375 HOH D 44 LIES ON A SPECIAL POSITION. REMARK 375 HOH D 134 LIES ON A SPECIAL POSITION. REMARK 375 HOH D 215 LIES ON A SPECIAL POSITION. REMARK 375 HOH D 269 LIES ON A SPECIAL POSITION. HETATM 835 ZNZN B 31 -0.002 -0.004 7.891 0.33 10.40 ZN HETATM 836 ZNZN D 31 0.000 0.000 -8.039 0.33 11.00 ZN HETATM 885 O HOH B 251 -0.023 -0.033 11.206 0.33 21.05 O etc Hmmm - but shouldn't the occupancy of the Zn be 1.00 if it's on the special position (assuming it's not disordered), though the first Zn above and the water do appear to be disordered since they're not actually on the special position. Fractional occupancy always implies some kind of disorder: occupancy = 1/3 of an atom on a special position would imply occupancy disorder, i.e. it's randomly present in only 1/3 of the unit cells. -- Ian On Fri, Dec 10, 2010 at 1:11 PM, Colin Nave colin.n...@diamond.ac.uk wrote: Does one regard the metal atom in a metalloprotein as being part of the protein? If so, a shared metal could occupy a special position in a dimer for example. In Acta Cryst. (2008). D64, 257-263 Metals in proteins: correlation between the metal-ion type, coordination number and the amino-acid residues involved in the coordination I. Dokmanic, M. Sikic and S. Tomic ( http://scripts.iucr.org/cgi-bin/paper?S090744490706595X ) it says there are 25 cases of metal atoms in special positions. Also Acta Cryst. (2002). D58, 29-38 The 2.6 Å resolution structure of Rhodobacter capsulatus bacterioferritin with metal-free dinuclear site and heme iron in a crystallographic `special position' D. Cobessi, L.-S. Huang, M. Ban, N. G. Pon, F. Daldal and E. A. Berry ( http://scripts.iucr.org/cgi-bin/paper?S0907444901017267 ) though the 'special position' is justifiably in quotation marks in this example as disorder is present. Colin
Re: [ccp4bb] Fwd: [ccp4bb] Wyckoff positions and protein atoms
Dear Ian, I think you are putting too much importance on the numerical instability of an atom's position when refining with full matrix refinement. When developing TNT's code for calculating second derivatives I found that building into the calculation the effects of such an atom overlapping its own, symmetry related, electron density eliminated the instability and no constraints to special positions were required. I was only working with block diagonal second derivatives with one block per atom but I don't see any reason the proper calculation would not work with the full matrix. The electron density of an atom near a special position is nearly that of one far away. It is not reasonable that a proper calculation would blow up for one and not the other. The key is doing the proper calculation. It's true that the proper calculation of the atomic block for an atom near a special position took more time than the calculation for all the other atoms in the model. You can't just calculate generic look-up tables that apply to all atoms. The reward of the full calculation is that all the complications you describe disappear. An atom that sits 0.001 A from a special position is not unstable in the least. It does, of course, have to have an occupancy of 1/n. I always avoid programing tests of a == b for real numbers because the round-off errors will always bite you at some point. This means that a test of an atom exactly on a special position can't be done reliably in floating point math. Your preferred assumption is that any atom near enough to a special position is really on the special position and should have an occupancy of one. My assumption is that no atom is every EXACTLY on the special position and if they are close enough to their symmetry image to forbid coexistence the occupancy should be 1/n. I think either assumption is reasonable but, of course, prefer mine for what I consider practical reasons. It helps that I have to code to make mine work. Dale Tronrud On 12/15/10 08:54, Ian Tickle wrote: Hi Herman What makes an atom on a special position is that it is literally ON the s.p.: it can't be 'almost on' the s.p. because then if you tried to refine the co-ordinates perpendicular to the axis you would find that the matrix would be singular or at least so badly conditioned that the solution would be poorly defined. The only solution to that problem is to constrain (i.e. fix) these co-ordinates to be exactly on the axis and not attempt to refine them. The data are telling you that you have insufficient resolution so you are not justified in placing the atom very close to the axis; the best you can do is place the atom with unit occupancy exactly _on_ the axis. It's only once the atom is a 'significant' distance (i.e. relative to the resolution) away from the axis that these co-ordinates can be independently refined. Then the data are telling you that the atom is disordered. If you collected higher resolution data you might well be able to detect successfully refine disordered atoms closer to the axis than with low resolution data. So it has nothing to do with the programmer setting an arbitrary threshold. This would have to be some complicated function of atom type, occupancy, B factor, resolution, data quality etc to work properly anyway so I doubt that it would be feasible. Instead it's determined completely by what the data are capable of telling you about the structure, as indeed it should be. My main concern was the conflict between some program implementations and the PDB and mmCIF format descriptions on this issue. For example the PDB documentation says that the ATOM record contains the occupancy (where this is defined in the CIF/mmCIF documentation). If it had intended that it should contain multiplicity*occupancy instead then presumably it would have said so. Cheers -- Ian On Wed, Dec 15, 2010 at 4:01 PM, herman.schreu...@sanofi-aventis.com wrote: Dear Ian, In my view, the confusion arises by NOT including the multiplicity into the occupancy. If we make the gedanken experiment and look at a range of crystal structures with a rotationally disordered water molecule near a symmetry axis (they do exist!) then as long as the water molecule is sufficiently far from the axis, it is clear that the occupancy should be 1/2 or 1/3 or whatever is the multiplicity. However, as the molecule approaches the axis at a certain moment at a certain treshold set by the programmer of the refinement program, the molecule suddenly becomes special and the occupancy is set to 1.0. So depending on rounding errors, different thresholds etc. different programs may make different decisions on whether a water is special or not. For me, this is confusing. Best regards, Herman -Original Message- From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Ian Tickle Sent: Wednesday, December 15, 2010 3
Re: [ccp4bb] Fwd: [ccp4bb] Wyckoff positions and protein atoms
On 12/16/10 03:06, Ian Tickle wrote: Dale The reward of the full calculation is that all the complications you describe disappear. An atom that sits 0.001 A from a special position is not unstable in the least. That's indeed a very interesting observation, I have to admit that I didn't think that would be achievable. But there must still be some threshold of distance at which even that fails? Presumably within rounding error? Or are you saying (I assume you aren't!) that you can even refine all co-ordinates of an atom exactly on a special position? Say the x and z co-ordinates of an atom at (0,y,0) in monoclinic? Presumably the atom would have to be given a random push one way or the other (random number generators are generally not a feature of crystallographic refinement programs, with the obvious exception of simulated annealing!)? To be frank, I wrote this code about 15 years ago, it works, and I've not given any thought to atoms on special positions since. I'll have to go back to my notes and code to dig up the exact method. Anyone with a copy of TNT can look up the code. I am, however, not in the least concerned about what happens when an atom falls exactly on a special position because I just don't think that any part of a protein model can be considered exact. If I have a model with two atoms, of occ=1/2 each, sitting 0.0001 A apart - it fits the density and I think everyone knows what that model means, or at least they should. If you decide to shove them each 0.5 A and call them a single atom with occ=1, your model will fit the density just as well and I have no problem with that either. By the way, the refinement issue has nothing to do with special positions. The instability you observe occurs any time you build two atoms into the same bit of density. If your model has two atoms, at a general position, with exactly the same coordinates the Normal matrix will have a singularity. The problem doesn't come up much because we normally choose not to build such models. It can be an issue in models with disorder where different conformations interpenetrate each other but the stereochemical restraints usually come to the rescue then. ? I always avoid programing tests of a == b for real numbers because the round-off errors will always bite you at some point. This means that a test of an atom exactly on a special position can't be done reliably in floating point math. Obviously common sense has to be applied here and tests for strict floating-point equality studiously avoided. But this is very easily remedied, my optimisation programs are full of tests like IF(ABS(X-Y).LT.1E-6) THEN ... and I'm certain so are yours (assuming of course you still program in Fortran!). This implies that in the case that an atom is off-axis and disordered you have to take care not to place it within say a few multiples of rounding error of the axis, since then it might be indeed be confused with one 'on' the special position. However if someone claims that an atom sits within say 10*rounding error of an axis as distinct from being on the axis, then a) there's no way that can be proved, and b) it would be indistinguishable from being on the s.p. and the difference in terms of structure factors and maps would be insignificant anyway, so it may as well be on-axis. If the difference is insignificant, it may as well be off-axis. I guess if the difference is insignificant it just comes down to personal preferences. Dale Tronrud I think this is how the Oxford CRYSTALS software ( http://www.xtl.ox.ac.uk/crystals.html ), which has been around for at least 30 years, deals with this issue, so I can't accept that it can't be made to work, even if I haven't got all the precise details straight of how it's done in practice. Your preferred assumption is that any atom near enough to a special position is really on the special position and should have an occupancy of one. My assumption is that no atom is every EXACTLY on the special position and if they are close enough to their symmetry image to forbid coexistence the occupancy should be 1/n. I think either assumption is reasonable but, of course, prefer mine for what I consider practical reasons. It helps that I have to code to make mine work. Whichever way it's done is only a matter of convention (clearly both ways work just as well), however I would reiterate that my main concern here is that convention and practice appear to have parted company in this particular instance! Cheers -- IAn
Re: [ccp4bb] Fwd: [ccp4bb] Wyckoff positions and protein atoms
On 12/16/10 06:47, Ian Tickle wrote: For the sake of argument let's say that 0.02 Ang is too big to be rounding error. So if you see that big a shift then the intention of the refinement program (or rather the programmer) which allowed such a value to be appear in the output should be that it's real. If the intention of the user was that the atom is actually on axis then the program should not have allowed such a large shift, since it will be interpreted as 'much bigger than rounding error' and therefore 'significantly off-axis'. I would certainly hope that no one believes that the precision of the parameters in a PDB file are significant to the level of round-off error! It's bad enough that a small number of people take the three decimal points of precision in the PDB file seriously. When a person places an atom in a model they aren't stating a believe that that is the EXACT location of the atom, only that they believe the center of the locus of all equivalent atoms in the crystal falls near that spot. If it's 0.02 A from a special position (and the SU of the position is larger than that) then it might be on the special position and it might not. If I come across one of your models and you have an atom exactly on a special position (assuming you're able to do that with three decimal points in a PDB file) I'd still assume that you only intend that there is an atom in the vicinity of that point and it might be exactly on the axis but it might be a little off. All structural models are fuzzy. Dale Tronrud
Re: [ccp4bb] Noisy difference maps with high solvent content?
Hi, This sort of problem can occur if you are missing your lowest resolution data and/or your model for the bulk solvent is inappropriate. You might want to double check these issues. With 80% solvent you have to be careful when choosing your contour level. If you are a fan of normalized maps and contouring based on sigma (and it's not a sigma by the way) you should be aware that the normalization factor is calculated over the protein and all that empty space and will be smaller than one calculated for equivalent protein density in a low solvent crystal. Plus/minus 3 contours will be lower and the significance of features will be inflated. One way to calibrate a contour level would be to leave out a known good bit of model and calculate your difference map. Then select a contour level that shows the understood omission well. Other peaks that show up at that level are errors as significant as the one you created. Dale Tronrud On 01/28/11 12:29, Todd Geders wrote: Greetings CCP4bb, *Short version: Very noisy difference maps from a crystal with extremely high solvent content, seeking advice on how best to handle such high solvent content to eliminate noise in difference maps. * *http://strayxray.com/images/coot.jpg* Long version: I'm having trouble with a 3.0Å dataset from a crystal with 80% solvent content. The space group is P4132 and I'm quite confident the high solvent content is real (there is a species-specific set of helices extending into the solvent channels that appears to prevent tighter packing). I was able to get a MR solution using a structurally related enzyme, but the difference maps are terribly noisy (see link). There are lots of negative density in empty spaces between well-defined 2Fo-Fc electron density. http://strayxray.com/images/coot.jpg The 2Fo-Fc density actually looks fairly good. The initial MR maps had clear density correlating to the sequence differences between the MR model and the crystallized protein. After fixing the model as best I could, the refinement statistics are R/Rfree of 27.5/30.3 with a data/parameters ratio of 1.7. The mosaicity ranges from 0.15-0.3, data were collected with 0.5° oscillations and 180° of data were collected. http://strayxray.com/images/diffraction.jpg Since the crystals appeared to suffer from radiation decay (based on scaling statistics), I only use the first 40° of data (which still gives around 8-fold redundancy). Using more minimal wedges of data or more data does not noticeably make better or noisier maps. Any advice on improving the maps? Could the noisy maps be due to the extraordinarily high solvent content? I'd appreciate any advice or comments. ~Todd Geders
[ccp4bb] Ken Olsen, Founder of Digital Equipment Corporation, Died Sunday
I see in the news that Ken Olsen has died. Although he was not a crystallographer I think we should stop for a moment to remember the profound impact the company that this man founded had on our field. My first experience in a crystallography lab was as an undergraduate in M. Sundaralingam's lab in Madison Wisconsin. While I never had the opportunity to use them, his two diffractometers were controlled by the ubiquitous PDP-8 computers. I had more experience with his main computer, which was either a PDP-11/34 or 35 (Ethan help me out!). This was connected to a Vector General graphics display running software called UWVG. Having the least stature in the lab I got the midnight to 4am time slot for model building. The computer took about 10 minutes to compute and contour each block of map, covering about three residues. While waiting I would crawl under the DECwriter and nap. The computer would stop rattling when the map was up and that would wake me. When I joined the Matthews lab in Oregon they had a VAX 11/780. What an amazing machine! It had 1 MB of RAM and could run a million instructions in a second. It only took 48 hours to run a cycle of refinement with PROLSQ, that is, if no one else used the computer. These specs don't sound like much but this computer was really a revolution for computational crystallography. That a single lab could own a computer of such power was unheard of before this. It wasn't just that the computer had so much RAM (We later got it up to its max of 4 MB.) but the advent of virtual memory made program design so much easier. You could simply define an array of 100,000 elements and not have to worry about finding where in memory, mixed in with the operating system, utility programs, and other users' software, you could find an unused block that big. Digital didn't invent virtual memory, but the VAX made it achievable for regular crystallographers. Through most of the 1980's you didn't have to worry about getting your code to run on other computers - Everyone had access to a VAX. In the 1990's DEC came out with the alpha CPU chip which really broke ground for performance. These things screamed when in came to running crystallographic software. In 1999 the lab bought several of the 666 MHz models. It was about four years before Intel came out with a chip that would match these alphas on my crystallography benchmark and they had to be clocked at over 2 GHz to do it. Yes, Digital lost out in the competition of the marketplace, and Ken Olsen was pushed out of the company well before the end. But what a ride it was. What great computers they were and what great science was done on them! Dale Tronrud
Re: [ccp4bb] what to do with disordered side chains
Standardization is great! That is why the way we describe positions, occupancy, and B factors has already been standardized. The core of this discussion is that some people want to use these parameters to describe details other than position, occupancy and motion. Since all the parameters on ATOM/HETATM records are already defined with great specificity, if you want the model to contain additional information you will have to define new parameters, or some way to specify the information you want to include using other, existing, records more adequate to the task (e.g. SIGATM). Dale Tronrud On 03/30/11 11:32, Frank von Delft wrote: I'm amazed at the pedestal people put their precious coordinates on -- isn't the first thing you learn about MX that our models are rubbish parametrizations of the actual content of the crystal? And thus they will remain as long as we have the R-factor gap, and no amount of coordinate-sigmas or dark-density will change that. What we *are* trying to do is communicate something, and the bedrock of communication is /convention/ - also known as standardization. What is science other than one large standardization exercise? So yes, standardization is *exactly* what is needed: when the same phenomenon is described in so many different ways by different people, what that indicates is not that they have different opinions, it indicates only that everybody has to second-guess what their audience will understand. But once we've laid down a convention, the guessing stops and both speaker and listener know what the hell is being said. phx. On 30/03/2011 19:04, James Holton wrote: I'm afraid this is not a problem that can be solved by standardization. Fundamentally, if you are a scientist who has collected some data (be it diffraction spot intensities, cell counts, or substrate concentration vs time), and you have built a model to explain that data (be it a constellation of atoms in a unit cell, exponential population growth, or a microscopic reaction mechanism), I think it is generally expected that your model explain the data to within experimental error. Unfortunately, this is never the case in macromolecular crystallography, where the model-data disagreement (Fobs-Fcalc) is ~4-5x bigger than the error bars (sigma(F)). Now, there is nothing shameful about an incomplete model, especially when thousands of very intelligent people working over half a century have not been able to come up with a better way to build one. In fact, perhaps a better name for the disordered side chain problem would be dark density? This name would place it properly amongst dark matter, dark energy and other fudge factors introduced to try and explain why our standard model is not consistent with observation? That is, dark density is the stuff we can't see, but nonetheless must be there somewhere. Whatever it is, I personally do hold a vain belief that perhaps someday soon the problem of dark density will be solved, and that presently instituting a policy requiring that all macromolecular models from this day forward remain at least as incomplete as yesterday's models is not a very good idea. I say: if you think there is something there then you should build it in, especially if it is important to the conclusions you are trying to make. You can defend your model the same way you would defend any other scientific model: by using established statistics to show that it agrees with the data better than an alternative model (like leaving it out). It is YOUR model, after all! Only you are responsible for how right it is. I do appreciate that students and other novices may have a harder time defining surfaces and measuring hydrogen bond lengths in these pesky floppy regions, but perhaps their education would be served better by learning the truth sooner than later? -James Holton MAD Scientist On 3/30/2011 9:26 AM, Filip Van Petegem wrote: Hello Mark, I absolutely agree with this. The worst thing is when everybody is following their own personal rules, and there are no major guidelines for end-users to figure out how to interpret those parts. I assume there are no absolute guidelines simply because there isn't any consensus among crystallographers... (from what we can gather from this set of emails...). On the other hand, this discussion has flared up many times in the past, and maybe it's time for a powerful dictator at the PDB to create the law... Filip Van Petegem On Wed, Mar 30, 2011 at 8:37 AM, Mark J van Raaij mjvanra...@cnb.csic.es mailto:mjvanra...@cnb.csic.es wrote: perhaps the IUCr and/or PDB (Gerard K?) should issue some guidelines along these lines? And oblige us all to follow them? Mark J van Raaij Laboratorio M-4 Dpto de Estructura de Macromoleculas Centro Nacional de Biotecnologia - CSIC c/Darwin 3, Campus Cantoblanco E-28049 Madrid, Spain
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
While what you say here is quite true and is useful for us to remember, your list is quite short. I can add another 3) The systematic error introduced by assuming full occupancy for all sites. There are, of course, many other factors that we don't account for that our refinement programs tend to dump into the B factors. The definition of that number in the PDB file, as listed in the mmCIF dictionary, only includes your first factor -- http://mmcif.rcsb.org/dictionaries/mmcif_std.dic/Items/_atom_site.B_iso_or_equiv.html and these numbers are routinely interpreted as though that definition is the law. Certainly the whole basis of TLS refinement is that the B factors are really Atomic Displacement Parameters. In addition the stereochemical restraints on B factors are derived from the assumption that these parameters are ADPs. Convoluting all these other factors with the ADPs causes serious problems for those who analyze B factors as measures of motion. The fact that current refinement programs mix all these factors with the ADP for an atom to produce a vaguely defined B factor should be considered a flaw to be corrected and not an opportunity to pile even more factors into this field in the PDB file. Dale Tronrud On 3/31/2011 9:06 AM, Zbyszek Otwinowski wrote: The B-factor in crystallography represents the convolution (sum) of two types of uncertainties about the atom (electron cloud) position: 1) dispersion of atom positions in crystal lattice 2) uncertainty of the experimenter's knowledge about the atom position. In general, uncertainty needs not to be described by Gaussian function. However, communicating uncertainty using the second moment of its distribution is a widely accepted practice, with frequently implied meaning that it corresponds to a Gaussian probability function. B-factor is simply a scaled (by 8 times pi squared) second moment of uncertainty distribution. In the previous, long thread, confusion was generated by the additional assumption that B-factor also corresponds to a Gaussian probability distribution and not just to a second moment of any probability distribution. Crystallographic literature often implies the Gaussian shape, so there is some justification for such an interpretation, where the more complex probability distribution is represented by the sum of displaced Gaussians, where the area under each Gaussian component corresponds to the occupancy of an alternative conformation. For data with a typical resolution for macromolecular crystallography, such multi-Gaussian description of the atom position's uncertainty is not practical, as it would lead to instability in the refinement and/or overfitting. Due to this, a simplified description of the atom's position uncertainty by just the second moment of probability distribution is the right approach. For this reason, the PDB format is highly suitable for the description of positional uncertainties, the only difference with other fields being the unusual form of squaring and then scaling up the standard uncertainty. As this calculation can be easily inverted, there is no loss of information. However, in teaching one should probably stress more this unusual form of presenting the standard deviation. A separate issue is the use of restraints on B-factor values, a subject that probably needs a longer discussion. With respect to the previous thread, representing poorly-ordered (so called 'disordered') side chains by the most likely conformer with appropriately high B-factors is fully justifiable, and currently is probably the best solution to a difficult problem. Zbyszek Otwinowski - they all know what B is and how to look for regions of high B (with, say, pymol) and they know not to make firm conclusions about H-bonds to flaming red side chains. But this knowledge may be quite wrong. If the flaming red really indicates large vibrational motion then yes, one whould not bet on stable H-bonds. But if the flaming red indicates that a well-ordered sidechain was incorrectly modeled at full occupancy when in fact it is only present at half-occupancy then no, the H-bond could be strong but only present in that half-occupancy conformation. One presumes that the other half-occupancy location (perhaps missing from the model) would have its own H-bonding network. I beg to differ. If a side chain has 2 or more positions, one should be a bit careful about making firm conclusions based on only one of those, even if it isn't clear exactly why one should use caution. Also, isn't the isotropic B we fit at medium resolution more of a spherical cow approximation to physical reality anyway? Phoebe Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] what to do with disordered side chains
On 3/31/2011 10:14 AM, Jacob Keller wrote: What do we gain? As Dale pointed out, we are already abusing either occupancy, B-factor or delete the side chain to compensate for our inability to tell the user that the side chain is disordered. With your proposal, we would fudge both occupancy and B-factor, which in my eyes is even worse as fudging just one of the two. We gain clarity to the non-crystallographer user: a b-factor of 278.9 sounds like possibly something real. A b-factor of exactly 1000 does not. Both probably have the same believability, viz., ~zero. Also, setting occupancy = zero is not fudging but rather respectfully declining to comment based on lack of data. I think it is exactly the same as omitting residues one cannot see in the density. These things are never clear unless there is a solid definition of the terms you are using. I don't think you can come up with an out of band value for the B factor that doesn't have a legitimate meaning as an atomic displacement parameter for someone. How large a B factor you can meaningfully define depends on your lower resolution limit. People working with electron microscopy or small angle X-ray scattering could easily build models with ADPs far larger than anything we normally encounter. In addition, you can't define 1000 as a magic value since the PDB format will only allow values up to 999.99, and I presume maintaining the PDB format is one of your goals. Of course, you could choose -99.99 as the magic value but that would break all of our existing software and I presume you don't want that either. Actually defining any value for the B factor as the magic value would break all of our software. The only advantage of a large, positive, number is that it would create bugs that are more subtle. The fundamental problem with your solution is that you are trying to cram two pieces of information into a single number. Such density always causes problems. Each concept needs its own value. You could implement your solution easily in mmCIF. Just create a new tag, say _atom_site.imaginary_site, which is either true or false for every atom. Then everyone would be able to either filter out the fake atoms or leave them in, without ambiguity or confusion. If you object that the naive user of structural models wouldn't know to check this tag - they aren't going to know about your magic B factor either. You can't out-think someone who's not paying attention. At some point you have to assume that people being paid to perform research will learn the basics of the data they are using, even if you know that assumption is not 100% true. Dale Tronrud
Re: [ccp4bb] what to do with disordered side chains
and reads it back in at the start of the next cycle. There can be no difference between the meaning of the parameters in memory and on disk.) A great deal of what we do with our models depends on the details of the definitions of these parameters. Adding extra meanings and special cases causes all sorts of problems at all levels of use. either. You can't out-think someone who's not paying attention. At some point you have to assume that people being paid to perform research will learn the basics of the data they are using, even if you know that assumption is not 100% true. Well, the problem is not *should* but *do*. Should we print bilingual danger signs in the US? Shouldn't we assume that people know English? But there is danger, and we care about sparing lives. Here too, if we care about the truth being abused or missed, it seems we should go out of our way. I've not advocated doing nothing. I've advocated that the solution we choose should be clearly defined and that definition be consistent with past definitions (as much as possible) and consistent with the principles of data structures created by the people who study such things. We *should* go out of our way to make a solution to this common problem. The solution we choose should be one that actually solves the problem and not simply creates more confusion. Dale Tronrud P.S. I just Googled occupancy zero. The top hit is a letter from Bernhard Rupp recommending that occupancies not be set to zero :-)
Re: [ccp4bb] what to do with disordered side chains
Clearly there are strong feelings held by the advocates of the several solutions to the problem of what to do about atoms that cannot be reliably placed based on the electron density map. I certainly understand since I passionately support my own favorite solution. Why is it that a community of generally reasonable people keep coming back to this same issue and yet fail to find a solution that can reach some kind of consensus? My 2 cents on this, more fundamental, issue: A model created by someone who believes that all atoms (for a residue with any atoms) must be built will contain two kinds of atoms. Those placed based on the appearance of the electron density and those placed in some convenient location simply to fill out the atom count. I think most everyone agrees that a full residue is a convenience for some users of our models. What those of us who favor partial models want is an absolutely clear distinction between the two classes of atoms. All this needs is a bit. Literally, one bit of data that flags those atoms added to the model simply to complete the set. Why can't we come to a solution that satisfies? Because we continue to use a non-extensible file format that does not allow us a place to put this bit. Some people want to put the bit in the occupancy column by defining a special value (occ=0) that would be the flag. Some people want to put it in the B factor column by defining a special value there (a couple possibilities here, B=1000.00, B=500.00, B varying but larger than that of any atom built into density). The B factor and occupancy columns in the PDB file have been precisely defined back when the mmCIF dictionary was created and to change their definitions now would require opening that process again. I am pretty sure that committee in charge will never allow a definition for these items that includes the phrase ... except when the value is equal too You can't run a database that way. Each piece of information has to have its own tag and definition. Once it is defined we can embrace the task of educating software developers and our collaborators who use our models in its meaning. There is just no place to put this bit in a PDB format file. mmCIF - its trivial. PDB format - no way. As long as we insist that this format is the preferred means of distributing our models we will continue to return to this argument again and again with no possibility of coming to a solution. Dale Tronrud P.S. I've even thought about using the model of the REMARK statement, where all sorts of information have been added by the hack of standardized remarks. I thought that one could create a standardized footnote that would mark the atoms as imaginary. I found that, unfortunately, footnotes were removed from the PDB format many years ago. On 4/3/2011 11:01 AM, Boaz Shaanan wrote: The original posting that started this thread referred to side-chains, as the subject still suggests. Do you propose to omit only side-chain atoms, in which case you end up with different residues, as pointed out by quite a few people,or do you suggest also to omit the main-chain atoms of the problematic residues ? Besides, as mentioned by Phoebe and others, many users (non-crystallographers) of PDB's know already the meaning of the B-factor and will know how to interpret a very high B. It is our task (the crystallographers) to enllighten those who don't know what the B column in a PDB entry stands for. I certainly do and I'm sure many of us do so too. I voted for high B and would vote for it again, if asked. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] On Behalf Of Bernhard Rupp (Hofkristallrat a.D.) [hofkristall...@gmail.com] Sent: Sunday, April 03, 2011 7:42 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] what to do with disordered side chains Thus my feeling is that if one does NOT see the coords in the electron density, they should NOT be included, and let someone else try to model them in, but they should be aware that they are modeling them. Joel L. Sussman Concur. BMC p 680 ‘How to handle missing parts’ Best wishes, BR On 3 Apr 2011, at 06:15, Frances C. Bernstein wrote: Doing something sensible in the major software packages, both for graphics and for other analysis of the structure, could solve the problem for most users. But nobody knows what other software is out there being used by individuals or small groups. And the more remote the authors of that software are from protein structure solution the more likely it is that they have not/will not properly handle atoms with zero occupancy or high B values, for example. I am absolutely positive that there is software
Re: [ccp4bb] what to do with disordered side chains
The definition of _atom_site.occupancy is The fraction of the atom type present at this site. The sum of the occupancies of all the atom types at this site may not significantly exceed 1.0 unless it is a dummy site. When an atom has an occupancy equal to zero that means that the atom is NEVER present at that site - and that is not what you intend to say. Setting the occupancy to zero does not mean that a full atom is located somewhere in this area. Quite the opposite. (The reference to a dummy site is interesting and implies to me that mmCIF already has the mechanism you wish for.) Having some experience with refining low occupancy atoms and working with dummy marker atoms I'm quite confident that you can never define a B factor cutoff that would work. No matter what value you choose you will find some atoms in density that refine to values greater than the cutoff, or the limit you choose is so high that you will find marker atoms that refine to less than the limit. A B factor cutoff cannot work - no matter the value you choose you will always be plagued with false positives or false negatives. If you really want to stuff this bit into one of these fields you have to go all out. Set the occupancy of a marker atom to -99.99. This will unambiguously mark the atom as an imaginary one. This will, of course, break every program that reads PDB format files, but that is what should happen in any case. If you change the definition of the columns in the file you must mandate that all programs be upgraded to recognized the new definitions. I don't know how you can do that other than ensuring that the change will cause programs to cough. To try to slide it by with a magic value that will be silently accepted by existing programs is to beg for bugs and subtle side-effects. Good luck getting the maintainers of the mmCIF standard to accept a magic value in either of these fields. How about this: We already have the keywords ATOM and HETATM (and don't ask me why we have two). How about we create a new record in the PDB format, say IMGATM, that would have all the fields of an ATOM record but would be recognized as whatever the marker is for dummy atoms in the current mmCIF? Existing programs would completely ignore these atoms, as they should until they are modified to do something reasonable with them. Those of us who have no use for them can either use a switch in the program to ignore them or just grep them out of the file. Someone could write a program that would take a model with only ATOM and HETATM records and fill out all the desired IMGATM records (Let's call that program WASNIAHC, everyone would remember that!). This solution is unambiguous. It can be represented in current mmCIF, I think. The PDB could run WASNIAHC themselves after deposition but before acceptance by the depositor so people like me would not have to deal with them during refinement but would be able to see them before our precious works of art are unleashed on the world. Seems like a win-win solution to me. Dale Tronrud On 4/3/2011 9:17 PM, Jacob Keller wrote: Well, what about getting the default settings on the major molecular viewers to hide atoms with either occ=0 or bcutoff (novice mode?)? While the b cutoff is still be tricky, I assume we could eventually come to consensus on some reasonable cutoff (2 sigma from the mean?), and then this approach would allow each free-spirited crystallographer to keep his own preferred method of dealing with these troublesome sidechains and nary a novice would be led astray JPK On Sun, Apr 3, 2011 at 2:58 PM, Eric Bennetter...@pobox.com wrote: Most non-structural users are familiar with the sequence of the proteins they are studying, and most software does at least display residue identity if you select an atom in a residue, so usually it is not necessary to do any cross checking besides selecting an atom in the residue and seeing what its residue name is. The chance of somebody misinterpreting a truncated Lys as Ala is, in my experience, much much lower than the chance they will trust the xyz coordinates of atoms with zero occupancy or high B factors. What worries me the most is somebody designing a whole biological experiment around an over-interpretation of details that are implied by xyz coordinates of atoms, even if those atoms were not resolved in the maps. When this sort of error occurs it is a level of pain and wasted effort that makes the pain associated with having to build back in missing side chains look completely trivial. As long as the PDB file format is the way users get structural data, there is really no good way to communicate atom exists with no reliable coordinates to the user, given the diversity of software packages out there for reading PDB files and the historical lack of any standard way of dealing with this issue. Even if the file format is hacked there is no way to force all the existing