[ccp4bb] opening in Zbyszek Otwinowski lab
Dear All, I have an opening in my laboratory at UT Southwestern for someone who would like to work on multi-crystal data processing in X-ray crystallography and/or low-resolution model building both in X-ray crystallography and cryo-EM. The ideal candidate will have good programming skills (e.g. C, C++, Swift), excellent understanding of linear algebra, and previous experience with the experimental side of any data-intensive field. A knowledge of CUDA programming and data mining techniques is a plus. Please send to z...@work.swmed.edu a 1-page resume containing: a link to your Google Scholar profile and other information about you that you consider to be the most important. Although the deadline for the application is February 15th, I will be interviewing candidates by Skype or in person (for Texas-based applicants) as soon as they are identified. Zbyszek Otwinowski UT Southwestern Medical Center 5323 Harry Hines Blvd., Dallas, TX 75390-8816 (214) 645 6385 (phone) (214) 645 6353 (fax) zbys...@work.swmed.edu
Re: [ccp4bb] ligand binding and crystal form
What is the height of non-origin Patterson function peak for your data sets? C-centered cells: 216.5 345.8 145.290.090.090.0 and 147.0 354.3 217.490.090.090.0 are very different; however, they have common subgroup F222 with similar unit cell parameters. In F222 one can permute the unit cell axes while preserving symmetry operators. For these C centered cells to have approximate F222 subgroup they need to have pseudotranslational symmetry that can be detected by calculating the Patterson function. You should have strong reflections with all indexes even or odd, and other reflections being weaker. What is the spot shape of these weaker spots? In case of pseudotranslational symmetry, MR can produce a pseudosolution related to the correct one by pseudotranslational symmetry vector. Translate your C2221 solution by {0, 0.5, 0.5} and try refining again. Zbyszek Otwinowski > Dear Veronica, > > with 1st, 2nd, 3rd map you mean the density for the same dataset after > three consecutive cycles of building-refining or three different maps from > three different crystals? > If it's the first case, it could be fine, it may mean that at each cycle > you improve the map so you see signal from the different ligand molecules. > If it's the second case, well, it could be simply an artifact. Are the > ligands proximal one to another or bind different sites on the protein? > > Best > V. > > 2016-10-26 14:32 GMT+02:00 Veronica Fiorentino < > veronicapfiorent...@gmail.com>: > >> Hello all, >> I just solved a NCS-tetrameric (biological assembly is just a dimer) >> crystal structures with ligand soak (same plate - same conditions). No >> density for ligand is observed in the first map. In the 2nd, I have 1 >> ligand bound. In the 3rd, I have 2 ligands bound. Is there any reason >> for >> this 'random' behaviour? >> >> In addition, I observed just one crystal out of 20 gave a different unit >> cell. Pointless confirms to me >> "Best Solution:space group C 2 2 2". REFMAC refinement shows R/Rfree >> ~ >> 20/25 % >> Cell from mtz : 216.5 345.8 145.290.090.090.0 >> Space group from mtz: number - 21; name - C 2 2 2 >> >> All other datasets have: >> Cell from mtz : 147.0 354.3 217.490.090.090.0 >> Space group from mtz: number - 20; name - C 2 2 21 >> >> I tried re-processing/refining the C2221 dataset in C222 but R/Rfree >> stays >> ~45%. Can I also consider the C2221 dataset as a 'different crystal >> form'? >> >> Am I safe? >> >> Thank you all, >> Veronica >> > > > > -- > > *Valentina Speranzini, PhD* > European Molecular Biology Laboratory > Grenoble Outstation > 71, avenue des Martyrs, CS 90181 > 38042 Grenoble Cedex 9, France > Web: http://www.embl.fr > E-mail: vsperanz...@embl.fr > Tel: +33 (0)4 76 20 7630 > Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] How to fit BioSAXS shape to the Structure
At low resolution, without interpretable anomalous signal, neither SAXS nor molecular replacement with SAXS model, can distinguish correct from inverted solution. So inverted model will fit crystal data equally well. Only phase extension to much higher resolution (e.g. 5A) can help. Yes, SAXS has an enantiomer problem - mirror image DAMMIN/F reconstructions will give the same fit to the raw scattering data, whereas your protein structure will only fit one hand. SUPCOMB can certainly deal with this problem, as detailed in http://www.embl-hamburg.de/biosaxs/manuals/supcomb.html [image: David Briggs on about.me] David Briggs about.me/david_briggs http://about.me/david_briggs On 26 June 2015 at 12:04, Reza Khayat rkha...@ccny.cuny.edu wrote: Hi, Follow up question on SAXS. Does SAXS have an enantiomer problem like electron microscopy? In other words, does the calculated model possess the correct handedness or can both handedness of a model fit the scattering profile just as well? Best wishes, Reza Reza Khayat, PhD Assistant Professor City College of New York 85 St. Nicholas Terrace CDI 12308 New York, NY 10031 (212) 650-6070 www.khayatlab.org On Jun 26, 2015, at 6:50 AM, David Briggs drdavidcbri...@gmail.com wrote: SASTBX has an online tool for achieving this: http://sastbx.als.lbl.gov/cgi-bin/superpose.html [image: David Briggs on about.me] David Briggs about.me/david_briggs http://about.me/david_briggs On 26 June 2015 at 11:39, Ashok Nayak ashokgocrac...@gmail.com wrote: Dear Weifei, It can also be done manually in Pymol by changing the mouse mode from 3 button viewing to 3 button editing and later moving the envelope onto the X-ray structure or vice-versa, however the best fit can be achieved in SUPCOMB. regards Ashok Nayak CSIR-CDRI, Lucknow India Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] Data reduction
These are two different types of completness indicators Scalepack reports Bragg's law completness, 98.8% of your unique reflections were in diffracting condition. If you use automatic corrections only informative reflections are output, for anisotropic diffraction 64% is reasonable Hi All, Scalepack output says 98.8% complete data but after converted to .mtz file it reduced to 64%. I have tried in CCP4 and phenix both. How is it possible? Ayan Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] X-rays and matter (the particle-wave picture)
The answer to your questions depends on the level of understanding of quantum mechanics. I am sending info where to find the subject discussed in more details. Bernhard Rupp's book page 251 necessarily simplifies a rather complex subject of the photon's interaction with multiple particles. Quantum mechanical wave function can be considered virtual from the measurement process point of view, as the photon (a single quantum) appears in the detector during the measurement process, but not on the way to it. the photon's coherence length The concept of photon's coherence length involves quantum mechanics mixed state. For introduction see: http://en.wikipedia.org/wiki/Quantum_state#Mixed_states virtual waves Quantum mechanical wave function is virtual in certain sense. The Feynman Lectures on Physics Vol 3 covers this subject quite well. appears again in some direction This refers to quantum mechanical wave-particle duality Hello Everybody! I was trying to make some sense from Bernhard Rupp's book page 251. I will copy the relevant part... When photons travel through a crystal, either of two things can happen: (i) nothing, which happens over 99% of the time; (ii) the electric field vector induces oscillations in all the electrons coherently within* the photon's coherence length* ranging from a few 1000 Angstroms for X-ray emission lines to several microns for modern synchrotron sources. At this point, the photon ceases to exist, and we can imagine that the electrons themselves emanate *virtual waves*, which constructively overlap in certain directions, and interfere destructively in others. The scattered photon then *appears again in some direction*, with the probability of that appearance proportional to the amplitude of the combined, resultant scattered wave in that particular direction...The sum of all scattering events of independent, single photons then generates the diffraction pattern. I underlined the problematic parts... can anyone shed some light on this ..or point me in the right direction? Thanks in advance Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353 Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] Space group numbers
How can it be if you're not even sure what the correct space group is? Ambiguities may arise in the presence of pseudosymmetry and/or packing disorders. In some cases, you can determine crystal structure from the same data in different space groups that do not have subgroup/supergroup relationship. One of the space groups may produce better results, something that can be determined quite late into the process. Similar situation may arise then merging data from multiple nearly isomorphous crystals that individually may be better describes by alternative space group symmetries. Zbyszek Otwinowski On 2 October 2014 13:51, Kay Diederichs kay.diederi...@uni-konstanz.de wrote: I don't see any sticking to initial indexing as worthwhile to worry about, since in the first integration, P1 is often used anyway, and it is quite normal (and easy) to re-index after the intensities become available, during scaling. Re-indexing from P1 to the true spacegroup often changes the cell parameters and their order, and this is sufficiently easy and well-documented in the output. Far from it: re-indexing would be a huge problem for us and one we wish to avoid at all costs. We had a case where the systematic absences were ambiguous (not uncommon!) and for a long time it wasn't clear which of two SGs (P21212 or P212121) it was. So we simply kept our options open and assigned the SG in XDS as P222 in all cases. This of course meant that the cell was automatically assigned with abc. We have a LIMS system with an Oracle database which keeps track of all processing (including all the failed jobs!) and it was a fundamental design feature that all crystals of the same crystal form (i.e. same space group similar cell) were indexed the same way relative to a reference dataset (the REFINDEX program ensures this, by calculating the correlation coefficient of the intensities for all possible indexings). So crystals may be initially re-indexed from the processed SG (where for example 2 axes have similar lengths) to conform with the reference dataset (in P222), but then once they are in the database there's no way of storing a re-re-indexed dataset based on a different space group assignment without disruption of all previous processing. We collected datasets from about 50 crystals over a 6 month period and stored the data in the database as we went along before we had one which gave a Phaser solution (having tried all 8 SG possibilities of course), and that resolved the SG ambiguity without reference to systematic absences (it was P212121). But there was no way we were going to go back and re-index everything (for what purpose in any case?), since it would require deleting all the data from the database, re-running all the processing and losing all the logging tracing info of the original processing. However changing the space group in the MTZ header from P222 to P212121 without changing the cell is of course trivial. I don't see how symmetry trumps geometry can be a universal rule. How can it be if you're not even sure what the correct space group is? Also the IUCr convention in say monoclinic space groups requires that for a and c the two shortest non-coplanar axis lengths be chosen which is the same as saying that beta should be as close a possible to 90 (but by convention 90). This is an eminently sensible and practical convention! So in one case a C2 cell with beta = 132 transforms to I2 with beta = 93. It is important to do this because several programs analyse the anisotropy in directions around the reciprocal axes and if the axes are only 48 deg apart you could easily miss significant anisotropy in the directions perpendicular to the reciprocal axes (i.e. parallel to the real axes). So at least in this case it is essential that geometry trumps symmetry. this is true; running in all 8 possible primitive orthorhombic space groups is a fallback that should save the user, and I don't know why it didn't work out in that specific case. Still, personally I find it much cleaner to use the space group number and space group symbol from ITC together with the proper ordering of cell parameters. I rather like to think once about the proper ordering, than to artificially impose abc , and additionally having to specify which is the pure rotation (in 18) or the screw (in 17). And having to specify one out of 1017 / 2017 / 1018/ 2018/ 3018 is super-ugly because a) there is no way I could remember which is which, b) they are not in the ITC, c) XDS and maybe other programs do not understand them. I completely agree that the CCP4 SG numbers are super-ugly: they are only there for internal programmer use and should not be made visible to the user (I'm sure there are lots of other super-ugly things hiding inside software!). Please use the H-M symbols: a) they're trivial to remember, b) they are part of the official ITC convention, c) they're designed
Re: [ccp4bb] correlated alternate confs - validation?
An additional problem is existence of alternative conformations close to rotational axis that violate crystal symmetry. If we want to describe such correlated alternative configurations, we need to describe also how they transform by such rotational axis. This problems may also exists also for other packing contacts; however, for conformations close to rotational axis, symmetry operator cannot preserve conformer ID, and this issue cannot be avoided. Zbyszek Otwinowski I would probably make the two waters alternates of each other. Quite possible, but the group definition, i.e. to which alt conf. side chain they belong, would need to be preserved, too. BR Cheers, Robbie Sent from my Windows Phone _ Van: Bernhard Rupp Verzonden: 23-7-2014 10:19 Aan: CCP4BB@JISCMAIL.AC.UK Onderwerp: [ccp4bb] correlated alternate confs - validation? Hi Fellows, something that may eventually become an issue for validation and reporting in PDB headers: using the Refmac grouped occupancy keyword I was able to form and refine various networks of correlated alternate conformations - it seems to works really well at least in a 1.6 and 1.2 A case I tried. Both occupancy and B-factors refine to reasonable values as expected/guessed from e-density and environment. Respect thanks for implementing this probably underutilized secret. This opens a question for validation: Instead of pretty much ignoring any atoms below occupancy of 1, one can now validate each of the network groupsâ geometry and density fit separately just as any other set of coordinates. I think with increasing data quality, resolution, and user education such refinements will become more frequent (and make a lot more sense than arbitrarily setting guessed independent hard occupancies/Bs that are not validated). Maybe some common format for (annotating) such correlated occupancy groups might eventually become necessary. Best, BR PS: Simple example shown below: two alternate confs of residue 338 which correlate with one water atom each in chain B, with corresponding partial occupancy (grp1: A338A-B5 ~0.6, grp2: A338B-B16 ~0.4). occupancy group id 1 chain A residue 338 alt A occupancy group id 1 chain B residue 5 occupancy group id 2 chain A residue 338 alt B occupancy group id 2 chain B residue 16 occupancy group alts complete 1 2 . more similar⦠occupancy refine AfaIct this does what I want. True? Bernhard Rupp k.-k. Hofkristallamt 001 (925) 209-7429 b...@ruppweb.org b...@hofkristallamt.org http://www.ruppweb.org/ --- Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] Protein Crystallography challenges Standard Model precision
Error estimates for the unit cell dimensions in macromolecular crystallography belong to atypical category of uncertainty estimates. Random error contribution in most cases is below 0.001A, so it can be neglected. Wavelength calibration error can be also made very small; however, I do not know how big it is in practice. Goniostat wobble error is taken into account in Scalepack refinement. Crystal-to-detector distance is not used in postrefinement/global refinement. Due to the measurement error being very small, even small variations in unit cell parameters can be detected within cryocooled crystals. These variations almost always are _orders_of_magnitude_larger_ than measurement uncertainty. Current practise is not to investigate the magnitude of the changes in the unit cell parameters, but when beam smaller than crystal is used, observing variations as large as 1A is not unusual. The main question is: what the unit cell uncertainty means? For most samples I could defend to use values: 0.001A, 0.01A, 0.1A and 1A as reasonable, depending on particular point of view. Without defining what the unit cell uncertainty means, publishing its values is pointless. Zbyszek Otwinowski -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Bernhard, A look at the methods section might give you a clue. Neither XDS nor XSCALE create mmCIF - files (you are talking about mmCIF, not CIF - subtle, but annoying difference), so that the choice is limited. I guess some programmer (rather than a scientist ;-) )used a simple printf commmand for a double precision number so the junk is left over from the memory region or other noise common to conversions. XDS actually prints error estimates for the cell dimensions in CORRECT.LP which could be added to the mmCIF file - a cif (sic!) file, I believe, requires those, by the way and checkCIF would complain about their absence. Cheers, Tim On 07/22/2014 01:01 PM, Bernhard Rupp wrote: I am just morbidly curious what program(s) deliver/mutilate/divine these cell constants in recent cif files: data_r4c69sf # _audit.revision_id 1_0 _audit.creation_date ? _audit.update_record 'Initial release' # _cell.entry_id 4c69 _cell.length_a 100.152000427 _cell.length_b 58.3689994812 _cell.length_c 66.5449981689 _cell.angle_alpha 90.0 _cell.angle_beta99.2519989014 _cell.angle_gamma 90.0 # Maybe a little plausibility check during cif generation might be ok Best, BR PS: btw, 10^-20 meters (10^5 time smaller than a proton) in fact seriously challenges the Standard Model limits.. - Bernhard Rupp k.-k. Hofkristallamt Crystallographiae Vindicis Militum Ordo b...@ruppweb.org b...@hofkristallamt.org http://www.ruppweb.org/ --- - -- - -- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Icedove - http://www.enigmail.net/ iD8DBQFTzk52UxlJ7aRr7hoRAul8AKCHFz/DAoqR7s0fGUp79xx2QlrfCQCeIiiy KXSurhgaQjhguKr9L0/zyVk= =vqGC -END PGP SIGNATURE- Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] Protein Crystallography challenges Standard Model precision
The least-square procedure for unit cell parameter refinement provides very precise estimates of uncertainty. Why they are so precise? Because we use many thousands of unmerged reflections to determine the precision 1 to 6 parameters (unit cell parameters). However, although error propagation through the least squares provides precision of about 0.001 A, or better in some cases, this is only precision not accuracy, and the precision is calculated typically with respect to the unit cell parameters averaged across the exposed volume of a crystal. In practice, the range of unit cell parameters within a crystal can be quite broad, and when we consider accuracy it is not clear, which unit cell parameters should be a reference point. Typically, the distribution of unit cell parameters in a crystal will not follow Gaussian distribution. Therefore, the accuracy of unit cell parameters determination is not well defined, even when we know the experimental conditions very well and propagate experimental uncertainties correctly. Variability of unit cell parameters can be quite high for data sets from different samples. However, description of this variability is typically not related to the very high precision of determination of unit cell parameters for an individual sample. Zbyszek On 07/22/2014 12:33 PM, Tim Gruene wrote: Dear Zbyszek, when you optimise a set of parameters against a set of data, I guess you can also provide their errors. If I understand correctly, this comes with least-squares-routines. I only pointed out that cell errors are listed in the XDS output (provided you refine them, of course). I am sure those errors are well defined. Best wishes, Tim On 07/22/2014 06:53 PM, Zbyszek Otwinowski wrote: Error estimates for the unit cell dimensions in macromolecular crystallography belong to atypical category of uncertainty estimates. Random error contribution in most cases is below 0.001A, so it can be neglected. Wavelength calibration error can be also made very small; however, I do not know how big it is in practice. Goniostat wobble error is taken into account in Scalepack refinement. Crystal-to-detector distance is not used in postrefinement/global refinement. Due to the measurement error being very small, even small variations in unit cell parameters can be detected within cryocooled crystals. These variations almost always are _orders_of_magnitude_larger_ than measurement uncertainty. Current practise is not to investigate the magnitude of the changes in the unit cell parameters, but when beam smaller than crystal is used, observing variations as large as 1A is not unusual. The main question is: what the unit cell uncertainty means? For most samples I could defend to use values: 0.001A, 0.01A, 0.1A and 1A as reasonable, depending on particular point of view. Without defining what the unit cell uncertainty means, publishing its values is pointless. Zbyszek Otwinowski Hi Bernhard, A look at the methods section might give you a clue. Neither XDS nor XSCALE create mmCIF - files (you are talking about mmCIF, not CIF - subtle, but annoying difference), so that the choice is limited. I guess some programmer (rather than a scientist ;-) )used a simple printf commmand for a double precision number so the junk is left over from the memory region or other noise common to conversions. XDS actually prints error estimates for the cell dimensions in CORRECT.LP which could be added to the mmCIF file - a cif (sic!) file, I believe, requires those, by the way and checkCIF would complain about their absence. Cheers, Tim On 07/22/2014 01:01 PM, Bernhard Rupp wrote: I am just morbidly curious what program(s) deliver/mutilate/divine these cell constants in recent cif files: data_r4c69sf # _audit.revision_id 1_0 _audit.creation_date ? _audit.update_record 'Initial release' # _cell.entry_id 4c69 _cell.length_a 100.152000427 _cell.length_b 58.3689994812 _cell.length_c 66.5449981689 _cell.angle_alpha 90.0 _cell.angle_beta99.2519989014 _cell.angle_gamma 90.0 # Maybe a little plausibility check during cif generation might be ok Best, BR PS: btw, 10^-20 meters (10^5 time smaller than a proton) in fact seriously challenges the Standard Model limits.. Bernhard Rupp k.-k. Hofkristallamt Crystallographiae Vindicis Militum Ordo b...@ruppweb.org b...@hofkristallamt.org http://www.ruppweb.org/ --- Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353 -- Zbyszek Otwinowski UT Southwestern Medical Center 5323 Harry Hines Blvd., Dallas, TX 75390-8816 (214) 645 6385 (phone) (214) 645 6353 (fax) zbys...@work.swmed.edu
Re: [ccp4bb] definitions of unique reflections
My preference is to use the term 'observed' for reflections whose intensities have been integrated, and the term 'informative' for those that satisfy some statistical criteria of being useful for structure determination. Programs like Truncate have hidden criteria of rejecting some observed reflections from the informative group, so this issue has been around for a long time. For a typical, properly done data collection, resolution limit is a widely used criterion of informativity. For anisotropic diffraction, a single number is definitely not a proper way to define the resolution limit. So we need something like signal-to-noise ratio cut-off to define a better equivalent of the resolution limit. The question is what we mean by signal-to-noise: it can be individual (unique/merged) reflection values (a wide-spread practice in small molecule crystallography, and for a good reason) or either signal, noise, or both of them, which are group averages rather than individual estimates, Personally, I prefer a ratio of an average signal to an individual uncertainty as a criterion that defines the informativity limit equivalent to a resolution cut-off. The second aspect of the issue is what value of the signal-to-noise ratio (however defined) should be the limiting criterion. The value around 2 represents a limit of what is 'fully' informative, and, as has been discussed, lower values of signal-to-noise provide some extra information. Around the ratio of 1, the value of the information becomes minimal. So for me, there are 2 types of data completeness: one, in terms of Bragg's condition, which defines if we missed part of reciprocal space during the experiment; and second, in terms of what is informative for structure solution. The second type will typically be low for resolution range close to the limit in the case of anisotropic diffraction. There is, therefore, nothing wrong in terms of how the experiment was done, if such completeness is low; on the other hand, the first type can tell us whether the experiment could be done better. So there are good reasons to report both types of completeness in the publication and in the deposit, even if there is no such custom yet. Zbyszek Otwinowski There is some disagreement on terms used to deposit data. We need a definition and an algorithm for each definition. Unique Reflections My definition is all the possible reflections out to the high resolution reported not related by symmetry. Where can I find this? The .mtz contains a list of all HKL calculated to the highest resolution. Usually, we are not able to measure all these diffraction spots due to limits of the detector, mechanical limits, crystal orientation, etc. 'Total reflections' The depositions server asks for total reflections. I assume it wants only those unique reflections we were able to collect, regardless of the sigma cut off. These are called 'observed'. The total we use in refinement will be a subset of the 'unique observed' that are cut on sigma. However, some crystallographers believe that we should not cut on sigma since some of the intensities may in fact be zero. Is this a question for both the Refmac and Phenix people? Please give us some guidance and maybe a reference or two that we can use. -- Kenneth A. Satyshur, M.S.,Ph.D. Senior Scientist University of Wisconsin Madison, Wisconsin 53706 608-215-5207 Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] Help in Cell content analysis
If your translational NCS is defined by a vector that does not correspond to lattice centering, i.e. has numbers different from 0 or 0.5, this is likely a case of order-disorder. Most such cases can be easily diagnosed by abnormal patterns in spot shape, e.g. every second reflection has a non-Bragg streak associated with it. Apparent dense packing, 18% of the solvent, is likely to arise from random packing of molecules in alternative positions within the unit cell, where every second position is occupied. This randomness can be cross-correlated between cells, and this will produce a diffuse scattering. An alternative explanation is that you crystallised a proteolitic fragment of your protein. Zbyszek Otwinowski Dear all i have a small query to ask and seek your suggestions: I have collected a data for a protein with 324 residues and processed at its best in P212121. So Matthews suggest 1 mol in ASU with expected Mol. weight of 43 kDa with sovent content of 58% and 2 mol./ASU with 18% solvent content. However the data suggest possibility of translational NCS so i think i should ask for two molecules so that both get corrected for NCS. However for 2 mol./ASU, Matthewssuggests a total mol. weight of 52 kDa. So how to decide which way to proceed for MR? Thanks Monica Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] Reprocess data with new resolution cutoff?
Reprocessing data to lower resolution only helps if there are ice rings or other sources of non-desired diffraction that can be eliminated as contributors to learned profiles in profile fitting. Strong ice diffraction occurs at 2.28A and 2.68A, so there is no indication that reprocessing data to lower resolution will change anything other than overall R-merge and other R-statistics. To calculate these statistics it is enough to re-merge the data with lower resolution. Zbyszek Otwinowski Hi all, This is a basic question and I'm sure the answer is widely known, but I'm having trouble finding it. I'm working on my first structure. I have a dataset that I processed in XDS with a resolution cutoff of 2.35 A, although the data are extremely weak-to-nonexistent at that resolution limit. After successful molecular replacement and initial refinement, I then performed paired refinements against this dataset cut to various resolutions (2.95 A, 2.85 A, 2.75 A, etc). Based on the improvement in R/Rfree seen between successive pairs, it appears that the data should be cut at around 2.55 A. Here is my question: as I proceed with refinement (I'm currently using Phenix), should I now simply set 2.55 A as the resolution limit in Phenix? Or should I go back to XDS and actually reprocess the data with the new limit (2.55 A instead of 2.35 A)? Thanks, Tom Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] metals disapear
My comments: Such observation is very uncommon for metals involved in catalysis by proteins. I have seen quite a few such structures involving Mg, Ca, Fe, Mn, Zn and most of the radiation damage was not at the catalytic metal. In case of Fe once I noticed slight shift in the position of the Fe ion upon exposure. The only metal ion that was significantly affected is Hg, and this was observed in multiple cases. We published results of radiation damage as a function of temperature going down up to 15K. There was some overall reduction of radiation damage, by about factor of 1.7; however, the most of the impact was away from the catalytic site. As a result, relative radiation damage was MORE concentrated at the catalytic site. Metals (Ca and Mn) were not particularly affected at any temperature. We observed that nitrate and iodine scavenge radicals help reduce specific radiation damage. However, increased X-ray absorption by iodine does makes overall situation worse and impact of nitrate was observed only at relatively low doses (up to 2 MGy). Kmetko at all. (2011) were negative about potential of using scavengers in general, including ascorbic acid. Data collection wavelength does not matter! It is an urban legend that shorter wavelength will help. I remember it being debunked two decades ago, and somehow it is still alive. Zbyszek Otwinowski -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear Dean, this is probably a very common observation: X-rays produce reducing electrons and as you reduce a metal I imagine it does not like its chemical environment as much as it did highly charged. Everything you can do to avoid radiation damage should help you prevent the ion to disappear: - - optimise your strategy to collect a minimal amount of data - - add vitamin C - - cool below 100K - - collect at short wavelength When your ion is intended to be used for phasing there are of course restraints limiting the choice. Regards, Tim On 04/30/2014 12:33 PM, Dean Derbyshire wrote: Hi all, Has anyone experienced catalytic metal ions disappearing during data collection ? If so, is there a way of preventing it? D. Dean Derbyshire Senior Research Scientist [cid:image001.jpg@01CF6470.5FA976D0] Box 1086 SE-141 22 Huddinge SWEDEN Visit: Lunastigen 7 Direct: +46 8 54683219 www.medivir.comhttp://www.medivir.com -- This transmission is intended for the person to whom or the entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the intended recipient, please be notified that any dissemination, distribution or copying is strictly prohibited. If you have received this transmission in error, please notify us immediately. Thank you for your cooperation. - -- - -- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Icedove - http://www.enigmail.net/ iD8DBQFTYNSPUxlJ7aRr7hoRAr7WAKCzC7FzqTkcVLILovmIL74OUQlsWQCgg2Yr xZgDCvIlf5YEWHLTDLiKcRc= =tp4F -END PGP SIGNATURE- Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] small molecule crystallography
-- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A -- Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-33021 or -33068 Fax. +49-551-39-22582 Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
[ccp4bb] Error in ccp4lib?
I am reading an external file, which contains phases and ABCDs in the space group P43212. My file has an asymmetric unit with k= h. Since CCP4 uses a different asymmetric unit with h=k, this requires phase and ABCD coefficients transformation. The transformation seems to be correct for reflections with initial h not equal to zero, but gives wrong result for 0 k l reflections. Zbyszek Otwinowski Zbyszek Otwinowski UT Southwestern Medical Center 5323 Harry Hines Blvd., Dallas, TX 75390-8816 (214) 645 6385 (phone) (214) 645 6353 (fax) zbys...@work.swmed.edu
Re: [ccp4bb] Error in ccp4lib?
Centrosymmetric reflections typically have C=0 and D=0, although non-zero values should not matter, as they do not modify phase probabilities for centrosymmetric reflections. Somehow, entering non-zero values for C and D for centrosymmetric reflection creates strange results during transformation of phase. Definitively a bug in ccp4lib, however only triggered by non-standard input. In practice, probably does not matter much. On 03/24/2014 06:11 PM, Eleanor Dodson wrote: You don't say how you are doing the transformation? I would simply input the file to cad cad hklin1 thisfile.mtz hklout newfile.mtz labi file 1 allin end I think (and hope) that the data and phases will be converted correctly to the CCP4 asymmetric unit. Eleanor On 25 Mar 2014, at 09:16, Zbyszek Otwinowski wrote: I am reading an external file, which contains phases and ABCDs in the space group P43212. My file has an asymmetric unit with k= h. Since CCP4 uses a different asymmetric unit with h=k, this requires phase and ABCD coefficients transformation. The transformation seems to be correct for reflections with initial h not equal to zero, but gives wrong result for 0 k l reflections. Zbyszek Otwinowski Zbyszek Otwinowski UT Southwestern Medical Center 5323 Harry Hines Blvd., Dallas, TX 75390-8816 (214) 645 6385 (phone) (214) 645 6353 (fax) zbys...@work.swmed.edu -- Zbyszek Otwinowski UT Southwestern Medical Center 5323 Harry Hines Blvd., Dallas, TX 75390-8816 (214) 645 6385 (phone) (214) 645 6353 (fax) zbys...@work.swmed.edu
Re: [ccp4bb] twinning problem ?
On 03/13/2014 10:55 AM, Keller, Jacob wrote: Unless you are interested in finding curious objects, what would you do with protein quasicrystal? The practices of macromolecular crystallography is about determining 3-dimensional structure of objects being crystallized. Protein quasicrystal are really unlikely to diffract to high enough resolution, and even ignoring all other practical aspects, like writing programs to solve such a structure, chances of building an atomic model are really slim. Right, if crystallography is seen as purely a tool for biology I agree. As for curious objects, I think almost all profound breakthroughs come from unadulterated curiosity and not desire for some practical end. Not sure why a priori this should be so, but just consider your favorite scientific breakthrough and whether the scientist set out to make the discovery or not. Some are, but most are not, I think. Maybe aperiodic protein crystals have some important function in biology somewhere, or have unforeseen materials science properties, analogous to silk or something. This is easy to test by analyzing diffraction patterns of individual crystals. In practice, the dominant contribution to angular broadening of diffraction peaks is angular disorder of microdomains, particularly in cryo-cooled crystals. However, exceptions do happen, but these rare situations need to be handled on case by case basis. The interpretation of the data presented in this article is that variation in unit cell between microcrystals induce their spatial misalignment. The data do not show variation of unit cell within individual microscrystalline domains. Tetragonal lysozyme can adopt quite a few variations of the crystal lattice during cryocooling. Depending on the conditions used, resulting mosaicity can vary from 0.1 degree (even for 1mm size crystal) to over 1. degree. Consequently, measured structure factors from a group of tetragonal lysozyme crystal can be quite reproducible, or not. As a test crystal, it should be handled with care. 1 degree mosaicity is not an impediment to high quality measurements. However, high mosaicity tends to correlate with presence of phase transitions during cryo-cooling. If such transition happen during cryo-cooling, crystals of the same protein, even from the same drop, may vary quite a lot in terms of structure factors. Additionally, even similar values of unit cell parameters are not guarantee of isomorphism between crystals. So I think you are saying that tetragonal lysozyme is an atypical case, and that normally the main contributor to the fitted parameter mosaicity is the phenomenon of microdomains shifted slightly in orientation. Maybe we can get the author to repeat the study for the other usual-suspect protein crystals to find out the truth, but the score currently seems to be 1-0 in favor of cell parameter shifts versus microcrystal orientation... No, I claim that the particular crystal studied by Colin Nave (Acta Cryst. 1998, D54: 848) is atypical case. I processed myself hundreds of tetragonal lysozyme data sets acquired on crystals grown and mounted by various people, so I believe that my experience defines better a typical case. The second reference, nicely provided by Colin, does not make the conclusion that dominant imperfection appeared to be a variation in unit-cell dimensions in the crystal, but rather states that The analysis further suggests that LT disorder is governed by variability inherent in the cooling process combined with the overall history of the crystal. As you can see on the figure 5A in Juers at al, 2007, the mosaicity is a dominant component of the reflection width for resolution higher than 8A. Only for very low resolutions one can see the effect of unit cell changes. What is important is that the crystal analyzed had a very low mosaicity: less than 0.02 degree before cryo-cooling and less than 0.1 degree after cryo-cooling. The mosacity after cryo-cooling is definitely below typical values. One has to remember that not only unit cell parameters are different for different microdomains, but also their structure factors will vary and can change quite a lot. Cryo-cooled crystals definitely can have high degree of internal non-isomorphism resulting from this effect. Zbyszek -- Zbyszek Otwinowski UT Southwestern Medical Center 5323 Harry Hines Blvd., Dallas, TX 75390-8816 (214) 645 6385 (phone) (214) 645 6353 (fax) zbys...@work.swmed.edu
Re: [ccp4bb] twinning problem ?
How to approach the analysis of such a problem: For any sample, crystalline or not, a generally valid description of diffraction intensity is it being a Fourier transform of electron density autocorrelation function. There are obvious normalizations involved. For crystals, this autocorrelation function is periodic and is called a Patterson function when it is derived from diffraction data. In the case of statistical disorder, an important factor characterizing it is autocorrelation of alternative conformations when they are displaced by unit cell periodicities. If such autocorrelation is zero, we have a pure statistical disorder; in such a case, we should add structure factors of alternative conformations to create a calculated F. There will be also diffused scattering from the disorder, but it will not be aligned with Bragg diffraction. More often, the presence of a particular alternative conformation will affect the probability of alternative conformation a unit cell away, and this needs to be considered separately for every unit cell translation. If this correlation is very strong - close to 1 - we have a situation similar or identical to merohedral twinning, and one should add F^2 from alternative models. In an intermediate case, when autocorrelation in a particular direction is between zero and one, the Fourier transform will produce streaks in diffraction pattern and the alignment of these streaks will be related to the properties of the autocorrelation function. Unfortunately, this creates problems when dealing with reduced data sets. Mosaicity is a very different phenomenon. It describes a range of angular alignments of microcrystals with the same unit cell within the sample. It broadens diffraction peaks by the same angle irrespective of the data resolution, but it cannot change the length of diffraction vector for each Bragg reflection. For this reason, the elongation of the spot on the detector resulting from mosaicity will be always perpendicular to the diffraction vector. This is distinct from the statistical disorder, where spot elongation will be aligned with the crystal lattice and not the detector plane. Obviously, no phase information can be derived from the spot shapes resulting from mosaicity. Interestingly, there is a potential for extracting phase information from spot shapes induced by statistical disorder. However, it is far from simple and can be used only to improve phases. It is not promising as an ab initio phasing method. This discussion assumed only one unit cell periodicity in the sample, which is the desired state in all cases. In cryo-cooled crystals, the rate of cooling is different for different parts of the sample, resulting quite often in different unit cell periodicities across the sample. Now there are multiple possibilities to consider; quite typically, the crystal symmetry is the same and the range of unit cell variability is small. This results in variable spot shape elongation, with angular range being resolution-dependent and elongation not necessarily perpendicular to the diffraction vector. By just looking at diffraction pattern, it is easy to distinguish this case from mosaicity. In such samples, a problem arises when rotation exposes distinctly different phases at different orientations. The resulting diffraction data will merge with poor statistics, as distinct structure factors will be merged together. Such condition is quite typical when large crystals are exposed with microbeams. Presence of different crystal forms also provides phasing opportunities known as averaging between crystals. However, this requires separate data set collection rather than mixing such crystals during one rotation sweep. Presence of multiple, similar unit cells in the sample is completely different and unrelated condition to statistical disorder. Zbyszek Otwinowski Not sure I understand why having statistical disorder makes for streaks--does the crystal then have a whole range of unit cell constants, with the spot at the most prevalent value, and the streaks are the tails of the distribution? If so, doesn't having the streak imply a really wide range of constants? And how would this be different from mosaicity? My guess is that this is not the right picture, and this is indeed roughly what mosaicity is. Alternatively, perhaps the streaks are interpreted as the result of a duality between the unit cell, which yields spots, and a super cell which is so large that it yields extremely close spots which are indistinguishable from lines/streaks. Usually this potential super cell is squelched by destructive interference due to each component unit cell being very nearly identical, but here the destructive interference doesn't happen because each component unit cell differs quite a bit from its fellows. And I guess in the latter case the supercell would have its cell constant (in the direction of the streaks) equal to (or a function of) the coherence length
Re: [ccp4bb] twinning problem ?
On 03/12/2014 04:15 PM, Keller, Jacob wrote: For any sample, crystalline or not, a generally valid description of diffraction intensity is it being a Fourier transform of electron density autocorrelation function. I thought for non-crystalline samples diffraction intensity is simply the Fourier transform of the electron density, not its autocorrelation function. Is that wrong? The Fourier transform of electron density is a complex scattering amplitude that by the axiom of quantum mechanics is not a measurable quantity. What is measurable is the module squared of it. In crystallography, it is called either F^2 (formally equal F*Fbar) or somewhat informally diffraction intensity, after one takes into account scaling factors. F*Fbar is the Fourier transform of an electron density autocorrelation function regardless if electron density is periodic or not. For periodic electron density the structure factors are described by sum of delta Dirac functions placed on the reciprocal lattice. These delta functions are multiplied by values of structure factors for corresponding Miller indices. Anyway, regarding spot streaking, perhaps there is a different, simpler formulation for how they arise, based on the two phenomena: (1) Crystal lattice convoluted with periodic contents, e.g., protein structure in exactly the same orientation (2) Crystal lattice convoluted with aperiodic contents, e.g. n different conformations of a protein loop, randomly sprinkled in the lattice. Option (1) makes normal spots. If there is a lot of scattering material doing (2), then streaks arise due to many super-cells occurring, each with an integral number of unit cells, and following a Poisson distribution with regard to frequency according to the number of distinct conformations. Anyway, I thought of this because it might be related to scattering from aperiodic crystals, in which there is no concept of unit cell as far as I know (just frequent distances), which makes them really interesting for thinking about diffraction. This formulation cannot describe aperiodic contents. The convolution of a crystal lattice with any function will result in electron density, which has a perfect crystal symmetry of the same periodicity as the starting crystal lattice. See the images here of an aperiodic lattice and its Fourier transform, if interested: http://postimg.org/gallery/1fowdm00/ This is interesting case of pseudocrystal, however because there is no crystal lattice, it is not relevant to (1) or (2). In any case, pentagonal quasilattices are probably not relevant to macromolecular crystallography. Mosaicity is a very different phenomenon. It describes a range of angular alignments of microcrystals with the same unit cell within the sample. It broadens diffraction peaks by the same angle irrespective of the data resolution, but it cannot change the length of diffraction vector for each Bragg reflection. For this reason, the elongation of the spot on the detector resulting from mosaicity will be always perpendicular to the diffraction vector. This is distinct from the statistical disorder, where spot elongation will be aligned with the crystal lattice and not the detector plane. I have been convinced by some elegant, carefully-thought-out papers that this microcrystal conception of the data-processing constant mosaicity is basically wrong, and that the primary factor responsible for observed mosaicity is discrepancies in unit cell constants, and not the microcrystal picture. I think maybe you are referring here to theoretical mosaicity and not the fitting parameter, so I am not contradicting you. I have seen recently an EM study of protein microcrystals which seems to show actual tilted mosaic domains just as you describe, and can find the reference if desired. This is easy to test by analyzing diffraction patterns of individual crystals. In practice, the dominant contribution to angular broadening of diffraction peaks is angular disorder of microdomains, particularly in cryo-cooled crystals. However, exceptions do happen, but these rare situations need to be handled on case by case basis. Zbyszek Presence of multiple, similar unit cells in the sample is completely different and unrelated condition to statistical disorder. Agreed! Jacob -- Zbyszek Otwinowski UT Southwestern Medical Center 5323 Harry Hines Blvd., Dallas, TX 75390-8816 (214) 645 6385 (phone) (214) 645 6353 (fax) zbys...@work.swmed.edu
Re: [ccp4bb] twinning problem ?
for improving crystal perfection, defining data-collection requirements and for data-processing procedures. Measurements on crystals of tetragonal lysozyme at room temperature and 100 K were made in order to illustrate how parameters describing the crystal imperfections can be obtained. At 100 K, the dominant imperfection appeared to be a variation in unit-cell dimensions in the crystal. PMID: 9757100 [PubMed - indexed for MEDLINE] -- Zbyszek Otwinowski UT Southwestern Medical Center 5323 Harry Hines Blvd., Dallas, TX 75390-8816 (214) 645 6385 (phone) (214) 645 6353 (fax) zbys...@work.swmed.edu
Re: [ccp4bb] twinning problem ?
Shape of the diffraction spots changes in the statistical disorder -- twinning continuum. At both ends spots shape is like in diffraction from crystals without such disorder. However, in the intermediate case, electron density autocorrelation function has additional component to one resulting from ordered crystal. This additional component of autocorrelation creates characteristic non-Bragg diffraction, e.g. streaks aligned with particular unit cell axis. In the absence of such diffraction pattern, the ambiguity is binary. The description of the problem indicates statistical disorder. Zbyszek Otwinowski Hi, If there's an NCS translation, recent versions of Phaser can account for it and give moment tests that can detect twinning even in the presence of tNCS. But I agree with Eleanor that the L test is generally a good choice in these cases. However, the fact that you see density suggests that your crystal might be more on the statistical disorder side of the statistical disorder -- twinning continuum, i.e. the crystal doesn't have mosaic blocks corresponding to one twin fraction that are large compared to the coherence length of the X-rays. So you might want to try refinement with the whole structure duplicated as alternate conformers. Best wishes, Randy Read - Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical ResearchTel: +44 1223 336500 Wellcome Trust/MRC Building Fax: +44 1223 336827 Hills Road E-mail: rj...@cam.ac.uk Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk On 11 Mar 2014, at 14:10, Eleanor Dodson eleanor.dod...@york.ac.uk wrote: Sorry - hadnt finished.. The twinning tests are distorted by NC translation - usually the L test is safe, but the others are all suspect.. On 11 March 2014 14:09, Eleanor Dodson eleanor.dod...@york.ac.uk wrote: What is the NC translation? If there is a factor of 0.5 that makes SG determination complicated.. Eleanor On 11 March 2014 14:04, Stephen Cusack cus...@embl.fr wrote: Dear All, I have 2.6 A data and unambiguous molecular replacement solution for two copies/asymmetric unit of a 80 K protein for a crystal integrated in P212121 (R-merge around 9%) with a=101.8, b=132.2, c=138.9. Refinement allowed rebuilding/completion of the model in the noraml way but the R-free does not go below 30%. The map in the model regions looks generally fine but there is a lot of extra positive density in the solvent regions (some of it looking like weak density for helices and strands) and unexpected positive peaks within the model region. Careful inspection allowed manual positioning of a completely different, overlapping solution for the dimer which fits the extra density perfectly. The two incompatible solutions are related by a 2-fold axis parallel to a. This clearly suggests some kind of twinning. However twinning analysis programmes (e.g. Phenix-Xtriage), while suggesting the potentiality of pseudo-merohedral twinning (-h, l, k) do not reveal any significant twinning fraction and proclaim the data likely to be untwinned. (NB. The programmes do however highlight a non-crystallographic translation and there are systematic intensity differences in the data). Refinement, including this twinning law made no difference since the estimated twinning fraction was 0.02. Yet the extra density is clearly there and I know exactly the real-space transformation between the two packing solutions. How can I best take into account this alternative solution (occupancy seems to be around 20-30%) in the refinement ? thanks for your suggestions Stephen -- ** Dr. Stephen Cusack, Head of Grenoble Outstation of EMBL Group leader in structural biology of protein-RNA complexes and viral proteins Joint appointment in EMBL Genome Biology Programme Director of CNRS-UJF-EMBL International Unit (UMI 3265) for Virus Host Cell Interactions (UVHCI) ** Email: cus...@embl.fr Website: http://www.embl.fr Tel:(33) 4 76 20 7238Secretary (33) 4 76 20 7123 Fax:(33) 4 76 20 7199 Postal address: EMBL Grenoble Outstation, 6 Rue Jules Horowitz, BP181, 38042 Grenoble Cedex 9, France Delivery address: EMBL Grenoble Outstation, Polygone Scientifique, 6 Rue Jules Horowitz, 38042 Grenoble, France ** Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] Data processing - twinned xtals
This is clearly a case of a crystal with a very long unit cell; a case which should be approached mindfully. HKL2000 has a default search for indexing solutions such that diffraction along the longest unit cell will be resolved, with the assumed spot size. The problem with such diffraction has 2 aspects: 1) how to process the already collected data where the spots are close to each other; 2) how to collect future data. Ad 1) The best solution is to reduce the spot size, so the spots are resolved. This may require an adjustment of spot size by a single pixel; one should not only change spot radius, but also change the box size between even and odd number of pixels in the box dimensions. Just changing the spot radius changes the spot diameter by an even number of pixels, so if one wants to change the spot diameter by one pixel, one has to change the box size. This is the consequence of the spot being in the center of the box. Just during indexing, there is also a workaround by specifying the command before indexing: longest vector followed by a number that defines the upper limit of the cell size. This may help finding indexing, but will create overlaps between spots during refinement and integration. This dataset presents a problem of collecting data by rotating on the axis perpendicular to the long unit cell. In consequence, the Image 1 has essentially (barely differing in centroid position) overlapping spots, so it would be hard to process them meaningfully by any program. Ad. 2) What would be a better way to collect data in the future? Hi CCP4 folks I have a data set which is looks twinned ( see the image-1 - I zoomed on to the image so that one can spot the twinning. Furthermore, the spots are very smeary from ~ 30 - 120 degrees of data collection, see image 2) I tried using HKL2000 and mosflm to process this data but i cannot process it. I was wondering if anyone has any ideas as to how to process this data or comments on whether this data is even useful. Also, I would really appreciate if someone could share their experiences on solving twinning issues during crystal growth Thanks in advance ! Mahesh[image: Inline image 2][image: Inline image 3] Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] Data processing - twinned xtals
This is clearly a case of a crystal with a very long unit cell; a case which should be approached mindfully. HKL2000 has a default search for indexing solutions such that diffraction along the longest unit cell will be resolved, with the assumed spot size. The problem with such diffraction has 2 aspects: 1) how to process the already collected data where the spots are close to each other; 2) how to collect future data. Ad 1) The best solution is to reduce the spot size, so the spots are resolved. This may require an adjustment of spot size by a single pixel; one should not only change spot radius, but also change the box size between even and odd number of pixels in the box dimensions. Just changing the spot radius changes the spot diameter by an even number of pixels, so if one wants to change the spot diameter by one pixel, one has to change the box size. This is the consequence of the spot being in the center of the box. Just during indexing, there is also a workaround by specifying the command before indexing: longest vector followed by a number that defines the upper limit of the cell size. This may help finding indexing, but will create overlaps between spots during refinement and integration. This dataset presents a problem of collecting data by rotating on the axis perpendicular to the long unit cell. In consequence, the Image 1 has essentially (barely differing in centroid position) overlapping spots, so it would be hard to process them meaningfully by any program. Ad. 2) What would be a better way to collect data in the future? Hi CCP4 folks I have a data set which is looks twinned ( see the image-1 - I zoomed on to the image so that one can spot the twinning. Furthermore, the spots are very smeary from ~ 30 - 120 degrees of data collection, see image 2) I tried using HKL2000 and mosflm to process this data but i cannot process it. I was wondering if anyone has any ideas as to how to process this data or comments on whether this data is even useful. Also, I would really appreciate if someone could share their experiences on solving twinning issues during crystal growth Thanks in advance ! Mahesh[image: Inline image 2][image: Inline image 3] Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353 Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] Data processing - twinned xtals
This is continuation of the previous message, which was sent prematurely. In case of crystals with one, very long unit cell data collection strategy needs to be carefully chosen. Ad. 2) What would be a better way to collect data in the future? First, the detector needs to be placed far back enough, so that the spots are resolved; at minimum when the longest unit cell is in the plane of the detector (perpendicular to the beam). To satisfy this condition, it is best to rotate on the axis that is parallel (or close to parallel, within 30 degrees) to the longest unit cell. However, this can be difficult to achieve in some cases. There are two types of workaround in such a situation: a) if the crystal has low mosaicity, the spots may be resolved in angular direction, if a short oscillation is used to collect images; HKL has no problems with 0.1 degree oscillation range; b) in the case of mosaic crystals, when a) doesn't work, a partial solution is increase the detector distance. There will be still a region of reciprocal lattice where the data will be lost due to overlap, but this region may be small enough for the data to be used in structure solution. There is no indication that the particular crystal presented is twinned or highly mosaic, so chances are good that this project will be solved. Zbyszek Otwinowski Hi CCP4 folks I have a data set which is looks twinned ( see the image-1 - I zoomed on to the image so that one can spot the twinning. Furthermore, the spots are very smeary from ~ 30 - 120 degrees of data collection, see image 2) I tried using HKL2000 and mosflm to process this data but i cannot process it. I was wondering if anyone has any ideas as to how to process this data or comments on whether this data is even useful. Also, I would really appreciate if someone could share their experiences on solving twinning issues during crystal growth Thanks in advance ! Mahesh[image: Inline image 2][image: Inline image 3] Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353 Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] rmerge
Dear All, The purpose of statistics in the output of Scalepack is to help the experimenter to assess the data. The question is, what is the purpose of R-merge statistics and its usefulness when its value exceeds 100%? When Scalepack was originally written 20 years ago, I made a decision to output the value 0.000 for R-merge values above 100%. Resolution shell with such R-merge may, depending on circumstances, contain perfectly fine data for structure refinement or data that are completely useless. In general, as in the case that started this discussion, high multiplicity will result in data close to the resolution limit having such high R-merge value. The best way to assess the resolution limit of the collected diffraction is to look at the refinement's R- and R-free factors. However, one has to make a preliminary judgement at an earlier stage about which data to forward to subsequent calculations. The 0.000 R-merge value is simply a pointer to the experimenter that one should pay attention to other criteria than R-merge statistics. I did not want to print N/A or some other non-numerical string to simplify the parsing of Scalepack output. I always considered R-merge as useful statistic only for shells with strong reflections, effectively meaning low-resolution data. For these data high values of R-merge (e.g. 10%) indicate the presence of systematic errors or effects. Otherwise, R-merge is a rather poor proxy for relevance of data. Other indicators that are much more useful to define the resolution limit are: - I/sig(I) if goodness-of-fit (chi^2) is close to 1 in this resolution shell; if not, one should only adjust the error scale factor, not the estimate of systematic error (Scalepack keyword: error systematic); - CC1/2 (or CC*) is the next best criterion; - other criteria can also be used, e.g. Rpim. The current version of HKL suite prints out all these statistics. Quite frequently, when a program, particularly a widely used one, seems to fail, it is an indication that there are issues with the data. This has been the case in other recent thread related to problems with indexing/processing of data. Something needs to be changed in such cases; it could be the input to the program or, in case of R-merge statistics, one should pay attention to something else rather than consider it a program failure. Best regards,
Re: [ccp4bb] Strange density in solvent channel and high Rfree
It is a clear-cut case of crystal packing disorder. The tell-tale sign is that data can be merged in the higher-symmetry lattice, while the number of molecules in the asymmetric unit (3 in P21) is not divisible by the higher symmetry factor (2, by going from P21 to P21212). From my experience, this is more likely a case of order-disorder than merohedral twinning. The difference between these two is that structure factors are added for the alternative conformations in the case of order-disorder, while intensities (structure factors squared) are added in the case of merohedral twinning. Now an important comment on how to proceed in the cases where data can be merged in a higher symmetry, but the structure needs to be solved in a lower symmetry due to a disorder. !Such data needs to be merged in the higher symmetry,assigned R-free flag, and THEN expanded to the lower symmetry. Reprocessing the data in a lower symmetry is an absolutely wrong procedure and it will artificially reduce R-free, as the new R-free flags will not follow data symmetry! Moreover, while this one is likely to be a case of order-disorder, and these are infrequent, reprocessing the data in a lower symmetry seems to be frequently abused, essentially in order to reduce R-free. Generally, when data CAN be merged in a higher symmetry, the only proper procedure in going to a lower-symmetry structure is by expanding these higher-symmetry data to a lower symmetry, and not by rescaling and merging the data in a lower symmetry. Zbyszek Otwinowski Dear all, We have solved the problem. Data processing in P1 looks better (six molecules in ASU), and Zanuda shows a P 1 21 1 symmetry (three molecules in ASU), Rfactor/Rfree drops to 0.20978/0.25719 in the first round of refinement (without put waters, ligands, etc.). Indeed, there were one more molecule in ASU, but the over-merged data in an orthorhombic lattice hid the correct solution. Thank you very much for all your suggestions, they were very important to solve this problem. Cheers, Andrey 2013/3/15 Andrey Nascimento andreynascime...@gmail.com *Dear all,* *I have collected a good quality dataset of a protein with 64% of solvent in P 2 21 21 space group at 1.7A resolution with good statistical parameters (values for last shell: Rmerge=0.202; I/Isig.=4.4; Complet.=93% Redun.=2.4, the overall values are better than last shell). The structure solution with molecular replacement goes well, the map quality at the protein chain is very good, but in the final of refinement, after addition of a lot of waters and other solvent molecules, TLS refinement, etc. ... the Rfree is a quite high yet, considering this resolution (1.77A).(Rfree= 0.29966 and Rfactor= 0.25534). Moreover, I reprocess the data in a lower symmetry space group (P21), but I got the same problem, and I tried all possible space groups for P222, but with other screw axis I can not even solve the structure.* *A strange thing in the structure are the large solvent channels with a lot of electron density positive peaks!? I usually did not see too many peaks in the solvent channel like this. This peaks are the only reason for these high R's in refinement that I can find. But, why are there too many peaks in the solvent channel???* *I put a .pdf file (ccp4bb_maps.pdf) with some more information and map figures in this link: https://dl.dropbox.com/u/16221126/ccp4bb_maps.pdf* * * *Do someone have an explanation or solution for this?* * * *Cheers,* *Andrey* Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] refining against weak data and Table I stats
The difference between one and the correlation coefficient is a square function of differences between the datapoints. So rather large 6% relative error with 8-fold data multiplicity (redundancy) can lead to CC1/2 values about 99.9%. It is just the nature of correlation coefficients. Zbyszek Otwinowski Related to this, I've always wondered what CC1/2 values mean for low resolution. Not being mathematically inclined, I'm sure this is a naive question, but i'll ask anyway - what does CC1/2=100 (or 99.9) mean? Does it mean the data is as good as it gets? Alan On 07/12/2012 17:15, Douglas Theobald wrote: Hi Boaz, I read the KK paper as primarily a justification for including extremely weak data in refinement (and of course introducing a new single statistic that can judge data *and* model quality comparably). Using CC1/2 to gauge resolution seems like a good option, but I never got from the paper exactly how to do that. The resolution bin where CC1/2=0.5 seems natural, but in my (limited) experience that gives almost the same answer as I/sigI=2 (see also KK fig 3). On Dec 7, 2012, at 6:21 AM, Boaz Shaanan bshaa...@exchange.bgu.ac.il wrote: Hi, I'm sure Kay will have something to say about this but I think the idea of the K K paper was to introduce new (more objective) standards for deciding on the resolution, so I don't see why another table is needed. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas Theobald [dtheob...@brandeis.edu] Sent: Friday, December 07, 2012 1:05 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] refining against weak data and Table I stats Hello all, I've followed with interest the discussions here about how we should be refining against weak data, e.g. data with I/sigI 2 (perhaps using all bins that have a significant CC1/2 per Karplus and Diederichs 2012). This all makes statistical sense to me, but now I am wondering how I should report data and model stats in Table I. Here's what I've come up with: report two Table I's. For comparability to legacy structure stats, report a classic Table I, where I call the resolution whatever bin I/sigI=2. Use that as my high res bin, with high res bin stats reported in parentheses after global stats. Then have another Table (maybe Table I* in supplementary material?) where I report stats for the whole dataset, including the weak data I used in refinement. In both tables report CC1/2 and Rmeas. This way, I don't redefine the (mostly) conventional usage of resolution, my Table I can be compared to precedent, I report stats for all the data and for the model against all data, and I take advantage of the information in the weak data during refinement. Thoughts? Douglas ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` Douglas L. Theobald Assistant Professor Department of Biochemistry Brandeis University Waltham, MA 02454-9110 dtheob...@brandeis.edu http://theobald.brandeis.edu/ ^\ /` /^. / /\ / / /`/ / . /` / / ' ' ' -- Alan Cheung Gene Center Ludwig-Maximilians-University Feodor-Lynen-Str. 25 81377 Munich Germany Phone: +49-89-2180-76845 Fax: +49-89-2180-76999 E-mail: che...@lmb.uni-muenchen.de
Re: [ccp4bb] P4132 vs. F23
Space groups F23 and P4132 are not subgroups of each other (without invoking pseudotranslational symmetry) so they cannot be related by twinning. The end of theoretical analysis. Zbyszek Otwinowski You need to say what the cell dimensions are for these 2 options.. Eleanor On 29 May 2012 15:59, Andrey Lebedev andrey.lebe...@stfc.ac.uk wrote: Hi Mike. I would be more careful about incorrect space group. Yes, sometimes auto-indexing gives strange results. However, in your case two sets of crystals differ by two factors, diffraction quality and space group. Therefore it seems more likely that you have two crystal forms. Could you please send me log-files from pointless or ctruncate? Then I would be able to say something more definite. Regards Andrey On 28 May 2012, at 08:46, Mike John wrote: Hi, All, we got many datasets from crystals of our protein. when the crystal has high quality, it will be indexed as F23 (correct space group). While diffracting poorer, it will be indexed as incorrect space group P4132. The crystals are twined. My question is: Given twinning, why the correct space group F23 will be indexed as an incorrect space group P4132 for most of our crystals (most crystals has poorer quality)? Theoretical analysis of twin operator and symmetry operator to connect F23 and P4132 would be highly appreciated. Thanks Mike Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] How to evaluate Fourier transform ripples
The question about Fourier transformation ripples has a straightforward answer in a fairly typical situation: A) data are collected to the resolution limit of diffraction, B) phases are uniform in quality across the resolution range, which is equivalent to R-free being uniform with respect to resolution within a factor of 2 or so, C) maps are not sharpened. The ripples originate from not including unobserved structure factors. The intensity of diffraction decreases rapidly past the measurability limit, so, in the above situation, the unobserved diffraction contributes very little. Consequently, the answer is that typically one should not see ripples. Ripples should not be confused with the effect of electron density maps being smoothed by vibrations and other forms of disorder. Zbyszek Otwinowski Dear All, Hi. I was asked in a manuscript revision to discuss about the possible effects of Fourier transformation ripples on the crystallographic results. Specifically, the reviewers question whether ripples may affect on the electron density around heavy metal center which has a Mo-S-As connection. From which angle or in which way this problem should be addressed most convincingly ? Thank you for any suggestion.Best,Conan Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] Y-Chi2 running out of chart
The two most likely possibilities are: 1. Beam position changed somewhat after the repair and the site file was not updated with the new position. This could result in misindexing of the diffraction pattern with poor positional agreement (Chi2) as a consequence. The diagnosis of misindexing is very simple, as it will not produce acceptable merging statistics even in P1 space group. The correction is also simple: by updating the site file with the correct beam position. 2. A non-ideal crystal with a complex spot shape in its diffraction pattern. This could result, for example, from uneven cooling rates and variability in the crystal lattice. Merging statistics should be acceptable, however they may not be perfect. Better cryo-cooling is likely to help. Zbyszek Otwinowski Dear Colleagues, I'm collecting a dataset on our recently repaired Rigaku home source. Crystal diffracts to 2.2A. Indexing seems to be all fine. However, during integration, I realize Y-Chi2 is increasing constantly (from 2 to 4.5, almost linear) within 60 degree collection, whereas X-Chi2 stays the same. An image is attached. There are still another 60 degree to go. Although the prediction fits the images well so far, I'm afraid the Y-Chi2 will eventually run out of the chart. My question is could it be related to any hardware malfunctioning, i.e., goniometer, image plates, etc, which may be a side effect of the recent major repair? Or what else it can be? Thanks, Bing Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353 Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] Change cell parameter
you probably need to reindex your data. h - h k - -k l - -l by using the command hkl matrix 1 0 0 0 -1 0 0 0 -1 in scalepack In HKL2000 you should use reindex menu or data set macro (not the overall scaling macro). Dataset macro exists only in the newest version of HKL2000. The reindexing will change the beta angle automatically. Dear all, I have a P2 derivative dataset with beta=89.6. I try to change the beta to 90.4 to be consistent with the native dataset. Should I do sth with the HKL, like applying a matrix? Thanks a million! Best, Zhiyi Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
The meaning of B-factor is the (scaled) sum of all positional uncertainties, and not just its one contributor, the Atomic Displacement Parameter that describes the relative displacement of an atom in the crystal lattice by a Gaussian function. That meaning (the sum of all contributions) comes from the procedure that calculates the B-factor in all PDB X-ray deposits, and not from an arbitrary decision by a committee. All programs that refine B-factors calculate an estimate of positional uncertainty, where contributors can be both Gaussian and non-Gaussian. For a non-Gaussian contributor, e.g. multiple occupancy, the exact numerical contribution is rather a complex function, but conceptually it is still an uncertainty estimate. Given the resolution of the typical data, we do not have a procedure to decouple Gaussian and non-Gaussian contributors, so we have to live with the B-factor being defined by the refinement procedure. However, we should still improve the estimates of the B-factor, e.g. by changing the restraints. In my experience, the Refmac's default restraints on B-factors in side chains are too tight and I adjust them. Still, my preference would be to have harmonic restraints on U (square root of B) rather than on Bs themselves. It is not we who cram too many meanings on the B-factor, it is the quite fundamental limitation of crystallographic refinement. Zbyszek Otwinowski The fundamental problem remains: we're cramming too many meanings into one number [B factor]. This the PDB could indeed solve, by giving us another column. (He said airily, blithely launching a totally new flame war.) phx.
[ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
The B-factor in crystallography represents the convolution (sum) of two types of uncertainties about the atom (electron cloud) position: 1) dispersion of atom positions in crystal lattice 2) uncertainty of the experimenter's knowledge about the atom position. In general, uncertainty needs not to be described by Gaussian function. However, communicating uncertainty using the second moment of its distribution is a widely accepted practice, with frequently implied meaning that it corresponds to a Gaussian probability function. B-factor is simply a scaled (by 8 times pi squared) second moment of uncertainty distribution. In the previous, long thread, confusion was generated by the additional assumption that B-factor also corresponds to a Gaussian probability distribution and not just to a second moment of any probability distribution. Crystallographic literature often implies the Gaussian shape, so there is some justification for such an interpretation, where the more complex probability distribution is represented by the sum of displaced Gaussians, where the area under each Gaussian component corresponds to the occupancy of an alternative conformation. For data with a typical resolution for macromolecular crystallography, such multi-Gaussian description of the atom position's uncertainty is not practical, as it would lead to instability in the refinement and/or overfitting. Due to this, a simplified description of the atom's position uncertainty by just the second moment of probability distribution is the right approach. For this reason, the PDB format is highly suitable for the description of positional uncertainties, the only difference with other fields being the unusual form of squaring and then scaling up the standard uncertainty. As this calculation can be easily inverted, there is no loss of information. However, in teaching one should probably stress more this unusual form of presenting the standard deviation. A separate issue is the use of restraints on B-factor values, a subject that probably needs a longer discussion. With respect to the previous thread, representing poorly-ordered (so called 'disordered') side chains by the most likely conformer with appropriately high B-factors is fully justifiable, and currently is probably the best solution to a difficult problem. Zbyszek Otwinowski - they all know what B is and how to look for regions of high B (with, say, pymol) and they know not to make firm conclusions about H-bonds to flaming red side chains. But this knowledge may be quite wrong. If the flaming red really indicates large vibrational motion then yes, one whould not bet on stable H-bonds. But if the flaming red indicates that a well-ordered sidechain was incorrectly modeled at full occupancy when in fact it is only present at half-occupancy then no, the H-bond could be strong but only present in that half-occupancy conformation. One presumes that the other half-occupancy location (perhaps missing from the model) would have its own H-bonding network. I beg to differ. If a side chain has 2 or more positions, one should be a bit careful about making firm conclusions based on only one of those, even if it isn't clear exactly why one should use caution. Also, isn't the isotropic B we fit at medium resolution more of a spherical cow approximation to physical reality anyway? Phoebe Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
Dale Tronrud wrote: While what you say here is quite true and is useful for us to remember, your list is quite short. I can add another 3) The systematic error introduced by assuming full occupancy for all sites. You are right that structural heterogeneity is an additional factor. Se-Met expression is one of the examples where the Se-Met residue is often not fully incorporated, and therefore its side chains have mixed with Met composition. Obviously, solvent molecules may have partial occupancies. Also, in heavily exposed crystals chemical reactions result in loss of the functional groups (e.g. by decarboxylation). However, in most cases even if side chains have multiple conformations their total occupancy is 1.0. There are, of course, many other factors that we don't account for that our refinement programs tend to dump into the B factors. The definition of that number in the PDB file, as listed in the mmCIF dictionary, only includes your first factor -- http://mmcif.rcsb.org/dictionaries/mmcif_std.dic/Items/_atom_site.B_iso_or_equiv.html and these numbers are routinely interpreted as though that definition is the law. Certainly the whole basis of TLS refinement is that the B factors are really Atomic Displacement Parameters. In addition the stereochemical restraints on B factors are derived from the assumption that these parameters are ADPs. Convoluting all these other factors with the ADPs causes serious problems for those who analyze B factors as measures of motion. The fact that current refinement programs mix all these factors with the ADP for an atom to produce a vaguely defined B factor should be considered a flaw to be corrected and not an opportunity to pile even more factors into this field in the PDB file. B-factors describe overall uncertainty of the current model. Refinement programs, which do not introduce or remove parts of the model (e.g. are not able to add additional conformations) intrinsically pile up all uncertainties into B-factors. Solutions, which you would like to see implemented, require a model-building like approach. The test of the success of such approach would be a substantial decrease of R-free values. If anybody can show it, it would be great. Zbyszek Dale Tronrud On 3/31/2011 9:06 AM, Zbyszek Otwinowski wrote: The B-factor in crystallography represents the convolution (sum) of two types of uncertainties about the atom (electron cloud) position: 1) dispersion of atom positions in crystal lattice 2) uncertainty of the experimenter's knowledge about the atom position. In general, uncertainty needs not to be described by Gaussian function. However, communicating uncertainty using the second moment of its distribution is a widely accepted practice, with frequently implied meaning that it corresponds to a Gaussian probability function. B-factor is simply a scaled (by 8 times pi squared) second moment of uncertainty distribution. In the previous, long thread, confusion was generated by the additional assumption that B-factor also corresponds to a Gaussian probability distribution and not just to a second moment of any probability distribution. Crystallographic literature often implies the Gaussian shape, so there is some justification for such an interpretation, where the more complex probability distribution is represented by the sum of displaced Gaussians, where the area under each Gaussian component corresponds to the occupancy of an alternative conformation. For data with a typical resolution for macromolecular crystallography, such multi-Gaussian description of the atom position's uncertainty is not practical, as it would lead to instability in the refinement and/or overfitting. Due to this, a simplified description of the atom's position uncertainty by just the second moment of probability distribution is the right approach. For this reason, the PDB format is highly suitable for the description of positional uncertainties, the only difference with other fields being the unusual form of squaring and then scaling up the standard uncertainty. As this calculation can be easily inverted, there is no loss of information. However, in teaching one should probably stress more this unusual form of presenting the standard deviation. A separate issue is the use of restraints on B-factor values, a subject that probably needs a longer discussion. With respect to the previous thread, representing poorly-ordered (so called 'disordered') side chains by the most likely conformer with appropriately high B-factors is fully justifiable, and currently is probably the best solution to a difficult problem. Zbyszek Otwinowski - they all know what B is and how to look for regions of high B (with, say, pymol) and they know not to make firm conclusions about H-bonds to flaming red side chains. But this knowledge may be quite wrong. If the flaming red really indicates large vibrational motion then yes, one whould not bet
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
Regarding the closing statement about the best solution to poorly ordered side chains: I described in the previous e-mail the probabilistic interpretation of B-factors. In the case of very high uncertainty = poorly ordered side chains, I prefer to deposit the conformer representing maximum a posteriori, even if it does not represent all possible conformations. Maximum a posteriori will have significant contribution from the most probable conformation of side chain (prior knowledge) and should not conflict with likelihood (electron density map). Thus, in practice I model the most probable conformation as long as it it in even very weak electron density, does not overlap significantly with negative difference electron density and do not clash with other residues. As a user of PDB files I much prefer the simplest and the most informative representation of the result. Removing parts of side chains that carry charges, as already mentioned, is not particularly helpful for the downstream uses. NMR-like deposits are not among my favorites, either. Having multiple conformations with low occupancies increases potential for a confusion, while benefits are not clear to me. Zbyszek Frank von Delft wrote: This is a lovely summary, and we should make our students read it. - But I'm afraid I do not see how it supports the closing statement in the last paragraph... phx. On 31/03/2011 17:06, Zbyszek Otwinowski wrote: The B-factor in crystallography represents the convolution (sum) of two types of uncertainties about the atom (electron cloud) position: 1) dispersion of atom positions in crystal lattice 2) uncertainty of the experimenter's knowledge about the atom position. In general, uncertainty needs not to be described by Gaussian function. However, communicating uncertainty using the second moment of its distribution is a widely accepted practice, with frequently implied meaning that it corresponds to a Gaussian probability function. B-factor is simply a scaled (by 8 times pi squared) second moment of uncertainty distribution. In the previous, long thread, confusion was generated by the additional assumption that B-factor also corresponds to a Gaussian probability distribution and not just to a second moment of any probability distribution. Crystallographic literature often implies the Gaussian shape, so there is some justification for such an interpretation, where the more complex probability distribution is represented by the sum of displaced Gaussians, where the area under each Gaussian component corresponds to the occupancy of an alternative conformation. For data with a typical resolution for macromolecular crystallography, such multi-Gaussian description of the atom position's uncertainty is not practical, as it would lead to instability in the refinement and/or overfitting. Due to this, a simplified description of the atom's position uncertainty by just the second moment of probability distribution is the right approach. For this reason, the PDB format is highly suitable for the description of positional uncertainties, the only difference with other fields being the unusual form of squaring and then scaling up the standard uncertainty. As this calculation can be easily inverted, there is no loss of information. However, in teaching one should probably stress more this unusual form of presenting the standard deviation. A separate issue is the use of restraints on B-factor values, a subject that probably needs a longer discussion. With respect to the previous thread, representing poorly-ordered (so called 'disordered') side chains by the most likely conformer with appropriately high B-factors is fully justifiable, and currently is probably the best solution to a difficult problem. Zbyszek Otwinowski - they all know what B is and how to look for regions of high B (with, say, pymol) and they know not to make firm conclusions about H-bonds to flaming red side chains. But this knowledge may be quite wrong. If the flaming red really indicates large vibrational motion then yes, one whould not bet on stable H-bonds. But if the flaming red indicates that a well-ordered sidechain was incorrectly modeled at full occupancy when in fact it is only present at half-occupancy then no, the H-bond could be strong but only present in that half-occupancy conformation. One presumes that the other half-occupancy location (perhaps missing from the model) would have its own H-bonding network. I beg to differ. If a side chain has 2 or more positions, one should be a bit careful about making firm conclusions based on only one of those, even if it isn't clear exactly why one should use caution. Also, isn't the isotropic B we fit at medium resolution more of a spherical cow approximation to physical reality anyway? Phoebe Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353 -- Zbyszek
Re: [ccp4bb] data processing deviations chisq
The most likely explanation is that you have a cracked crystal or your crystal has split from radiation damage around the frame 70. If you had a cracked crystal from the start, the spot overlap between crystals would change when the sample was rotated. In such cases, it may happen that the program starts refining a different subgroup of crystals as the one that defines crystal parameters. If these crystals are isomorphous to each other, the scaling can correct for such a variable exposure/integration. If the crystals are not isomorphous, which sometimes happens, you have a problem. In such a case it would probably be better to restrict the scaling to the initial 70 frames. Sometimes it helps to reprocess the data with different spot integration size. There are two opposing strategies that could be beneficial: 1) reduce the spot integration size to narrow down the integration to a single crystal 2) increase the spot integration size to integrate diffraction from a group of crystals with similar orientation. If these are uniformly integrated, the phasing signal should be still preserved. If your crystal has split from radiation damage, the strategy 2 may help, but frequently the cracking induced by radiation damage is a very bad sign (crystal lattice has changed so the crystal is no longer isomorphous with its initial state). Scaling provides the ultimate diagnostic of the problem's seriousness. Statistics from integration that you provided are secondary, but may help in pinpointing the source of the problem. One can disregard them if the scaling is good. The presence of iodine creates additional factors. It increases X-ray absorption and so the radiation damage, but iodine is also a quencher of radicals and tends to reduce structural changes induced by radiation. Hope that helps, Zbyszek Otwinowski Hi all, I have collected one iodine soaked data in our home source, and processing the data using HKL2000. Exposure time per frame is 5min/1 degree. While processing I have noticed that the Chisq values, cell parameters and rotation change Vs frame are deviating like anything. Please find the picasa link to see the carves. I will be benefited if anyone kindly suggest me the possible cause of these deviations. https://picasaweb.google.com/118341875228875389610/Mar252011?authkey=Gv1sRgCJXe6-ncmt6KmwE# Thank you all for your suggestions, Sincerely, Debajyoti Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] Merging data to increase multiplicity
I concur with Kay, particularly with point d) and its consequences. Sometimes it is obvious which result is better, often it is not. For example, one of the hkl users was testing new options and found that all the statistics (including refinement r-free) were worse, but the experimental and refinement maps were much better. Myself I am testing crystallographic programs written by others. For easy cases they typically produce equivalent results, and for borderline cases, they tend to be sensitive to the input parameters, and sometimes one program works better, and on other data, another works better. This even applies to programs written by the same person (e.g. DM and Parrot), particularly when adjusting input parameters. The problems in real life are so diverse, that it is not clear what would be a representative set to test programs and make general conclusions. Zbyszek Otwinowski Am 20:59, schrieb Van Den Berg, Bert: I have heard this before. I’m wondering though, does anybody know of a systematic study where different data processing programs are compared with real-life, non-lysozyme data? Bert Bert, some time ago I tried to start something to this effect - take a look at the Quality Control article in XDSwiki. (http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Quality_Control). But it hasn't worked out, i.e. nobody participated (so far). Possible reasons include: a) it is considered politically incorrect (many years ago I wrote about a comparison that I did ... the reactions from a few people were rather harsh) b) for reasons un-intelligible to me, people do not like to make their raw data public (even if I ask directly) c) it does take time to do and document d) it's difficult to agree on the right methodology e) it's a question that seems to interest only specialists f) there's probably not a single answer g) the programs are being constantly improved Concerning the last point, a wiki seems to be a good place to collect the results (a table can be used to follow progress in a program, but also to see the differences between programs). But that brings me to my last point - a wiki article does not count as a paper. best, Kay Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] Problem with finding of spots
The default mode of autoindexing in HKL2000 and denzo is to search for unit cell lengths producing spots that can be resolved at the data collection distance and with the specified spot size (it can be changed in HKL2000 interface). If the unit cell of your crystal is significantly longer, the program will not find it. Considering that your spots are not well resolved, it is a quite likely possibility. Your diffraction extends to less than 3A, so pushing detector back will not cause any loss of the measured diffraction. The logic of limiting the unit cell length in autoindexing is that indexing will fail when data cannot be integrated due to spot overlap resulting from a long unit cell. Another possibility is that your crystal has some type of severe packing disorder along the long axis. In such a case, there is probably no point in collecting a dataset. Autoindexing in HKL2000 is a multi-step process. Not all the found peaks you have shown in your diffraction image will be used for autoindexing, as they have also to pass the signal-to-noise cutoff in denzo. The ones accepted for autoindexing are shown in a subsequent window in green color. You can also use a resolution limit to eliminate peaks at higher resolution during autoindexing and then extend the resolution during the refinement. Sometimes it helps for very mosaic crystals. Zbyszek Otwinowski Dear colleagues, I am working on one dataset that is hard to process. The data are about 3A of resolution. As we are not able to reproduce the experiment again, I have to use this one, collected in a dirty way. The problem starts immediately with finding of spots. I have tried HKL2000, XDS, D*trek, ipmosflm, imosflm, but none of them gave a good read-out of the images. All the programs find some spots in wrong positions and the real spots are not covered. Here is an example: http://kolda.webz.cz/image-predictions.jpg The data were collected in-house, Saturn 944++ CCD, and all the necessary information should be in the header properly. I checked the distance, other parameters, but the problem is with finding of correct or real spots on the image. This should be even header-independent, should not? All the programs fail (or even crash) in this routine. Does anyone have any suggestion, please? Btw, we have several structures in the PDB from this experimental setup. This is the first problem I have met. Many thanks for any response. Petr -- Petr Kolenko petr.kole...@biochemtech.uni-halle.de http://kolda.webz.cz Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353 Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] What makes the difference between 2 composite omit maps?
I am not aware of this point being explicitly made. Maybe somebody else could point to the relevant reference? However, the logic here is very simple: (1) Take a model and generate from it Fc (2) Calculate map with minimal rms error m*Fo*exp(i*phiCalc) (3) This map is biased with respect to errors in the model (4) To avoid this bias one can subtract a part of the model (Fc*coefficient) from the above map - the coefficient is chosen so electron density in the resulting map does not change at the point where we added or subtracted an atom; - conceptually the above procedure when we subtract an atom is the composite omit map - does not change means here: within first order approximation, we ignore second order effects - for sigmaA-weighted map this coefficient is D/2 (5) The subtraction gives: (m*Fo-D/2*Fc)exp(i*phiCalc) (6) After multiplication by 2, we get (2m*Fo-DFc)exp(i*phiCalc) Hailiang Zhang wrote: Thanks! Can you refer me some documents about your following statements: derivation of sigmaa-weighted 2mFo-DFc formula is by calculating Fourier coefficients of the following map: Rescaled composite omit map, where minimal structural element (of the size about the resolution element) is being omitted and the starting point is the map with coefficients m*Fo*exp(i*phiCalc) It seems the above was not involved in Read's publications about SIGMAA. Thanks again! Hailiang sigmaa-weighted 2mFo-DFc is the _COMPOSIT_OMIT_ map. There is no point in calculating omit map of an omit map A brief explanation: derivation of sigmaa-weighted 2mFo-DFc formula is by calculating Fourier coefficients of the following map: Rescaled composite omit map, where minimal structural element (of the size about the resolution element) is being omitted and the starting point is the map with coefficients m*Fo*exp(i*phiCalc) BTW, composite omit map of a map with coefficients Fo*exp(i*phiCalc) is simply Fo-1/2Fc map that after factor of 2 scaling becomes 2Fo-Fc map Hi, I want to calculate the sigmaa-weighted 2mFo-DFc composite omit map, and tried the following 2 scripts: (1) ./omit hklin ${f}.mtz mapout ${f}.map EOF LABI FP=mFo FC=DFC PHI=PHIC RESO 29.50 3.22 SCAL 2.0 -1.0 EOF (2) ./omit hklin ${f}.mtz mapout ${f}.map EOF LABI FP=FWT FC=FC PHI=PHIC RESO 29.50 3.22 SCAL 1.0 0.0 EOF The output maps are just different, and I wonder why. I am also more concerned about which one is more appropriate for the sigmaa-weighted 2mFo-DFc composite omit map. (mFo is what I generated from the SIGMAA output) Thanks for any suggestions! Best Regards, Hailiang Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353 Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353 -- Zbyszek Otwinowski UT Southwestern Medical Center 5323 Harry Hines Blvd., Dallas, TX 75390-8816 (214) 645 6385 (phone) (214) 645 6353 (fax) zbys...@work.swmed.edu
Re: [ccp4bb] What makes the difference between 2 composite omit maps?
sigmaa-weighted 2mFo-DFc is the _COMPOSIT_OMIT_ map. There is no point in calculating omit map of an omit map A brief explanation: derivation of sigmaa-weighted 2mFo-DFc formula is by calculating Fourier coefficients of the following map: Rescaled composite omit map, where minimal structural element (of the size about the resolution element) is being omitted and the starting point is the map with coefficients m*Fo*exp(i*phiCalc) BTW, composite omit map of a map with coefficients Fo*exp(i*phiCalc) is simply Fo-1/2Fc map that after factor of 2 scaling becomes 2Fo-Fc map Hi, I want to calculate the sigmaa-weighted 2mFo-DFc composite omit map, and tried the following 2 scripts: (1) ./omit hklin ${f}.mtz mapout ${f}.map EOF LABI FP=mFo FC=DFC PHI=PHIC RESO 29.50 3.22 SCAL 2.0 -1.0 EOF (2) ./omit hklin ${f}.mtz mapout ${f}.map EOF LABI FP=FWT FC=FC PHI=PHIC RESO 29.50 3.22 SCAL 1.0 0.0 EOF The output maps are just different, and I wonder why. I am also more concerned about which one is more appropriate for the sigmaa-weighted 2mFo-DFc composite omit map. (mFo is what I generated from the SIGMAA output) Thanks for any suggestions! Best Regards, Hailiang Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353 Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] Calculating R-merge between 2 mtz files.
There is an answer that requires using non-ccp4 program: export intensity columns in scalepack format, create separate file for each dataset. Merge files in scalepack, it can read its own output, so provide them as input files. Jason Porta wrote: Hi everybody, I would like to take two mtz files (which are very similar) and calculate the R-merge between them. I tried looking into CCP4 and Phenix, but could not find a direct path. Does anybody know how I can do this R-merge calculation? Best regards, Jason Porta -- Zbyszek Otwinowski UT Southwestern Medical Center 5323 Harry Hines Blvd., Dallas, TX 75390-8816 (214) 645 6385 (phone) (214) 645 6353 (fax) zbys...@work.swmed.edu
Re: [ccp4bb] data reduction
Dear Eleanor, Even if Denzo, Scalepack and HKL suite have been extensively developed, their file formats have not changed in many years. However, there are intermediate programs between Scalepack and structure solution ones that can potentially be a source of problem. For example, ctruncate for some time was rejecting centrosymmetric reflections when fed Scalepack output. In this case, the error message suggests that the problem may be related to asymmetric unit transformations, but, as stated earlier, Scalepack has not changed the asymmetric unit it outputs. So I doubt the problem lies in Scalepack as such. Zbyszek Otwinowski Can you send a bit of your scalepack unmerged data - that would allow us to check format and pointless behavior.. It sounds a bit like a scalepack problem though.. Eleanor Alexandra Deaconescu wrote: Dear all: I am trying to solve a structure from apparently a hexagonal crystal. I indexed and scaled data in P6 in Scalepack (with merging) then used Scalepack2mtz (with ensure unique reflections and add Rfree as well as the truncate procedure), and then attempted to run molecular replacement with Phaser. Now the problem appeared - Phaser immediately quits with the following error message FATAL RUNTIME ERROR: Reflections are not a unique set by symmetry. I do not understand this at all. I also tried running scalepack using the NO MERGE macro as people have indicated earlier on this bb (thank you again!, I also checked the scl.in that is written out and it had the NO MERGE statement), and then tried to run pointless to verify the spacegroup but the program complained the reflections are merged (that is impossible, I checked the number of reflections in the unmerged and merged files and they were different as one would expect). I repeated the procedures several times and I always get the same errors. I can't make any sense of this and I can't move forward. Any ideas? Many thanks, Alex
Re: [ccp4bb] Mosaicity beam divergence
Richard Gillilan wrote: Sorry, I meant to say does divergence add to the reported mosaicity value. If so, do actual mosaicity and divergence add in quadrature to give the reported value? Yes, they do add in quadrature, the total is reported by scalepack and HKL2000 (if postrefinement is used), denzo overestimates it somewhat. -- Zbyszek Otwinowski UT Southwestern Medical Center 5323 Harry Hines Blvd., Dallas, TX 75390-8816 (214) 645 6385 (phone) (214) 645 6353 (fax) zbys...@work.swmed.edu
Re: [ccp4bb] Scalepack error model?
The scalepack log file gives the formula: Summary of reflections intensities and R-factors by intensity bins R linear = SUM ( ABS(I - I)) / SUM (I) R square = SUM ( (I - I) ** 2) / SUM (I ** 2) Chi**2 = SUM ( (I - I) ** 2) / (Error ** 2 * N / (N-1) ) ) which equivalent to the Jay Ponder's formula, with the important addition, that sigma_avg and Iavg represent the average of all _other_ measurements with the same reduced hkl index. All sigmas are calculated from the error model described in the publications. Some of the error model parameters are defined at the moment by user, they can be refined iteratively by experimenter by adjusting parameters in subsequent runs of scalepack, but most of the time it is not required. New version will adjust all these parameters automatically. Zbyszek Otwinowski Richard Gillilan wrote: Thanks Joe and others. Bits and pieces of this story appear in 11.4.8 of International Tables volume F, Borek et. al. Acta Cryst D59 (2003) and the Scalepack manual, but none are complete or have enough detail to follow easily. None of them give the expression for Chi-square for this problem. I found a presentation by Jay Ponder online (for his Bio5325 course) that gives: chi-2 = 1/N sum (I_avg - I_meas)^2/(sigma_avg^2 + sigma_meas^2) where the sum probably runs over all reflections and the I_avg is the average of the appropriate group of symmetry-related reflections. Sigma_avg^2 should be the sigma computed from the error model below (not given in the presentation) I think and sigma_meas is the sigma^2 from the actual symmetry-related reflections. One would then adjust the error parameters below to give chi-square approx unity and this leads to the proper scaling factors for intensities and sigmas. One confusing hitch seems to be that (according to the International Tables F Eqs.(11.4.8.5) and (11.4.8.6)), the error model is also implicitly defined and must be solved iteratively ... though it's hard to see that from the text. Does this sound right? Richard -- Zbyszek Otwinowski UT Southwestern Medical Center 5323 Harry Hines Blvd., Dallas, TX 75390-8816 (214) 645 6385 (phone) (214) 645 6353 (fax) zbys...@work.swmed.edu