[ccp4bb] opening in Zbyszek Otwinowski lab

2017-12-19 Thread Zbyszek Otwinowski

Dear All,

I have an opening in my laboratory at UT Southwestern for someone who would like 
to work on multi-crystal data processing in X-ray crystallography and/or 
low-resolution model building both in X-ray crystallography and cryo-EM. The 
ideal candidate will have good programming skills (e.g. C, C++, Swift), 
excellent understanding of linear algebra, and previous experience with the 
experimental side of any data-intensive field. A knowledge of CUDA programming 
and data mining techniques is a plus.


Please send to z...@work.swmed.edu a 1-page resume containing: a link to your 
Google Scholar profile and other information about you that you consider to be 
the most important. Although the deadline for the application is February 15th, 
I will be interviewing candidates by Skype or in person (for Texas-based 
applicants) as soon as they are identified.


Zbyszek Otwinowski
UT Southwestern Medical Center  
5323 Harry Hines Blvd., Dallas, TX 75390-8816
(214) 645 6385 (phone) (214) 645 6353 (fax)
zbys...@work.swmed.edu


Re: [ccp4bb] ligand binding and crystal form

2016-10-26 Thread Zbyszek Otwinowski
What is the height of non-origin Patterson function peak for your data sets?

C-centered cells:

  216.5   345.8   145.290.090.090.0

and

  147.0   354.3   217.490.090.090.0

are very different; however, they have common subgroup F222 with similar
unit cell parameters. In F222 one can permute the unit cell axes while
preserving symmetry operators.

For these C centered cells to have approximate F222 subgroup they need to
have pseudotranslational symmetry that can be detected by calculating the 
Patterson function. You should have strong reflections with all indexes
even or odd, and other reflections being weaker. What is the spot shape of
these weaker spots?

In case of pseudotranslational symmetry, MR can produce a pseudosolution
related to the correct one by pseudotranslational symmetry vector.
Translate your C2221 solution by {0, 0.5, 0.5} and try refining again.



Zbyszek Otwinowski




> Dear Veronica,
>
> with 1st, 2nd, 3rd map you mean the density for the same dataset after
> three consecutive cycles of building-refining or three different maps from
> three different crystals?
> If it's the first case, it could be fine, it may mean that at each cycle
> you improve the map so you see signal from the different ligand molecules.
> If it's the second case, well, it could be simply an artifact. Are the
> ligands proximal one to another or bind different sites on the protein?
>
> Best
> V.
>
> 2016-10-26 14:32 GMT+02:00 Veronica Fiorentino <
> veronicapfiorent...@gmail.com>:
>
>> Hello all,
>> I just solved a NCS-tetrameric (biological assembly is just a dimer)
>> crystal structures with ligand soak (same plate - same conditions). No
>> density for ligand is observed in the first map. In the 2nd, I have 1
>> ligand bound. In the 3rd, I have 2 ligands bound. Is there any reason
>> for
>> this 'random' behaviour?
>>
>> In addition, I observed just one crystal out of 20 gave a different unit
>> cell. Pointless confirms to me
>> "Best Solution:space group C 2 2 2". REFMAC refinement shows R/Rfree
>> ~
>> 20/25 %
>> Cell from mtz :   216.5   345.8   145.290.090.090.0
>> Space group from mtz: number -   21; name - C 2 2 2
>>
>> All other datasets have:
>> Cell from mtz :   147.0   354.3   217.490.090.090.0
>> Space group from mtz: number -   20; name - C 2 2 21
>>
>> I tried re-processing/refining the C2221 dataset in C222 but R/Rfree
>> stays
>> ~45%. Can I also consider the C2221 dataset as a 'different crystal
>> form'?
>>
>> Am I safe?
>>
>> Thank you all,
>> Veronica
>>
>
>
>
> --
>
> *Valentina Speranzini, PhD*
> European Molecular Biology Laboratory
> Grenoble Outstation
> 71, avenue des Martyrs, CS 90181
> 38042 Grenoble Cedex 9, France
> Web: http://www.embl.fr
> E-mail: vsperanz...@embl.fr
> Tel: +33 (0)4 76 20 7630
>


Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] How to fit BioSAXS shape to the Structure

2015-06-26 Thread Zbyszek Otwinowski
At low resolution, without interpretable anomalous signal, neither SAXS
nor molecular replacement with SAXS model, can distinguish correct from
inverted solution. So inverted model will fit crystal data equally well.

Only phase extension to much higher resolution (e.g. 5A) can help.




 Yes, SAXS has an enantiomer problem - mirror image DAMMIN/F
 reconstructions
 will give the same fit to the raw scattering data, whereas your protein
 structure will only fit one hand.

 SUPCOMB can certainly deal with this problem, as detailed in
 http://www.embl-hamburg.de/biosaxs/manuals/supcomb.html




 [image: David Briggs on about.me]

 David Briggs
 about.me/david_briggs
   http://about.me/david_briggs

 On 26 June 2015 at 12:04, Reza Khayat rkha...@ccny.cuny.edu wrote:

  Hi,

  Follow up question on SAXS. Does SAXS have an enantiomer problem like
 electron microscopy? In other words, does the calculated model possess
 the
 correct handedness or can both handedness of a model fit the scattering
 profile just as well?

  Best wishes,
 Reza

Reza Khayat, PhD
 Assistant Professor
 City College of New York
 85 St. Nicholas Terrace CDI 12308
 New York, NY 10031
 (212) 650-6070
  www.khayatlab.org

  On Jun 26, 2015, at 6:50 AM, David Briggs drdavidcbri...@gmail.com
 wrote:

  SASTBX has an online tool for achieving this:
 http://sastbx.als.lbl.gov/cgi-bin/superpose.html



 [image: David Briggs on about.me]

 David Briggs
  about.me/david_briggs
   http://about.me/david_briggs

 On 26 June 2015 at 11:39, Ashok Nayak ashokgocrac...@gmail.com wrote:

Dear Weifei,

  It can also be done manually in Pymol by changing the mouse mode from
 3
 button viewing to 3 button editing and later moving the envelope onto
 the
 X-ray structure or vice-versa, however the best fit can be achieved in
 SUPCOMB.

  regards
  Ashok Nayak
  CSIR-CDRI, Lucknow
  India







Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] Data reduction

2015-05-27 Thread Zbyszek Otwinowski
These are two different types of completness indicators
Scalepack reports Bragg's law completness, 98.8% of your unique
reflections were in diffracting condition. If you use automatic
corrections only informative reflections are output, for anisotropic
diffraction 64% is reasonable

 Hi All,

 Scalepack output says 98.8% complete data but after converted to .mtz file
 it reduced to 64%.
 I have tried in CCP4 and phenix both.
 How is it possible?

 Ayan



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] X-rays and matter (the particle-wave picture)

2015-05-21 Thread Zbyszek Otwinowski
The answer to your questions depends on the level of understanding of
quantum mechanics. I am sending info where to find the subject discussed
in more details.

Bernhard Rupp's book page 251 necessarily simplifies a rather complex
subject of the photon's interaction with multiple particles. Quantum
mechanical wave function can be considered virtual from the measurement
process point of view, as the photon (a single quantum) appears in the
detector during the measurement process, but not on the way to it.


 the photon's coherence length

The concept of photon's coherence length involves quantum mechanics mixed
state. For introduction see:
http://en.wikipedia.org/wiki/Quantum_state#Mixed_states

 virtual waves
Quantum mechanical wave function is virtual in certain sense. The
Feynman Lectures on Physics Vol 3 covers this subject quite well.

 appears again in some direction
This refers to quantum mechanical wave-particle duality


 Hello Everybody!
 I was trying to make some sense from  Bernhard Rupp's book page 251.

 I will copy the relevant part...

 When photons travel through a crystal, either of two things can happen: (i)
 nothing, which happens over 99% of the time; (ii) the electric field vector
 induces oscillations in all the electrons coherently within* the
photon's coherence length* ranging from a few 1000 Angstroms for X-ray
emission lines to several microns for modern synchrotron sources. At
this point, the
 photon ceases to exist, and we can imagine that the electrons themselves
emanate *virtual waves*, which constructively overlap in certain
directions, and interfere destructively in others. The scattered photon
then *appears again in some direction*, with the probability of that
appearance proportional to the amplitude of the combined, resultant
scattered wave in that particular direction...The sum of all
scattering
 events of independent, single photons then generates the diffraction
pattern.

 I underlined the problematic parts...

 can anyone shed some light on this ..or point me in the right direction?


 Thanks in advance



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] Space group numbers

2014-10-02 Thread Zbyszek Otwinowski
How can it be if you're not even sure what the correct space group is?

Ambiguities may arise in the presence of pseudosymmetry and/or packing
disorders. In some cases, you can determine crystal structure from the
same data in different space groups that do not have subgroup/supergroup
relationship. One of the space groups may produce better results,
something that can be determined quite late into the process.

Similar situation may arise then merging data from multiple nearly
isomorphous crystals that individually may be better describes by
alternative space group symmetries.

Zbyszek Otwinowski

 On 2 October 2014 13:51, Kay Diederichs kay.diederi...@uni-konstanz.de
 wrote:


 I don't see any sticking to initial indexing as worthwhile to worry
 about, since in the first integration, P1 is often used anyway, and it
 is
 quite normal (and easy) to re-index after the intensities become
 available,
 during scaling. Re-indexing from P1 to the true spacegroup often changes
 the cell parameters and their order, and this is sufficiently easy and
 well-documented in the output.


 Far from it: re-indexing would be a huge problem for us and one we wish to
 avoid at all costs.  We had a case where the systematic absences were
 ambiguous (not uncommon!) and for a long time it wasn't clear which of two
 SGs (P21212 or P212121) it was.  So we simply kept our options open and
 assigned the SG in XDS as P222 in all cases.  This of course meant that
 the
 cell was automatically assigned with abc.  We have a LIMS system with an
 Oracle database which keeps track of all processing (including all the
 failed jobs!) and it was a fundamental design feature that all crystals of
 the same crystal form (i.e. same space group  similar cell) were indexed
 the same way relative to a reference dataset (the REFINDEX program ensures
 this, by calculating the correlation coefficient of the intensities for
 all
 possible indexings).

 So crystals may be initially re-indexed from the processed SG (where for
 example 2 axes have similar lengths) to conform with the reference dataset
 (in P222), but then once they are in the database there's no way of
 storing
 a re-re-indexed dataset based on a different space group assignment
 without
 disruption of all previous processing.  We collected datasets from about
 50
 crystals over a 6 month period and stored the data in the database as we
 went along before we had one which gave a Phaser solution (having tried
 all
 8 SG possibilities of course), and that resolved the SG ambiguity without
 reference to systematic absences (it was P212121).  But there was no way
 we
 were going to go back and re-index everything (for what purpose in any
 case?), since it would require deleting all the data from the database,
 re-running all the processing and losing all the logging  tracing info of
 the original processing.  However changing the space group in the MTZ
 header from P222 to P212121 without changing the cell is of course
 trivial.

 I don't see how symmetry trumps geometry can be a universal rule.  How
 can it be if you're not even sure what the correct space group is?  Also
 the IUCr convention in say monoclinic space groups requires that for a and
 c the two shortest non-coplanar axis lengths be chosen which is the same
 as saying that beta should be as close a possible to 90 (but by convention
 90).  This is an eminently sensible and practical convention!  So in one
 case a C2 cell with beta = 132 transforms to I2 with beta = 93.  It is
 important to do this because several programs analyse the anisotropy in
 directions around the reciprocal axes and if the axes are only 48 deg
 apart
 you could easily miss significant anisotropy in the directions
 perpendicular to the reciprocal axes (i.e. parallel to the real axes).  So
 at least in this case it is essential that geometry trumps symmetry.


 this is true; running in all 8 possible primitive orthorhombic space
 groups is a fallback that should save the user, and I don't know why it
 didn't work out in that specific case. Still, personally I find it much
 cleaner to use the space group number and space group symbol from ITC
 together with the proper ordering of cell parameters. I rather like to
 think once about the proper ordering, than to artificially impose abc
 ,
 and additionally having to specify which is the pure rotation (in 18) or
 the screw (in 17). And having to specify one out of  1017 / 2017 / 1018/
 2018/ 3018 is super-ugly because a) there is no way I could remember
 which
 is which, b) they are not in the ITC, c) XDS and maybe other programs do
 not understand them.


 I completely agree that the CCP4 SG numbers are super-ugly: they are only
 there for internal programmer use and should not be made visible to the
 user (I'm sure there are lots of other super-ugly things hiding inside
 software!).  Please use the H-M symbols: a) they're trivial to remember,
 b)
 they are part of the official ITC convention, c) they're designed

Re: [ccp4bb] correlated alternate confs - validation?

2014-07-23 Thread Zbyszek Otwinowski
An additional problem is existence of alternative conformations close to
rotational axis that violate crystal symmetry.

If we want to describe such correlated alternative configurations, we need
to describe also how they transform by such rotational axis.

This problems may also exists also for other packing contacts; however,
for conformations close to rotational axis, symmetry operator cannot
preserve conformer ID, and this issue cannot be avoided.

Zbyszek Otwinowski

I would probably make the two waters alternates of each other.



 Quite possible, but the group definition, i.e. to which alt conf. side
 chain they belong,

 would need to be preserved, too.



 BR





 Cheers,
 Robbie

 Sent from my Windows Phone

   _

 Van: Bernhard Rupp
 Verzonden: 23-7-2014 10:19
 Aan: CCP4BB@JISCMAIL.AC.UK
 Onderwerp: [ccp4bb] correlated alternate confs - validation?

 Hi Fellows,



 something that may eventually become an issue for validation and reporting
 in PDB headers:



 using the Refmac grouped occupancy keyword I was able to form and refine
 various networks of correlated

 alternate conformations - it seems to works really well at least in a 1.6
 and 1.2 A case I tried.

 Both occupancy and B-factors refine to reasonable values as
 expected/guessed from e-density and environment.

 Respect  thanks for implementing this probably underutilized secret.



 This opens a question for validation: Instead of pretty much ignoring any
 atoms below occupancy of 1, one

 can now validate each of the network groups’ geometry and density fit
 separately just as any other

 set of coordinates. I think with increasing data quality, resolution, and
 user education such refinements will become more

 frequent (and make a lot more sense than arbitrarily setting guessed
 independent hard occupancies/Bs

 that are not validated). Maybe some common format for (annotating) such
 correlated occupancy groups might

 eventually become necessary.



 Best, BR



 PS: Simple example shown below: two alternate confs of residue 338 which
 correlate with one

 water atom each in chain B, with corresponding partial occupancy (grp1:
 A338A-B5 ~0.6, grp2: A338B-B16 ~0.4).



 occupancy group id 1 chain A residue 338 alt A

 occupancy group id 1 chain B residue 5

 occupancy group id 2 chain A residue 338 alt B

 occupancy group id 2 chain B residue 16

 occupancy group alts complete 1 2

 . more similar…

 occupancy refine



 AfaIct this does what I want. True?



 

 Bernhard Rupp

 k.-k. Hofkristallamt

 001 (925) 209-7429

 b...@ruppweb.org

 b...@hofkristallamt.org

 http://www.ruppweb.org/

 ---








Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] Protein Crystallography challenges Standard Model precision

2014-07-22 Thread Zbyszek Otwinowski
Error estimates for the unit cell dimensions in macromolecular
crystallography belong to atypical category of uncertainty estimates.

Random error contribution in most cases is below 0.001A, so it can be
neglected. Wavelength calibration error can be also made very small;
however, I do not know how big it is in practice. Goniostat wobble error
is taken into account in Scalepack refinement. Crystal-to-detector
distance is not used in postrefinement/global refinement.

Due to the measurement error being very small, even small variations in
unit cell parameters can be detected within cryocooled crystals. These
variations almost always are _orders_of_magnitude_larger_ than measurement
uncertainty. Current practise is not to investigate the magnitude of the
changes in the unit cell parameters, but when beam smaller than crystal is
used, observing variations as large as 1A is not unusual.

The main question is: what the unit cell uncertainty means? For most
samples I could defend to use values: 0.001A, 0.01A, 0.1A and 1A as
reasonable, depending on particular point of view.

Without defining what the unit cell uncertainty means, publishing its
values is pointless.


Zbyszek Otwinowski



 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi Bernhard,

 A look at the  methods section might give you a clue. Neither XDS nor
 XSCALE create mmCIF - files (you are talking about mmCIF, not CIF -
 subtle, but annoying difference), so that the choice is limited. I
 guess some programmer (rather than a scientist ;-) )used a simple
 printf commmand for a double precision number so the junk is left over
 from the memory region or other noise common to conversions.

 XDS actually prints error estimates for the cell dimensions in
 CORRECT.LP which could be added to the mmCIF file - a cif (sic!) file,
 I believe, requires those, by the way and checkCIF would complain
 about their absence.

 Cheers,
 Tim

 On 07/22/2014 01:01 PM, Bernhard Rupp wrote:
 I am just morbidly curious what program(s) deliver/mutilate/divine
 these cell constants in recent cif files:



 data_r4c69sf

 #

 _audit.revision_id 1_0

 _audit.creation_date   ?

 _audit.update_record   'Initial release'

 #

 _cell.entry_id  4c69

 _cell.length_a  100.152000427

 _cell.length_b  58.3689994812

 _cell.length_c  66.5449981689

 _cell.angle_alpha   90.0

 _cell.angle_beta99.2519989014

 _cell.angle_gamma   90.0

 #



 Maybe a little plausibility check during cif generation  might be
 ok



 Best, BR



 PS: btw, 10^-20 meters (10^5 time smaller than a proton) in fact
 seriously challenges the Standard Model limits..

 


 - 

 Bernhard Rupp

 k.-k. Hofkristallamt

 Crystallographiae Vindicis Militum Ordo

 b...@ruppweb.org

 b...@hofkristallamt.org

 http://www.ruppweb.org/

 ---








 - --
 - --
 Dr Tim Gruene
 Institut fuer anorganische Chemie
 Tammannstr. 4
 D-37077 Goettingen

 GPG Key ID = A46BEE1A

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.12 (GNU/Linux)
 Comment: Using GnuPG with Icedove - http://www.enigmail.net/

 iD8DBQFTzk52UxlJ7aRr7hoRAul8AKCHFz/DAoqR7s0fGUp79xx2QlrfCQCeIiiy
 KXSurhgaQjhguKr9L0/zyVk=
 =vqGC
 -END PGP SIGNATURE-



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] Protein Crystallography challenges Standard Model precision

2014-07-22 Thread Zbyszek Otwinowski
The least-square procedure for unit cell parameter refinement provides very 
precise estimates of uncertainty. Why they are so precise? Because we use many 
thousands of unmerged reflections to determine the precision 1 to 6 parameters 
(unit cell parameters). However, although error propagation through the least 
squares provides precision of about 0.001 A, or better in some cases, this is 
only precision not accuracy, and the precision is calculated typically with 
respect to the unit cell parameters averaged across the exposed volume of a crystal.


In practice, the range of unit cell parameters within a crystal can be quite 
broad, and when we consider accuracy it is not clear, which unit cell parameters 
should be a reference point. Typically, the distribution of unit cell parameters 
in a crystal will not follow Gaussian distribution.
Therefore, the accuracy of unit cell parameters determination is not well 
defined, even when we know the experimental conditions very well and propagate 
experimental uncertainties correctly.


Variability of unit cell parameters can be quite high for data sets from 
different samples. However, description of this variability is typically not 
related to the very high precision of determination of unit cell parameters for 
an individual sample.


Zbyszek


On 07/22/2014 12:33 PM, Tim Gruene wrote:

Dear Zbyszek,

when you optimise a set of parameters against a set of data, I guess you
can also provide their errors. If I understand correctly, this comes
with least-squares-routines. I only pointed out that cell errors are
listed in the XDS output (provided you refine them, of course). I am
sure those errors are well defined.

Best wishes,
Tim

On 07/22/2014 06:53 PM, Zbyszek Otwinowski wrote:

Error estimates for the unit cell dimensions in macromolecular
crystallography belong to atypical category of uncertainty estimates.

Random error contribution in most cases is below 0.001A, so it can be
neglected. Wavelength calibration error can be also made very small;
however, I do not know how big it is in practice. Goniostat wobble error
is taken into account in Scalepack refinement. Crystal-to-detector
distance is not used in postrefinement/global refinement.

Due to the measurement error being very small, even small variations in
unit cell parameters can be detected within cryocooled crystals. These
variations almost always are _orders_of_magnitude_larger_ than measurement
uncertainty. Current practise is not to investigate the magnitude of the
changes in the unit cell parameters, but when beam smaller than crystal is
used, observing variations as large as 1A is not unusual.

The main question is: what the unit cell uncertainty means? For most
samples I could defend to use values: 0.001A, 0.01A, 0.1A and 1A as
reasonable, depending on particular point of view.

Without defining what the unit cell uncertainty means, publishing its
values is pointless.


Zbyszek Otwinowski



Hi Bernhard,

A look at the  methods section might give you a clue. Neither XDS nor
XSCALE create mmCIF - files (you are talking about mmCIF, not CIF -
subtle, but annoying difference), so that the choice is limited. I
guess some programmer (rather than a scientist ;-) )used a simple
printf commmand for a double precision number so the junk is left over
from the memory region or other noise common to conversions.

XDS actually prints error estimates for the cell dimensions in
CORRECT.LP which could be added to the mmCIF file - a cif (sic!) file,
I believe, requires those, by the way and checkCIF would complain
about their absence.

Cheers,
Tim

On 07/22/2014 01:01 PM, Bernhard Rupp wrote:

I am just morbidly curious what program(s) deliver/mutilate/divine
these cell constants in recent cif files:



data_r4c69sf

#

_audit.revision_id 1_0

_audit.creation_date   ?

_audit.update_record   'Initial release'

#

_cell.entry_id  4c69

_cell.length_a  100.152000427

_cell.length_b  58.3689994812

_cell.length_c  66.5449981689

_cell.angle_alpha   90.0

_cell.angle_beta99.2519989014

_cell.angle_gamma   90.0

#



Maybe a little plausibility check during cif generation  might be
ok



Best, BR



PS: btw, 10^-20 meters (10^5 time smaller than a proton) in fact
seriously challenges the Standard Model limits..








Bernhard Rupp

k.-k. Hofkristallamt

Crystallographiae Vindicis Militum Ordo

b...@ruppweb.org

b...@hofkristallamt.org

http://www.ruppweb.org/

---














Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353






--
Zbyszek Otwinowski
UT Southwestern Medical Center  
5323 Harry Hines Blvd., Dallas, TX 75390-8816
(214) 645 6385 (phone) (214) 645 6353 (fax)
zbys...@work.swmed.edu


Re: [ccp4bb] definitions of unique reflections

2014-07-14 Thread Zbyszek Otwinowski
My preference is to use the term 'observed' for reflections whose
intensities have been integrated, and the term 'informative' for those
that satisfy some statistical criteria of being useful for structure
determination.
Programs like Truncate have hidden criteria of rejecting some observed
reflections from the informative group, so this issue has been around for
a long time.
For a typical, properly done data collection, resolution limit is a widely
used criterion of informativity. For anisotropic diffraction, a single
number is definitely not a proper way to define the resolution limit.
So we need something like signal-to-noise ratio cut-off to define a better
equivalent of the resolution limit. The question is what we mean by
signal-to-noise: it can be individual (unique/merged) reflection values (a
wide-spread practice in small molecule crystallography, and for a good
reason) or either signal, noise, or both of them, which are group averages
rather than individual estimates, Personally, I prefer a ratio of an
average signal to an individual uncertainty as a criterion that defines
the informativity limit equivalent to a resolution cut-off.
The second aspect of the issue is what value of the signal-to-noise ratio
(however defined) should be the limiting criterion. The value around 2
represents a limit of what is 'fully' informative, and, as has been
discussed, lower values of signal-to-noise provide some extra information.
Around the ratio of 1, the value of the information becomes minimal.

So for me, there are 2 types of data completeness: one, in terms of
Bragg's condition, which defines if we missed part of reciprocal space
during the experiment; and second, in terms of what is informative for
structure solution. The second type will typically be low for resolution
range close to the limit in the case of anisotropic diffraction. There is,
therefore, nothing wrong in terms of how the experiment was done, if such
completeness is low; on the other hand, the first type can tell us whether
the experiment could be done better. So there are good reasons to report
both types of completeness in the publication and in the deposit, even if
there is no such custom yet.

Zbyszek Otwinowski

 There is some disagreement on terms used to deposit data. We need a
 definition and an algorithm
 for each definition.

 Unique Reflections

 My definition is all the possible reflections out to the high resolution
 reported not related by symmetry.
  Where can I find this? The .mtz contains a list of all HKL calculated to
 the highest resolution. Usually, we
 are not able to measure all these diffraction spots due to limits of the
 detector, mechanical limits, crystal
 orientation, etc.

 'Total reflections'
 The depositions server asks for total reflections. I assume it wants only
 those unique reflections we were able to
 collect, regardless of the sigma cut off. These are called 'observed'. The
 total we use in refinement will be a subset
 of the 'unique observed' that are cut on sigma. However, some
 crystallographers believe that we should not cut
 on sigma since some of the intensities may in fact be zero. Is this a
 question for both the Refmac and Phenix people?

 Please give us some guidance and maybe a reference or two that we can use.


 --
 Kenneth A. Satyshur, M.S.,Ph.D.
 Senior Scientist
 University of Wisconsin
 Madison, Wisconsin 53706
 608-215-5207



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] Help in Cell content analysis

2014-06-16 Thread Zbyszek Otwinowski
If your translational NCS is defined by a vector that does not correspond
to lattice centering, i.e. has numbers different from 0 or 0.5, this is
likely a case of order-disorder. Most such cases can be easily diagnosed
by abnormal patterns in spot shape, e.g. every second reflection has a
non-Bragg streak associated with it.
Apparent dense packing, 18% of the solvent, is likely to arise from random
packing of molecules in alternative positions within the unit cell, where
every second position is occupied. This randomness can be cross-correlated
between cells, and this will produce a diffuse scattering.
An alternative explanation is that you crystallised a proteolitic fragment
of your protein.

Zbyszek Otwinowski

 Dear all
 i have a small query to ask and seek your suggestions:

 I have collected a data for a protein with 324 residues and processed at
 its best in P212121. So Matthews suggest 1 mol in ASU with expected Mol.
 weight of 43 kDa with sovent content of 58% and 2 mol./ASU with 18%
 solvent
 content. However the data suggest possibility of translational NCS so i
 think i should ask for two molecules so that both get corrected for NCS.
 However for 2 mol./ASU, Matthewssuggests a total mol. weight of 52 kDa. So
 how to decide which way to proceed for MR?

 Thanks
 Monica



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] Reprocess data with new resolution cutoff?

2014-05-19 Thread Zbyszek Otwinowski
Reprocessing data to lower resolution only helps if there are ice rings or
other sources of non-desired diffraction that can be eliminated as
contributors to learned profiles in profile fitting. Strong ice
diffraction occurs at 2.28A and 2.68A, so there is no indication that
reprocessing data to lower resolution will change anything other than
overall R-merge and other R-statistics. To calculate these statistics it
is enough to re-merge the data with lower resolution.

Zbyszek Otwinowski

 Hi all,

 This is a basic question and I'm sure the answer is widely known, but
 I'm having trouble finding it.

 I'm working on my first structure.  I have a dataset that I processed
 in XDS with a resolution cutoff of 2.35 A, although the data are
 extremely weak-to-nonexistent at that resolution limit.  After
 successful molecular replacement and initial refinement, I then
 performed paired refinements against this dataset cut to various
 resolutions (2.95 A, 2.85 A, 2.75 A, etc).  Based on the improvement
 in R/Rfree seen between successive pairs, it appears that the data
 should be cut at around 2.55 A.

 Here is my question: as I proceed with refinement (I'm currently using
 Phenix), should I now simply set 2.55 A as the resolution limit in
 Phenix?  Or should I go back to XDS and actually reprocess the data
 with the new limit (2.55 A instead of 2.35 A)?

 Thanks,
 Tom



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] metals disapear

2014-04-30 Thread Zbyszek Otwinowski
My comments:

Such observation is very uncommon for metals involved in catalysis by
proteins. I have seen quite a few such structures involving Mg, Ca, Fe,
Mn, Zn and most of the radiation damage was not at the catalytic metal. In
case of Fe once I noticed slight shift in the position of the Fe ion upon
exposure.
The only metal ion that was significantly affected is Hg, and this was
observed in multiple cases.

We published results of radiation damage as a function of temperature
going down up to 15K. There was some overall reduction of radiation
damage, by about factor of 1.7; however, the most of the impact was away
from the catalytic site. As a result, relative radiation damage was MORE
concentrated at the catalytic site. Metals (Ca and Mn) were not
particularly affected at any temperature.

We observed that nitrate and iodine scavenge radicals help reduce specific
radiation damage. However, increased X-ray absorption by iodine does makes
overall situation worse and impact of nitrate was observed only at
relatively low doses (up to 2 MGy).  Kmetko at all. (2011) were negative
about potential of using scavengers in general, including ascorbic acid.

Data collection wavelength does not matter! It is an urban legend that
shorter wavelength will help. I remember it being debunked two decades
ago, and somehow it is still alive.

Zbyszek Otwinowski

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Dear Dean,

 this is probably a very common observation: X-rays produce reducing
 electrons and as you reduce a metal I imagine it does not like its
 chemical environment as much as it did highly charged.

 Everything you can do to avoid radiation damage should help you
 prevent the ion to disappear:
 - - optimise your strategy to collect a minimal amount of data
 - - add vitamin C
 - - cool below 100K
 - - collect at short wavelength

 When your ion is intended to be used for phasing there are of course
 restraints limiting the choice.

 Regards,
 Tim


 On 04/30/2014 12:33 PM, Dean Derbyshire wrote:
 Hi all, Has anyone experienced catalytic metal ions disappearing
 during data collection ? If so, is there a way of preventing it?
 D.

 Dean Derbyshire Senior Research Scientist
 [cid:image001.jpg@01CF6470.5FA976D0] Box 1086 SE-141 22 Huddinge
 SWEDEN Visit: Lunastigen 7 Direct: +46 8 54683219
 www.medivir.comhttp://www.medivir.com

 --


 This transmission is intended for the person to whom or the entity to
 which it is addressed and may contain information that is privileged,
 confidential and exempt from disclosure under applicable law. If you are
 not the intended recipient, please be notified that any dissemination,
 distribution or copying is strictly prohibited. If you have received
 this transmission in error, please notify us immediately.
 Thank you for your cooperation.


 - --
 - --
 Dr Tim Gruene
 Institut fuer anorganische Chemie
 Tammannstr. 4
 D-37077 Goettingen

 GPG Key ID = A46BEE1A

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.12 (GNU/Linux)
 Comment: Using GnuPG with Icedove - http://www.enigmail.net/

 iD8DBQFTYNSPUxlJ7aRr7hoRAr7WAKCzC7FzqTkcVLILovmIL74OUQlsWQCgg2Yr
 xZgDCvIlf5YEWHLTDLiKcRc=
 =tp4F
 -END PGP SIGNATURE-



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] small molecule crystallography

2014-03-25 Thread Zbyszek Otwinowski
 --
 Dr Tim Gruene
 Institut fuer anorganische Chemie
 Tammannstr. 4
 D-37077 Goettingen
 GPG Key ID = A46BEE1A
 --
 Prof. George M. Sheldrick FRS
 Dept. Structural Chemistry,
 University of Goettingen,
 Tammannstr. 4,
 D37077 Goettingen, Germany
 Tel. +49-551-39-33021 or -33068
 Fax. +49-551-39-22582


Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


[ccp4bb] Error in ccp4lib?

2014-03-24 Thread Zbyszek Otwinowski
I am reading an external file, which contains phases and ABCDs in the space 
group P43212. My file has an asymmetric unit with k= h.
Since CCP4 uses a different asymmetric unit with h=k, this requires phase and 
ABCD coefficients transformation. The transformation seems to be correct for 
reflections with initial h not equal to zero, but gives wrong result for  0 k l 
reflections.


Zbyszek Otwinowski


Zbyszek Otwinowski
UT Southwestern Medical Center  
5323 Harry Hines Blvd., Dallas, TX 75390-8816
(214) 645 6385 (phone) (214) 645 6353 (fax)
zbys...@work.swmed.edu


Re: [ccp4bb] Error in ccp4lib?

2014-03-24 Thread Zbyszek Otwinowski
Centrosymmetric reflections typically have C=0 and D=0, although non-zero values 
should not matter, as they do not modify phase probabilities for centrosymmetric 
reflections.
Somehow, entering non-zero values for C and D for centrosymmetric reflection 
creates strange results during transformation of phase. Definitively a bug in 
ccp4lib, however only triggered by non-standard input. In practice, probably 
does not matter much.





On 03/24/2014 06:11 PM, Eleanor Dodson wrote:

You don't say how you are doing the transformation?
I would simply input the file to cad
cad hklin1 thisfile.mtz hklout  newfile.mtz
labi file 1 allin
end

I think (and hope) that the data and phases will be converted correctly to the 
CCP4 asymmetric unit.
Eleanor


On 25 Mar 2014, at 09:16, Zbyszek Otwinowski wrote:


I am reading an external file, which contains phases and ABCDs in the space group 
P43212. My file has an asymmetric unit with k= h.
Since CCP4 uses a different asymmetric unit with h=k, this requires phase and 
ABCD coefficients transformation. The transformation seems to be correct for 
reflections with initial h not equal to zero, but gives wrong result for  0 k l 
reflections.

Zbyszek Otwinowski


Zbyszek Otwinowski
UT Southwestern Medical Center  
5323 Harry Hines Blvd., Dallas, TX 75390-8816
(214) 645 6385 (phone) (214) 645 6353 (fax)
zbys...@work.swmed.edu





--
Zbyszek Otwinowski
UT Southwestern Medical Center  
5323 Harry Hines Blvd., Dallas, TX 75390-8816
(214) 645 6385 (phone) (214) 645 6353 (fax)
zbys...@work.swmed.edu


Re: [ccp4bb] twinning problem ?

2014-03-13 Thread Zbyszek Otwinowski

On 03/13/2014 10:55 AM, Keller, Jacob wrote:

Unless you are interested in finding curious objects, what would you do with 
protein quasicrystal? The practices of macromolecular crystallography is about 
determining 3-dimensional structure of objects being crystallized. Protein 
quasicrystal are really unlikely to diffract to high enough resolution, and 
even ignoring all other practical aspects, like writing programs to solve such 
a structure, chances of building an atomic model are really slim.


Right, if crystallography is seen as purely a tool for biology I agree. As for 
curious objects, I think almost all profound breakthroughs come from 
unadulterated curiosity and not desire for some practical end. Not sure why a 
priori this should be so, but just consider your favorite scientific 
breakthrough and whether the scientist set out to make the discovery or not. 
Some are, but most are not, I think. Maybe aperiodic protein crystals have some 
important function in biology somewhere, or have unforeseen materials science 
properties, analogous to silk or something.


This is easy to test by analyzing diffraction patterns of individual crystals.

In practice, the dominant contribution to angular broadening of
diffraction peaks is angular disorder of microdomains, particularly in 
cryo-cooled crystals.
However, exceptions do happen, but these rare situations need to be
handled on case by case basis.
The interpretation of the data presented in this article is that variation in 
unit cell between microcrystals induce their spatial misalignment. The data do 
not show variation of unit cell within individual microscrystalline domains.
Tetragonal lysozyme can adopt quite a few variations of the crystal lattice 
during cryocooling. Depending on the conditions used, resulting mosaicity can 
vary from 0.1 degree (even for 1mm size crystal) to over 1. degree.

Consequently, measured structure factors from a group of tetragonal lysozyme 
crystal can be quite reproducible, or not. As a test crystal, it should be 
handled with care.
1 degree mosaicity is not an impediment to high quality measurements. However, 
high mosaicity tends to correlate with presence of phase transitions during 
cryo-cooling. If such transition happen during cryo-cooling, crystals of the 
same protein, even from the same drop, may vary quite a lot in terms of 
structure factors. Additionally, even similar values of unit cell parameters 
are not guarantee of isomorphism between crystals.

So I think you are saying that tetragonal lysozyme is an atypical case, and that normally 
the main contributor to the fitted parameter mosaicity is the phenomenon of 
microdomains shifted slightly in orientation. Maybe we can get the author to repeat the 
study for the other usual-suspect protein crystals to find out the truth, but the score 
currently seems to be 1-0 in favor of cell parameter shifts versus microcrystal 
orientation...



No, I claim that the particular crystal studied by Colin Nave (Acta Cryst. 1998, 
D54: 848) is atypical case. I processed myself hundreds of tetragonal lysozyme 
data sets acquired on crystals grown and mounted by various people, so I believe 
that my experience defines better a typical case.


The second reference, nicely provided by Colin, does not make the conclusion 
that dominant imperfection appeared to be a variation in unit-cell dimensions 
in the crystal, but rather states that The analysis further suggests that LT 
disorder is governed by variability inherent in the cooling process combined 
with the overall history of the crystal.


As you can see on the figure 5A in Juers at al, 2007, the mosaicity is a 
dominant component of the reflection width for resolution higher than 8A.

Only for very low resolutions one can see the effect of unit cell changes.

What is important is that the crystal analyzed had a very low mosaicity: less 
than 0.02 degree before cryo-cooling and less than 0.1 degree after 
cryo-cooling. The mosacity after cryo-cooling is definitely below typical values.


One has to remember that not only unit cell parameters are different for 
different microdomains, but also their structure factors will vary and can 
change quite a lot. Cryo-cooled crystals definitely can have high degree of 
internal non-isomorphism resulting from this effect.


Zbyszek

--
Zbyszek Otwinowski
UT Southwestern Medical Center  
5323 Harry Hines Blvd., Dallas, TX 75390-8816
(214) 645 6385 (phone) (214) 645 6353 (fax)
zbys...@work.swmed.edu


Re: [ccp4bb] twinning problem ?

2014-03-12 Thread Zbyszek Otwinowski
How to approach the analysis of such a problem:

For any sample, crystalline or not, a generally valid description of
diffraction intensity is it being a Fourier transform of electron density
autocorrelation function. There are obvious normalizations involved. For
crystals, this autocorrelation function is periodic and is called a
Patterson function when it is derived from diffraction data.

In the case of statistical disorder, an important factor characterizing it
is autocorrelation of alternative conformations when they are displaced by
unit cell periodicities. If such autocorrelation is zero, we have a pure
statistical disorder; in such a case, we should add structure factors of
alternative conformations to create a calculated F. There will be also
diffused scattering from the disorder, but it will not be aligned with
Bragg diffraction. More often, the presence of a particular alternative
conformation will affect the probability of alternative conformation a
unit cell away, and this needs to be considered separately for every unit
cell translation. If this correlation is very strong - close to 1 - we
have a situation similar or identical to merohedral twinning, and one
should add F^2 from alternative models. In an intermediate case, when
autocorrelation in a particular direction is between zero and one, the
Fourier transform will produce streaks in diffraction pattern and the
alignment of these streaks will be related to the properties of the
autocorrelation function. Unfortunately, this creates problems when
dealing with reduced data sets.

Mosaicity is a very different phenomenon. It describes a range of angular
alignments of microcrystals with the same unit cell within the sample. It
broadens diffraction peaks by the same angle irrespective of the data
resolution, but it cannot change the length of diffraction vector for each
Bragg reflection. For this reason, the elongation of the spot on the
detector resulting from mosaicity will be always perpendicular to the
diffraction vector. This is distinct from the statistical disorder, where
spot elongation will be aligned with the crystal lattice and not the
detector plane.

Obviously, no phase information can be derived from the spot shapes
resulting from mosaicity. Interestingly, there is a potential for
extracting phase information from spot shapes induced by statistical
disorder. However, it is far from simple and can be used only to improve
phases. It is not promising as an ab initio phasing method.

This discussion assumed only one unit cell periodicity in the sample,
which is the desired state in all cases. In cryo-cooled crystals, the rate
of cooling is different for different parts of the sample, resulting quite
often in different unit cell periodicities across the sample. Now there
are multiple possibilities to consider; quite typically, the crystal
symmetry is the same and the range of unit cell variability is small. This
results in variable spot shape elongation, with angular range being
resolution-dependent and elongation not necessarily perpendicular to the
diffraction vector. By just looking at diffraction pattern, it is easy to
distinguish this case from mosaicity. In such samples, a problem arises
when rotation exposes distinctly different phases at different
orientations. The resulting diffraction data will merge with poor
statistics, as distinct structure factors will be merged together. Such
condition is quite typical when large crystals are exposed with
microbeams.
Presence of different crystal forms also provides phasing opportunities
known as averaging between crystals. However, this requires separate data
set collection rather than mixing such crystals during one rotation sweep.

Presence of multiple, similar unit cells in the sample is completely
different and unrelated condition to statistical disorder.

Zbyszek Otwinowski


 Not sure I understand why having statistical disorder makes for
 streaks--does the crystal then have a whole range of unit cell constants,
 with the spot at the most prevalent value, and the streaks are the tails
 of the distribution? If so, doesn't having the streak imply a really wide
 range of constants? And how would this be different from mosaicity? My
 guess is that this is not the right picture, and this is indeed roughly
 what mosaicity is.

 Alternatively, perhaps the streaks are interpreted as the result of a
 duality between the unit cell, which yields spots, and a super cell
 which is so large that it yields extremely close spots which are
 indistinguishable from lines/streaks. Usually this potential super cell is
 squelched by destructive interference due to each component unit cell
 being very nearly identical, but here the destructive interference doesn't
 happen because each component unit cell differs quite a bit from its
 fellows.

 And I guess in the latter case the supercell would have its cell
 constant (in the direction of the streaks) equal to (or a function of) the
 coherence length

Re: [ccp4bb] twinning problem ?

2014-03-12 Thread Zbyszek Otwinowski

On 03/12/2014 04:15 PM, Keller, Jacob wrote:

For any sample, crystalline or not, a generally valid description of 
diffraction intensity is it being a Fourier transform of electron density 
autocorrelation function.


I thought for non-crystalline samples diffraction intensity is simply the 
Fourier transform of the electron density, not its autocorrelation function. Is 
that wrong?



The Fourier transform of electron density is a complex scattering amplitude that 
by the axiom of quantum mechanics is not a measurable quantity. What is 
measurable is the module squared of it. In crystallography, it is called either 
F^2 (formally equal F*Fbar) or somewhat informally diffraction intensity, after 
one takes into account scaling factors. F*Fbar is the Fourier transform of an 
electron density autocorrelation function regardless if electron density is 
periodic or not. For periodic electron density the structure factors are 
described by sum of delta Dirac functions placed on the reciprocal lattice. 
These delta functions are multiplied by values of structure factors for 
corresponding Miller indices.





Anyway, regarding spot streaking, perhaps there is a different, simpler 
formulation for how they arise, based on the two phenomena:

(1) Crystal lattice convoluted with periodic contents, e.g., protein structure 
in exactly the same orientation
(2) Crystal lattice convoluted with aperiodic contents, e.g. n different 
conformations of a protein loop, randomly sprinkled in the lattice.

Option (1) makes normal spots. If there is a lot of scattering material doing (2), then 
streaks arise due to many super-cells occurring, each with an integral number 
of unit cells, and following a Poisson distribution with regard to frequency according to 
the number of distinct conformations. Anyway, I thought of this because it might be 
related to scattering from aperiodic crystals, in which there is no concept of unit cell 
as far as I know (just frequent distances), which makes them really interesting for 
thinking about diffraction.



This formulation cannot describe aperiodic contents. The convolution of a 
crystal lattice with any function will result in electron density, which has a 
perfect crystal symmetry of the same periodicity as the starting crystal lattice.



See the images here of an aperiodic lattice and its Fourier transform, if 
interested:

http://postimg.org/gallery/1fowdm00/


This is interesting case of pseudocrystal, however because there is no crystal 
lattice, it is not relevant to (1) or (2). In any case, pentagonal quasilattices 
are probably not relevant to macromolecular crystallography.





Mosaicity is a very different phenomenon. It describes a range of angular 
alignments of microcrystals with the same unit cell within the sample. It 
broadens diffraction peaks by the same angle irrespective of the data 
resolution, but it cannot change the length of diffraction vector for each 
Bragg reflection. For this reason, the elongation of the spot on the detector 
resulting from mosaicity will be always perpendicular to the diffraction 
vector. This is distinct from the statistical disorder, where spot elongation 
will be aligned with the crystal lattice and not the detector plane.


I have been convinced by some elegant, carefully-thought-out papers that this microcrystal 
conception of the data-processing constant mosaicity is basically wrong, and that the primary 
factor responsible for observed mosaicity is discrepancies in unit cell constants, and not the 
microcrystal picture. I think maybe you are referring here to theoretical mosaicity and not the 
fitting parameter, so I am not contradicting you. I have seen recently an EM study of protein microcrystals 
which seems to show actual tilted mosaic domains just as you describe, and can find the reference if desired.


This is easy to test by analyzing diffraction patterns of individual crystals. 
In practice, the dominant contribution to angular broadening of diffraction 
peaks is angular disorder of microdomains, particularly in cryo-cooled crystals. 
However, exceptions do happen, but these rare situations need to be handled on 
case by case basis.


Zbyszek


Presence of multiple, similar unit cells in the sample is completely different 
and unrelated condition to statistical disorder.


Agreed!

Jacob




--
Zbyszek Otwinowski
UT Southwestern Medical Center  
5323 Harry Hines Blvd., Dallas, TX 75390-8816
(214) 645 6385 (phone) (214) 645 6353 (fax)
zbys...@work.swmed.edu


Re: [ccp4bb] twinning problem ?

2014-03-12 Thread Zbyszek Otwinowski
 for improving crystal perfection, 
defining data-collection requirements and for data-processing procedures. 
Measurements on crystals of tetragonal lysozyme at room temperature and 100 K 
were made in order to illustrate how parameters describing the crystal 
imperfections can be obtained. At 100 K, the dominant imperfection appeared to 
be a variation in unit-cell dimensions in the crystal.
PMID: 9757100 [PubMed - indexed for MEDLINE]




--
Zbyszek Otwinowski
UT Southwestern Medical Center  
5323 Harry Hines Blvd., Dallas, TX 75390-8816
(214) 645 6385 (phone) (214) 645 6353 (fax)
zbys...@work.swmed.edu


Re: [ccp4bb] twinning problem ?

2014-03-11 Thread Zbyszek Otwinowski
Shape of the diffraction spots changes in the statistical disorder --
 twinning continuum. At both ends spots shape is like in diffraction from
crystals without such disorder. However, in the intermediate case,
electron density autocorrelation function has additional component to
one resulting from ordered crystal. This additional component of
autocorrelation creates characteristic non-Bragg diffraction, e.g.
streaks aligned with particular unit cell axis.

In the absence of such diffraction pattern, the ambiguity is binary. The
description of the problem indicates statistical disorder.

Zbyszek Otwinowski

 Hi,

 If there's an NCS translation, recent versions of Phaser can account for
 it and give moment tests that can detect twinning even in the presence of
 tNCS.  But I agree with Eleanor that the L test is generally a good choice
 in these cases.

 However, the fact that you see density suggests that your crystal might be
 more on the statistical disorder side of the statistical disorder --
 twinning continuum, i.e. the crystal doesn't have mosaic blocks
 corresponding to one twin fraction that are large compared to the
 coherence length of the X-rays.  So you might want to try refinement with
 the whole structure duplicated as alternate conformers.

 Best wishes,

 Randy Read

 -
 Randy J. Read
 Department of Haematology, University of Cambridge
 Cambridge Institute for Medical ResearchTel: +44 1223 336500
 Wellcome Trust/MRC Building Fax: +44 1223 336827
 Hills Road
 E-mail: rj...@cam.ac.uk
 Cambridge CB2 0XY, U.K.
 www-structmed.cimr.cam.ac.uk

 On 11 Mar 2014, at 14:10, Eleanor Dodson eleanor.dod...@york.ac.uk
 wrote:

 Sorry - hadnt finished..
 The twinning tests are distorted by NC translation - usually the L test
 is safe, but the others are all suspect..



 On 11 March 2014 14:09, Eleanor Dodson eleanor.dod...@york.ac.uk
 wrote:
 What is the NC translation? If there is a factor of 0.5 that makes SG
 determination complicated..
 Eleanor


 On 11 March 2014 14:04, Stephen Cusack cus...@embl.fr wrote:
 Dear All,
 I have 2.6 A data and unambiguous molecular replacement solution for
 two copies/asymmetric unit of a 80 K protein for a crystal
 integrated
 in P212121 (R-merge around 9%) with a=101.8, b=132.2, c=138.9.
 Refinement allowed rebuilding/completion of the model in the noraml way
 but the R-free does not go below 30%. The map in the model regions looks
 generally fine but  there is a lot
 of extra positive density in the solvent regions (some of it looking
 like weak density for helices and strands)  and unexpected positive
 peaks within the model region.
 Careful inspection allowed manual positioning of a completely different,
 overlapping solution for the dimer which fits the extra density
 perfectly.
 The two incompatible solutions are related by a 2-fold axis parallel to
 a.
 This clearly suggests some kind of twinning. However twinning analysis
 programmes (e.g. Phenix-Xtriage), while suggesting the potentiality
 of pseudo-merohedral twinning (-h, l, k) do not reveal
 any significant twinning fraction and proclaim the data likely to be
 untwinned. (NB. The programmes do however highlight a
 non-crystallographic translation and there are systematic intensity
 differences in the data). Refinement, including this twinning law made
 no difference
 since the estimated twinning fraction was 0.02. Yet the extra density is
 clearly there and I know exactly the real-space transformation between
 the two packing solutions.
 How can I best take into account this alternative solution (occupancy
 seems to be around 20-30%) in the refinement ?
 thanks for your suggestions
 Stephen

 --

 **
 Dr. Stephen Cusack,
 Head of Grenoble Outstation of EMBL
 Group leader in structural biology of protein-RNA complexes and viral
 proteins
 Joint appointment in EMBL Genome Biology Programme
 Director of CNRS-UJF-EMBL International Unit (UMI 3265) for Virus Host
 Cell Interactions (UVHCI)
 **

 Email:  cus...@embl.fr
 Website: http://www.embl.fr
 Tel:(33) 4 76 20 7238Secretary (33) 4 76 20 7123
 Fax:(33) 4 76 20 7199
 Postal address:   EMBL Grenoble Outstation, 6 Rue Jules Horowitz, BP181,
 38042 Grenoble Cedex 9, France
 Delivery address: EMBL Grenoble Outstation, Polygone Scientifique,
   6 Rue Jules Horowitz, 38042 Grenoble, France
 **





Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] Data processing - twinned xtals

2013-08-16 Thread Zbyszek Otwinowski
This is clearly a case of a crystal with a very long unit cell; a case
which should be approached mindfully.

HKL2000 has a default search for indexing solutions such that diffraction
along the longest unit cell will be resolved, with the assumed spot size.

The problem with such diffraction has 2 aspects:
1) how to process the already collected data where the spots are close to
each other;
2) how to collect future data.

Ad 1) The best solution is to reduce the spot size, so the spots are
resolved. This may require an adjustment of spot size by a single pixel;
one should not only change spot radius, but also change the box size
between even and odd number of pixels in the box dimensions.

Just changing the spot radius changes the spot diameter by an even number
of pixels, so if one wants to change the spot diameter by one pixel, one
has to change the box size. This is the consequence of the spot being in
the center of the box.

Just during indexing, there is also a workaround by specifying the command
before indexing: longest vector followed by a number that defines the
upper limit of the cell size. This may help finding indexing, but will
create overlaps between spots during refinement and integration.

This dataset presents a problem of collecting data by rotating on the axis
perpendicular to the long unit cell. In consequence, the Image 1 has
essentially (barely differing in centroid position) overlapping spots, so
it would be hard to process them meaningfully by any program.

Ad. 2) What would be a better way to collect data in the future?


 Hi CCP4 folks

 I have a data set which is looks twinned ( see the image-1  - I zoomed on
 to the image so that one can spot the twinning. Furthermore, the spots are
 very smeary from ~ 30 - 120 degrees of data collection, see image 2) I
 tried using HKL2000 and mosflm to process this data but i cannot process
 it. I was wondering if anyone has any ideas as to how to process this data
 or comments on whether this data is even useful. Also, I would really
 appreciate if someone could share their experiences on solving twinning
 issues during crystal growth

 Thanks in advance !

 Mahesh[image: Inline image 2][image: Inline image 3]



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] Data processing - twinned xtals

2013-08-16 Thread Zbyszek Otwinowski
This is clearly a case of a crystal with a very long unit cell; a case
which should be approached mindfully.

HKL2000 has a default search for indexing solutions such that diffraction
along the longest unit cell will be resolved, with the assumed spot size.

The problem with such diffraction has 2 aspects:
1) how to process the already collected data where the spots are close to
each other;
2) how to collect future data.

Ad 1) The best solution is to reduce the spot size, so the spots are
resolved. This may require an adjustment of spot size by a single pixel;
one should not only change spot radius, but also change the box size
between even and odd number of pixels in the box dimensions.

Just changing the spot radius changes the spot diameter by an even number
of pixels, so if one wants to change the spot diameter by one pixel, one
has to change the box size. This is the consequence of the spot being in
the center of the box.

Just during indexing, there is also a workaround by specifying the command
before indexing: longest vector followed by a number that defines the
upper limit of the cell size. This may help finding indexing, but will
create overlaps between spots during refinement and integration.

This dataset presents a problem of collecting data by rotating on the axis
perpendicular to the long unit cell. In consequence, the Image 1 has
essentially (barely differing in centroid position) overlapping spots, so
it would be hard to process them meaningfully by any program.

Ad. 2) What would be a better way to collect data in the future?


 Hi CCP4 folks

 I have a data set which is looks twinned ( see the image-1  - I zoomed
on to the image so that one can spot the twinning. Furthermore, the
spots are very smeary from ~ 30 - 120 degrees of data collection, see
image 2) I tried using HKL2000 and mosflm to process this data but i
cannot process it. I was wondering if anyone has any ideas as to how to
process this data or comments on whether this data is even useful. Also,
I would really appreciate if someone could share their experiences on
solving twinning issues during crystal growth

 Thanks in advance !

 Mahesh[image: Inline image 2][image: Inline image 3]



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] Data processing - twinned xtals

2013-08-16 Thread Zbyszek Otwinowski
This is continuation of the previous message, which was sent prematurely.

In case of crystals with one, very long unit cell data collection strategy
needs to be carefully chosen.

Ad. 2) What would be a better way to collect data in the future?

First, the detector needs to be placed far back enough, so that the spots
are resolved; at minimum when the longest unit cell is in the plane of the
detector (perpendicular to the beam).

To satisfy this condition, it is best to rotate on the axis that is
parallel (or close to parallel, within 30 degrees) to the longest unit
cell.

However, this can be difficult to achieve in some cases. There are two
types of workaround in such a situation:

a) if the crystal has low mosaicity, the spots may be resolved in angular
direction, if a short oscillation is used to collect images; HKL has no
problems with 0.1 degree oscillation range;
b) in the case of mosaic crystals, when a) doesn't work, a partial
solution is increase the detector distance. There will be still a region
of reciprocal lattice where the data will be lost due to overlap, but this
region may be small enough for the data to be used in structure solution.

There is no indication that the particular crystal presented is twinned or
highly mosaic, so chances are good that this project will be solved.

Zbyszek Otwinowski

 Hi CCP4 folks
 I have a data set which is looks twinned ( see the image-1  - I zoomed
on to the image so that one can spot the twinning. Furthermore, the spots
are very smeary from ~ 30 - 120 degrees of data collection, see image 2) I
tried using HKL2000 and mosflm to process this data but i cannot process
it. I was wondering if anyone has any ideas as to how to process this data
or comments on whether this data is even useful. Also, I would really
appreciate if someone could share their experiences on solving twinning
issues during crystal growth
 Thanks in advance !
 Mahesh[image: Inline image 2][image: Inline image 3]


Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] rmerge

2013-08-16 Thread Zbyszek Otwinowski
Dear All,

The purpose of statistics in the output of Scalepack is to help the
experimenter to assess the data. The question is, what is the purpose of
R-merge statistics and its usefulness when its value exceeds 100%?

When Scalepack was originally written 20 years ago, I made a decision to
output the value 0.000 for R-merge values above 100%.
Resolution shell with such R-merge may, depending on circumstances,
contain perfectly fine data for structure refinement or data that are
completely useless. In general, as in the case that started this
discussion, high multiplicity will result in data close to the resolution
limit having such high R-merge value. The best way to assess the
resolution limit of the collected diffraction is to look at the
refinement's R- and R-free factors. However, one has to make a preliminary
judgement at an earlier stage about which data to forward to subsequent
calculations. The 0.000 R-merge value is simply a pointer to the
experimenter that one should pay attention to other criteria than R-merge
statistics. I did not want to print N/A or some other non-numerical string
to simplify the parsing of Scalepack output.

I always considered R-merge as useful statistic only for shells with
strong reflections, effectively meaning low-resolution data. For these
data high values of R-merge (e.g. 10%) indicate the presence of systematic
errors or effects. Otherwise, R-merge is a rather poor proxy for relevance
of data. Other indicators that are much more useful to define the
resolution limit are:
- I/sig(I) if goodness-of-fit (chi^2) is close to 1 in this resolution
shell; if not, one should only adjust the error scale factor, not the
estimate of systematic error (Scalepack keyword: error systematic);
- CC1/2  (or CC*) is the next best criterion;
- other criteria can also be used, e.g. Rpim.

The current version of HKL suite prints out all these statistics.

Quite frequently, when a program, particularly a widely used one, seems to
fail, it is an indication that there are issues with the data. This has
been the case in other recent thread related to problems with
indexing/processing of data. Something needs to be changed in such cases;
it could be the input to the program or, in case of R-merge statistics,
one should pay attention to something else rather than consider it a
program failure.

Best regards,


Re: [ccp4bb] Strange density in solvent channel and high Rfree

2013-03-19 Thread Zbyszek Otwinowski
It is a clear-cut case of crystal packing disorder. The tell-tale sign is
that data can be merged in the higher-symmetry lattice, while the number
of molecules in the asymmetric unit (3 in P21) is not divisible by the
higher symmetry factor (2, by going from P21 to P21212).
From my experience, this is more likely a case of order-disorder than
merohedral twinning. The difference between these two is that structure
factors are added for the alternative conformations in the case of
order-disorder, while intensities (structure factors squared) are added in
the case of merohedral twinning.

Now an important comment on how to proceed in the cases where data can be
merged in a higher symmetry, but the structure needs to be solved in a
lower symmetry due to a disorder.

!Such data needs to be merged in the higher symmetry,assigned R-free flag,
and THEN expanded to the lower symmetry. Reprocessing the data in a lower
symmetry is an absolutely wrong procedure and it will artificially reduce
R-free, as the new R-free flags will not follow data symmetry!

Moreover, while this one is likely to be a case of order-disorder, and
these are infrequent, reprocessing the data in a lower symmetry seems to
be frequently abused, essentially in order to reduce R-free. Generally,
when data CAN be merged in a higher symmetry, the only proper procedure in
going to a lower-symmetry structure is by expanding these higher-symmetry
data to a lower symmetry, and not by rescaling and merging the data in a
lower symmetry.

Zbyszek Otwinowski

 Dear all,
 We have solved the problem. Data processing in P1 looks better (six
 molecules in ASU), and Zanuda shows a P 1 21 1 symmetry (three molecules
 in
 ASU), Rfactor/Rfree drops to 0.20978/0.25719 in the first round
 of refinement (without put waters, ligands, etc.).

 Indeed, there were one more molecule in ASU, but the over-merged data in
 an orthorhombic lattice hid the correct solution.

 Thank you very much for all your suggestions, they were very important to
 solve this problem.

 Cheers,

 Andrey

 2013/3/15 Andrey Nascimento andreynascime...@gmail.com

 *Dear all,*

 *I have collected a good quality dataset of a protein with 64% of
 solvent
 in P 2 21 21 space group at 1.7A resolution with good statistical
 parameters (values for last shell: Rmerge=0.202; I/Isig.=4.4;
 Complet.=93%
 Redun.=2.4, the overall values are better than last shell). The
 structure
 solution with molecular replacement goes well, the map quality at the
 protein chain is very good, but in the final of refinement, after
 addition
 of a lot of waters and other solvent molecules, TLS refinement, etc. ...
 the Rfree is a quite high yet, considering this resolution
 (1.77A).(Rfree=
 0.29966 and Rfactor= 0.25534). Moreover, I reprocess the data in a lower
 symmetry space group (P21), but I got the same problem, and I tried all
 possible space groups for P222, but with other screw axis I can not even
 solve the structure.*

 *A strange thing in the structure are the large solvent channels with a
 lot of electron density positive peaks!? I usually did not see too many
 peaks in the solvent channel like this. This peaks are the only reason
 for
 these high R's in refinement that I can find. But, why are there too
 many
 peaks in the solvent channel???*

 *I put a .pdf file (ccp4bb_maps.pdf) with some more information and map
 figures in this link: https://dl.dropbox.com/u/16221126/ccp4bb_maps.pdf*

 *
 *

 *Do someone have an explanation or solution for this?*

 * *

 *Cheers,*

 *Andrey*




Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] refining against weak data and Table I stats

2012-12-07 Thread Zbyszek Otwinowski
The difference between one and the correlation coefficient is a square
function of differences between the datapoints. So rather large 6%
relative error with 8-fold data multiplicity (redundancy) can lead to
CC1/2 values about 99.9%.
It is just the nature of correlation coefficients.

Zbyszek Otwinowski



 Related to this, I've always wondered what CC1/2 values mean for low
 resolution. Not being mathematically inclined, I'm sure this is a naive
 question, but i'll ask anyway - what does CC1/2=100 (or 99.9) mean?
 Does it mean the data is as good as it gets?

 Alan



 On 07/12/2012 17:15, Douglas Theobald wrote:
 Hi Boaz,

 I read the KK paper as primarily a justification for including
 extremely weak data in refinement (and of course introducing a new
 single statistic that can judge data *and* model quality comparably).
 Using CC1/2 to gauge resolution seems like a good option, but I never
 got from the paper exactly how to do that.  The resolution bin where
 CC1/2=0.5 seems natural, but in my (limited) experience that gives
 almost the same answer as I/sigI=2 (see also KK fig 3).



 On Dec 7, 2012, at 6:21 AM, Boaz Shaanan bshaa...@exchange.bgu.ac.il
 wrote:

 Hi,

 I'm sure Kay will have something to say  about this but I think the
 idea of the K  K paper was to introduce new (more objective) standards
 for deciding on the resolution, so I don't see why another table is
 needed.

 Cheers,




Boaz


 Boaz Shaanan, Ph.D.
 Dept. of Life Sciences
 Ben-Gurion University of the Negev
 Beer-Sheva 84105
 Israel

 E-mail: bshaa...@bgu.ac.il
 Phone: 972-8-647-2220  Skype: boaz.shaanan
 Fax:   972-8-647-2992 or 972-8-646-1710





 
 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas
 Theobald [dtheob...@brandeis.edu]
 Sent: Friday, December 07, 2012 1:05 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: [ccp4bb] refining against weak data and Table I stats

 Hello all,

 I've followed with interest the discussions here about how we should be
 refining against weak data, e.g. data with I/sigI  2 (perhaps using
 all bins that have a significant CC1/2 per Karplus and Diederichs
 2012).  This all makes statistical sense to me, but now I am wondering
 how I should report data and model stats in Table I.

 Here's what I've come up with: report two Table I's.  For comparability
 to legacy structure stats, report a classic Table I, where I call the
 resolution whatever bin I/sigI=2.  Use that as my high res bin, with
 high res bin stats reported in parentheses after global stats.   Then
 have another Table (maybe Table I* in supplementary material?) where I
 report stats for the whole dataset, including the weak data I used in
 refinement.  In both tables report CC1/2 and Rmeas.

 This way, I don't redefine the (mostly) conventional usage of
 resolution, my Table I can be compared to precedent, I report stats
 for all the data and for the model against all data, and I take
 advantage of the information in the weak data during refinement.

 Thoughts?

 Douglas


 ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
 Douglas L. Theobald
 Assistant Professor
 Department of Biochemistry
 Brandeis University
 Waltham, MA  02454-9110

 dtheob...@brandeis.edu
 http://theobald.brandeis.edu/

 ^\
   /`  /^.  / /\
 / / /`/  / . /`
 / /  '   '
 '




 --
 Alan Cheung
 Gene Center
 Ludwig-Maximilians-University
 Feodor-Lynen-Str. 25
 81377 Munich
 Germany
 Phone:  +49-89-2180-76845
 Fax:  +49-89-2180-76999
 E-mail: che...@lmb.uni-muenchen.de



Re: [ccp4bb] P4132 vs. F23

2012-05-31 Thread Zbyszek Otwinowski
Space groups F23 and P4132 are not subgroups of each other (without
invoking pseudotranslational symmetry) so they cannot be related by
twinning. The end of theoretical analysis.

Zbyszek Otwinowski





 You need to say what the cell dimensions are for these 2 options..
  Eleanor

 On 29 May 2012 15:59, Andrey Lebedev andrey.lebe...@stfc.ac.uk wrote:

 Hi Mike.

 I would be more careful about incorrect space group. Yes, sometimes
 auto-indexing gives strange results.
 However, in your case two sets of crystals differ by two factors,
 diffraction quality and space group.
 Therefore it seems more likely that you have two crystal forms.
 Could you please send me log-files from pointless or ctruncate? Then I
 would be able to say something more definite.

 Regards

 Andrey

 On 28 May 2012, at 08:46, Mike John wrote:

 Hi, All,

 we got many datasets from crystals of our protein. when the crystal has
 high quality, it will be indexed as F23 (correct space group). While
 diffracting poorer, it will be indexed as incorrect space group  P4132.
 The
 crystals are twined. My question is:
 Given twinning, why the correct space group F23 will be indexed as an
 incorrect space group P4132 for most of our crystals (most crystals has
 poorer quality)? Theoretical analysis of twin operator and symmetry
 operator to connect F23 and P4132 would be highly appreciated. Thanks

 Mike




Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] How to evaluate Fourier transform ripples

2011-07-06 Thread Zbyszek Otwinowski
The question about Fourier transformation ripples has a straightforward
answer in a fairly typical situation:
A) data are collected to the resolution limit of diffraction,
B) phases are uniform in quality across the resolution range, which is
equivalent to R-free being uniform with respect to resolution within a
factor of 2  or so,
C) maps are not sharpened.

The ripples originate from not including unobserved structure factors. The
intensity of diffraction decreases rapidly past the measurability limit,
so, in the above situation, the unobserved diffraction contributes very
little. Consequently, the answer is that typically one should not see
ripples.
Ripples should not be confused with the effect of electron density maps
being smoothed by vibrations and other forms of disorder.

Zbyszek Otwinowski


 Dear All, Hi. I was asked in a manuscript revision to discuss
 about the possible effects of Fourier transformation ripples on the
 crystallographic results. Specifically, the reviewers question whether
 ripples may affect on the electron density around heavy metal center which
 has a Mo-S-As connection. From which angle or in which way this problem
 should be addressed most convincingly ? Thank you for any
 suggestion.Best,Conan


Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] Y-Chi2 running out of chart

2011-06-22 Thread Zbyszek Otwinowski
The two most likely possibilities are:

1. Beam position changed somewhat after the repair and the site file was
not updated with the new position. This could result in misindexing of the
diffraction pattern with poor positional agreement (Chi2) as a
consequence. The diagnosis of misindexing is very simple, as it will not
produce acceptable merging statistics even in P1 space group. The
correction is also simple: by updating the site file with the correct beam
position.

2. A non-ideal crystal with a complex spot shape in its diffraction
pattern. This could result, for example, from uneven cooling rates and
variability in the crystal lattice. Merging statistics should be
acceptable, however they may not be perfect. Better cryo-cooling is likely
to help.

Zbyszek Otwinowski

 Dear Colleagues,

 I'm collecting a dataset on our recently repaired Rigaku home source.
Crystal diffracts to 2.2A. Indexing seems to be all fine. However,
during integration, I realize Y-Chi2 is increasing constantly (from 2 to
4.5, almost linear) within 60 degree collection, whereas X-Chi2 stays
the same. An image is attached. There are still another 60 degree to go.
Although the
 prediction fits the images well so far, I'm afraid the Y-Chi2 will
eventually run out of the chart.
 My question is could it be related to any hardware malfunctioning, i.e.,
goniometer, image plates, etc, which may be a side effect of the recent
major repair? Or what else it can be?

 Thanks,
 Bing



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] Change cell parameter

2011-06-08 Thread Zbyszek Otwinowski
you probably need to reindex your data.
h - h
k - -k
l - -l
by using the command
hkl matrix
 1  0  0
 0 -1  0
 0  0 -1
in scalepack
In HKL2000 you should use reindex menu or data set macro (not the overall
scaling macro). Dataset macro exists only in the newest version of
HKL2000.
The reindexing will change the beta angle automatically.

 Dear all,

 I have a P2 derivative dataset with beta=89.6. I try to change the beta
to 90.4 to be consistent with the native dataset. Should I do sth with
the HKL, like applying a matrix? Thanks a million!

 Best,
 Zhiyi



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains

2011-04-01 Thread Zbyszek Otwinowski
The meaning of B-factor is the (scaled) sum of all positional
uncertainties, and not just its one contributor, the Atomic Displacement
Parameter that describes the relative displacement of an atom in the
crystal lattice by a Gaussian function.
That meaning (the sum of all contributions) comes from the procedure that
calculates the B-factor in all PDB X-ray deposits, and not from an
arbitrary decision by a committee. All programs that refine B-factors
calculate an estimate of positional uncertainty, where contributors can be
both Gaussian and non-Gaussian. For a non-Gaussian contributor, e.g.
multiple occupancy, the exact numerical contribution is rather a complex
function, but conceptually it is still an uncertainty estimate. Given the
resolution of the typical data, we do not have a procedure to decouple
Gaussian and non-Gaussian contributors, so we have to live with the
B-factor being defined by the refinement procedure. However, we should
still improve the estimates of the B-factor, e.g. by changing the
restraints. In my experience, the Refmac's default restraints on B-factors
in side chains are too tight and I adjust them. Still, my preference would
be to have harmonic restraints on U (square root of B) rather than on Bs
themselves.
It is not we who cram too many meanings on the B-factor, it is the quite
fundamental limitation of crystallographic refinement.

Zbyszek Otwinowski

 The fundamental problem remains:  we're cramming too many meanings into
one number [B factor].  This the PDB could indeed solve, by giving us
another column.  (He said airily, blithely launching a totally new flame
war.)
 phx.



[ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains

2011-03-31 Thread Zbyszek Otwinowski
The B-factor in crystallography represents the convolution (sum) of two
types of uncertainties about the atom (electron cloud) position:

1) dispersion of atom positions in crystal lattice
2) uncertainty of the experimenter's knowledge  about the atom position.

In general, uncertainty needs not to be described by Gaussian function.
However, communicating uncertainty using the second moment of its
distribution is a widely accepted practice, with frequently implied
meaning that it corresponds to a Gaussian probability function. B-factor
is simply a scaled (by 8 times pi squared) second moment of uncertainty
distribution.

In the previous, long thread, confusion was generated by the additional
assumption that B-factor also corresponds to a Gaussian probability
distribution and not just to a second moment of any probability
distribution. Crystallographic literature often implies the Gaussian
shape, so there is some justification for such an interpretation, where
the more complex probability distribution is represented by the sum of
displaced Gaussians, where the area under each Gaussian component
corresponds to the occupancy of an alternative conformation.

For data with a typical resolution for macromolecular crystallography,
such multi-Gaussian description of the atom position's uncertainty is not
practical, as it would lead to instability in the refinement and/or
overfitting. Due to this, a simplified description of the atom's position
uncertainty by just the second moment of probability distribution is the
right approach. For this reason, the PDB format is highly suitable for the
description of positional uncertainties,  the only difference with other
fields being the unusual form of squaring and then scaling up the standard
uncertainty. As this calculation can be easily inverted, there is no loss
of information. However, in teaching one should probably stress more this
unusual form of presenting the standard deviation.

A separate issue is the use of restraints on B-factor values, a subject
that probably needs a longer discussion.

With respect to the previous thread, representing poorly-ordered (so
called 'disordered') side chains by the most likely conformer with
appropriately high B-factors is fully justifiable, and currently is
probably the best solution to a difficult problem.

Zbyszek Otwinowski



 - they all know what B is and how to look for regions of high B
 (with, say, pymol) and they know not to make firm conclusions about
 H-bonds
 to flaming red side chains.

But this knowledge may be quite wrong.  If the flaming red really
 indicates
large vibrational motion then yes, one whould not bet on stable H-bonds.
But if the flaming red indicates that a well-ordered sidechain was
 incorrectly
modeled at full occupancy when in fact it is only present at
 half-occupancy
then no, the H-bond could be strong but only present in that
 half-occupancy
conformation.  One presumes that the other half-occupancy location
 (perhaps
missing from the model) would have its own H-bonding network.


 I beg to differ.  If a side chain has 2 or more positions, one should be a
 bit careful about making firm conclusions based on only one of those, even
 if it isn't clear exactly why one should use caution.  Also, isn't the
 isotropic B we fit at medium resolution more of a spherical cow
 approximation to physical reality anyway?

   Phoebe





Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains

2011-03-31 Thread Zbyszek Otwinowski

Dale Tronrud wrote:

   While what you say here is quite true and is useful for us to
remember, your list is quite short.  I can add another

3) The systematic error introduced by assuming full occupancy for all 
sites.


You are right that structural heterogeneity is an additional factor.
Se-Met expression is one of the examples where the Se-Met residue is 
often not fully incorporated, and therefore its side chains have mixed 
with Met composition.

Obviously, solvent molecules may have partial occupancies.
Also, in heavily exposed crystals chemical reactions result in loss of 
the functional groups (e.g. by decarboxylation).
However, in most cases even if side chains have multiple conformations 
their total occupancy is 1.0.




There are, of course, many other factors that we don't account for
that our refinement programs tend to dump into the B factors.

   The definition of that number in the PDB file, as listed in the mmCIF
dictionary, only includes your first factor --

http://mmcif.rcsb.org/dictionaries/mmcif_std.dic/Items/_atom_site.B_iso_or_equiv.html 



and these numbers are routinely interpreted as though that definition is
the law.  Certainly the whole basis of TLS refinement is that the B factors
are really Atomic Displacement Parameters.   In addition the stereochemical
restraints on B factors are derived from the assumption that these 
parameters

are ADPs.  Convoluting all these other factors with the ADPs causes serious
problems for those who analyze B factors as measures of motion.

   The fact that current refinement programs mix all these factors with the
ADP for an atom to produce a vaguely defined B factor should be 
considered
a flaw to be corrected and not an opportunity to pile even more factors 
into

this field in the PDB file.



B-factors describe overall uncertainty of the current model. Refinement 
programs, which do not introduce or remove parts of the model (e.g. are 
not able to add additional conformations) intrinsically pile up all 
uncertainties into B-factors. Solutions, which you would like to see 
implemented, require a model-building like approach. The test of the 
success of such approach would be a substantial decrease of R-free 
values. If anybody can show it, it would be great.


Zbyszek


Dale Tronrud






On 3/31/2011 9:06 AM, Zbyszek Otwinowski wrote:

The B-factor in crystallography represents the convolution (sum) of two
types of uncertainties about the atom (electron cloud) position:

1) dispersion of atom positions in crystal lattice
2) uncertainty of the experimenter's knowledge  about the atom position.

In general, uncertainty needs not to be described by Gaussian function.
However, communicating uncertainty using the second moment of its
distribution is a widely accepted practice, with frequently implied
meaning that it corresponds to a Gaussian probability function. B-factor
is simply a scaled (by 8 times pi squared) second moment of uncertainty
distribution.

In the previous, long thread, confusion was generated by the additional
assumption that B-factor also corresponds to a Gaussian probability
distribution and not just to a second moment of any probability
distribution. Crystallographic literature often implies the Gaussian
shape, so there is some justification for such an interpretation, where
the more complex probability distribution is represented by the sum of
displaced Gaussians, where the area under each Gaussian component
corresponds to the occupancy of an alternative conformation.

For data with a typical resolution for macromolecular crystallography,
such multi-Gaussian description of the atom position's uncertainty is not
practical, as it would lead to instability in the refinement and/or
overfitting. Due to this, a simplified description of the atom's position
uncertainty by just the second moment of probability distribution is the
right approach. For this reason, the PDB format is highly suitable for 
the

description of positional uncertainties,  the only difference with other
fields being the unusual form of squaring and then scaling up the 
standard

uncertainty. As this calculation can be easily inverted, there is no loss
of information. However, in teaching one should probably stress more this
unusual form of presenting the standard deviation.

A separate issue is the use of restraints on B-factor values, a subject
that probably needs a longer discussion.

With respect to the previous thread, representing poorly-ordered (so
called 'disordered') side chains by the most likely conformer with
appropriately high B-factors is fully justifiable, and currently is
probably the best solution to a difficult problem.

Zbyszek Otwinowski




- they all know what B is and how to look for regions of high B
(with, say, pymol) and they know not to make firm conclusions about
H-bonds
to flaming red side chains.


But this knowledge may be quite wrong.  If the flaming red really
indicates
large vibrational motion then yes, one whould not bet

Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains

2011-03-31 Thread Zbyszek Otwinowski
Regarding the closing statement about the best solution to poorly 
ordered side chains:


I described in the previous e-mail the probabilistic interpretation of 
B-factors. In the case of very high uncertainty = poorly ordered side 
chains, I prefer to deposit the conformer representing maximum a 
posteriori, even if it does not represent all possible conformations.
Maximum a posteriori will have significant contribution from the most 
probable conformation of side chain (prior knowledge) and should not 
conflict with likelihood (electron density map).
Thus, in practice I model the most probable conformation as long as it 
it in even very weak electron density, does not overlap significantly 
with negative difference electron density and do not clash with other 
residues.


As a user of PDB files I much prefer the simplest and the most 
informative representation of the result. Removing parts of side chains 
that carry charges, as already mentioned, is not particularly helpful 
for the downstream uses. NMR-like deposits are not among my favorites, 
either. Having multiple conformations with low occupancies increases 
potential for a confusion, while benefits are not clear to me.


Zbyszek

Frank von Delft wrote:
This is a lovely summary, and we should make our students read it. - But 
I'm afraid I do not see how it supports the closing statement in the 
last paragraph... phx.



On 31/03/2011 17:06, Zbyszek Otwinowski wrote:

The B-factor in crystallography represents the convolution (sum) of two
types of uncertainties about the atom (electron cloud) position:

1) dispersion of atom positions in crystal lattice
2) uncertainty of the experimenter's knowledge  about the atom position.

In general, uncertainty needs not to be described by Gaussian function.
However, communicating uncertainty using the second moment of its
distribution is a widely accepted practice, with frequently implied
meaning that it corresponds to a Gaussian probability function. B-factor
is simply a scaled (by 8 times pi squared) second moment of uncertainty
distribution.

In the previous, long thread, confusion was generated by the additional
assumption that B-factor also corresponds to a Gaussian probability
distribution and not just to a second moment of any probability
distribution. Crystallographic literature often implies the Gaussian
shape, so there is some justification for such an interpretation, where
the more complex probability distribution is represented by the sum of
displaced Gaussians, where the area under each Gaussian component
corresponds to the occupancy of an alternative conformation.

For data with a typical resolution for macromolecular crystallography,
such multi-Gaussian description of the atom position's uncertainty is not
practical, as it would lead to instability in the refinement and/or
overfitting. Due to this, a simplified description of the atom's position
uncertainty by just the second moment of probability distribution is the
right approach. For this reason, the PDB format is highly suitable for 
the

description of positional uncertainties,  the only difference with other
fields being the unusual form of squaring and then scaling up the 
standard

uncertainty. As this calculation can be easily inverted, there is no loss
of information. However, in teaching one should probably stress more this
unusual form of presenting the standard deviation.

A separate issue is the use of restraints on B-factor values, a subject
that probably needs a longer discussion.

With respect to the previous thread, representing poorly-ordered (so
called 'disordered') side chains by the most likely conformer with
appropriately high B-factors is fully justifiable, and currently is
probably the best solution to a difficult problem.

Zbyszek Otwinowski




- they all know what B is and how to look for regions of high B
(with, say, pymol) and they know not to make firm conclusions about
H-bonds
to flaming red side chains.

But this knowledge may be quite wrong.  If the flaming red really
indicates
large vibrational motion then yes, one whould not bet on stable 
H-bonds.

But if the flaming red indicates that a well-ordered sidechain was
incorrectly
modeled at full occupancy when in fact it is only present at
half-occupancy
then no, the H-bond could be strong but only present in that
half-occupancy
conformation.  One presumes that the other half-occupancy location
(perhaps
missing from the model) would have its own H-bonding network.

I beg to differ.  If a side chain has 2 or more positions, one should 
be a
bit careful about making firm conclusions based on only one of those, 
even

if it isn't clear exactly why one should use caution.  Also, isn't the
isotropic B we fit at medium resolution more of a spherical cow
approximation to physical reality anyway?

   Phoebe





Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353





--
Zbyszek

Re: [ccp4bb] data processing deviations chisq

2011-03-25 Thread Zbyszek Otwinowski
The most likely explanation is that you have a cracked crystal or your
crystal has split from radiation damage around the frame 70.

If you had a cracked crystal from the start, the spot overlap between
crystals would change when the sample was rotated. In such cases, it may
happen that the program starts refining a different subgroup of crystals
as the one that defines crystal parameters. If these crystals are
isomorphous to each other, the scaling can correct for such a variable
exposure/integration. If the crystals are not isomorphous, which sometimes
happens, you have a problem. In such a case it would probably be better to
restrict the scaling to the initial 70 frames. Sometimes it helps to
reprocess the data with different spot integration size. There are two
opposing strategies that could be beneficial:
1) reduce the spot integration size to narrow down the integration to a
single crystal
2) increase the spot integration size to integrate diffraction from a
group of crystals with similar orientation. If these are uniformly
integrated, the phasing signal should be still preserved.

If your crystal has split from radiation damage, the strategy 2 may help,
but frequently the cracking induced by radiation damage is a very bad sign
(crystal lattice has changed so the crystal is no longer isomorphous with
its initial state).

Scaling provides the ultimate diagnostic of the problem's seriousness.
Statistics from integration that you provided are secondary, but may help
in pinpointing the source of the problem. One can disregard them if the
scaling is good.

The presence of iodine creates additional factors. It increases X-ray
absorption and so the radiation damage, but iodine is also a quencher of
radicals and tends to reduce structural changes induced by radiation.

Hope that helps,
Zbyszek Otwinowski

 Hi all,

 I have collected one iodine soaked data in our home source, and processing
 the data using HKL2000. Exposure time per frame is 5min/1 degree.
 While processing I have noticed that the Chisq values, cell parameters and
 rotation change Vs frame are deviating like anything. Please find the
 picasa link to see the carves. I will be benefited if anyone kindly
 suggest me the possible cause of these deviations.

 https://picasaweb.google.com/118341875228875389610/Mar252011?authkey=Gv1sRgCJXe6-ncmt6KmwE#

 Thank you all for your suggestions,

 Sincerely,

 Debajyoti


Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] Merging data to increase multiplicity

2011-01-30 Thread Zbyszek Otwinowski
I concur with Kay, particularly with point d) and its consequences.

Sometimes it is obvious which result is better, often it is not. For
example, one of the hkl users was testing new options and found that all
the statistics (including refinement r-free) were worse, but the
experimental and refinement maps were much better.

Myself I am testing crystallographic programs written by others. For easy
cases they typically produce equivalent results, and for borderline cases,
they tend to be sensitive to the input parameters, and sometimes one
program works better, and on other data, another works better. This even
applies to programs written by the same person (e.g. DM and Parrot),
particularly when adjusting input parameters.

The problems in real life are so diverse, that it is not clear what would
be a representative set to test programs and make general conclusions.

Zbyszek Otwinowski

 Am 20:59, schrieb Van Den Berg, Bert:
 I have heard this before. I’m wondering though, does anybody know of a
 systematic study where different data processing programs are compared
 with real-life, non-lysozyme data?

 Bert

 Bert,

 some time ago I tried to start something to this effect - take a look at
 the Quality Control article in XDSwiki.
 (http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Quality_Control).
 But it hasn't worked out, i.e. nobody participated (so far).
 Possible reasons include:
 a) it is considered politically incorrect (many years ago I wrote about
 a comparison that I did ... the reactions from a few people were rather
 harsh)
 b) for reasons un-intelligible to me, people do not like to make their
 raw data public (even if I ask directly)
 c) it does take time to do and document
 d) it's difficult to agree on the right methodology
 e) it's a question that seems to interest only specialists
 f) there's probably not a single answer
 g) the programs are being constantly improved

 Concerning the last point, a wiki seems to be a good place to collect
 the results (a table can be used to follow progress in a program, but
 also to see the differences between programs). But that brings me to my
 last point - a wiki article does not count as a paper.

 best,

 Kay



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] Problem with finding of spots

2010-11-23 Thread Zbyszek Otwinowski
The default mode of autoindexing in HKL2000 and denzo is to search for
unit cell lengths producing spots that can be resolved at the data
collection distance and with the specified spot size (it can be changed in
HKL2000 interface). If the unit cell of your crystal is significantly
longer, the program will not find it. Considering that your spots are not
well resolved, it is a quite likely possibility. Your diffraction extends
to less than 3A, so pushing detector back will not cause any loss of the 
measured diffraction. The logic of limiting the unit cell length in
autoindexing is that indexing will fail when data cannot be integrated due
to spot overlap resulting from a long unit cell.
Another possibility is that your crystal has some type of severe packing
disorder along the long axis. In such a case, there is probably no point
in collecting a dataset.

Autoindexing in HKL2000 is a multi-step process. Not all the found peaks
you have shown in your diffraction image will be used for autoindexing, as
they have also to pass the signal-to-noise cutoff in denzo. The ones
accepted for autoindexing are shown in a subsequent window in green color.
You can also use a resolution limit to eliminate peaks at higher
resolution during autoindexing and then extend the resolution during the
refinement. Sometimes it helps for very mosaic crystals.

Zbyszek Otwinowski

 Dear colleagues,

 I am working on one dataset that is hard to process. The data are about
3A of resolution. As we are not able to reproduce the experiment again,
I have
 to use this one, collected in a dirty way.
 The problem starts immediately with finding of spots. I have tried HKL2000,
 XDS, D*trek, ipmosflm, imosflm, but none of them gave a good read-out of
the
 images. All the programs find some spots in wrong positions and the real
spots are not covered. Here is an example:

 http://kolda.webz.cz/image-predictions.jpg

 The data were collected in-house, Saturn 944++ CCD, and all the
necessary information should be in the header properly. I checked the
distance, other
 parameters, but the problem is with finding of correct or real spots on
 the image. This should be even header-independent, should not? All the
programs fail (or even crash) in this routine. Does anyone have any
suggestion, please?

 Btw, we have several structures in the PDB from this experimental setup.
This is the first problem I have met.

 Many thanks for any response.

 Petr

 --
 Petr Kolenko
 petr.kole...@biochemtech.uni-halle.de
 http://kolda.webz.cz



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] What makes the difference between 2 composite omit maps?

2010-11-02 Thread Zbyszek Otwinowski
I am not aware of this point being explicitly made. Maybe somebody else 
could point to the relevant reference?


However, the logic here is very simple:

(1) Take a model and generate from it Fc
(2) Calculate map with minimal rms error m*Fo*exp(i*phiCalc)
(3) This map is biased with respect to errors in the model
(4) To avoid this bias one can subtract a part of the model 
(Fc*coefficient) from the above map
- the coefficient is chosen so electron density in the resulting map 
does not change at the point where we added or subtracted an atom;
- conceptually the above procedure when we subtract an atom is the 
composite omit map
- does not change means here: within first order approximation, we 
ignore second order effects

- for sigmaA-weighted map this coefficient is D/2
(5) The subtraction gives: (m*Fo-D/2*Fc)exp(i*phiCalc)
(6) After multiplication by 2, we get (2m*Fo-DFc)exp(i*phiCalc)



Hailiang Zhang wrote:

Thanks! Can you refer me some documents about your following statements:

derivation of sigmaa-weighted 2mFo-DFc formula is by calculating Fourier
coefficients of the following map:
Rescaled composite omit map, where minimal structural element (of the size
about the resolution element) is being omitted and the starting point is
the map with coefficients m*Fo*exp(i*phiCalc)

It seems the above was not involved in Read's publications about SIGMAA.

Thanks again!

Hailiang


sigmaa-weighted 2mFo-DFc is the _COMPOSIT_OMIT_ map. There is no point in
calculating omit map of an omit map

A brief explanation: derivation of sigmaa-weighted 2mFo-DFc formula is by
calculating Fourier coefficients of the following map:

Rescaled composite omit map, where minimal structural element (of the size
about the resolution element) is being omitted and the starting point is
the map with coefficients m*Fo*exp(i*phiCalc)

BTW, composite omit map of a map with coefficients Fo*exp(i*phiCalc) is
simply Fo-1/2Fc map that after factor of 2 scaling becomes 2Fo-Fc map


Hi,
I want to calculate the sigmaa-weighted 2mFo-DFc composite omit map, and

tried the following 2 scripts:

(1)
./omit hklin ${f}.mtz mapout ${f}.map EOF
LABI FP=mFo FC=DFC PHI=PHIC
RESO 29.50 3.22
SCAL 2.0 -1.0
EOF

(2)
./omit hklin ${f}.mtz mapout ${f}.map EOF
LABI FP=FWT FC=FC PHI=PHIC
RESO 29.50 3.22
SCAL 1.0 0.0
EOF

The output maps are just different, and I wonder why. I am also more

concerned about which one is more appropriate for the sigmaa-weighted
2mFo-DFc composite omit map.

(mFo is what I generated from the SIGMAA output)

Thanks for any suggestions!

Best Regards, Hailiang



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353







--
Zbyszek Otwinowski
UT Southwestern Medical Center  
5323 Harry Hines Blvd., Dallas, TX 75390-8816
(214) 645 6385 (phone) (214) 645 6353 (fax)
zbys...@work.swmed.edu


Re: [ccp4bb] What makes the difference between 2 composite omit maps?

2010-11-01 Thread Zbyszek Otwinowski
sigmaa-weighted 2mFo-DFc is the _COMPOSIT_OMIT_ map. There is no point in
calculating omit map of an omit map

A brief explanation: derivation of sigmaa-weighted 2mFo-DFc formula is by
calculating Fourier coefficients of the following map:

Rescaled composite omit map, where minimal structural element (of the size
about the resolution element) is being omitted and the starting point is
the map with coefficients m*Fo*exp(i*phiCalc)

BTW, composite omit map of a map with coefficients Fo*exp(i*phiCalc) is
simply Fo-1/2Fc map that after factor of 2 scaling becomes 2Fo-Fc map

 Hi,
 I want to calculate the sigmaa-weighted 2mFo-DFc composite omit map, and
tried the following 2 scripts:

 (1)
 ./omit hklin ${f}.mtz mapout ${f}.map EOF
 LABI FP=mFo FC=DFC PHI=PHIC
 RESO 29.50 3.22
 SCAL 2.0 -1.0
 EOF

 (2)
 ./omit hklin ${f}.mtz mapout ${f}.map EOF
 LABI FP=FWT FC=FC PHI=PHIC
 RESO 29.50 3.22
 SCAL 1.0 0.0
 EOF

 The output maps are just different, and I wonder why. I am also more
concerned about which one is more appropriate for the sigmaa-weighted
2mFo-DFc composite omit map.

 (mFo is what I generated from the SIGMAA output)

 Thanks for any suggestions!

 Best Regards, Hailiang



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353



Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] Calculating R-merge between 2 mtz files.

2009-12-16 Thread Zbyszek Otwinowski

There is an answer that requires using non-ccp4 program:

export intensity columns in scalepack format, create separate file
for each dataset.

Merge files in scalepack, it can read its own output, so provide them
as input files.

Jason Porta wrote:

Hi everybody,

I would like to take two mtz files (which are very similar) and calculate
the R-merge between them. I tried looking into CCP4 and Phenix, but could
not find a direct path. Does anybody know how I can do this R-merge calculation?

Best regards,

Jason Porta




--
Zbyszek Otwinowski
UT Southwestern Medical Center  
5323 Harry Hines Blvd., Dallas, TX 75390-8816
(214) 645 6385 (phone) (214) 645 6353 (fax)
zbys...@work.swmed.edu


Re: [ccp4bb] data reduction

2009-10-29 Thread Zbyszek Otwinowski
Dear Eleanor,

Even if Denzo, Scalepack and HKL suite have been extensively developed,
their file formats have not changed in many years. However, there are
intermediate programs between Scalepack and structure solution ones that
can potentially be a source of problem. For example, ctruncate for some
time was rejecting centrosymmetric reflections when fed Scalepack output.
In this case, the error message suggests that the problem may be related
to asymmetric unit transformations, but, as stated earlier, Scalepack has
not changed the asymmetric unit it outputs. So I doubt the problem lies in
Scalepack as such.
Zbyszek Otwinowski

 Can you send a bit of your scalepack unmerged data - that would allow us
 to check format and pointless behavior..

 It sounds a bit like a scalepack problem though..
   Eleanor

 Alexandra Deaconescu wrote:
 Dear all:

 I am trying to solve a structure from apparently a hexagonal crystal.
 I indexed and scaled data  in P6 in Scalepack (with merging) then used
 Scalepack2mtz (with ensure unique reflections and add Rfree as well as
 the truncate procedure), and then attempted to run molecular
 replacement with Phaser. Now the problem appeared - Phaser immediately
 quits with the following error message FATAL RUNTIME ERROR:
 Reflections are not a unique set by symmetry. I do not understand
 this at all.

 I also tried running scalepack using the NO MERGE macro as people have
 indicated earlier on this bb (thank you again!, I also checked the
 scl.in that is written out and it had the NO MERGE statement), and
 then tried to run pointless to verify the spacegroup but the program
 complained the reflections are merged (that is impossible, I checked
 the number of reflections in the unmerged and merged files and they
 were different as one would expect). I repeated the procedures several
 times and I always get the same errors. I can't make any sense of this
 and I can't move forward. Any ideas?

 Many thanks,
 Alex




Re: [ccp4bb] Mosaicity beam divergence

2009-08-06 Thread Zbyszek Otwinowski

Richard Gillilan wrote:
Sorry, I meant to say does divergence add to the reported mosaicity 
value. If so, do actual mosaicity and divergence add in quadrature to 
give the reported value?




Yes, they do add in quadrature, the total is reported by scalepack and
HKL2000 (if postrefinement is used), denzo overestimates it somewhat.


--
Zbyszek Otwinowski
UT Southwestern Medical Center  
5323 Harry Hines Blvd., Dallas, TX 75390-8816
(214) 645 6385 (phone) (214) 645 6353 (fax)
zbys...@work.swmed.edu


Re: [ccp4bb] Scalepack error model?

2009-07-17 Thread Zbyszek Otwinowski

The scalepack log file gives the formula:

 Summary of reflections intensities and R-factors by intensity bins
 R linear = SUM ( ABS(I - I)) / SUM (I)
 R square = SUM ( (I - I) ** 2) / SUM (I ** 2)
 Chi**2   = SUM ( (I - I) ** 2) / (Error ** 2 * N / (N-1) ) )

which equivalent to the Jay Ponder's formula, with the important addition, that 
sigma_avg and Iavg represent the average of all _other_ measurements with the 
same reduced hkl index.

All sigmas are calculated from the error model described in the publications.

Some of the error model parameters are defined at the moment by user, they can 
be refined iteratively by experimenter by adjusting parameters in subsequent 
runs of scalepack, but most of the time it is not required. New version will 
adjust all these parameters automatically.


Zbyszek Otwinowski

Richard Gillilan wrote:

Thanks Joe and others.

Bits and pieces of this story appear in 11.4.8 of International Tables 
volume F, Borek et. al. Acta Cryst D59 (2003) and the Scalepack manual, 
but none are complete or have enough detail to follow easily. None of 
them give the expression for Chi-square for this problem.


I found a presentation by Jay Ponder online (for his Bio5325 course) 
that gives:


chi-2 = 1/N sum (I_avg - I_meas)^2/(sigma_avg^2 + sigma_meas^2)

where the sum probably runs over all reflections and the I_avg is the 
average of the appropriate group of symmetry-related reflections. 
Sigma_avg^2 should be the sigma computed from the error model below (not 
given in the presentation) I think and sigma_meas is the sigma^2 from 
the actual symmetry-related reflections.


One would then adjust the error parameters below to give chi-square 
approx unity and this leads to the proper scaling factors for 
intensities and sigmas.


One confusing hitch seems to be that (according to the International 
Tables F Eqs.(11.4.8.5) and (11.4.8.6)), the error model is also 
implicitly defined and must be solved iteratively ... though it's hard 
to see that from the text.


Does this sound right?


Richard



--
Zbyszek Otwinowski
UT Southwestern Medical Center  
5323 Harry Hines Blvd., Dallas, TX 75390-8816
(214) 645 6385 (phone) (214) 645 6353 (fax)
zbys...@work.swmed.edu