Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains

2011-04-01 Thread Robbie Joosten

Hi Frank,

  I described in the previous e-mail the probabilistic interpretation of
  B-factors. In the case of very high uncertainty = poorly ordered side
  chains, I prefer to deposit the conformer representing maximum a
  posteriori, even if it does not represent all possible conformations.
  Maximum a posteriori will have significant contribution from the most
  probable conformation of side chain (prior knowledge) and should not
  conflict with likelihood (electron density map).
  Thus, in practice I model the most probable conformation as long as it
  it in even very weak electron density, does not overlap significantly
  with negative difference electron density and do not clash with other
  residues.
 If it's probability you're after, if there's no density to guide you 
 (very common!) you'd have to place all likely rotamers that don't 
 clash with anything, and set their occupancies to their probability (as 
 encoded in the rotamer library).
Which library? The one for all side chains of a specific type, or the one for a 
specific type with a given backbone conformation? These are quite different and 
change with the content of the PDB.
'Hacking' the occupancies is risky bussiness in general: errors are made quite 
easily. I frequently encounter side chains with partial occupancies but no 
alternatives, how can I relate this to the experimental date? Even worse, I 
also see cases where the occupancies of alternates sum up to values  1.00. 
What does that mean? Is that a local increase of DarmMatter accidentally 
encoded in the occupancy?

 This is now veering into data-free protein modeling territory... wasn't 
 the idea to present to the downstream user an atomic representation of 
 what the electron density shows us?
Yes, but what we see can be deceiving.

 Worse, what we're also doing is encoding multiple different things in 
 one place - what database people call poorly normalised, i.e. to 
 understand a data field requires further parsing and if statements. In 
 this case: to know whether there was no density, as end-user I'd have 
 to have to second-guess what exactly those 
 high-B-factor-variable-occupancy atoms mean.
 
 Until the PDB is expanded, the conventions need to be clear, and I 
 thought they were:
 High B-factor == means atom is there but density is weak
 Atom missing == no density to support it.
Unfortunately, it is not trivial to decide when there is 'no density'. We must 
have a good metric to do this, but I don't think it exists yet. Removing atoms 
is thus very subjective. This explaines why I frequently find positive 
difference density peaks near missing side chains. Leaving side chains in 
sometimes gives negative difference density but refining them with proper 
B-factor restrainsts reduces the problem a lot. There is still the problem of 
radiation damage, but that is relatively small. At least refining the B-factor 
is more reproducible and less subjective than making the binary choice to keep 
or remove an atom.
 
Cheers,
Robbie

 
 Oh well...
 phx.
  

Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains

2011-04-01 Thread Frank von Delft

Hi Robbie

 If it's probability you're after, if there's no density to guide you
 (very common!) you'd have to place all likely rotamers that don't
 clash with anything, and set their occupancies to their probability (as
 encoded in the rotamer library).
Which library? The one for all side chains of a specific type, or the 
one for a specific type with a given backbone conformation? These are 
quite different and change with the content of the PDB.
'Hacking' the occupancies is risky bussiness in general: errors are 
made quite easily. I frequently encounter side chains with partial 
occupancies but no alternatives, how can I relate this to the 
experimental date? Even worse, I also see cases where the occupancies 
of alternates sum up to values  1.00. What does that mean? Is that a 
local increase of DarmMatter accidentally encoded in the occupancy?
Actually, I wasn't advocating it - I was taking ZO's suggestion to it's 
logical conclusion to point out the problem, namely deciding what is 
most likely.  This you underline with your (very valid) question.




 Until the PDB is expanded, the conventions need to be clear, and I
 thought they were:
 High B-factor == means atom is there but density is weak
 Atom missing == no density to support it.
Unfortunately, it is not trivial to decide when there is 'no density'. 
We must have a good metric to do this, but I don't think it exists 
yet. Removing atoms is thus very subjective. This explaines why I 
frequently find positive difference density peaks near missing side 
chains. Leaving side chains in sometimes gives negative difference 
density but refining them with proper B-factor restrainsts reduces the 
problem a lot. There is still the problem of radiation damage, but 
that is relatively small. At least refining the B-factor is more 
reproducible and less subjective than making the binary choice to keep 
or remove an atom.

(Radiation damage is NOT a relatively small problem.)

The fundamental problem remains:  we're cramming too many meanings into 
one number.  This the PDB could indeed solve, by giving us another 
column.  (He said airily, blithely launching a totally new flame war.)


phx.


Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains

2011-04-01 Thread Zbyszek Otwinowski
The meaning of B-factor is the (scaled) sum of all positional
uncertainties, and not just its one contributor, the Atomic Displacement
Parameter that describes the relative displacement of an atom in the
crystal lattice by a Gaussian function.
That meaning (the sum of all contributions) comes from the procedure that
calculates the B-factor in all PDB X-ray deposits, and not from an
arbitrary decision by a committee. All programs that refine B-factors
calculate an estimate of positional uncertainty, where contributors can be
both Gaussian and non-Gaussian. For a non-Gaussian contributor, e.g.
multiple occupancy, the exact numerical contribution is rather a complex
function, but conceptually it is still an uncertainty estimate. Given the
resolution of the typical data, we do not have a procedure to decouple
Gaussian and non-Gaussian contributors, so we have to live with the
B-factor being defined by the refinement procedure. However, we should
still improve the estimates of the B-factor, e.g. by changing the
restraints. In my experience, the Refmac's default restraints on B-factors
in side chains are too tight and I adjust them. Still, my preference would
be to have harmonic restraints on U (square root of B) rather than on Bs
themselves.
It is not we who cram too many meanings on the B-factor, it is the quite
fundamental limitation of crystallographic refinement.

Zbyszek Otwinowski

 The fundamental problem remains:  we're cramming too many meanings into
one number [B factor].  This the PDB could indeed solve, by giving us
another column.  (He said airily, blithely launching a totally new flame
war.)
 phx.



Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains

2011-04-01 Thread Bernhard Rupp (Hofkristallrat a.D.)
 In my experience, the Refmac's default restraints on B-factors in side chains 
 are too tight and I adjust them. 

Concur. See BMC p 640.

BR


Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains

2011-04-01 Thread James Holton
I'm not sure I entirely agree with ZO's assessment that a B factor is
a measure of uncertainty.  Pedantically, all it really is is an
instruction to the refinement program to build some electron density
with a certain width and height at a certain location.  The result is
then compared to the data, parameters are adjusted, etc.  I don't
think the B factor is somehow converted into an error bar on the
calculated electron density, is it?

For example, a B-factor of 500 on a carbon atom just means that the
peak to build is ~0.02 electron/A^3 tall, and ~3 A wide (full width
at half maximum).  By comparison, a carbon with B=20 is 1.6
electrons/A^3 tall and ~0.7 A wide (FWHM).  One of the bugs that
Dale referred to is the fact that most refinement programs do not
plot electron density more than 3 A away from each atomic center, so
a substantial fraction of the 6 electrons represented by a carbon with
B=500 will be sharply cut off, and missing from the FC calculation.
Then again, all 6 electrons will be missing if the atoms are simply
not modeled, or if the occupancy is zero.

The point I am trying to make here is that there is no B factor that
will make an atom go away, because the way B factors are implemented
is to always conserve the total number of electrons in the atom, but
just spread them out over more space.

Now, a peak height of 0.02 electrons/A^3 may sound like it might as
well be zero, especially when sitting next to a B=20 atom, but what if
all the atoms have high B factors?  For example, if the average
(Wilson) B factor is 80 (like it typically is for a ~4A structure),
then the average peak height of a carbon atom is 0.3 electrons/A^3,
and then 0.02 electrons/A^3 starts to become more significant.  If we
consider a ~11 A structure, then the average atomic B factor will be
around 500.  This B vs resolution relationship is something I
derived empirically from the PDB (Holton JSR 2009).  Specifically, the
average B factor for PDB files at a given resolution d is: B =
4*d^2+12.  Admittedly, this is on average, but the trend does make
physical sense: atoms with high B factors don't contribute very much
to high-angle spots.

More formally, the problem with using a high B-factor as a flag is
that it is not resolution-general.  Dale has already pointed this out.

Personally, I prefer to think of B factors as a atom-by-atom
resolution rather than an error bar, and this is how I tell
students to interpret them (using the B = 4*d^2+12 formula).  The
problem I have with the error bar interpretation is that
heterogeneity and uncertainty are not the same thing.  That is, just
because the atom is jumping around does not mean you don't know
where the centroid of the distribution is.  The u_x in
B=8*pi^2*u_x^2 does reflect the standard error of atomic position in
a GIVEN unit cell, but since we are averaging over trillions of cells,
the error bar on the AVERAGE atomic position is actually a great
deal smaller than u.  I think this distinction is important because
what we are building is a model of the AVERAGE electron density, not a
single molecule.

Just my 0.02 electrons

-James Holton
MAD Scientist



On Fri, Apr 1, 2011 at 10:57 AM, Zbyszek Otwinowski
zbys...@work.swmed.edu wrote:
 The meaning of B-factor is the (scaled) sum of all positional
 uncertainties, and not just its one contributor, the Atomic Displacement
 Parameter that describes the relative displacement of an atom in the
 crystal lattice by a Gaussian function.
 That meaning (the sum of all contributions) comes from the procedure that
 calculates the B-factor in all PDB X-ray deposits, and not from an
 arbitrary decision by a committee. All programs that refine B-factors
 calculate an estimate of positional uncertainty, where contributors can be
 both Gaussian and non-Gaussian. For a non-Gaussian contributor, e.g.
 multiple occupancy, the exact numerical contribution is rather a complex
 function, but conceptually it is still an uncertainty estimate. Given the
 resolution of the typical data, we do not have a procedure to decouple
 Gaussian and non-Gaussian contributors, so we have to live with the
 B-factor being defined by the refinement procedure. However, we should
 still improve the estimates of the B-factor, e.g. by changing the
 restraints. In my experience, the Refmac's default restraints on B-factors
 in side chains are too tight and I adjust them. Still, my preference would
 be to have harmonic restraints on U (square root of B) rather than on Bs
 themselves.
 It is not we who cram too many meanings on the B-factor, it is the quite
 fundamental limitation of crystallographic refinement.

 Zbyszek Otwinowski

 The fundamental problem remains:  we're cramming too many meanings into
 one number [B factor].  This the PDB could indeed solve, by giving us
 another column.  (He said airily, blithely launching a totally new flame
 war.)
 phx.




Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains

2011-04-01 Thread Randy J. Read
In this case, I'm more on ZO's side. Let's say that the refinement program 
can't get an atom to the right position (for instance, to pick a reasonably 
realistic example, because you've put a leucine side chain in backwards). 
In that case, the B-factor for the atom nearest to where there should be 
one in the structure will get larger to smear out its density and put some 
in the right place. To a good approximation, the optimal increase in the 
B-factor will be the one you'd expect for a Gaussian probability 
distribution, i.e. 8Pi^2/3 times the positional error squared. So a refined 
B-factor does include a measure of the uncertainty or error in the atom's 
position.


Best wishes,

Randy Read

On Apr 1 2011, James Holton wrote:


I'm not sure I entirely agree with ZO's assessment that a B factor is
a measure of uncertainty.  Pedantically, all it really is is an
instruction to the refinement program to build some electron density
with a certain width and height at a certain location.  The result is
then compared to the data, parameters are adjusted, etc.  I don't
think the B factor is somehow converted into an error bar on the
calculated electron density, is it?

For example, a B-factor of 500 on a carbon atom just means that the
peak to build is ~0.02 electron/A^3 tall, and ~3 A wide (full width
at half maximum).  By comparison, a carbon with B=20 is 1.6
electrons/A^3 tall and ~0.7 A wide (FWHM).  One of the bugs that
Dale referred to is the fact that most refinement programs do not
plot electron density more than 3 A away from each atomic center, so
a substantial fraction of the 6 electrons represented by a carbon with
B=500 will be sharply cut off, and missing from the FC calculation.
Then again, all 6 electrons will be missing if the atoms are simply
not modeled, or if the occupancy is zero.

The point I am trying to make here is that there is no B factor that
will make an atom go away, because the way B factors are implemented
is to always conserve the total number of electrons in the atom, but
just spread them out over more space.

Now, a peak height of 0.02 electrons/A^3 may sound like it might as
well be zero, especially when sitting next to a B=20 atom, but what if
all the atoms have high B factors?  For example, if the average
(Wilson) B factor is 80 (like it typically is for a ~4A structure),
then the average peak height of a carbon atom is 0.3 electrons/A^3,
and then 0.02 electrons/A^3 starts to become more significant.  If we
consider a ~11 A structure, then the average atomic B factor will be
around 500.  This B vs resolution relationship is something I
derived empirically from the PDB (Holton JSR 2009).  Specifically, the
average B factor for PDB files at a given resolution d is: B =
4*d^2+12.  Admittedly, this is on average, but the trend does make
physical sense: atoms with high B factors don't contribute very much
to high-angle spots.

More formally, the problem with using a high B-factor as a flag is
that it is not resolution-general.  Dale has already pointed this out.

Personally, I prefer to think of B factors as a atom-by-atom
resolution rather than an error bar, and this is how I tell
students to interpret them (using the B = 4*d^2+12 formula).  The
problem I have with the error bar interpretation is that
heterogeneity and uncertainty are not the same thing.  That is, just
because the atom is jumping around does not mean you don't know
where the centroid of the distribution is.  The u_x in
B=8*pi^2*u_x^2 does reflect the standard error of atomic position in
a GIVEN unit cell, but since we are averaging over trillions of cells,
the error bar on the AVERAGE atomic position is actually a great
deal smaller than u.  I think this distinction is important because
what we are building is a model of the AVERAGE electron density, not a
single molecule.

Just my 0.02 electrons

-James Holton
MAD Scientist



On Fri, Apr 1, 2011 at 10:57 AM, Zbyszek Otwinowski
zbys...@work.swmed.edu wrote:
The meaning of B-factor is the (scaled) sum of all positional 
uncertainties, and not just its one contributor, the Atomic Displacement 
Parameter that describes the relative displacement of an atom in the 
crystal lattice by a Gaussian function. That meaning (the sum of all 
contributions) comes from the procedure that calculates the B-factor in 
all PDB X-ray deposits, and not from an arbitrary decision by a 
committee. All programs that refine B-factors calculate an estimate of 
positional uncertainty, where contributors can be both Gaussian and 
non-Gaussian. For a non-Gaussian contributor, e.g. multiple occupancy, 
the exact numerical contribution is rather a complex function, but 
conceptually it is still an uncertainty estimate. Given the resolution 
of the typical data, we do not have a procedure to decouple Gaussian and 
non-Gaussian contributors, so we have to live with the B-factor being 
defined by the refinement procedure. However, we should still improve 
the estimates of the B-factor, 

Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains

2011-04-01 Thread Eric Bennett
Personally I think it is a _good_ thing that those missing atoms are a pain, 
because it helps ensure you are aware of the problem.  As somebody who is in 
the business of supplying non-structural people with models, and seeing how 
those models are sometimes (mis)interpreted, I think it's better to inflict 
that pain than it is to present a model that non-structural people are likely 
to over-interpret.  

The PDB provides various manipulated versions of crystal structures, such as 
biological assemblies.  I don't think it would necessarily be a bad idea to 
build missing atoms back into those sorts of processed files but for the main 
deposited entry the best way to make sure the model is not abused is to leave 
out atoms that can't be modeled accurately.

Just as an example since you mention surfaces, some of the people I work with 
calculate solvent accessible surface areas of individual residues for purposes 
such as engineering cysteines for chemical conjugation, and if residues are 
modeled into bogus positions just to say all the atoms are there, software that 
calculates per-residue SASA has to have a reliable way of knowing to ignore 
those atoms when calculating the area of neighboring residues.  Ad hoc 
solutions like putting very large values in the B column are not clear cut for 
such a software program to interpret.  Leaving the atom out completely is 
pretty unambiguous.

-Eric


On Mar 31, 2011, at 7:34 PM, Scott Pegan wrote:

 I agree with Zbyszek with the modeling of side chains and stress the 
 following points:
 
 1) It drives me nuts when I find that PDB is missing atoms from side chains.  
  This requires me to rebuild them to get any use out of the PDB such as 
 relevant surface renderings or electropotential plots.   I am an experienced 
 structural biologist so that I can immediately identify that they have been 
 removed and  can rebuild them.  I feel sorry for my fellow scientists from 
 other biological fields that can't perform this task readability, thus 
 removing these atoms from a model limits their usefulness to a wider 
 scientific audience.
 
 2)  Not sure if any one has documented the percentage of actual side chains 
 missing from radiation damage versus heterogeneity in confirmation (i.e. 
 dissolved a crystal after collection and sent it to Mass Spec).   Although 
 the former likely happens occasionally, my gut tells me that the latter is 
 significantly more predominant.  As a result, absence of atoms from a side 
 chain in the PDB where the main chain is clearly visible in the electron 
 density might make for the best statistics for an experimental model, but 
 does not reflect a reality.  
 
 Scott
 


Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains

2011-03-31 Thread Frank von Delft
This is a lovely summary, and we should make our students read it. - But 
I'm afraid I do not see how it supports the closing statement in the 
last paragraph... phx.



On 31/03/2011 17:06, Zbyszek Otwinowski wrote:

The B-factor in crystallography represents the convolution (sum) of two
types of uncertainties about the atom (electron cloud) position:

1) dispersion of atom positions in crystal lattice
2) uncertainty of the experimenter's knowledge  about the atom position.

In general, uncertainty needs not to be described by Gaussian function.
However, communicating uncertainty using the second moment of its
distribution is a widely accepted practice, with frequently implied
meaning that it corresponds to a Gaussian probability function. B-factor
is simply a scaled (by 8 times pi squared) second moment of uncertainty
distribution.

In the previous, long thread, confusion was generated by the additional
assumption that B-factor also corresponds to a Gaussian probability
distribution and not just to a second moment of any probability
distribution. Crystallographic literature often implies the Gaussian
shape, so there is some justification for such an interpretation, where
the more complex probability distribution is represented by the sum of
displaced Gaussians, where the area under each Gaussian component
corresponds to the occupancy of an alternative conformation.

For data with a typical resolution for macromolecular crystallography,
such multi-Gaussian description of the atom position's uncertainty is not
practical, as it would lead to instability in the refinement and/or
overfitting. Due to this, a simplified description of the atom's position
uncertainty by just the second moment of probability distribution is the
right approach. For this reason, the PDB format is highly suitable for the
description of positional uncertainties,  the only difference with other
fields being the unusual form of squaring and then scaling up the standard
uncertainty. As this calculation can be easily inverted, there is no loss
of information. However, in teaching one should probably stress more this
unusual form of presenting the standard deviation.

A separate issue is the use of restraints on B-factor values, a subject
that probably needs a longer discussion.

With respect to the previous thread, representing poorly-ordered (so
called 'disordered') side chains by the most likely conformer with
appropriately high B-factors is fully justifiable, and currently is
probably the best solution to a difficult problem.

Zbyszek Otwinowski




- they all know what B is and how to look for regions of high B
(with, say, pymol) and they know not to make firm conclusions about
H-bonds
to flaming red side chains.

But this knowledge may be quite wrong.  If the flaming red really
indicates
large vibrational motion then yes, one whould not bet on stable H-bonds.
But if the flaming red indicates that a well-ordered sidechain was
incorrectly
modeled at full occupancy when in fact it is only present at
half-occupancy
then no, the H-bond could be strong but only present in that
half-occupancy
conformation.  One presumes that the other half-occupancy location
(perhaps
missing from the model) would have its own H-bonding network.


I beg to differ.  If a side chain has 2 or more positions, one should be a
bit careful about making firm conclusions based on only one of those, even
if it isn't clear exactly why one should use caution.  Also, isn't the
isotropic B we fit at medium resolution more of a spherical cow
approximation to physical reality anyway?

   Phoebe





Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains

2011-03-31 Thread Hailiang Zhang
Dear Zbyszek:

Thanks a lot for your good summary. It is very interesting but, do you
think there are some references for more detailed description, especially
from mathematics point of view about correlating B-factor to the Gaussian
probability distribution (the B-factor unit of A^2 is my first doubt as
for the probability distribution description)? Thanks again for your
efforts!

Best Regards, Hailiang


 The B-factor in crystallography represents the convolution (sum) of two
 types of uncertainties about the atom (electron cloud) position:

 1) dispersion of atom positions in crystal lattice
 2) uncertainty of the experimenter's knowledge  about the atom position.

 In general, uncertainty needs not to be described by Gaussian function.
 However, communicating uncertainty using the second moment of its
 distribution is a widely accepted practice, with frequently implied
 meaning that it corresponds to a Gaussian probability function. B-factor
 is simply a scaled (by 8 times pi squared) second moment of uncertainty
 distribution.

 In the previous, long thread, confusion was generated by the additional
 assumption that B-factor also corresponds to a Gaussian probability
 distribution and not just to a second moment of any probability
 distribution. Crystallographic literature often implies the Gaussian
 shape, so there is some justification for such an interpretation, where
 the more complex probability distribution is represented by the sum of
 displaced Gaussians, where the area under each Gaussian component
 corresponds to the occupancy of an alternative conformation.

 For data with a typical resolution for macromolecular crystallography,
 such multi-Gaussian description of the atom position's uncertainty is not
 practical, as it would lead to instability in the refinement and/or
 overfitting. Due to this, a simplified description of the atom's position
 uncertainty by just the second moment of probability distribution is the
 right approach. For this reason, the PDB format is highly suitable for the
 description of positional uncertainties,  the only difference with other
 fields being the unusual form of squaring and then scaling up the standard
 uncertainty. As this calculation can be easily inverted, there is no loss
 of information. However, in teaching one should probably stress more this
 unusual form of presenting the standard deviation.

 A separate issue is the use of restraints on B-factor values, a subject
 that probably needs a longer discussion.

 With respect to the previous thread, representing poorly-ordered (so
 called 'disordered') side chains by the most likely conformer with
 appropriately high B-factors is fully justifiable, and currently is
 probably the best solution to a difficult problem.

 Zbyszek Otwinowski



 - they all know what B is and how to look for regions of high B
 (with, say, pymol) and they know not to make firm conclusions about
 H-bonds
 to flaming red side chains.

But this knowledge may be quite wrong.  If the flaming red really
 indicates
large vibrational motion then yes, one whould not bet on stable H-bonds.
But if the flaming red indicates that a well-ordered sidechain was
 incorrectly
modeled at full occupancy when in fact it is only present at
 half-occupancy
then no, the H-bond could be strong but only present in that
 half-occupancy
conformation.  One presumes that the other half-occupancy location
 (perhaps
missing from the model) would have its own H-bonding network.


 I beg to differ.  If a side chain has 2 or more positions, one should be
 a
 bit careful about making firm conclusions based on only one of those,
 even
 if it isn't clear exactly why one should use caution.  Also, isn't the
 isotropic B we fit at medium resolution more of a spherical cow
 approximation to physical reality anyway?

   Phoebe





 Zbyszek Otwinowski
 UT Southwestern Medical Center at Dallas
 5323 Harry Hines Blvd.
 Dallas, TX 75390-8816
 Tel. 214-645-6385
 Fax. 214-645-6353




Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains

2011-03-31 Thread Ethan Merritt
On Thursday, March 31, 2011 10:05:22 am Hailiang Zhang wrote:
 Dear Zbyszek:
 
 Thanks a lot for your good summary. It is very interesting but, do you
 think there are some references for more detailed description, especially
 from mathematics point of view about correlating B-factor to the Gaussian
 probability distribution (the B-factor unit of A^2 is my first doubt as
 for the probability distribution description)? Thanks again for your
 efforts!
 
 Best Regards, Hailiang

I already cited the IUCr standard once, but here it is again:
 Trueblood, et al, 1996; Acta Cryst. A52, 770-781 
 http://dx.doi.org/10.1107/S0108767396005697



-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742


Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains

2011-03-31 Thread Dale Tronrud

   While what you say here is quite true and is useful for us to
remember, your list is quite short.  I can add another

3) The systematic error introduced by assuming full occupancy for all sites.

There are, of course, many other factors that we don't account for
that our refinement programs tend to dump into the B factors.

   The definition of that number in the PDB file, as listed in the mmCIF
dictionary, only includes your first factor --

http://mmcif.rcsb.org/dictionaries/mmcif_std.dic/Items/_atom_site.B_iso_or_equiv.html

and these numbers are routinely interpreted as though that definition is
the law.  Certainly the whole basis of TLS refinement is that the B factors
are really Atomic Displacement Parameters.   In addition the stereochemical
restraints on B factors are derived from the assumption that these parameters
are ADPs.  Convoluting all these other factors with the ADPs causes serious
problems for those who analyze B factors as measures of motion.

   The fact that current refinement programs mix all these factors with the
ADP for an atom to produce a vaguely defined B factor should be considered
a flaw to be corrected and not an opportunity to pile even more factors into
this field in the PDB file.

Dale Tronrud


On 3/31/2011 9:06 AM, Zbyszek Otwinowski wrote:

The B-factor in crystallography represents the convolution (sum) of two
types of uncertainties about the atom (electron cloud) position:

1) dispersion of atom positions in crystal lattice
2) uncertainty of the experimenter's knowledge  about the atom position.

In general, uncertainty needs not to be described by Gaussian function.
However, communicating uncertainty using the second moment of its
distribution is a widely accepted practice, with frequently implied
meaning that it corresponds to a Gaussian probability function. B-factor
is simply a scaled (by 8 times pi squared) second moment of uncertainty
distribution.

In the previous, long thread, confusion was generated by the additional
assumption that B-factor also corresponds to a Gaussian probability
distribution and not just to a second moment of any probability
distribution. Crystallographic literature often implies the Gaussian
shape, so there is some justification for such an interpretation, where
the more complex probability distribution is represented by the sum of
displaced Gaussians, where the area under each Gaussian component
corresponds to the occupancy of an alternative conformation.

For data with a typical resolution for macromolecular crystallography,
such multi-Gaussian description of the atom position's uncertainty is not
practical, as it would lead to instability in the refinement and/or
overfitting. Due to this, a simplified description of the atom's position
uncertainty by just the second moment of probability distribution is the
right approach. For this reason, the PDB format is highly suitable for the
description of positional uncertainties,  the only difference with other
fields being the unusual form of squaring and then scaling up the standard
uncertainty. As this calculation can be easily inverted, there is no loss
of information. However, in teaching one should probably stress more this
unusual form of presenting the standard deviation.

A separate issue is the use of restraints on B-factor values, a subject
that probably needs a longer discussion.

With respect to the previous thread, representing poorly-ordered (so
called 'disordered') side chains by the most likely conformer with
appropriately high B-factors is fully justifiable, and currently is
probably the best solution to a difficult problem.

Zbyszek Otwinowski




- they all know what B is and how to look for regions of high B
(with, say, pymol) and they know not to make firm conclusions about
H-bonds
to flaming red side chains.


But this knowledge may be quite wrong.  If the flaming red really
indicates
large vibrational motion then yes, one whould not bet on stable H-bonds.
But if the flaming red indicates that a well-ordered sidechain was
incorrectly
modeled at full occupancy when in fact it is only present at
half-occupancy
then no, the H-bond could be strong but only present in that
half-occupancy
conformation.  One presumes that the other half-occupancy location
(perhaps
missing from the model) would have its own H-bonding network.



I beg to differ.  If a side chain has 2 or more positions, one should be a
bit careful about making firm conclusions based on only one of those, even
if it isn't clear exactly why one should use caution.  Also, isn't the
isotropic B we fit at medium resolution more of a spherical cow
approximation to physical reality anyway?

   Phoebe






Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353


Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains

2011-03-31 Thread Zbyszek Otwinowski

Dale Tronrud wrote:

   While what you say here is quite true and is useful for us to
remember, your list is quite short.  I can add another

3) The systematic error introduced by assuming full occupancy for all 
sites.


You are right that structural heterogeneity is an additional factor.
Se-Met expression is one of the examples where the Se-Met residue is 
often not fully incorporated, and therefore its side chains have mixed 
with Met composition.

Obviously, solvent molecules may have partial occupancies.
Also, in heavily exposed crystals chemical reactions result in loss of 
the functional groups (e.g. by decarboxylation).
However, in most cases even if side chains have multiple conformations 
their total occupancy is 1.0.




There are, of course, many other factors that we don't account for
that our refinement programs tend to dump into the B factors.

   The definition of that number in the PDB file, as listed in the mmCIF
dictionary, only includes your first factor --

http://mmcif.rcsb.org/dictionaries/mmcif_std.dic/Items/_atom_site.B_iso_or_equiv.html 



and these numbers are routinely interpreted as though that definition is
the law.  Certainly the whole basis of TLS refinement is that the B factors
are really Atomic Displacement Parameters.   In addition the stereochemical
restraints on B factors are derived from the assumption that these 
parameters

are ADPs.  Convoluting all these other factors with the ADPs causes serious
problems for those who analyze B factors as measures of motion.

   The fact that current refinement programs mix all these factors with the
ADP for an atom to produce a vaguely defined B factor should be 
considered
a flaw to be corrected and not an opportunity to pile even more factors 
into

this field in the PDB file.



B-factors describe overall uncertainty of the current model. Refinement 
programs, which do not introduce or remove parts of the model (e.g. are 
not able to add additional conformations) intrinsically pile up all 
uncertainties into B-factors. Solutions, which you would like to see 
implemented, require a model-building like approach. The test of the 
success of such approach would be a substantial decrease of R-free 
values. If anybody can show it, it would be great.


Zbyszek


Dale Tronrud






On 3/31/2011 9:06 AM, Zbyszek Otwinowski wrote:

The B-factor in crystallography represents the convolution (sum) of two
types of uncertainties about the atom (electron cloud) position:

1) dispersion of atom positions in crystal lattice
2) uncertainty of the experimenter's knowledge  about the atom position.

In general, uncertainty needs not to be described by Gaussian function.
However, communicating uncertainty using the second moment of its
distribution is a widely accepted practice, with frequently implied
meaning that it corresponds to a Gaussian probability function. B-factor
is simply a scaled (by 8 times pi squared) second moment of uncertainty
distribution.

In the previous, long thread, confusion was generated by the additional
assumption that B-factor also corresponds to a Gaussian probability
distribution and not just to a second moment of any probability
distribution. Crystallographic literature often implies the Gaussian
shape, so there is some justification for such an interpretation, where
the more complex probability distribution is represented by the sum of
displaced Gaussians, where the area under each Gaussian component
corresponds to the occupancy of an alternative conformation.

For data with a typical resolution for macromolecular crystallography,
such multi-Gaussian description of the atom position's uncertainty is not
practical, as it would lead to instability in the refinement and/or
overfitting. Due to this, a simplified description of the atom's position
uncertainty by just the second moment of probability distribution is the
right approach. For this reason, the PDB format is highly suitable for 
the

description of positional uncertainties,  the only difference with other
fields being the unusual form of squaring and then scaling up the 
standard

uncertainty. As this calculation can be easily inverted, there is no loss
of information. However, in teaching one should probably stress more this
unusual form of presenting the standard deviation.

A separate issue is the use of restraints on B-factor values, a subject
that probably needs a longer discussion.

With respect to the previous thread, representing poorly-ordered (so
called 'disordered') side chains by the most likely conformer with
appropriately high B-factors is fully justifiable, and currently is
probably the best solution to a difficult problem.

Zbyszek Otwinowski




- they all know what B is and how to look for regions of high B
(with, say, pymol) and they know not to make firm conclusions about
H-bonds
to flaming red side chains.


But this knowledge may be quite wrong.  If the flaming red really
indicates
large vibrational motion then yes, one whould not bet 

Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains

2011-03-31 Thread Zbyszek Otwinowski
Regarding the closing statement about the best solution to poorly 
ordered side chains:


I described in the previous e-mail the probabilistic interpretation of 
B-factors. In the case of very high uncertainty = poorly ordered side 
chains, I prefer to deposit the conformer representing maximum a 
posteriori, even if it does not represent all possible conformations.
Maximum a posteriori will have significant contribution from the most 
probable conformation of side chain (prior knowledge) and should not 
conflict with likelihood (electron density map).
Thus, in practice I model the most probable conformation as long as it 
it in even very weak electron density, does not overlap significantly 
with negative difference electron density and do not clash with other 
residues.


As a user of PDB files I much prefer the simplest and the most 
informative representation of the result. Removing parts of side chains 
that carry charges, as already mentioned, is not particularly helpful 
for the downstream uses. NMR-like deposits are not among my favorites, 
either. Having multiple conformations with low occupancies increases 
potential for a confusion, while benefits are not clear to me.


Zbyszek

Frank von Delft wrote:
This is a lovely summary, and we should make our students read it. - But 
I'm afraid I do not see how it supports the closing statement in the 
last paragraph... phx.



On 31/03/2011 17:06, Zbyszek Otwinowski wrote:

The B-factor in crystallography represents the convolution (sum) of two
types of uncertainties about the atom (electron cloud) position:

1) dispersion of atom positions in crystal lattice
2) uncertainty of the experimenter's knowledge  about the atom position.

In general, uncertainty needs not to be described by Gaussian function.
However, communicating uncertainty using the second moment of its
distribution is a widely accepted practice, with frequently implied
meaning that it corresponds to a Gaussian probability function. B-factor
is simply a scaled (by 8 times pi squared) second moment of uncertainty
distribution.

In the previous, long thread, confusion was generated by the additional
assumption that B-factor also corresponds to a Gaussian probability
distribution and not just to a second moment of any probability
distribution. Crystallographic literature often implies the Gaussian
shape, so there is some justification for such an interpretation, where
the more complex probability distribution is represented by the sum of
displaced Gaussians, where the area under each Gaussian component
corresponds to the occupancy of an alternative conformation.

For data with a typical resolution for macromolecular crystallography,
such multi-Gaussian description of the atom position's uncertainty is not
practical, as it would lead to instability in the refinement and/or
overfitting. Due to this, a simplified description of the atom's position
uncertainty by just the second moment of probability distribution is the
right approach. For this reason, the PDB format is highly suitable for 
the

description of positional uncertainties,  the only difference with other
fields being the unusual form of squaring and then scaling up the 
standard

uncertainty. As this calculation can be easily inverted, there is no loss
of information. However, in teaching one should probably stress more this
unusual form of presenting the standard deviation.

A separate issue is the use of restraints on B-factor values, a subject
that probably needs a longer discussion.

With respect to the previous thread, representing poorly-ordered (so
called 'disordered') side chains by the most likely conformer with
appropriately high B-factors is fully justifiable, and currently is
probably the best solution to a difficult problem.

Zbyszek Otwinowski




- they all know what B is and how to look for regions of high B
(with, say, pymol) and they know not to make firm conclusions about
H-bonds
to flaming red side chains.

But this knowledge may be quite wrong.  If the flaming red really
indicates
large vibrational motion then yes, one whould not bet on stable 
H-bonds.

But if the flaming red indicates that a well-ordered sidechain was
incorrectly
modeled at full occupancy when in fact it is only present at
half-occupancy
then no, the H-bond could be strong but only present in that
half-occupancy
conformation.  One presumes that the other half-occupancy location
(perhaps
missing from the model) would have its own H-bonding network.

I beg to differ.  If a side chain has 2 or more positions, one should 
be a
bit careful about making firm conclusions based on only one of those, 
even

if it isn't clear exactly why one should use caution.  Also, isn't the
isotropic B we fit at medium resolution more of a spherical cow
approximation to physical reality anyway?

   Phoebe





Zbyszek Otwinowski
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.
Dallas, TX 75390-8816
Tel. 214-645-6385
Fax. 214-645-6353





--
Zbyszek