[ccp4bb] Why TLS refinement may go wrong and why it is so often the case (was: Re: [ccp4bb] Calculation of generalised R-factor?)

Pavel Afonine Thu, 22 Dec 2016 14:04:18 -0800

Dear Ethan,

thanks for comments. I do remember we talked about it last year in Argonne
at CCP4 school and we did not seem to agree indeed. Let me bring my
arguments as you stated yours.

Refinement is an optimization problem that involves model, data and fitting
tools (such as target function and means to minimize or maximize it). In
turn this means that refinement is a mathematical problem, and when it
comes to math lousy definitions are least desired. TLS is one of many
models used to describe crystal structures. Obviously, any model is an
approximation to reality, and as you rightfully point out TLS isn't
exception.

TLS approximates motions of atomic model as rigid body. It is a crude and
poor approximation indeed if, and only if, TLS model is used alone to
account for all motions that the molecule happen to undergo. However, if
one considers that total motion of the molecule is a superposition of
several different motions arising at different structural levels, such as
vibration of molecule as a whole, motions of individual chains, libration
of side chains around chi-angles, individual atomic vibrations, etc, then
it is not too unnatural to assume that there will be always a rigid-body
component in this hierarchy of motions and this is the component that TLS
is supposed to describe. And by the way this is exactly the reason why we
always recommend using TLS and individual isotropic ADP refinement together.

Most of the time a model is expected to have physical meaning, its
mathematical description and computer instructions that bring all this to
production. In case of TLS the physical meaning mandates the parameters of
TLS model describe rigid body motions of a group of atoms. For instance, I
hope it is not too uncomfortable to assume that atom group is unlikely to
vibrate or librate with amplitudes that would throw it off the unit cell or
bump into other molecules around or adjacent TLS groups tearing apart
corresponding covalent bond, to name but a few. Mathematically it all means
that T, L and S matrices have to possess a number of certain very well
defined properties. Based on these properties one can tell something about
meaningfulness of TLS parameters and this is what Urzhumtsev et al discuss.

If we analyze TLS matrices for all PDB entries (where available), we will
find that about 85% of them do not have physical sense. Now, shall we care?
Well, we do care about covalent bonds to have certain lengths, peptide
phi/psi not being in forbidden Ramachandran plot areas, meaningful molecule
packing in unit cell, etc.. so why we would not care about TLS being
meaningful? Why one would fight to death to get great R factors, nice model
geometries and such but let nonsense in TLS records? After all, these
records my be looked at by someone in an attempt to extract some biological
relevance. It's just does not seem logic nor consistent to be strict about
validity of some model parameters and let other parameters to be junk.

So why TLS refinement results (refined elements of T, L and S matrices!)
may be bad? There are three fundamental reasons for this that I can think
of. One is related to the choice of TLS model such as assumptions about
rigid groups. The other one is related to how TLS model is used, such as
whether it is used alone to account for all motions or in combination with
other motion descriptors. The third reason arises from a technicality about
how parameters of TLS model are optimized. While the first two are more or
less obvious, the third one is more subtle and I will expand on this more.

Elements of TLS matrices represent parameters of corresponding motions
(such as three amplitudes of libration about three orthogonal axes and
three amplitudes of translation about possibly different set of three
orthogonal axes). However, elements of TLS matrices are not exactly motion
parameters but their functions. In order to extract motion parameters
(amplitudes, axes, etc) from TLS matrices one needs to perform a rather
complex protocol shown in figure 1 in our paper:
http://journals.iucr.org/d/issues/2015/08/00/rr5096/rr5096.pdf

In other words, using TLS matrices makes it mathematically very easy to
encode information about motions into structure factor formula, but the
price for this convenience is that it makes it difficult to extract motion
parameters from TLS matrices.

The problem is that all refinement programs that I'm aware of (and that use
TLS) refine elements of TLS matrices and not actual motion parameters. For
someone who wrote a refinement program this makes it crystal-clear why
there are so many bad TLS refinement outcomes. This is simply because there
is no control over refinement of motion parameters. Basically, these
parameters are refined without any restraints! And to know what that means
just try to refine a model without any stereochemistry/geometry restraints
- you will get a model with atoms flying all over the unit cell volume!
That same sort of damage happens to TLS parameters making it no wonder why
there are so many failures!

The fundamental solution to this problem is to re-design implementation of
TLS refinement such that a new procedure would a) refine and report
parameters of motions and not elements of TLS matrices, and b) implement
proper restraints on refinable parameters. This is a lot of work, though.
I'd say for me it would take a few months to do it properly. Needless to
say that it's next to impossible to get any funding to do this kind of
exercise!

Reading year remark

"""
The Urzhumtsev et al classification of "nonsensical" TLS matrices includes
many that make lots of sense but do not happen to describe a perfectly
rigid body.
That's OK, because proteins are not perfectly rigid bodies.
The TLS models are useful approximations that capture
essential features of a messy ensemble of protein atoms.
"""

I guess I see the source of confusion..

Once again we all agree that any model (and TLS is not an exception) is
only an approximation to the reality. However, there are two different key
questions here that people often either confuse or inappropriately mix up:

a) How well the TLS approximation explains the real motions of atomic
groups? Your remark above concerns this point, which is the validation of
the whole TLS approach, and clearly it is a valid point.

b) Whether the particular descriptors of a TLS model comply with the "basic
axioms" of the whole TLS theory. In Urzhumtsev et al we address this
question.

A can't-be-simpler example for "a)" and "b" above is:
- Everybody agrees that using isotropic B-factors is a fine modeling tool
at certain resolutions - this is the point (a) above.
- Everybody also agrees that negative B values in the PDB models is not a
very good thing to have - this is the point (b) above.

Atomic coordinates and ADP are the two among several parameters used to
describe crystal structure. Crystallographers put a lot of effort to
identify stereochemically correct models. We argue that the same should be
done to the other model parameters, such as TLS. Hirshfeld (1976) pioneered
this and you brought ADP validation to macromolecular world. Urzhumtsev et
al extend this deeper and further.

The TLS theory (Schomaker & Trueblood, 1968) is based on the assumption
that uncertainties in atomic positions of a given group are described by
correlated vibrations and librations (which may be supplemented by
individual atomic corrections). The same theory shows that mathematically
it is possible to represent motions of many random models (around their
mean) as an averaged motion that can be described by a few matrices (T, L
and S). Now if the TLS matrices do not represent such the average then what
is their meaning then? Clearly, arguments like "they may decrease
R-factors", "it is better than nothing" or "it's ok since TLS is just an
approximation" do not appear very strong scientifically.

You say

"""
Complaining that in practice the refined TLS values deviate from those that
would hypothetically be obtained from fitting perfectly rigid groups is
beside the point
"""

Here you again refer to the point (a) above and not (b). The U matrices
calculated from the TLS may be different from the individual U values,
regardless how much we want them to coincide. However, independently of the
quality of this fit, the TLS matrices always have to comply with the basic
TLS axioms.

You say

"""
But a validation criterion that is so strict that, it labels 85% of all
protein refinements as "nonsensical" is not a very, useful test.
"""

Here we are talking about validity of TLS refinement result, which are T, L
and S matrices and not atomic model as a whole. Bad TLS does not
necessarily mean bad atom coordinates, for example. In fact atomic models
with bad TLS may still be fine, it's just their TLS parameters should not
be taken seriously and also one should realize that possibly lower R
factors obtained as result of using TLS are simply due to using TLS as a
fudge factor and not as a physically meaningful model.

Summarizing, I would say:
1) Using TLS model is a great addition to (and not a replacement for)
simple individual isotropic or anisotropic model of Atomic Displacement
Parameters (ADP). If used correctly, TLS is expected to improve the model.
However, this is not the case most of the time for reasons eluded above,
which is unfortunate.
2) Needless to say, validation is important. This applies to all model
parameters, not just coordinates!
3) Someone better off investing some effort into redoing TLS refinement
protocols... just to stop adding nonsense to the database!

Happy holidays and all the best,
Pavel

On Tue, Dec 20, 2016 at 11:30 PM, Ethan Merritt <merr...@u.washington.edu>
wrote:

> On Tuesday, 20 December 2016 10:28:44 PM Pavel Afonine wrote:
> > Hi Dirk,
> >
> >
> > I want to check the validity of the refinement of anisotropic B-factors
> vs.
> > > TLS + isototropic B-factors using the Hamilton R-value ratio test as
> > > described in Ethan Merritt's paper "To B or not to B", Acta Cryst. D,
> Vol
> > > 68, pp 468. This test uses the generalised R-factors (assuming unit
> > > weights), RG=(Sum(Fo-Fc)^2/Sum(Fo)^2)^1/2. Although Hamilton wrote
> that
> > > at the end of refinement, one could also use the similar ratio of the
> usual
> > > R-factors, I really would like to check the ratio of the RG-values
> after
> > > refinement. As far as I can see, this value is not reported by the
> usual
> > > refinement programs.
> >
> >
> >
> > R factor is a global metric that, if considered alone, is not going to
> > answer your question. Best is to consider all three:
> >
> > 1) Rfree;
> > 2) Rfree-Rwork;
>
> > 3) Meaningfulness of refined TLS matrices. Note, as we discovered and
> > documented recently, results of TLS refinements (TLS matrices) are
> > nonsensical in 85% of PDB entries (yes, eighty-five are bad, believe it
> or not!):
>
> > From deep TLS validation to ensembles of atomic models built from
> elemental
> > motions. A. Urzhumtsev, P. V. Afonine, A. H. Van Benschoten, J. S.
> Fraser and P. D.
> > Adams. Acta Cryst. (2015). D71, 1668-1683.
>
> As you know, I disagree on this point.
>
> The Urzhumtsev et al classification of "nonsensical" TLS matrices includes
> many that make lots of sense but do not happen to describe a perfectly
> rigid body.
> That's OK, because proteins are not perfectly rigid bodies.
> The TLS models are useful approximations that capture
> essential features of a messy ensemble of protein atoms.
> Complaining that in practice the refined TLS values deviate from those that
> would hypothetically be obtained from fitting perfectly rigid groups is
> beside
> the point.
>
> Of course some refinements really are bad and some models really are
> unreasonable.  Validation tests can help you catch these and fix your
> model or refinement.  But a validation criterion that is so strict that
> it labels 85% of all protein refinements as "nonsensical" is not a very
> useful test.
>
> >
> > I'd say if you pass "1-3)" you are more than good. If still in doubt, you
> > can make an extra effort and do what's described in
> >
> > Validation of crystallographic models containing TLS or other
> descriptions
> > of anisotropy
> > F. Zucker, P. C. Champ and E. A. Merritt
> > Acta Cryst. (2010). D66, 889-900
> >
> > which may reveal extra troubles.
>
> Note that the primary validation test described in the Zucker paper
> (we called it SKITTLS) is a check for the pairwise consistency of
> adjacent TLS groups.   It might flag as inconsistent two adjacent
> groups that both pass the criteria in Urzhumtsev et al, or conversely
> it might rate two groups that fail the Urzhumtsev criteria as being
> nevertheless consistent in their description of atoms they jointly
> apply to.
>
>                 Ethan
>

[ccp4bb] Why TLS refinement may go wrong and why it is so often the case (was: Re: [ccp4bb] Calculation of generalised R-factor?)

Reply via email to