Re: [ccp4bb] [3dem] Which resolution?

2020-03-12 Thread Petrus Zwart
Hi Jacob,

On Thu, Mar 12, 2020 at 9:13 AM Keller, Jacob 
wrote:

> I would think the most information-reflecting representation for
> systematic absences (or maybe for all reflections) would be not I/sig but
> the reflection's (|log|) ratio to the expected intensity in that shell
> (median intensity, say).


Xtriage does something like this as part of its space group assignment
algorithm. A choice of space group implies assigning reflections the label
acentric, centric or absent. Each of these have their own prior
distribution, which can be convoluted with a gaussian to compute a
likelihood for that specific space group hypothesis. It provides a decent
way of assigning space groups in an automated manner.


> (...)
>


> Maybe more generally, should refinement incorporate weighting for these
> deviant spots? Or maybe it already does, but my understanding was that
> I/sig was the most salient for weighting.
>

The best option is to have a decent likelihood function that takes into
account the (almost) full uncertainty of the observation into
consideration, as described by Read & Pannu (https://bit.ly/2W6qmVR)
including various numerical /mathematical approaches to compute this ( Read
& McCoy https://bit.ly/2Qa6b5I;  Perpendicular Pronoun & Perryman
https://bit.ly/2TKjJXH ).

P






> JPK
>
> +
> Jacob Pearson Keller
> Research Scientist / Looger Lab
> HHMI Janelia Research Campus
> 19700 Helix Dr, Ashburn, VA 20147
> Desk: (571)209-4000 x3159
> Cell: (301)592-7004
> +
>
> The content of this email is confidential and intended for the recipient
> specified in message only. It is strictly forbidden to share any part of
> this message with any third party, without a written consent of the sender.
> If you received this message by mistake, please reply to this message and
> follow with its deletion, so that we can ensure such a mistake does not
> occur in the future.
>
> -Original Message-
> From: CCP4 bulletin board  On Behalf Of Kay
> Diederichs
> Sent: Tuesday, March 10, 2020 2:48 AM
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] [3dem] Which resolution?
>
> I'd say that it depends on your state of knowledge, and on their I and
> sigma.
>
> - if you know the space group for sure before you do the measurement of
> the systematic absences, their I and sigma don't matter to you (because
> they don't influence your mental model of the experiment), so their
> information content is (close to) zero.
> - if the space group is completely unknown, some groups of reflections
> (e.g. h,k,l = 0,0,2n+1) can only be considered "potentially systematic
> absences". Then both I and sigma matter. "small" or "high" I/sigma for each
> member of such a group of reflections would indeed add quite some
> information in this situation, so an information content of up to 1 bit
> would be justified. "intermediate" I/sigma (say, 0.5 to 2) would be closer
> to zero bit, since it does not let you safely decide between "yes" or "no"
> (the recent paper by Randy Read and coworkers relates I and sigma to bits
> of information, but not in the context of decision making from potentially
> systematic absent reflections).
>
> So it is not quite straightforward, I think.
>
> best wishes,
> Kay
>
> On Tue, 10 Mar 2020 01:26:03 +0100, James Holton  wrote:
>
> >I'd say they are 1 bit each, since they are the answer to a yes-or-no
> >question.
> >
> >-James Holton
> >MAD Scientist
> >
> >On 2/27/2020 6:32 PM, Keller, Jacob wrote:
> >> How would one evaluate the information content of systematic absences?
> >>
> >> JPK
> >>
> >> On Feb 26, 2020 8:14 PM, James Holton  wrote:
> >> In my opinion the threshold should be zero bits.  Yes, this is where
> >> CC1/2 = 0 (or FSC = 0).  If there is correlation then there is
> >> information, and why throw out information if there is information to
> >> be had?  Yes, this information comes with noise attached, but that is
> >> why we have weights.
> >>
> >> It is also important to remember that zero intensity is still useful
> >> information.  Systematic absences are an excellent example.  They
> >> have no intensity at all, but they speak volumes about the structure.
> >> In a similar way, high-angle zero-intensity observations also tell us
> >> something.  Ever tried unrestrained B factor refinement at poor
> >> resolution?  It is hard to do nowadays because of all the safety
> >> catches in modern software, but you can get great R factors this way.

Re: [ccp4bb] [3dem] Which resolution?

2020-03-12 Thread Keller, Jacob
I would think the most information-reflecting representation for systematic 
absences (or maybe for all reflections) would be not I/sig but the reflection's 
(|log|) ratio to the expected intensity in that shell (median intensity, say). 
Thus, when the median intensity is 1000 counts, and one observes a spot of 1 
count, this would be quite information-rich even though its I/sig would be 
really small.

This approach would also reflect the lesser information contained in twinned 
data, where deviations from mean intensities are smaller, even though I/sig be 
quite large.

Regarding spots that are not systematic absence candidates, isn't it true that 
a very weak spot (e.g. 10 counts) of I/sig = 2 might contain more information 
than a strong spot (1000 counts) of I/sig = 20 in the same shell, if the median 
counts in the shell were 1000?

I used to hear rumors that maps calculated from the 1000 strongest spots were 
almost tantamount to using all reflections; maybe these flaky maps would be 
improved by using instead the 1000 reflections which deviate most from expected 
intensities? (i.e., both stronger and weaker.)

Maybe more generally, should refinement incorporate weighting for these deviant 
spots? Or maybe it already does, but my understanding was that I/sig was the 
most salient for weighting.

I guess the general idea is that the more unexpected the value is, the more it 
captures something unique about the thing being measured, thus more information.

JPK

+
Jacob Pearson Keller
Research Scientist / Looger Lab
HHMI Janelia Research Campus
19700 Helix Dr, Ashburn, VA 20147
Desk: (571)209-4000 x3159
Cell: (301)592-7004
+

The content of this email is confidential and intended for the recipient 
specified in message only. It is strictly forbidden to share any part of this 
message with any third party, without a written consent of the sender. If you 
received this message by mistake, please reply to this message and follow with 
its deletion, so that we can ensure such a mistake does not occur in the future.

-Original Message-
From: CCP4 bulletin board  On Behalf Of Kay Diederichs
Sent: Tuesday, March 10, 2020 2:48 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] [3dem] Which resolution?

I'd say that it depends on your state of knowledge, and on their I and sigma.

- if you know the space group for sure before you do the measurement of the 
systematic absences, their I and sigma don't matter to you (because they don't 
influence your mental model of the experiment), so their information content is 
(close to) zero.
- if the space group is completely unknown, some groups of reflections (e.g. 
h,k,l = 0,0,2n+1) can only be considered "potentially systematic absences". 
Then both I and sigma matter. "small" or "high" I/sigma for each member of such 
a group of reflections would indeed add quite some information in this 
situation, so an information content of up to 1 bit would be justified. 
"intermediate" I/sigma (say, 0.5 to 2) would be closer to zero bit, since it 
does not let you safely decide between "yes" or "no" (the recent paper by Randy 
Read and coworkers relates I and sigma to bits of information, but not in the 
context of decision making from potentially systematic absent reflections). 

So it is not quite straightforward, I think.

best wishes,
Kay

On Tue, 10 Mar 2020 01:26:03 +0100, James Holton  wrote:

>I'd say they are 1 bit each, since they are the answer to a yes-or-no 
>question.
>
>-James Holton
>MAD Scientist
>
>On 2/27/2020 6:32 PM, Keller, Jacob wrote:
>> How would one evaluate the information content of systematic absences?
>>
>> JPK
>>
>> On Feb 26, 2020 8:14 PM, James Holton  wrote:
>> In my opinion the threshold should be zero bits.  Yes, this is where
>> CC1/2 = 0 (or FSC = 0).  If there is correlation then there is 
>> information, and why throw out information if there is information to 
>> be had?  Yes, this information comes with noise attached, but that is 
>> why we have weights.
>>
>> It is also important to remember that zero intensity is still useful 
>> information.  Systematic absences are an excellent example.  They 
>> have no intensity at all, but they speak volumes about the structure.  
>> In a similar way, high-angle zero-intensity observations also tell us 
>> something.  Ever tried unrestrained B factor refinement at poor 
>> resolution?  It is hard to do nowadays because of all the safety 
>> catches in modern software, but you can get great R factors this way.
>> A telltale sign of this kind of "over fitting" is remarkably large 
>> Fcalc values beyond the resolution cutoff.  These don't contribute to 
>> the R fa

Re: [ccp4bb] [3dem] Which resolution?

2020-03-12 Thread Gergely Katona
Hi,

I think through bond and through space B factor (+ sphericity) restraints 
primarily exist for pragmatic reasons: they are needed to maintain the 
numerical stability of the refinement. That is a separate issue from making 
physical sense. If one finds consistent B-factor similarity in atoms connected 
through bonds and near each other that lends support to a physical 
interpretation. Unfortunately that is not the case: a CG atom in a lysine 
residue tend to have higher Biso than a CB atom and lower Biso than a CD atom 
even though they are both CB and CD are covalently bonded to CG and they are 
physically close to one other. That tells me that the position of the atom in a 
residue predicts their relative B factors than bonding connectivity. Spherical 
targets also work against evidence as the probability of finding perfectly 
isotropic atoms in a protein structure is close to zero in high resolution 
structures. It is enough take a look at the excellent PARVATI home page at the 
fraction of atoms at anisotropy=1: 
http://skuld.bmsc.washington.edu/parvati/parvati_survey.html

We did not try to put atoms into predetermined groups, but used clustering 
analysis of the ADP tensors in a set of high resolution structures.  We found 
that ADPs of covalently bonded atoms are rarely the most similar. Only in 
weakly defined parts of the structure bonding connectivity has strong 
predictive power, but I wonder if that is not entirely the effect of 
restraints. 

What is really surprising is that ADPs of chemical similar atoms have a 
tendency to be the most similar even though they are located in completely 
different parts of the structure. And that includes similarity in displacement 
directions in the absence of obvious symmetry.
Different crystal structures have different disorder and resolution in you 
analysis is shown to have a role. Therefore restraints might need to be 
tailored to the actual type of disorder (for example using TLS or not). I agree 
that when it comes to physically relevant ADP restraints, our toolbox may also 
be incomplete. 

Best wishes,

Gergely

-Original Message-
From: CCP4 bulletin board  On Behalf Of Bohdan Schneider
Sent: March 12, 2020 11:05
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] [3dem] Which resolution?

Hello,

B-factors actually do have a physical meaning which is at least to some extent 
reflected by the crystal structures as refined. This can be demonstrated at 
higher resolution structures: when we created three tiers of structures, better 
than 1.9 Å, 1.9-2.4 Å, and 2.4-3.0 Å, structures in the first one showed 
distinct distributions of B factors for the amino acid side chain/main chain 
atoms inside and outside the protein, for DNA bases, and phosphates, and for 
water at the interface, and on the biomolecule surface. The distinction is less 
clear for the
1.9-2.4 Å structures and is lost completely below that resolution limit.

We think that the distributions for the high resolution structures can be 
developed into meaningful set of constraints and/or validation criteria.

If interested, you can read more in our open access paper Acta Cryst. 
(2014). D70, 2413–2419 (doi:10.1107/S1399004714014631).

Best regards,

Bohdan, bs.structbio.org

On 2020-03-11 16:41, Gerard DVD Kleywegt wrote:
>>> If this is the case, why can't we use model B factors to validate 
>>> our structure? I know some people are skeptical about this approach 
>>> because B factors are refinable parameters.
>>>
>>> Rangana
>>
>> It is not clear to me exactly what you are asking.
>>
>> B factors _should_ be validated, precisely because they are refined 
>> parameters that are part of your model.   Where have you seen 
>> skepticism?
> 
> Rangana said that B-values should not be used *to validate 
> structures*, NOT that B-values themselves shouldn't be validated themselves.
> 
> I suppose I am at least in part to blame for the former notion and the 
> reason for this (at least circa 1995 when the Angry Young Men from 
> Uppsala first starting harping on about this) was that B-values 
> tend(ed) to be error sinks which could "absorb" all sorts of errors 
> and phenomena in addition to modelling atomic displacement (e.g., 
> unresolved disorder, unresolved NCS differences, incorrect restraints, 
> incorrect atom types modelled, partial ocupancies, etc.).
> 
> --Gerard
> 
> **
>     Gerard J. Kleywegt
> 
>    http://xray.bmc.uu.se/gerard   mailto:ger...@xray.bmc.uu.se
> **
>     The opinions in this message are fictional.  Any similarity
>     to actual opinions, living or dead, is purely coincidental.
> *

Re: [ccp4bb] [3dem] Which resolution?

2020-03-12 Thread Bohdan Schneider

Hello,

B-factors actually do have a physical meaning which is at least to some 
extent reflected by the crystal structures as refined. This can be 
demonstrated at higher resolution structures: when we created three 
tiers of structures, better than 1.9 Å, 1.9-2.4 Å, and 2.4-3.0 Å, 
structures in the first one showed distinct distributions of B factors 
for the amino acid side chain/main chain atoms inside and outside the 
protein, for DNA bases, and phosphates, and for water at the interface, 
and on the biomolecule surface. The distinction is less clear for the 
1.9-2.4 Å structures and is lost completely below that resolution limit.


We think that the distributions for the high resolution structures can 
be developed into meaningful set of constraints and/or validation criteria.


If interested, you can read more in our open access paper Acta Cryst. 
(2014). D70, 2413–2419 (doi:10.1107/S1399004714014631).


Best regards,

Bohdan, bs.structbio.org

On 2020-03-11 16:41, Gerard DVD Kleywegt wrote:
If this is the case, why can't we use model B factors to validate our 
structure? I know some people are skeptical about this approach 
because B factors are refinable parameters.


Rangana


It is not clear to me exactly what you are asking.

B factors _should_ be validated, precisely because they are refined 
parameters

that are part of your model.   Where have you seen skepticism?


Rangana said that B-values should not be used *to validate structures*, 
NOT that B-values themselves shouldn't be validated themselves.


I suppose I am at least in part to blame for the former notion and the 
reason for this (at least circa 1995 when the Angry Young Men from 
Uppsala first starting harping on about this) was that B-values tend(ed) 
to be error sinks which could "absorb" all sorts of errors and phenomena 
in addition to modelling atomic displacement (e.g., unresolved disorder, 
unresolved NCS differences, incorrect restraints, incorrect atom types 
modelled, partial ocupancies, etc.).


--Gerard

**
    Gerard J. Kleywegt

   http://xray.bmc.uu.se/gerard   mailto:ger...@xray.bmc.uu.se
**
    The opinions in this message are fictional.  Any similarity
    to actual opinions, living or dead, is purely coincidental.
**
    Little known gastromathematical curiosity: let "z" be the
    radius and "a" the thickness of a pizza. Then the volume
     of that pizza is equal to pi*z*z*a !
**



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] [3dem] Which resolution?

2020-03-12 Thread Nave, Colin (DLSLtd,RAL,LSCI)
ard deviation for the background which takes account of the variation in 
the background which cannot be modelled.


10.   Contrast against background. – Our disordered sidechain or partially 
occupied substrate has to be distinguished from this background. A modest 
telescope can easily resolve the moons of Jupiter (2-10 arc minutes separation) 
on a dark night but in the middle of Hong Kong (apparently the worlds brightest 
city) it would struggle. For MX, Increasing the exposure might help if there is 
a significant noise component. After this, the interpretability will not 
improve unless higher order Fourier terms become more significant, thereby 
allowing modelling to improve. Increased exposure could be counterproductive if 
radiation damage results.


11.   Dynamic range requirements for the image. Does one want to see the 
hydrogens on an electron density map in the presence of both noise and 
background. The contrast of the hydrogens will be low compared to the carbon, 
nitrogen and oxygen atoms. This is a similar situation to the Rose criterion 
for x-ray imaging where one wants to see a protein or organelle against the 
varying density of the cytosol. Another example would be to see both Jupiter 
and the fainter moons (e.g. moon number 5) in the presence of background from 
the sky.



12.   Interpretability – Despite the fact that our partially occupied substrate 
has a similar average density to the background, we often have some idea of the 
geometry of the substrate we wish to position. Individual atoms might be poorly 
defined if they are within the standard deviation of the background but a chain 
of them with plausible geometry might be identifiable. The machine learning 
might be able to accomplish this. Again, for x-ray imaging, filaments and 
membranes are easier to observe than single particles.



13.  Information content at a particular resolution-  a far more useful metric 
than resolution. For the case of atoms of approximately equal density (e.g. 
C,N,O atoms) the contrast is high (e.g.  100% depending on how one defines 
contrast). For this case, a half bit FSC threshold might be of some use for 
predicting whether one would observe individual atoms on an electron density 
map.  It would also apply at lower resolution to distinguishing sidechains 
packing together in the ordered interior of proteins. Although the weighting 
schemes for electron density map calculations should be able to handle low 
information content data, it is not clear to me whether including data at an 
information content of 0.01 would results in a significant improvement in the 
maps. Perhaps it would for the case of difference maps where Fc is high and Fo 
is low. For refinement including the low information content data is obviously 
very helpful.



14.   “Local resolution estimation - using small sub-volumes in low-res parts 
of maps.” I am sure Alexis is correct regarding fixed thresholds even though 
they may work for some cases. The “low resolution parts of the maps” does 
conflict somewhat with my harsh restricted us of the term resolution if it is 
restricted to instrument resolution. The same Fourier components contributed to 
these parts of the map as to the other parts. Even if the B factors in this 
part were high, one would still need to measure these Fourier components to 
identify this. Should one say “parts of the map with a low information content 
at this resolution?” Catchy isn’t it.



15.   “Turn a target SNR into an FSC threshold” – Yes, this is exactly what I 
would like to see though I guess the N in SNR might conflict with the strict 
definition of noise.



16.   Information content in the above list. Hopefully very little as none of 
it is suprising. Not sure how one calculates misinformation, disinformation 
content etc.

I am not happy about some of the terms in this list. There must be a better 
phrase than “background in an electron density map” which avoids the term noise.
Also have to read the papers Randy Read, Robert Oeffner & Airlie McCoy, 
http://journals.iucr.org/d/issues/2020/03/00/ba5308/index.html and Alexis Rohou 
https://www.biorxiv.org/content/10.1101/2020.03.01.972067v1. Some of the issues 
in the list above might be clarified in these articles.

Thanks all for the interesting discussions

Colin












From: CCP4 bulletin board  On Behalf Of Randy Read
Sent: 09 March 2020 12:37
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] [3dem] Which resolution?

Hi Alexis,

A brief summary of the relevant points in the paper that Pavel mentioned 
(https://journals.iucr.org/d/issues/2020/03/00/ba5308):

The paper is about how to estimate the amount of information gained by making a 
diffraction measurement, not really about defining resolution limits.  Based on 
the results presented, I would argue that it's not a good idea to define a 
traditional resolution cutoff based on average numbers in a resolution shell, 
because there will almost certainly be useful data

Re: [ccp4bb] [3dem] Which resolution?

2020-03-11 Thread Gerard DVD Kleywegt
If this is the case, why can't we use model B factors to validate our 
structure? I know some people are skeptical about this approach because B 
factors are refinable parameters.


Rangana


It is not clear to me exactly what you are asking.

B factors _should_ be validated, precisely because they are refined parameters
that are part of your model.   Where have you seen skepticism?


Rangana said that B-values should not be used *to validate structures*, NOT 
that B-values themselves shouldn't be validated themselves.


I suppose I am at least in part to blame for the former notion and the reason 
for this (at least circa 1995 when the Angry Young Men from Uppsala first 
starting harping on about this) was that B-values tend(ed) to be error sinks 
which could "absorb" all sorts of errors and phenomena in addition to 
modelling atomic displacement (e.g., unresolved disorder, unresolved NCS 
differences, incorrect restraints, incorrect atom types modelled, partial 
ocupancies, etc.).


--Gerard

**
   Gerard J. Kleywegt

  http://xray.bmc.uu.se/gerard   mailto:ger...@xray.bmc.uu.se
**
   The opinions in this message are fictional.  Any similarity
   to actual opinions, living or dead, is purely coincidental.
**
   Little known gastromathematical curiosity: let "z" be the
   radius and "a" the thickness of a pizza. Then the volume
of that pizza is equal to pi*z*z*a !
**



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] [3dem] Which resolution?

2020-03-10 Thread Kay Diederichs
 structure determination (by EM or MX). If it is used for
>>> these mature techniques it must be right!
>>>
>>> It is the adoption of the ½ bit threshold I worry about. I gave a
>>> rather weak example for MX which consisted of partial occupancy of
>>> side chains, substrates etc. For x-ray imaging a wide range of
>>> contrasts can occur and, if you want to see features with only a
>>> small contrast above the surroundings then I think the half bit
>>> threshold would be inappropriate.
>>>
>>> It would be good to see a clear message from the MX and EM
>>> communities as to why an information content threshold of ½ a bit is
>>> generally appropriate for these techniques and an acknowledgement
>>> that this threshold is technique/problem dependent.
>>>
>>> We might then progress from the bronze age to the iron age.
>>>
>>> Regards
>>>
>>> Colin
>>>
>>> *From:*CCP4 bulletin board  *On Behalf Of
>>> *Alexis Rohou
>>> *Sent:* 21 February 2020 16:35
>>> *To:* CCP4BB@JISCMAIL.AC.UK
>>> *Subject:* Re: [ccp4bb] [3dem] Which resolution?
>>>
>>> Hi all,
>>>
>>> For those bewildered by Marin's insistence that everyone's been
>>> messing up their stats since the bronze age, I'd like to offer what
>>> my understanding of the situation. More details in this thread from a
>>> few years ago on the exact same topic:
>>>
>>> https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html
>>> <https://urldefense.com/v3/__https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html__;!!Eh6p8Q!TK-tIY-zm5coRu74uWMkIJkTFWNz4-1ibr1oaahxT_2BAAetUTMNdfRqUCmIsJF61uc$>
>>>
>>> https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html
>>> <https://urldefense.com/v3/__https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html__;!!Eh6p8Q!TK-tIY-zm5coRu74uWMkIJkTFWNz4-1ibr1oaahxT_2BAAetUTMNdfRqUCmIPu-nRBo$>
>>>
>>> Notwithstanding notational problems (e.g. strict equations as opposed
>>> to approximation symbols, or omission of symbols to denote
>>> estimation), I believe Frank & Al-Ali and "descendent" papers (e.g.
>>> appendix of Rosenthal & Henderson 2003) are fine. The cross terms
>>> that Marin is agitated about indeed do in fact have an expectation
>>> value of 0.0 (in the ensemble; if the experiment were performed an
>>> infinite number of times with different realizations of noise). I
>>> don't believe Pawel or Jose Maria or any of the other authors really
>>> believe that the cross-terms are orthogonal.
>>>
>>> When N (the number of independent Fouier voxels in a shell) is large
>>> enough, mean(Signal x Noise) ~ 0.0 is only an approximation, but a
>>> pretty good one, even for a single FSC experiment. This is why, in my
>>> book, derivations that depend on Frank & Al-Ali are OK, under the
>>> strict assumption that N is large. Numerically, this becomes apparent
>>> when Marin's half-bit criterion is plotted - asymptotically it has
>>> the same behavior as a constant threshold.
>>>
>>> So, is Marin wrong to worry about this? No, I don't think so. There
>>> are indeed cases where the assumption of large N is broken. And under
>>> those circumstances, any fixed threshold (0.143, 0.5, whatever) is
>>> dangerous. This is illustrated in figures of van Heel & Schatz
>>> (2005). Small boxes, high-symmetry, small objects in large boxes, and
>>> a number of other conditions can make fixed thresholds dangerous.
>>>
>>> It would indeed be better to use a non-fixed threshold. So why am I
>>> not using the 1/2-bit criterion in my own work? While numerically it
>>> behaves well at most resolution ranges, I was not convinced by
>>> Marin's derivation in 2005. Philosophically though, I think he's
>>> right - we should aim for FSC thresholds that are more robust to the
>>> kinds of edge cases mentioned above. It would be the right thing to do.
>>>
>>> Hope this helps,
>>>
>>> Alexis
>>>
>>> On Sun, Feb 16, 2020 at 9:00 AM Penczek, Pawel A
>>> mailto:pawel.a.penc...@uth.tmc.edu>> wrote:
>>>
>>> Marin,
>>>
>>> The statistics in 2010 review is fine. You may disagree with
>>> assumptions, but I can assure you the “statistics” (as you call
>>> it) is fine. Careful reading of the paper would reveal to you
>>> this much.
&

Re: [ccp4bb] [3dem] Which resolution?

2020-03-09 Thread James Holton
I'd say they are 1 bit each, since they are the answer to a yes-or-no 
question.


-James Holton
MAD Scientist

On 2/27/2020 6:32 PM, Keller, Jacob wrote:

How would one evaluate the information content of systematic absences?

JPK

On Feb 26, 2020 8:14 PM, James Holton  wrote:
In my opinion the threshold should be zero bits.  Yes, this is where 
CC1/2 = 0 (or FSC = 0).  If there is correlation then there is 
information, and why throw out information if there is information to 
be had?  Yes, this information comes with noise attached, but that is 
why we have weights.


It is also important to remember that zero intensity is still useful 
information.  Systematic absences are an excellent example.  They have 
no intensity at all, but they speak volumes about the structure.  In a 
similar way, high-angle zero-intensity observations also tell us 
something.  Ever tried unrestrained B factor refinement at poor 
resolution?  It is hard to do nowadays because of all the safety 
catches in modern software, but you can get great R factors this way.  
A telltale sign of this kind of "over fitting" is remarkably large 
Fcalc values beyond the resolution cutoff.  These don't contribute to 
the R factor, however, because Fobs is missing for these hkls. So, 
including zero-intensity data suppresses at least some types of 
over-fitting.


The thing I like most about the zero-information resolution cutoff is 
that it forces us to address the real problem: what do you mean by 
"resolution" ?  Not long ago, claiming your resolution was 3.0 A meant 
that after discarding all spots with individual I/sigI < 3 you still 
have 80% completeness in the 3.0 A bin.  Now we are saying we have a 
3.0 A data set when we can prove statistically that a few 
non-background counts fell into the sum of all spot areas at 3.0 A.  
These are not the same thing.


Don't get me wrong, including the weak high-resolution information 
makes the model better, and indeed I am even advocating including all 
the noisy zeroes.  However, weak data at 3.0 A is never going to be as 
good as having strong data at 3.0 A.  So, how do we decide?  I 
personally think that the resolution assigned to the PDB deposition 
should remain the classical I/sigI > 3 at 80% rule.  This is really 
the only way to have meaningful comparison of resolution between very 
old and very new structures.  One should, of course, deposit all the 
data, but don't claim that cut-off as your "resolution".  That is just 
plain unfair to those who came before.


Oh yeah, and I also have a session on "interpreting low-resolution 
maps" at the GRC this year. 
https://www.grc.org/diffraction-methods-in-structural-biology-conference/2020/


So, please, let the discussion continue!

-James Holton
MAD Scientist

On 2/22/2020 11:06 AM, Nave, Colin (DLSLtd,RAL,LSCI) wrote:


Alexis

This is a very useful summary.

You say you were not convinced by Marin's derivation in 2005. Are you 
convinced now and, if not, why?


My interest in this is that the FSC with half bit thresholds have the 
danger of being adopted elsewhere because they are becoming standard 
for protein structure determination (by EM or MX). If it is used for 
these mature techniques it must be right!


It is the adoption of the ½ bit threshold I worry about. I gave a 
rather weak example for MX which consisted of partial occupancy of 
side chains, substrates etc. For x-ray imaging a wide range of 
contrasts can occur and, if you want to see features with only a 
small contrast above the surroundings then I think the half bit 
threshold would be inappropriate.


It would be good to see a clear message from the MX and EM 
communities as to why an information content threshold of ½ a bit is 
generally appropriate for these techniques and an acknowledgement 
that this threshold is technique/problem dependent.


We might then progress from the bronze age to the iron age.

Regards

Colin

*From:*CCP4 bulletin board  *On Behalf Of 
*Alexis Rohou

*Sent:* 21 February 2020 16:35
*To:* CCP4BB@JISCMAIL.AC.UK
*Subject:* Re: [ccp4bb] [3dem] Which resolution?

Hi all,

For those bewildered by Marin's insistence that everyone's been 
messing up their stats since the bronze age, I'd like to offer what 
my understanding of the situation. More details in this thread from a 
few years ago on the exact same topic:


https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html 
<https://urldefense.com/v3/__https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html__;!!Eh6p8Q!TK-tIY-zm5coRu74uWMkIJkTFWNz4-1ibr1oaahxT_2BAAetUTMNdfRqUCmIsJF61uc$>


https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html 
<https://urldefense.com/v3/__https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html__;!!Eh6p8Q!TK-tIY-zm5coRu74uWMkIJkTFWNz4-1ibr1oaahxT_2BAAetUTMNdfRqUCmIPu-nRBo$>


Notwithstanding notational problems (e.g. strict equations as opposed 
to approximation

Re: [ccp4bb] [3dem] Which resolution?

2020-03-09 Thread James Holton
Well, in the example I made the individual atomic B factors, the Wilson 
B, and "true B-factor" were all the same thing.  Keeps it simple.


But to be clear, yes: the B factors at the end of refinement are not the 
"true B-factors", they are our best estimation of them.  Given that no 
protein model has ever explained the data to within experimental error, 
there is clearly a difference between these two things.


I suppose suspicions of B factors arise historically because of how much 
they have been abused.  In all our efforts to fit Gaussian pegs into 
weird-shaped holes we have done a lot of crazy things with B factors. 
Observations/parameters and all that.  You can usually tell something is 
amiss when you see physically unreasonable things, like covalently 
bonded neighbors having drastically different Bs.  And of course, like 
all good validation methods these eventually turned into restraints.  
Thru-bond and thru-space B factor restraints are the default now.


That said, I do expect that the "average" B factor is accurate for most 
structures.  This is because even a small error in overall B is easily 
corrected in scaling.  The distribution of atomic B factors, however, is 
harder.  The connection to the Wilson B depends on which part of the 
Wilson plot you look at.  Any atoms with very large B factors (such as 
the waters in the bulk solvent) will not contribute significantly to 
high-angle structure factors.  So, if you are looking at the slope of 
the Wilson plot at high (-ish) angle you will not see anything coming 
from these atoms.  Similar is true for disordered side chains, etc.


A common way a refined B factor can go wrong is if the atom in question 
really should be split into two conformers.  In those cases the refined 
B factor will be too high, and too anisotropic.  You can also fill in 
little bits of noise or systematic error with low-occupancy water atoms 
that have B factors so sharp as to only contribute to a few map grid 
points.  These are probably not real. On the other hand, anything built 
into a wrong place or just general nothingness will often refine to a 
very large B factor.


So yes, all these are caveats in the model-vs-data difference. However, 
if you have a data set that extends to 1.8 A resolution those outermost 
spots will be due to the atoms in the structure with the lowest "true" B 
factors.  Atoms with high B factors don't really play a role in 
determining the outer resolution limit.


-James Holton
MAD Scientist

On 3/9/2020 1:07 AM, Dale Tronrud wrote:

Just a note: James Holton said "true B-factor" not "true B-factors".
  I believe he was talking about the overall B not the individual B's.

Dale Tronrud

On 3/8/2020 3:25 PM, Rangana Warshamanage wrote:

Sorry for not being clear enough.
If B-factors at the end of refinement are the "true B-factors" then they
represent a true property of data. They should be good enough to assess
the model quality directly. This is what I meant by B factor validation.
However, how far are the final B-factors similar to true B-factors is
another question.

Rangana


On Sun, Mar 8, 2020 at 7:06 PM Ethan A Merritt mailto:merr...@uw.edu>> wrote:

 On Sunday, 8 March 2020 01:08:32 PDT Rangana Warshamanage wrote:
 > "The best estimate we have of the "true" B factor is the model B
 factors
 > we get at the end of refinement, once everything is converged,
 after we
 > have done all the building we can.  It is this "true B factor"
 that is a
 > property of the data, not the model, "
 >
 > If this is the case, why can't we use model B factors to validate our
 > structure? I know some people are skeptical about this approach
 because B
 > factors are refinable parameters.
 >
 > Rangana

 It is not clear to me exactly what you are asking.

 B factors _should_ be validated, precisely because they are refined
 parameters
 that are part of your model.   Where have you seen skepticism?

 Maybe you thinking of the frequent question "should the averaged
 refined B
 equal the Wilson B reported by data processing?".  That discussion usual
 wanders off into explanations of why the Wilson B estimate is or is not
 reliable, what "average B" actually means, and so on.  For me the bottom
 line is that comparison of Bavg to the estimated Wilson B is an
 extremely
 weak validation test.  There are many better tests for model quality.

         Ethan







To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1




To 

Re: [ccp4bb] [3dem] Which resolution?

2020-03-09 Thread Randy Read
dependent samples in a Fourier shell) is low, this bias is significant
> (2) thresholds in use today do not involve a significance test; they just 
> ignore the variance of the FSC as an estimator of SNR; to caricature, this is 
> like the whole field were satisfied with p values of ~0.5.
> (3) as far as I can tell, ignoring the bias and variance of the FSC as an 
> estimator of SNR is _mostly OK_ when doing global resolution estimates, when 
> the estimated resolution is pretty high (large n) and when the FSC curve has 
> a steep falloff. That's a lot of hand-waving, which I think we should aim to 
> dispense of.
> (4) when doing local resolution estimation using small sub-volumes in low-res 
> parts of maps, I'm convinced the fixed threshold are completely off.
> (5) I see no good reason to keep using fixed FSC thresholds, even for global 
> resolution estimates, but I still don' t know whether Marin's 1/2-bit-based 
> FSC criterion is correct (if I had to bet, I'd say not). Aiming for 1/2-bit 
> information content per Fourier component may be the correct target to aim 
> for, and fixed threshold are definitely not the way to go, but I am not 
> convinced that the 2005 proposal is the correct way forward
> (6) I propose a framework for deriving non-fixed FSC thresholds based on 
> desired SNR and confidence levels. Under some conditions, my proposed 
> thresholds behave similarly to Marin's 1/2-bit-based curve, which convinces 
> me further that Marin really is onto something.
> 
> To re-iterate: the choice of target SNR (or information content) is 
> independent of the choice of SNR estimator and of statistical testing 
> framework.
> 
> Hope this helps,
> Alexis
> 
> 
> 
> On Sat, Feb 22, 2020 at 2:06 AM Nave, Colin (DLSLtd,RAL,LSCI) 
> mailto:colin.n...@diamond.ac.uk>> wrote:
> Alexis
> 
> This is a very useful summary.
> 
>  
> 
> You say you were not convinced by Marin's derivation in 2005. Are you 
> convinced now and, if not, why?
> 
>  
> 
> My interest in this is that the FSC with half bit thresholds have the danger 
> of being adopted elsewhere because they are becoming standard for protein 
> structure determination (by EM or MX). If it is used for these mature 
> techniques it must be right!
> 
>  
> 
> It is the adoption of the ½ bit threshold I worry about. I gave a rather weak 
> example for MX which consisted of partial occupancy of side chains, 
> substrates etc. For x-ray imaging a wide range of contrasts can occur and, if 
> you want to see features with only a small contrast above the surroundings 
> then I think the half bit threshold would be inappropriate.
> 
>  
> 
> It would be good to see a clear message from the MX and EM communities as to 
> why an information content threshold of ½ a bit is generally appropriate for 
> these techniques and an acknowledgement that this threshold is 
> technique/problem dependent.
> 
>  
> 
> We might then progress from the bronze age to the iron age.
> 
>  
> 
> Regards
> 
> Colin
> 
>  
> 
>  
> 
>  
> 
> From: CCP4 bulletin board  <mailto:CCP4BB@JISCMAIL.AC.UK>> On Behalf Of Alexis Rohou
> Sent: 21 February 2020 16:35
> To: CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK>
> Subject: Re: [ccp4bb] [3dem] Which resolution?
> 
>  
> 
> Hi all,
> 
>  
> 
> For those bewildered by Marin's insistence that everyone's been messing up 
> their stats since the bronze age, I'd like to offer what my understanding of 
> the situation. More details in this thread from a few years ago on the exact 
> same topic: 
> 
> https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html 
> <https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html>
> https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html 
> <https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html>
>  
> 
> Notwithstanding notational problems (e.g. strict equations as opposed to 
> approximation symbols, or omission of symbols to denote estimation), I 
> believe Frank & Al-Ali and "descendent" papers (e.g. appendix of Rosenthal & 
> Henderson 2003) are fine. The cross terms that Marin is agitated about indeed 
> do in fact have an expectation value of 0.0 (in the ensemble; if the 
> experiment were performed an infinite number of times with different 
> realizations of noise). I don't believe Pawel or Jose Maria or any of the 
> other authors really believe that the cross-terms are orthogonal.
> 
>  
> 
> When N (the number of independent Fouier voxels in a shell) is large enough, 
> mean(Signal x Noise) ~ 0.0 is only an approximation, but a pretty good one, 
> even for 

Re: [ccp4bb] [3dem] Which resolution?

2020-03-08 Thread Frank Von Delft
 as saying I did not fully understand the 
relationship itself and was a partial reason why I raised the issue in another 
message to this thread.
Who cares anyway about the headline resolution? Well, defining a resolution can 
be important if one wants to calculate the exposure required to see particular 
features and whether they are then degraded by radiation damage. This relates 
to the issue I raised concerning the Rose criterion. As an example one might 
have a virus particle with an average density of 1.1 embedded in an object (a 
biological cell) of density 1.0 (I am keeping the numbers simple). The virus 
has a diameter of 50nm. There are 5000 voxels in the image (the number 5000 was 
used by Rose when analysing images from televisions). This gives 5000 chances 
of a false alarm so, I want to ensure the signal to noise ratio in the image is 
sufficiently high. This is why Rose adopted a contrast to noise ratio of 5 
(Rose criterion K of 5). For each voxel in the image we need a noise level 
sufficiently low to identify the feature. For a Rose criterion of 5 and the 
contrast of 0.1 it means that we need an average (?) of 625 photons per Shannon 
reciprocal voxel (the “speckle” given by the object as a whole) at the required 
resolution (1/50nm) in order to achieve this. The expression for the required 
number of photons is (K/2C)**2. However, if we have already identified a 
candidate voxel for the virus (perhaps using labelled fluorescent methods) we 
can get away with a Rose criterion of 3 (equivalent to K=5 over 5000 pixels) 
and 225 photons will suffice. For this case, a signal to noise ratio of 3 
corresponds to a  0.0027 probability of the event occurring due to Random 
noise. The information content is therefore –log20.0027 which is 8.5 bits. I 
therefore have a real space information content of 8.5 bits and an average 225 
photons at the resolution limit. The question is to relate these and come up 
with the appropriate value for the FSC threshold so I can judge whether a 
particle with this low contrast can be identified. In the above example, the 
object (biological cell) as a whole has a defined boundary and forms a natural 
sharp edged mask. The hard edge mask ( see Heel and Schatz section 4.7) is 
therefore present.

I am sure Marin (or others) will put me right of there are mistakes in the 
above.

Finally, for those interested in the relationship between information content 
and probability the article by Weaver (one of Shannon’s collaborators) gives a 
non-mathematical and perhaps philosophical description. It can be found at
http://www.mt-archive.info/50/SciAm-1949-Weaver.pdf

Sorry for the long reply – but at least some of it was requested!

Colin



From: CCP4 bulletin board <mailto:CCP4BB@JISCMAIL.AC.UK> 
On Behalf Of colin.n...@diamond.ac.uk<mailto:colin.n...@diamond.ac.uk>
Sent: 17 February 2020 11:26
To: CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>
Subject: [ccp4bb] FW: [ccp4bb] [3dem] Which resolution?


Dear all.
Would it help to separate out the issue of the FSC from the value of the 
threshold? My understanding is that the FSC addresses the spatial frequency at 
which there is a reliable information content in the image. This concept should 
apply to a wide variety of types of image. The issue is then what value of the 
threshold to use. For interpretation of protein structures (whether by x-ray or 
electron microscopy), a half bit threshold appears to be appropriate. However, 
for imaging the human brain (one of Marin’s examples) a higher threshold might 
be adopted as a range of contrasts might be present (axons for example have a 
similar density to the surroundings). For crystallography, if one wants to see 
lighter atoms (hydrogens in the presence of uranium or in proteins) a higher 
threshold might also be appropriate. I am not sure about this to be honest as a 
2 bit threshold (for example) would mean that there is information to higher 
resolution at a threshold of a half bit (unless one is at a diffraction or 
instrument limited resolution).

Most CCP4BBers will understand that a single number is not good enough. 
However, many users of the protein structure databases will simply search for 
the structure with the highest named resolution. It might be difficult to send 
these users to re-education camps.

Regards
Colin

From: CCP4 bulletin board mailto:CCP4BB@JISCMAIL.AC.UK>> 
On Behalf Of Petrus Zwart
Sent: 16 February 2020 21:50
To: CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>
Subject: Re: [ccp4bb] [3dem] Which resolution?

Hi All,

How is the 'correct' resolution estimation related to the estimated error on 
some observed hydrogen bond length of interest, or an error on the estimated 
occupancy of a ligand or conformation or anything else that has structural 
significance?

In crystallography, it isn't really (only in some very approximate fashion), 
and I doubt that in EM there is something to that effect. If you want

Re: [ccp4bb] [3dem] Which resolution?

2020-03-08 Thread Alexis Rohou
Hi Colin,

It sounds to me like you are mostly asking about whether 1/2-bit is the
"correct" target to aim for, the "correct" criterion for a resolution
claim. I have no view on that. I have yet to read Randy's work on the topic
- it sounds very informative.

What I do have a view on is, once one has decided one likes 1/2 bit
information content (equiv to SNR 0.207) or C_ref = 0.5, aka FSC=0.143
(equiv to SNR 0.167) as a criterion, how one should turn that into an FSC
threshold.

You say you were not convinced by Marin's derivation in 2005. Are you
> convinced now and, if not, why?


No. I was unable to follow Marin's derivation then, and last I tried (a
couple of years back), I was still unable to follow it. This is despite
being convinced that Marin is correct that fixed FSC thresholds are not
desirable. To be clear, my objections have nothing to do with whether
1/2-bit is an appropriate criterion, they're entirely about how you turn a
target SNR into an FSC threshold.

A few years ago, an equivalent thread on 3DEM/CCPEM (I think CCP4BB was
spared) led me to re-examine the foundations of the use of the FSC in
general. You can read more details in the manuscript I posted to bioRxiv a
few days ago (https://www.biorxiv.org/content/10.1101/2020.03.01.972067v1),
but essentially I conclude that:

(1) fixed-threshold criteria are not always appropriate, because they rely
on a biased estimator of the SNR, and in cases where* n* (the number of
independent samples in a Fourier shell) is low, this bias is significant
(2) thresholds in use today do not involve a significance test; they just
ignore the variance of the FSC as an estimator of SNR; to caricature, this
is like the whole field were satisfied with p values of ~0.5.
(3) as far as I can tell, ignoring the bias and variance of the FSC as an
estimator of SNR is _mostly OK_ when doing global resolution estimates,
when the estimated resolution is pretty high (large *n)* and when the FSC
curve has a steep falloff. That's a lot of hand-waving, which I think we
should aim to dispense of.
(4) when doing local resolution estimation using small sub-volumes in
low-res parts of maps, I'm convinced the fixed threshold are completely off.
(5) I see no good reason to keep using fixed FSC thresholds, even for
global resolution estimates, but I still don' t know whether Marin's
1/2-bit-based FSC criterion is correct (if I had to bet, I'd say not).
Aiming for 1/2-bit information content per Fourier component may be the
correct target to aim for, and fixed threshold are definitely not the way
to go, but I am not convinced that the 2005 proposal is the correct way
forward
(6) I propose a framework for deriving non-fixed FSC thresholds based on
desired SNR and confidence levels. Under some conditions, my proposed
thresholds behave similarly to Marin's 1/2-bit-based curve, which convinces
me further that Marin really is onto something.

To re-iterate: the choice of target SNR (or information content) is
independent of the choice of SNR estimator and of statistical testing
framework.

Hope this helps,
Alexis



On Sat, Feb 22, 2020 at 2:06 AM Nave, Colin (DLSLtd,RAL,LSCI) <
colin.n...@diamond.ac.uk> wrote:

> Alexis
>
> This is a very useful summary.
>
>
>
> You say you were not convinced by Marin's derivation in 2005. Are you
> convinced now and, if not, why?
>
>
>
> My interest in this is that the FSC with half bit thresholds have the
> danger of being adopted elsewhere because they are becoming standard for
> protein structure determination (by EM or MX). If it is used for these
> mature techniques it must be right!
>
>
>
> It is the adoption of the ½ bit threshold I worry about. I gave a rather
> weak example for MX which consisted of partial occupancy of side chains,
> substrates etc. For x-ray imaging a wide range of contrasts can occur and,
> if you want to see features with only a small contrast above the
> surroundings then I think the half bit threshold would be inappropriate.
>
>
>
> It would be good to see a clear message from the MX and EM communities as
> to why an information content threshold of ½ a bit is generally appropriate
> for these techniques and an acknowledgement that this threshold is
> technique/problem dependent.
>
>
>
> We might then progress from the bronze age to the iron age.
>
>
>
> Regards
>
> Colin
>
>
>
>
>
>
>
> *From:* CCP4 bulletin board  *On Behalf Of *Alexis
> Rohou
> *Sent:* 21 February 2020 16:35
> *To:* CCP4BB@JISCMAIL.AC.UK
> *Subject:* Re: [ccp4bb] [3dem] Which resolution?
>
>
>
> Hi all,
>
>
>
> For those bewildered by Marin's insistence that everyone's been messing up
> their stats since the bronze age, I'd like to offer what my understanding
> of the situation. More details in this thread from a few years ago on the
>

Re: [ccp4bb] [3dem] Which resolution?

2020-03-08 Thread Dale Tronrud
   Just a note: James Holton said "true B-factor" not "true B-factors".
 I believe he was talking about the overall B not the individual B's.

Dale Tronrud

On 3/8/2020 3:25 PM, Rangana Warshamanage wrote:
> Sorry for not being clear enough.
> If B-factors at the end of refinement are the "true B-factors" then they
> represent a true property of data. They should be good enough to assess
> the model quality directly. This is what I meant by B factor validation.
> However, how far are the final B-factors similar to true B-factors is
> another question.
> 
> Rangana
> 
> 
> On Sun, Mar 8, 2020 at 7:06 PM Ethan A Merritt  > wrote:
> 
> On Sunday, 8 March 2020 01:08:32 PDT Rangana Warshamanage wrote:
> > "The best estimate we have of the "true" B factor is the model B
> factors
> > we get at the end of refinement, once everything is converged,
> after we
> > have done all the building we can.  It is this "true B factor"
> that is a
> > property of the data, not the model, "
> >
> > If this is the case, why can't we use model B factors to validate our
> > structure? I know some people are skeptical about this approach
> because B
> > factors are refinable parameters.
> >
> > Rangana
> 
> It is not clear to me exactly what you are asking.
> 
> B factors _should_ be validated, precisely because they are refined
> parameters
> that are part of your model.   Where have you seen skepticism?
> 
> Maybe you thinking of the frequent question "should the averaged
> refined B
> equal the Wilson B reported by data processing?".  That discussion usual
> wanders off into explanations of why the Wilson B estimate is or is not
> reliable, what "average B" actually means, and so on.  For me the bottom
> line is that comparison of Bavg to the estimated Wilson B is an
> extremely
> weak validation test.  There are many better tests for model quality.
> 
>         Ethan
> 
> 
> 
> 
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
> 



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] [3dem] Which resolution?

2020-03-08 Thread Rangana Warshamanage
Sorry for not being clear enough.
If B-factors at the end of refinement are the "true B-factors" then they
represent a true property of data. They should be good enough to assess the
model quality directly. This is what I meant by B factor validation.
However, how far are the final B-factors similar to true B-factors is
another question.

Rangana


On Sun, Mar 8, 2020 at 7:06 PM Ethan A Merritt  wrote:

> On Sunday, 8 March 2020 01:08:32 PDT Rangana Warshamanage wrote:
> > "The best estimate we have of the "true" B factor is the model B factors
> > we get at the end of refinement, once everything is converged, after we
> > have done all the building we can.  It is this "true B factor" that is a
> > property of the data, not the model, "
> >
> > If this is the case, why can't we use model B factors to validate our
> > structure? I know some people are skeptical about this approach because B
> > factors are refinable parameters.
> >
> > Rangana
>
> It is not clear to me exactly what you are asking.
>
> B factors _should_ be validated, precisely because they are refined
> parameters
> that are part of your model.   Where have you seen skepticism?
>
> Maybe you thinking of the frequent question "should the averaged refined B
> equal the Wilson B reported by data processing?".  That discussion usual
> wanders off into explanations of why the Wilson B estimate is or is not
> reliable, what "average B" actually means, and so on.  For me the bottom
> line is that comparison of Bavg to the estimated Wilson B is an extremely
> weak validation test.  There are many better tests for model quality.
>
> Ethan
>
>
>
>
>



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] [3dem] Which resolution?

2020-03-08 Thread Ethan A Merritt
On Sunday, 8 March 2020 01:08:32 PDT Rangana Warshamanage wrote:
> "The best estimate we have of the "true" B factor is the model B factors
> we get at the end of refinement, once everything is converged, after we
> have done all the building we can.  It is this "true B factor" that is a
> property of the data, not the model, "
> 
> If this is the case, why can't we use model B factors to validate our
> structure? I know some people are skeptical about this approach because B
> factors are refinable parameters.
> 
> Rangana

It is not clear to me exactly what you are asking.

B factors _should_ be validated, precisely because they are refined parameters
that are part of your model.   Where have you seen skepticism?

Maybe you thinking of the frequent question "should the averaged refined B
equal the Wilson B reported by data processing?".  That discussion usual
wanders off into explanations of why the Wilson B estimate is or is not
reliable, what "average B" actually means, and so on.  For me the bottom
line is that comparison of Bavg to the estimated Wilson B is an extremely
weak validation test.  There are many better tests for model quality.

Ethan



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] [3dem] Which resolution?

2020-03-08 Thread Kay Diederichs
we can use model B factors to validate structures - see 

Analysis and validation of macromolecular B values
R. C. Masmaliyeva and G. N. Murshudov
Acta Cryst. (2019). D75, 505-518
https://doi.org/10.1107/S2059798319004807

HTH
Kay


On Sun, 8 Mar 2020 09:08:32 +, Rangana Warshamanage  
wrote:

>"The best estimate we have of the "true" B factor is the model B factors
>we get at the end of refinement, once everything is converged, after we
>have done all the building we can.  It is this "true B factor" that is a
>property of the data, not the model, "
>
>If this is the case, why can't we use model B factors to validate our
>structure? I know some people are skeptical about this approach because B
>factors are refinable parameters.
>
>Rangana
>
>On Sat, Mar 7, 2020 at 8:01 PM James Holton  wrote:
>
>> Yes, that's right.  Model B factors are fit to the data.  That Boverall
>> gets added to all atomic B factors in the model before the structure is
>> written out, yes?
>>
>> The best estimate we have of the "true" B factor is the model B factors
>> we get at the end of refinement, once everything is converged, after we
>> have done all the building we can.  It is this "true B factor" that is a
>> property of the data, not the model, and it has the relationship to
>> resolution and map appearance that I describe below.  Does that make sense?
>>
>> -James Holton
>> MAD Scientist
>>
>> On 3/7/2020 10:45 AM, dusan turk wrote:
>> > James,
>> >
>> > The case you’ve chosen is not a good illustration of the relationship
>> between atomic B and resolution.   The problem is that during scaling of
>> Fcalc to Fobs also B-factor difference between the two sets of numbers is
>> minimized. In the simplest form  with two constants Koverall and Boverall
>> it looks like this:
>> >
>> > sum_to_be_minimized = sum (FOBS**2 -  Koverall * FCALC**2 * exp(-1/d**2
>> * Boverall) )
>> >
>> > Then one can include bulk solvent correction, anisotripic scaling, … In
>> PHENIX it gets quite complex.
>> >
>> > Hence, almost regardless of the average model B you will always get the
>> same map, because the “B" of the map will reflect the B of the FOBS.  When
>> all atomic Bs are equal then they are also equal to average B.
>> >
>> > best, dusan
>> >
>> >
>> >> On 7 Mar 2020, at 01:01, CCP4BB automatic digest system <
>> lists...@jiscmail.ac.uk> wrote:
>> >>
>> >>> On Thu, 5 Mar 2020 01:11:33 +0100, James Holton 
>> wrote:
>> >>>
>>  The funny thing is, although we generally regard resolution as a
>> primary
>>  indicator of data quality the appearance of a density map at the
>> classic
>>  "1-sigma" contour has very little to do with resolution, and
>> everything
>>  to do with the B factor.
>> 
>>  Seriously, try it. Take any structure you like, set all the B factors
>> to
>>  30 with PDBSET, calculate a map with SFALL or phenix.fmodel and have a
>>  look at the density of tyrosine (Tyr) side chains.  Even if you
>>  calculate structure factors all the way out to 1.0 A the holes in the
>>  Tyr rings look exactly the same: just barely starting to form.  This
>> is
>>  because the structure factors from atoms with B=30 are essentially
>> zero
>>  out at 1.0 A, and adding zeroes does not change the map.  You can
>> adjust
>>  the contour level, of course, and solvent content will have some
>> effect
>>  on where the "1-sigma" contour lies, but generally B=30 is the point
>>  where Tyr side chains start to form their holes.  Traditionally, this
>> is
>>  attributed to 1.8A resolution, but it is really at B=30.  The point
>>  where waters first start to poke out above the 1-sigma contour is at
>>  B=60, despite being generally attributed to d=2.7A.
>> 
>>  Now, of course, if you cut off this B=30 data at 3.5A then the Tyr
>> side
>>  chains become blobs, but that is equivalent to collecting data with
>> the
>>  detector way too far away and losing your high-resolution spots off
>> the
>>  edges.  I have seen a few people do that, but not usually for a
>>  published structure.  Most people fight very hard for those faint,
>>  barely-existing high-angle spots.  But why do we do that if the map is
>>  going to look the same anyway?  The reason is because resolution and B
>>  factors are linked.
>> 
>>  Resolution is about separation vs width, and the width of the density
>>  peak from any atom is set by its B factor.  Yes, atoms have an
>> intrinsic
>>  width, but it is very quickly washed out by even modest B factors (B >
>>  10).  This is true for both x-ray and electron form factors. To a very
>>  good approximation, the FWHM of C, N and O atoms is given by:
>>  FWHM= sqrt(B*log(2))/pi+0.15
>> 
>>  where "B" is the B factor assigned to the atom and the 0.15 fudge
>> factor
>>  accounts for its intrinsic width when B=0.  Now that we know the peak
>>  width, we can start to ask if two peaks are 

Re: [ccp4bb] [3dem] Which resolution?

2020-03-08 Thread Rangana Warshamanage
"The best estimate we have of the "true" B factor is the model B factors
we get at the end of refinement, once everything is converged, after we
have done all the building we can.  It is this "true B factor" that is a
property of the data, not the model, "

If this is the case, why can't we use model B factors to validate our
structure? I know some people are skeptical about this approach because B
factors are refinable parameters.

Rangana

On Sat, Mar 7, 2020 at 8:01 PM James Holton  wrote:

> Yes, that's right.  Model B factors are fit to the data.  That Boverall
> gets added to all atomic B factors in the model before the structure is
> written out, yes?
>
> The best estimate we have of the "true" B factor is the model B factors
> we get at the end of refinement, once everything is converged, after we
> have done all the building we can.  It is this "true B factor" that is a
> property of the data, not the model, and it has the relationship to
> resolution and map appearance that I describe below.  Does that make sense?
>
> -James Holton
> MAD Scientist
>
> On 3/7/2020 10:45 AM, dusan turk wrote:
> > James,
> >
> > The case you’ve chosen is not a good illustration of the relationship
> between atomic B and resolution.   The problem is that during scaling of
> Fcalc to Fobs also B-factor difference between the two sets of numbers is
> minimized. In the simplest form  with two constants Koverall and Boverall
> it looks like this:
> >
> > sum_to_be_minimized = sum (FOBS**2 -  Koverall * FCALC**2 * exp(-1/d**2
> * Boverall) )
> >
> > Then one can include bulk solvent correction, anisotripic scaling, … In
> PHENIX it gets quite complex.
> >
> > Hence, almost regardless of the average model B you will always get the
> same map, because the “B" of the map will reflect the B of the FOBS.  When
> all atomic Bs are equal then they are also equal to average B.
> >
> > best, dusan
> >
> >
> >> On 7 Mar 2020, at 01:01, CCP4BB automatic digest system <
> lists...@jiscmail.ac.uk> wrote:
> >>
> >>> On Thu, 5 Mar 2020 01:11:33 +0100, James Holton 
> wrote:
> >>>
>  The funny thing is, although we generally regard resolution as a
> primary
>  indicator of data quality the appearance of a density map at the
> classic
>  "1-sigma" contour has very little to do with resolution, and
> everything
>  to do with the B factor.
> 
>  Seriously, try it. Take any structure you like, set all the B factors
> to
>  30 with PDBSET, calculate a map with SFALL or phenix.fmodel and have a
>  look at the density of tyrosine (Tyr) side chains.  Even if you
>  calculate structure factors all the way out to 1.0 A the holes in the
>  Tyr rings look exactly the same: just barely starting to form.  This
> is
>  because the structure factors from atoms with B=30 are essentially
> zero
>  out at 1.0 A, and adding zeroes does not change the map.  You can
> adjust
>  the contour level, of course, and solvent content will have some
> effect
>  on where the "1-sigma" contour lies, but generally B=30 is the point
>  where Tyr side chains start to form their holes.  Traditionally, this
> is
>  attributed to 1.8A resolution, but it is really at B=30.  The point
>  where waters first start to poke out above the 1-sigma contour is at
>  B=60, despite being generally attributed to d=2.7A.
> 
>  Now, of course, if you cut off this B=30 data at 3.5A then the Tyr
> side
>  chains become blobs, but that is equivalent to collecting data with
> the
>  detector way too far away and losing your high-resolution spots off
> the
>  edges.  I have seen a few people do that, but not usually for a
>  published structure.  Most people fight very hard for those faint,
>  barely-existing high-angle spots.  But why do we do that if the map is
>  going to look the same anyway?  The reason is because resolution and B
>  factors are linked.
> 
>  Resolution is about separation vs width, and the width of the density
>  peak from any atom is set by its B factor.  Yes, atoms have an
> intrinsic
>  width, but it is very quickly washed out by even modest B factors (B >
>  10).  This is true for both x-ray and electron form factors. To a very
>  good approximation, the FWHM of C, N and O atoms is given by:
>  FWHM= sqrt(B*log(2))/pi+0.15
> 
>  where "B" is the B factor assigned to the atom and the 0.15 fudge
> factor
>  accounts for its intrinsic width when B=0.  Now that we know the peak
>  width, we can start to ask if two peaks are "resolved".
> 
>  Start with the classical definition of "resolution" (call it after
> Airy,
>  Raleigh, Dawes, or whatever famous person you like), but essentially
> you
>  are asking the question: "how close can two peaks be before they merge
>  into one peak?".  For Gaussian peaks this is 0.849*FWHM. Simple
> enough.
>  However, when you look at the density of two atoms this 

Re: [ccp4bb] [3dem] Which resolution?

2020-03-07 Thread Robert Stroud
James answer seems right, and make abject sense. -and makes sense by experience 
too..  
Bob
Robert Stroud
str...@msg.ucsf.edu



> On Mar 7, 2020, at 12:01 PM, James Holton  wrote:
> 
> Yes, that's right.  Model B factors are fit to the data.  That Boverall gets 
> added to all atomic B factors in the model before the structure is written 
> out, yes?
> 
> The best estimate we have of the "true" B factor is the model B factors we 
> get at the end of refinement, once everything is converged, after we have 
> done all the building we can.  It is this "true B factor" that is a property 
> of the data, not the model, and it has the relationship to resolution and map 
> appearance that I describe below.  Does that make sense?
> 
> -James Holton
> MAD Scientist
> 
> On 3/7/2020 10:45 AM, dusan turk wrote:
>> James,
>> 
>> The case you’ve chosen is not a good illustration of the relationship 
>> between atomic B and resolution.   The problem is that during scaling of 
>> Fcalc to Fobs also B-factor difference between the two sets of numbers is 
>> minimized. In the simplest form  with two constants Koverall and Boverall it 
>> looks like this:
>> 
>> sum_to_be_minimized = sum (FOBS**2 -  Koverall * FCALC**2 * exp(-1/d**2 * 
>> Boverall) )
>> 
>> Then one can include bulk solvent correction, anisotripic scaling, … In 
>> PHENIX it gets quite complex.
>> 
>> Hence, almost regardless of the average model B you will always get the same 
>> map, because the “B" of the map will reflect the B of the FOBS.  When all 
>> atomic Bs are equal then they are also equal to average B.
>> 
>> best, dusan
>> 
>> 
>>> On 7 Mar 2020, at 01:01, CCP4BB automatic digest system 
>>>  wrote:
>>> 
 On Thu, 5 Mar 2020 01:11:33 +0100, James Holton  wrote:
 
> The funny thing is, although we generally regard resolution as a primary
> indicator of data quality the appearance of a density map at the classic
> "1-sigma" contour has very little to do with resolution, and everything
> to do with the B factor.
> 
> Seriously, try it. Take any structure you like, set all the B factors to
> 30 with PDBSET, calculate a map with SFALL or phenix.fmodel and have a
> look at the density of tyrosine (Tyr) side chains.  Even if you
> calculate structure factors all the way out to 1.0 A the holes in the
> Tyr rings look exactly the same: just barely starting to form.  This is
> because the structure factors from atoms with B=30 are essentially zero
> out at 1.0 A, and adding zeroes does not change the map.  You can adjust
> the contour level, of course, and solvent content will have some effect
> on where the "1-sigma" contour lies, but generally B=30 is the point
> where Tyr side chains start to form their holes.  Traditionally, this is
> attributed to 1.8A resolution, but it is really at B=30.  The point
> where waters first start to poke out above the 1-sigma contour is at
> B=60, despite being generally attributed to d=2.7A.
> 
> Now, of course, if you cut off this B=30 data at 3.5A then the Tyr side
> chains become blobs, but that is equivalent to collecting data with the
> detector way too far away and losing your high-resolution spots off the
> edges.  I have seen a few people do that, but not usually for a
> published structure.  Most people fight very hard for those faint,
> barely-existing high-angle spots.  But why do we do that if the map is
> going to look the same anyway?  The reason is because resolution and B
> factors are linked.
> 
> Resolution is about separation vs width, and the width of the density
> peak from any atom is set by its B factor.  Yes, atoms have an intrinsic
> width, but it is very quickly washed out by even modest B factors (B >
> 10).  This is true for both x-ray and electron form factors. To a very
> good approximation, the FWHM of C, N and O atoms is given by:
> FWHM= sqrt(B*log(2))/pi+0.15
> 
> where "B" is the B factor assigned to the atom and the 0.15 fudge factor
> accounts for its intrinsic width when B=0.  Now that we know the peak
> width, we can start to ask if two peaks are "resolved".
> 
> Start with the classical definition of "resolution" (call it after Airy,
> Raleigh, Dawes, or whatever famous person you like), but essentially you
> are asking the question: "how close can two peaks be before they merge
> into one peak?".  For Gaussian peaks this is 0.849*FWHM. Simple enough.
> However, when you look at the density of two atoms this far apart you
> will see the peak is highly oblong. Yes, the density has one maximum,
> but there are clearly two atoms in there.  It is also pretty obvious the
> long axis of the peak is the line between the two atoms, and if you fit
> two round atoms into this peak you recover the distance between them
> quite accurately.  Are they really not "resolved" if it is so clear

Re: [ccp4bb] [3dem] Which resolution?

2020-03-07 Thread dusan turk
James,

> On 7 Mar 2020, at 21:01, James Holton  wrote:
> 
> Yes, that's right.  Model B factors are fit to the data.  That Boverall gets 
> added to all atomic B factors in the model before the structure is written 
> out, yes?

Almost true. It depends how the programs are written. In MAIN this is not 
necessary. 

> The best estimate we have of the "true" B factor is the model B factors we 
> get at the end of refinement, once everything is converged, after we have 
> done all the building we can.  It is this "true B factor" that is a property 
> of the data, not the model, and it has the relationship to resolution and map 
> appearance that I describe below.  Does that make sense?

This is how it almost always is. Sometimes the best fit is achieved when model 
Baverage is higher than Fcalc fit to Fobs would suggest it. In such cases the 
difference is subtracted again during Fcalc to Fobs scaling. I did not 
investigate this any further, but maybe someone else has an idea or already 
established solution.

best, dusan



> 
> -James Holton



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] [3dem] Which resolution?

2020-03-07 Thread James Holton
Yes, that's right.  Model B factors are fit to the data.  That Boverall 
gets added to all atomic B factors in the model before the structure is 
written out, yes?


The best estimate we have of the "true" B factor is the model B factors 
we get at the end of refinement, once everything is converged, after we 
have done all the building we can.  It is this "true B factor" that is a 
property of the data, not the model, and it has the relationship to 
resolution and map appearance that I describe below.  Does that make sense?


-James Holton
MAD Scientist

On 3/7/2020 10:45 AM, dusan turk wrote:

James,

The case you’ve chosen is not a good illustration of the relationship between 
atomic B and resolution.   The problem is that during scaling of Fcalc to Fobs 
also B-factor difference between the two sets of numbers is minimized. In the 
simplest form  with two constants Koverall and Boverall it looks like this:

sum_to_be_minimized = sum (FOBS**2 -  Koverall * FCALC**2 * exp(-1/d**2 * 
Boverall) )

Then one can include bulk solvent correction, anisotripic scaling, … In PHENIX 
it gets quite complex.

Hence, almost regardless of the average model B you will always get the same map, 
because the “B" of the map will reflect the B of the FOBS.  When all atomic Bs 
are equal then they are also equal to average B.

best, dusan



On 7 Mar 2020, at 01:01, CCP4BB automatic digest system 
 wrote:


On Thu, 5 Mar 2020 01:11:33 +0100, James Holton  wrote:


The funny thing is, although we generally regard resolution as a primary
indicator of data quality the appearance of a density map at the classic
"1-sigma" contour has very little to do with resolution, and everything
to do with the B factor.

Seriously, try it. Take any structure you like, set all the B factors to
30 with PDBSET, calculate a map with SFALL or phenix.fmodel and have a
look at the density of tyrosine (Tyr) side chains.  Even if you
calculate structure factors all the way out to 1.0 A the holes in the
Tyr rings look exactly the same: just barely starting to form.  This is
because the structure factors from atoms with B=30 are essentially zero
out at 1.0 A, and adding zeroes does not change the map.  You can adjust
the contour level, of course, and solvent content will have some effect
on where the "1-sigma" contour lies, but generally B=30 is the point
where Tyr side chains start to form their holes.  Traditionally, this is
attributed to 1.8A resolution, but it is really at B=30.  The point
where waters first start to poke out above the 1-sigma contour is at
B=60, despite being generally attributed to d=2.7A.

Now, of course, if you cut off this B=30 data at 3.5A then the Tyr side
chains become blobs, but that is equivalent to collecting data with the
detector way too far away and losing your high-resolution spots off the
edges.  I have seen a few people do that, but not usually for a
published structure.  Most people fight very hard for those faint,
barely-existing high-angle spots.  But why do we do that if the map is
going to look the same anyway?  The reason is because resolution and B
factors are linked.

Resolution is about separation vs width, and the width of the density
peak from any atom is set by its B factor.  Yes, atoms have an intrinsic
width, but it is very quickly washed out by even modest B factors (B >
10).  This is true for both x-ray and electron form factors. To a very
good approximation, the FWHM of C, N and O atoms is given by:
FWHM= sqrt(B*log(2))/pi+0.15

where "B" is the B factor assigned to the atom and the 0.15 fudge factor
accounts for its intrinsic width when B=0.  Now that we know the peak
width, we can start to ask if two peaks are "resolved".

Start with the classical definition of "resolution" (call it after Airy,
Raleigh, Dawes, or whatever famous person you like), but essentially you
are asking the question: "how close can two peaks be before they merge
into one peak?".  For Gaussian peaks this is 0.849*FWHM. Simple enough.
However, when you look at the density of two atoms this far apart you
will see the peak is highly oblong. Yes, the density has one maximum,
but there are clearly two atoms in there.  It is also pretty obvious the
long axis of the peak is the line between the two atoms, and if you fit
two round atoms into this peak you recover the distance between them
quite accurately.  Are they really not "resolved" if it is so clear
where they are?

In such cases you usually want to sharpen, as that will make the oblong
blob turn into two resolved peaks.  Sharpening reduces the B factor and
therefore FWHM of every atom, making the "resolution" (0.849*FWHM) a
shorter distance.  So, we have improved resolution with sharpening!  Why
don't we always do this?  Well, the reason is because of noise.
Sharpening up-weights the noise of high-order Fourier terms and
therefore degrades the overall signal-to-noise (SNR) of the map.  This
is what I believe Colin would call reduced "contrast".  Of course, since
we view 

Re: [ccp4bb] [3dem] Which resolution?

2020-03-07 Thread dusan turk
James,

The case you’ve chosen is not a good illustration of the relationship between 
atomic B and resolution.   The problem is that during scaling of Fcalc to Fobs 
also B-factor difference between the two sets of numbers is minimized. In the 
simplest form  with two constants Koverall and Boverall it looks like this:

sum_to_be_minimized = sum (FOBS**2 -  Koverall * FCALC**2 * exp(-1/d**2 * 
Boverall) )

Then one can include bulk solvent correction, anisotripic scaling, … In PHENIX 
it gets quite complex.  

Hence, almost regardless of the average model B you will always get the same 
map, because the “B" of the map will reflect the B of the FOBS.  When all 
atomic Bs are equal then they are also equal to average B.

best, dusan


> On 7 Mar 2020, at 01:01, CCP4BB automatic digest system 
>  wrote:
> 
>> On Thu, 5 Mar 2020 01:11:33 +0100, James Holton  wrote:
>> 
>>> The funny thing is, although we generally regard resolution as a primary
>>> indicator of data quality the appearance of a density map at the classic
>>> "1-sigma" contour has very little to do with resolution, and everything
>>> to do with the B factor.
>>> 
>>> Seriously, try it. Take any structure you like, set all the B factors to
>>> 30 with PDBSET, calculate a map with SFALL or phenix.fmodel and have a
>>> look at the density of tyrosine (Tyr) side chains.  Even if you
>>> calculate structure factors all the way out to 1.0 A the holes in the
>>> Tyr rings look exactly the same: just barely starting to form.  This is
>>> because the structure factors from atoms with B=30 are essentially zero
>>> out at 1.0 A, and adding zeroes does not change the map.  You can adjust
>>> the contour level, of course, and solvent content will have some effect
>>> on where the "1-sigma" contour lies, but generally B=30 is the point
>>> where Tyr side chains start to form their holes.  Traditionally, this is
>>> attributed to 1.8A resolution, but it is really at B=30.  The point
>>> where waters first start to poke out above the 1-sigma contour is at
>>> B=60, despite being generally attributed to d=2.7A.
>>> 
>>> Now, of course, if you cut off this B=30 data at 3.5A then the Tyr side
>>> chains become blobs, but that is equivalent to collecting data with the
>>> detector way too far away and losing your high-resolution spots off the
>>> edges.  I have seen a few people do that, but not usually for a
>>> published structure.  Most people fight very hard for those faint,
>>> barely-existing high-angle spots.  But why do we do that if the map is
>>> going to look the same anyway?  The reason is because resolution and B
>>> factors are linked.
>>> 
>>> Resolution is about separation vs width, and the width of the density
>>> peak from any atom is set by its B factor.  Yes, atoms have an intrinsic
>>> width, but it is very quickly washed out by even modest B factors (B >
>>> 10).  This is true for both x-ray and electron form factors. To a very
>>> good approximation, the FWHM of C, N and O atoms is given by:
>>> FWHM= sqrt(B*log(2))/pi+0.15
>>> 
>>> where "B" is the B factor assigned to the atom and the 0.15 fudge factor
>>> accounts for its intrinsic width when B=0.  Now that we know the peak
>>> width, we can start to ask if two peaks are "resolved".
>>> 
>>> Start with the classical definition of "resolution" (call it after Airy,
>>> Raleigh, Dawes, or whatever famous person you like), but essentially you
>>> are asking the question: "how close can two peaks be before they merge
>>> into one peak?".  For Gaussian peaks this is 0.849*FWHM. Simple enough.
>>> However, when you look at the density of two atoms this far apart you
>>> will see the peak is highly oblong. Yes, the density has one maximum,
>>> but there are clearly two atoms in there.  It is also pretty obvious the
>>> long axis of the peak is the line between the two atoms, and if you fit
>>> two round atoms into this peak you recover the distance between them
>>> quite accurately.  Are they really not "resolved" if it is so clear
>>> where they are?
>>> 
>>> In such cases you usually want to sharpen, as that will make the oblong
>>> blob turn into two resolved peaks.  Sharpening reduces the B factor and
>>> therefore FWHM of every atom, making the "resolution" (0.849*FWHM) a
>>> shorter distance.  So, we have improved resolution with sharpening!  Why
>>> don't we always do this?  Well, the reason is because of noise.
>>> Sharpening up-weights the noise of high-order Fourier terms and
>>> therefore degrades the overall signal-to-noise (SNR) of the map.  This
>>> is what I believe Colin would call reduced "contrast".  Of course, since
>>> we view maps with a threshold (aka contour) a map with SNR=5 will look
>>> almost identical to a map with SNR=500. The "noise floor" is generally
>>> well below the 1-sigma threshold, or even the 0-sigma threshold
>>> (https://doi.org/10.1073/pnas.1302823110).  As you turn up the
>>> sharpening you will see blobs split apart and also see new peaks rising
>>> 

Re: [ccp4bb] [3dem] Which resolution?

2020-03-06 Thread Pavel Afonine
e is the possibility of
> >>> confusion regarding the use of the word threshold. I fully agree that
> >>> a half bit information threshold is inappropriate if it is taken to
> >>> mean that the data should be truncated at that resolution. The ever
> >>> more sophisticated refinement programs are becoming adept at handling
> >>> the noisy data.
> >>>
> >>> The half bit information threshold I was discussing refers to a
> >>> nominal resolution. This is not just for trivial reporting purposes.
> >>> The half bit threshold is being used to compare imaging methods and
> >>> perhaps demonstrate that significant information is present with a
> >>> dose below any radiation damage threshold (that word again). The
> >>> justification for doing this appears to come from the fact it has been
> >>> adopted for protein structure determination by single particle
> >>> electron microscopy. However, low contrast features might not be
> >>> visible at this nominal resolution.
> >>>
> >>> The analogy with protein crystallography might be to collect data
> >>> below an absorption edge to give a nominal resolution of 2 angstrom.
> >>> Then do it again well above the absorption edge. The second one gives
> >>> much greater Bijvoet differences despite the fact that the nominal
> >>> resolution is the same. I doubt whether anyone doing this would be
> >>> misled by this as they would examine the statistics for the Bijvoet
> >>> differences instead. However, it does indicate the relationship
> >>> between contrast and resolution.
> >>>
> >>> The question, if referring to an information threshold for nominal
> >>> resolution, could be “Is there significant information in the data at
> >>> the required contrast and resolution?”. Then “Can one obtain this
> >>> information at a dose below any radiation damage limit”
> >>>
> >>> Keep posting!
> >>>
> >>> Regards
> >>>
> >>> Colin
> >>>
> >>> *From:*James Holton 
> >>> *Sent:* 27 February 2020 01:14
> >>> *To:* CCP4BB@JISCMAIL.AC.UK
> >>> *Cc:* Nave, Colin (DLSLtd,RAL,LSCI) 
> >>> *Subject:* Re: [ccp4bb] [3dem] Which resolution?
> >>>
> >>> In my opinion the threshold should be zero bits.  Yes, this is where
> >>> CC1/2 = 0 (or FSC = 0).  If there is correlation then there is
> >>> information, and why throw out information if there is information to
> >>> be had?  Yes, this information comes with noise attached, but that is
> >>> why we have weights.
> >>>
> >>> It is also important to remember that zero intensity is still useful
> >>> information.  Systematic absences are an excellent example.  They have
> >>> no intensity at all, but they speak volumes about the structure.  In a
> >>> similar way, high-angle zero-intensity observations also tell us
> >>> something.  Ever tried unrestrained B factor refinement at poor
> >>> resolution?  It is hard to do nowadays because of all the safety
> >>> catches in modern software, but you can get great R factors this way.
> >>> A telltale sign of this kind of "over fitting" is remarkably large
> >>> Fcalc values beyond the resolution cutoff.  These don't contribute to
> >>> the R factor, however, because Fobs is missing for these hkls. So,
> >>> including zero-intensity data suppresses at least some types of
> >>> over-fitting.
> >>>
> >>> The thing I like most about the zero-information resolution cutoff is
> >>> that it forces us to address the real problem: what do you mean by
> >>> "resolution" ?  Not long ago, claiming your resolution was 3.0 A meant
> >>> that after discarding all spots with individual I/sigI < 3 you still
> >>> have 80% completeness in the 3.0 A bin.  Now we are saying we have a
> >>> 3.0 A data set when we can prove statistically that a few
> >>> non-background counts fell into the sum of all spot areas at 3.0 A.
> >>> These are not the same thing.
> >>>
> >>> Don't get me wrong, including the weak high-resolution information
> >>> makes the model better, and indeed I am even advocating including all
> >>> the noisy zeroes.  However, weak data at 3.0 A is never going to be as
> >>> good as having strong data at 3.0 A.  So, how 

Re: [ccp4bb] [3dem] Which resolution?

2020-03-06 Thread James Holton
*pi*r/d)
sinc3(x) = (x==0?1:3*(sin(x)/x-cos(x))/(x*x))

where kernel_d(r) is the normalized weight given to a point "r" Angstrom
away from the center of each blurring operation, and "sinc3" is the
Fourier synthesis of a solid sphere.  That is, if you make an HKL file
with all F=1 and PHI=0 out to a resolution d, then effectively all hkls
beyond the resolution limit are zero. If you calculate a map with those
Fs, you will find the kernel_d(r) function at the origin.  What that
means is: by applying a resolution cutoff, you are effectively
multiplying your data by this sphere of unit Fs, and since a
multiplication in reciprocal space is a convolution in real space, the
effect is convoluting (blurring) with kernel_d(x).

For comparison, if you apply a B factor, the real-space blurring kernel
is this:
kernel_B(r) = (4*pi/B)**1.5*exp(-4*pi**2/B*r*r)

If you graph these two kernels (format is for gnuplot) you will find
that they have the same FWHM whenever B=80*(d/3)**2.  This "rule" is the
one I used for my resolution demonstration movie I made back in the late
20th century:
https://bl831.als.lbl.gov/~jamesh/movies/index.html#resolution

What I did then was set all atomic B factors to B = 80*(d/3)^2 and then
cut the resolution at "d".  Seemed sensible at the time.  I suppose I
could have used the PDB-wide average atomic B factor reported for
structures with resolution "d", which roughly follows:
B = 4*d**2+12
https://bl831.als.lbl.gov/~jamesh/pickup/reso_vs_avgB.png

The reason I didn't use this formula for the movie is because I didn't
figure it out until about 10 years later.  These two curves cross at
1.5A, but diverge significantly at poor resolution.  So, which one is
right?  It depends on how well you can measure really really faint
spots, and we've been getting better at that in recent decades.

So, what I'm trying to say here is that just because your data has CC1/2
or FSC dropping off to insignificance at 1.8 A doesn't mean you are
going to see holes in Tyr side chains.  However, if you measure your
weak, high-res data really well (high multiplicity), you might be able
to sharpen your way to a much clearer map.

-James Holton
MAD Scientist

On 2/27/2020 11:01 AM, Nave, Colin (DLSLtd,RAL,LSCI) wrote:

James

All you say seems sensible to me but there is the possibility of
confusion regarding the use of the word threshold. I fully agree that
a half bit information threshold is inappropriate if it is taken to
mean that the data should be truncated at that resolution. The ever
more sophisticated refinement programs are becoming adept at handling
the noisy data.

The half bit information threshold I was discussing refers to a
nominal resolution. This is not just for trivial reporting purposes.
The half bit threshold is being used to compare imaging methods and
perhaps demonstrate that significant information is present with a
dose below any radiation damage threshold (that word again). The
justification for doing this appears to come from the fact it has been
adopted for protein structure determination by single particle
electron microscopy. However, low contrast features might not be
visible at this nominal resolution.

The analogy with protein crystallography might be to collect data
below an absorption edge to give a nominal resolution of 2 angstrom.
Then do it again well above the absorption edge. The second one gives
much greater Bijvoet differences despite the fact that the nominal
resolution is the same. I doubt whether anyone doing this would be
misled by this as they would examine the statistics for the Bijvoet
differences instead. However, it does indicate the relationship
between contrast and resolution.

The question, if referring to an information threshold for nominal
resolution, could be “Is there significant information in the data at
the required contrast and resolution?”. Then “Can one obtain this
information at a dose below any radiation damage limit”

Keep posting!

Regards

Colin

*From:*James Holton 
*Sent:* 27 February 2020 01:14
*To:* CCP4BB@JISCMAIL.AC.UK
*Cc:* Nave, Colin (DLSLtd,RAL,LSCI) 
*Subject:* Re: [ccp4bb] [3dem] Which resolution?

In my opinion the threshold should be zero bits.  Yes, this is where
CC1/2 = 0 (or FSC = 0).  If there is correlation then there is
information, and why throw out information if there is information to
be had?  Yes, this information comes with noise attached, but that is
why we have weights.

It is also important to remember that zero intensity is still useful
information.  Systematic absences are an excellent example.  They have
no intensity at all, but they speak volumes about the structure.  In a
similar way, high-angle zero-intensity observations also tell us
something.  Ever tried unrestrained B factor refinement at poor
resolution?  It is hard to do nowadays because of all the safety
catches in modern software, but you can get great R factors this way.
A telltale sign of this kind of "over fit

Re: [ccp4bb] [3dem] Which resolution?

2020-03-05 Thread Kay Diederichs
>20th century:
>https://bl831.als.lbl.gov/~jamesh/movies/index.html#resolution
>
>What I did then was set all atomic B factors to B = 80*(d/3)^2 and then
>cut the resolution at "d".  Seemed sensible at the time.  I suppose I
>could have used the PDB-wide average atomic B factor reported for
>structures with resolution "d", which roughly follows:
>B = 4*d**2+12
>https://bl831.als.lbl.gov/~jamesh/pickup/reso_vs_avgB.png
>
>The reason I didn't use this formula for the movie is because I didn't
>figure it out until about 10 years later.  These two curves cross at
>1.5A, but diverge significantly at poor resolution.  So, which one is
>right?  It depends on how well you can measure really really faint
>spots, and we've been getting better at that in recent decades.
>
>So, what I'm trying to say here is that just because your data has CC1/2
>or FSC dropping off to insignificance at 1.8 A doesn't mean you are
>going to see holes in Tyr side chains.  However, if you measure your
>weak, high-res data really well (high multiplicity), you might be able
>to sharpen your way to a much clearer map.
>
>-James Holton
>MAD Scientist
>
>On 2/27/2020 11:01 AM, Nave, Colin (DLSLtd,RAL,LSCI) wrote:
>>
>> James
>>
>> All you say seems sensible to me but there is the possibility of
>> confusion regarding the use of the word threshold. I fully agree that
>> a half bit information threshold is inappropriate if it is taken to
>> mean that the data should be truncated at that resolution. The ever
>> more sophisticated refinement programs are becoming adept at handling
>> the noisy data.
>>
>> The half bit information threshold I was discussing refers to a
>> nominal resolution. This is not just for trivial reporting purposes.
>> The half bit threshold is being used to compare imaging methods and
>> perhaps demonstrate that significant information is present with a
>> dose below any radiation damage threshold (that word again). The
>> justification for doing this appears to come from the fact it has been
>> adopted for protein structure determination by single particle
>> electron microscopy. However, low contrast features might not be
>> visible at this nominal resolution.
>>
>> The analogy with protein crystallography might be to collect data
>> below an absorption edge to give a nominal resolution of 2 angstrom.
>> Then do it again well above the absorption edge. The second one gives
>> much greater Bijvoet differences despite the fact that the nominal
>> resolution is the same. I doubt whether anyone doing this would be
>> misled by this as they would examine the statistics for the Bijvoet
>> differences instead. However, it does indicate the relationship
>> between contrast and resolution.
>>
>> The question, if referring to an information threshold for nominal
>> resolution, could be “Is there significant information in the data at
>> the required contrast and resolution?”. Then “Can one obtain this
>> information at a dose below any radiation damage limit”
>>
>> Keep posting!
>>
>> Regards
>>
>> Colin
>>
>> *From:*James Holton 
>> *Sent:* 27 February 2020 01:14
>> *To:* CCP4BB@JISCMAIL.AC.UK
>> *Cc:* Nave, Colin (DLSLtd,RAL,LSCI) 
>> *Subject:* Re: [ccp4bb] [3dem] Which resolution?
>>
>> In my opinion the threshold should be zero bits.  Yes, this is where
>> CC1/2 = 0 (or FSC = 0).  If there is correlation then there is
>> information, and why throw out information if there is information to
>> be had?  Yes, this information comes with noise attached, but that is
>> why we have weights.
>>
>> It is also important to remember that zero intensity is still useful
>> information.  Systematic absences are an excellent example.  They have
>> no intensity at all, but they speak volumes about the structure.  In a
>> similar way, high-angle zero-intensity observations also tell us
>> something.  Ever tried unrestrained B factor refinement at poor
>> resolution?  It is hard to do nowadays because of all the safety
>> catches in modern software, but you can get great R factors this way. 
>> A telltale sign of this kind of "over fitting" is remarkably large
>> Fcalc values beyond the resolution cutoff.  These don't contribute to
>> the R factor, however, because Fobs is missing for these hkls. So,
>> including zero-intensity data suppresses at least some types of
>> over-fitting.
>>
>> The thing I like most about the zero-information resolution cutoff is
>> that it forces us to address the real problem: what do you mean by
>> "re

Re: [ccp4bb] [3dem] Which resolution?

2020-03-04 Thread James Holton
pening" again.


Why not use a filter that is non-Gaussian?  We do this all the time!  
Cutting off the data at a given resolution (d) is equivalent to blurring 
the map with this function:


kernel_d(r) = 4/3*pi/d**3*sinc3(2*pi*r/d)
sinc3(x) = (x==0?1:3*(sin(x)/x-cos(x))/(x*x))

where kernel_d(r) is the normalized weight given to a point "r" Angstrom 
away from the center of each blurring operation, and "sinc3" is the 
Fourier synthesis of a solid sphere.  That is, if you make an HKL file 
with all F=1 and PHI=0 out to a resolution d, then effectively all hkls 
beyond the resolution limit are zero. If you calculate a map with those 
Fs, you will find the kernel_d(r) function at the origin.  What that 
means is: by applying a resolution cutoff, you are effectively 
multiplying your data by this sphere of unit Fs, and since a 
multiplication in reciprocal space is a convolution in real space, the 
effect is convoluting (blurring) with kernel_d(x).


For comparison, if you apply a B factor, the real-space blurring kernel 
is this:

kernel_B(r) = (4*pi/B)**1.5*exp(-4*pi**2/B*r*r)

If you graph these two kernels (format is for gnuplot) you will find 
that they have the same FWHM whenever B=80*(d/3)**2.  This "rule" is the 
one I used for my resolution demonstration movie I made back in the late 
20th century:

https://bl831.als.lbl.gov/~jamesh/movies/index.html#resolution

What I did then was set all atomic B factors to B = 80*(d/3)^2 and then 
cut the resolution at "d".  Seemed sensible at the time.  I suppose I 
could have used the PDB-wide average atomic B factor reported for 
structures with resolution "d", which roughly follows:

B = 4*d**2+12
https://bl831.als.lbl.gov/~jamesh/pickup/reso_vs_avgB.png

The reason I didn't use this formula for the movie is because I didn't 
figure it out until about 10 years later.  These two curves cross at 
1.5A, but diverge significantly at poor resolution.  So, which one is 
right?  It depends on how well you can measure really really faint 
spots, and we've been getting better at that in recent decades.


So, what I'm trying to say here is that just because your data has CC1/2 
or FSC dropping off to insignificance at 1.8 A doesn't mean you are 
going to see holes in Tyr side chains.  However, if you measure your 
weak, high-res data really well (high multiplicity), you might be able 
to sharpen your way to a much clearer map.


-James Holton
MAD Scientist

On 2/27/2020 11:01 AM, Nave, Colin (DLSLtd,RAL,LSCI) wrote:


James

All you say seems sensible to me but there is the possibility of 
confusion regarding the use of the word threshold. I fully agree that 
a half bit information threshold is inappropriate if it is taken to 
mean that the data should be truncated at that resolution. The ever 
more sophisticated refinement programs are becoming adept at handling 
the noisy data.


The half bit information threshold I was discussing refers to a 
nominal resolution. This is not just for trivial reporting purposes. 
The half bit threshold is being used to compare imaging methods and 
perhaps demonstrate that significant information is present with a 
dose below any radiation damage threshold (that word again). The 
justification for doing this appears to come from the fact it has been 
adopted for protein structure determination by single particle 
electron microscopy. However, low contrast features might not be 
visible at this nominal resolution.


The analogy with protein crystallography might be to collect data 
below an absorption edge to give a nominal resolution of 2 angstrom. 
Then do it again well above the absorption edge. The second one gives 
much greater Bijvoet differences despite the fact that the nominal 
resolution is the same. I doubt whether anyone doing this would be 
misled by this as they would examine the statistics for the Bijvoet 
differences instead. However, it does indicate the relationship 
between contrast and resolution.


The question, if referring to an information threshold for nominal 
resolution, could be “Is there significant information in the data at 
the required contrast and resolution?”. Then “Can one obtain this 
information at a dose below any radiation damage limit”


Keep posting!

Regards

Colin

*From:*James Holton 
*Sent:* 27 February 2020 01:14
*To:* CCP4BB@JISCMAIL.AC.UK
*Cc:* Nave, Colin (DLSLtd,RAL,LSCI) 
*Subject:* Re: [ccp4bb] [3dem] Which resolution?

In my opinion the threshold should be zero bits.  Yes, this is where 
CC1/2 = 0 (or FSC = 0).  If there is correlation then there is 
information, and why throw out information if there is information to 
be had?  Yes, this information comes with noise attached, but that is 
why we have weights.


It is also important to remember that zero intensity is still useful 
information.  Systematic absences are an excellent example.  They have 
no intensity at all, but they speak volumes about the structure.  In a 
similar way, h

Re: [ccp4bb] [3dem] Which resolution?

2020-02-27 Thread Keller, Jacob
How would one evaluate the information content of systematic absences?

JPK

On Feb 26, 2020 8:14 PM, James Holton  wrote:
In my opinion the threshold should be zero bits.  Yes, this is where CC1/2 = 0 
(or FSC = 0).  If there is correlation then there is information, and why throw 
out information if there is information to be had?  Yes, this information comes 
with noise attached, but that is why we have weights.

It is also important to remember that zero intensity is still useful 
information.  Systematic absences are an excellent example.  They have no 
intensity at all, but they speak volumes about the structure.  In a similar 
way, high-angle zero-intensity observations also tell us something.  Ever tried 
unrestrained B factor refinement at poor resolution?  It is hard to do nowadays 
because of all the safety catches in modern software, but you can get great R 
factors this way.  A telltale sign of this kind of "over fitting" is remarkably 
large Fcalc values beyond the resolution cutoff.  These don't contribute to the 
R factor, however, because Fobs is missing for these hkls. So, including 
zero-intensity data suppresses at least some types of over-fitting.

The thing I like most about the zero-information resolution cutoff is that it 
forces us to address the real problem: what do you mean by "resolution" ?  Not 
long ago, claiming your resolution was 3.0 A meant that after discarding all 
spots with individual I/sigI < 3 you still have 80% completeness in the 3.0 A 
bin.  Now we are saying we have a 3.0 A data set when we can prove 
statistically that a few non-background counts fell into the sum of all spot 
areas at 3.0 A.  These are not the same thing.

Don't get me wrong, including the weak high-resolution information makes the 
model better, and indeed I am even advocating including all the noisy zeroes.  
However, weak data at 3.0 A is never going to be as good as having strong data 
at 3.0 A.  So, how do we decide?  I personally think that the resolution 
assigned to the PDB deposition should remain the classical I/sigI > 3 at 80% 
rule.  This is really the only way to have meaningful comparison of resolution 
between very old and very new structures.  One should, of course, deposit all 
the data, but don't claim that cut-off as your "resolution".  That is just 
plain unfair to those who came before.

Oh yeah, and I also have a session on "interpreting low-resolution maps" at the 
GRC this year.  
https://www.grc.org/diffraction-methods-in-structural-biology-conference/2020/<https://urldefense.com/v3/__https://www.grc.org/diffraction-methods-in-structural-biology-conference/2020/__;!!Eh6p8Q!TK-tIY-zm5coRu74uWMkIJkTFWNz4-1ibr1oaahxT_2BAAetUTMNdfRqUCmIqtYfhDo$>

So, please, let the discussion continue!

-James Holton
MAD Scientist

On 2/22/2020 11:06 AM, Nave, Colin (DLSLtd,RAL,LSCI) wrote:
Alexis
This is a very useful summary.

You say you were not convinced by Marin's derivation in 2005. Are you convinced 
now and, if not, why?

My interest in this is that the FSC with half bit thresholds have the danger of 
being adopted elsewhere because they are becoming standard for protein 
structure determination (by EM or MX). If it is used for these mature 
techniques it must be right!

It is the adoption of the ½ bit threshold I worry about. I gave a rather weak 
example for MX which consisted of partial occupancy of side chains, substrates 
etc. For x-ray imaging a wide range of contrasts can occur and, if you want to 
see features with only a small contrast above the surroundings then I think the 
half bit threshold would be inappropriate.

It would be good to see a clear message from the MX and EM communities as to 
why an information content threshold of ½ a bit is generally appropriate for 
these techniques and an acknowledgement that this threshold is 
technique/problem dependent.

We might then progress from the bronze age to the iron age.

Regards
Colin



From: CCP4 bulletin board <mailto:CCP4BB@JISCMAIL.AC.UK> 
On Behalf Of Alexis Rohou
Sent: 21 February 2020 16:35
To: CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>
Subject: Re: [ccp4bb] [3dem] Which resolution?

Hi all,

For those bewildered by Marin's insistence that everyone's been messing up 
their stats since the bronze age, I'd like to offer what my understanding of 
the situation. More details in this thread from a few years ago on the exact 
same topic:
https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html<https://urldefense.com/v3/__https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html__;!!Eh6p8Q!TK-tIY-zm5coRu74uWMkIJkTFWNz4-1ibr1oaahxT_2BAAetUTMNdfRqUCmIsJF61uc$>
https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html<https://urldefense.com/v3/__https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html__;!!Eh6p8Q!TK-tIY-zm5coRu74uWMkIJkTFWNz4-1ibr1oaahxT_2BAAetUTMNdfRqUCmIPu-nRBo$>

Notwithstan

Re: [ccp4bb] [3dem] Which resolution?

2020-02-27 Thread Nave, Colin (DLSLtd,RAL,LSCI)
James
All you say seems sensible to me but there is the possibility of confusion 
regarding the use of the word threshold. I fully agree that a half bit 
information threshold is inappropriate if it is taken to mean that the data 
should be truncated at that resolution. The ever more sophisticated refinement 
programs are becoming adept at handling the noisy data.

The half bit information threshold I was discussing refers to a nominal 
resolution. This is not just for trivial reporting purposes. The half bit 
threshold is being used to compare imaging methods and perhaps demonstrate that 
significant information is present with a dose below any radiation damage 
threshold (that word again). The justification for doing this appears to come 
from the fact it has been adopted for protein structure determination by single 
particle electron microscopy. However, low contrast features might not be 
visible at this nominal resolution.

The analogy with protein crystallography might be to collect data below an 
absorption edge to give a nominal resolution of 2 angstrom. Then do it again 
well above the absorption edge. The second one gives much greater Bijvoet 
differences despite the fact that the nominal resolution is the same. I doubt 
whether anyone doing this would be misled by this as they would examine the 
statistics for the Bijvoet differences instead. However, it does indicate the 
relationship between contrast and resolution.

The question, if referring to an information threshold for nominal resolution, 
could be “Is there significant information in the data at the required contrast 
and resolution?”. Then “Can one obtain this information at a dose below any 
radiation damage limit”

Keep posting!
Regards
  Colin
From: James Holton 
Sent: 27 February 2020 01:14
To: CCP4BB@JISCMAIL.AC.UK
Cc: Nave, Colin (DLSLtd,RAL,LSCI) 
Subject: Re: [ccp4bb] [3dem] Which resolution?

In my opinion the threshold should be zero bits.  Yes, this is where CC1/2 = 0 
(or FSC = 0).  If there is correlation then there is information, and why throw 
out information if there is information to be had?  Yes, this information comes 
with noise attached, but that is why we have weights.

It is also important to remember that zero intensity is still useful 
information.  Systematic absences are an excellent example.  They have no 
intensity at all, but they speak volumes about the structure.  In a similar 
way, high-angle zero-intensity observations also tell us something.  Ever tried 
unrestrained B factor refinement at poor resolution?  It is hard to do nowadays 
because of all the safety catches in modern software, but you can get great R 
factors this way.  A telltale sign of this kind of "over fitting" is remarkably 
large Fcalc values beyond the resolution cutoff.  These don't contribute to the 
R factor, however, because Fobs is missing for these hkls. So, including 
zero-intensity data suppresses at least some types of over-fitting.

The thing I like most about the zero-information resolution cutoff is that it 
forces us to address the real problem: what do you mean by "resolution" ?  Not 
long ago, claiming your resolution was 3.0 A meant that after discarding all 
spots with individual I/sigI < 3 you still have 80% completeness in the 3.0 A 
bin.  Now we are saying we have a 3.0 A data set when we can prove 
statistically that a few non-background counts fell into the sum of all spot 
areas at 3.0 A.  These are not the same thing.

Don't get me wrong, including the weak high-resolution information makes the 
model better, and indeed I am even advocating including all the noisy zeroes.  
However, weak data at 3.0 A is never going to be as good as having strong data 
at 3.0 A.  So, how do we decide?  I personally think that the resolution 
assigned to the PDB deposition should remain the classical I/sigI > 3 at 80% 
rule.  This is really the only way to have meaningful comparison of resolution 
between very old and very new structures.  One should, of course, deposit all 
the data, but don't claim that cut-off as your "resolution".  That is just 
plain unfair to those who came before.

Oh yeah, and I also have a session on "interpreting low-resolution maps" at the 
GRC this year.  
https://www.grc.org/diffraction-methods-in-structural-biology-conference/2020/

So, please, let the discussion continue!

-James Holton
MAD Scientist
On 2/22/2020 11:06 AM, Nave, Colin (DLSLtd,RAL,LSCI) wrote:
Alexis
This is a very useful summary.

You say you were not convinced by Marin's derivation in 2005. Are you convinced 
now and, if not, why?

My interest in this is that the FSC with half bit thresholds have the danger of 
being adopted elsewhere because they are becoming standard for protein 
structure determination (by EM or MX). If it is used for these mature 
techniques it must be right!

It is the adoption of the ½ bit threshold I worry about. I gave a rather weak 
example for MX which

Re: [ccp4bb] [3dem] Which resolution?

2020-02-26 Thread James Holton
In my opinion the threshold should be zero bits.  Yes, this is where 
CC1/2 = 0 (or FSC = 0).  If there is correlation then there is 
information, and why throw out information if there is information to be 
had?  Yes, this information comes with noise attached, but that is why 
we have weights.


It is also important to remember that zero intensity is still useful 
information.  Systematic absences are an excellent example.  They have 
no intensity at all, but they speak volumes about the structure.  In a 
similar way, high-angle zero-intensity observations also tell us 
something.  Ever tried unrestrained B factor refinement at poor 
resolution?  It is hard to do nowadays because of all the safety catches 
in modern software, but you can get great R factors this way.  A 
telltale sign of this kind of "over fitting" is remarkably large Fcalc 
values beyond the resolution cutoff.  These don't contribute to the R 
factor, however, because Fobs is missing for these hkls. So, including 
zero-intensity data suppresses at least some types of over-fitting.


The thing I like most about the zero-information resolution cutoff is 
that it forces us to address the real problem: what do you mean by 
"resolution" ?  Not long ago, claiming your resolution was 3.0 A meant 
that after discarding all spots with individual I/sigI < 3 you still 
have 80% completeness in the 3.0 A bin.  Now we are saying we have a 3.0 
A data set when we can prove statistically that a few non-background 
counts fell into the sum of all spot areas at 3.0 A. These are not the 
same thing.


Don't get me wrong, including the weak high-resolution information makes 
the model better, and indeed I am even advocating including all the 
noisy zeroes.  However, weak data at 3.0 A is never going to be as good 
as having strong data at 3.0 A.  So, how do we decide?  I personally 
think that the resolution assigned to the PDB deposition should remain 
the classical I/sigI > 3 at 80% rule.  This is really the only way to 
have meaningful comparison of resolution between very old and very new 
structures.  One should, of course, deposit all the data, but don't 
claim that cut-off as your "resolution".  That is just plain unfair to 
those who came before.


Oh yeah, and I also have a session on "interpreting low-resolution maps" 
at the GRC this year. 
https://www.grc.org/diffraction-methods-in-structural-biology-conference/2020/


So, please, let the discussion continue!

-James Holton
MAD Scientist

On 2/22/2020 11:06 AM, Nave, Colin (DLSLtd,RAL,LSCI) wrote:


Alexis

This is a very useful summary.

You say you were not convinced by Marin's derivation in 2005. Are you 
convinced now and, if not, why?


My interest in this is that the FSC with half bit thresholds have the 
danger of being adopted elsewhere because they are becoming standard 
for protein structure determination (by EM or MX). If it is used for 
these mature techniques it must be right!


It is the adoption of the ½ bit threshold I worry about. I gave a 
rather weak example for MX which consisted of partial occupancy of 
side chains, substrates etc. For x-ray imaging a wide range of 
contrasts can occur and, if you want to see features with only a small 
contrast above the surroundings then I think the half bit threshold 
would be inappropriate.


It would be good to see a clear message from the MX and EM communities 
as to why an information content threshold of ½ a bit is generally 
appropriate for these techniques and an acknowledgement that this 
threshold is technique/problem dependent.


We might then progress from the bronze age to the iron age.

Regards

Colin

*From:*CCP4 bulletin board  *On Behalf Of 
*Alexis Rohou

*Sent:* 21 February 2020 16:35
*To:* CCP4BB@JISCMAIL.AC.UK
*Subject:* Re: [ccp4bb] [3dem] Which resolution?

Hi all,

For those bewildered by Marin's insistence that everyone's been 
messing up their stats since the bronze age, I'd like to offer what my 
understanding of the situation. More details in this thread from a few 
years ago on the exact same topic:


https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html

https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html

Notwithstanding notational problems (e.g. strict equations as opposed 
to approximation symbols, or omission of symbols to denote 
estimation), I believe Frank & Al-Ali and "descendent" papers (e.g. 
appendix of Rosenthal & Henderson 2003) are fine. The cross terms that 
Marin is agitated about indeed do in fact have an expectation value of 
0.0 (in the ensemble; if the experiment were performed an infinite 
number of times with different realizations of noise). I don't believe 
Pawel or Jose Maria or any of the other authors really believe that 
the cross-terms are orthogonal.


When N (the number of independent Fouier voxels in a shell) is large 
enough, mean(Signal x Noise) ~ 0.0 is only an approximation, but a

[ccp4bb] AW: [ccp4bb] [3dem] Which resolution?

2020-02-23 Thread Jon Hughes
absolutely.
jon

-Ursprüngliche Nachricht-
Von: CCP4 bulletin board  Im Auftrag von Gerard Bricogne
Gesendet: Sonntag, 23. Februar 2020 20:42
An: CCP4BB@JISCMAIL.AC.UK
Betreff: Re: [ccp4bb] [3dem] Which resolution?

Gentlemen,

 Please consider for a moment that by such intemperate language and tone, 
you are making a topic of fundamental importance to both the MX and the EM 
communities into a no-go area. This cannot be good for anyone's reputation nor 
for the two fields in general. It has to be possible to discuss the topic of 
"resolution" in a dispassionate way, so as to jointly gain an improved and 
shared understanding of the matter, without feeling implicitly under pressure 
to support one side or the other. An acrimonious dispute like this one can only 
be putting people off getting involved in the discussion, which is exactly the 
opposite of what a thread on a scientific bulletin board should be doing.


 With best wishes,

  Gerard.

--
On Sun, Feb 23, 2020 at 08:15:34AM -0300, Marin van Heel wrote:
> Hi Carlos Oscar and Jose-Maria,
> 
> I choose to answer you guys first, because it will take little of my 
> time to counter your criticism and because I have long since been less 
> than amused by your published, ill-conceived criticism:
> 
> “*Marin, I always suffer with your reference to sloppy statistics. If 
> we take your paper of 2005 where the 1/2 bit criterion was proposed, 
> Eqs. 4 to
> 15 have completely ignored the fact that you are dealing with Fourier 
> components, that are complex numbers, and consequently you have to 
> deal with random variables that have TWO components, which moreover 
> the real and imaginary part are not independent and, in their turn, 
> they are not independent of the nearby Fourier coefficients so that 
> for computing radial averages you would need to account for the correlation 
> among coefficients*”
> 
> I had seen this argumentation against our (2005) paper in your 
> manuscript/paper years back. I was so stunned by the level of 
> misunderstanding expressed in your manuscript that I chose not to 
> spend any time reacting to those statements. Now that you choose to so 
> openly display your thoughts on the matter, I have no other choice 
> than to spell out your errors in public.
> 
> 
> 
> All complex arrays in our 2005 paper are Hermitian (since they are the 
> FTs of real data), and so are all their inner products. In all the 
> integrals over rings one always averages a complex Fourier-space voxel 
> with its Hermitian conjugate yielding *ONE* real value (times two)!  
> Without that Hermitian property, FRCs and FSCs, which are real 
> normalised correlation functions would not even have been possible. I 
> was - and still am - stunned by this level of misunderstanding!
> 
> 
> 
> This is a blatant blunder that you are propagating over years, a 
> blunder that does not do any good to your reputation, yet also a 
> blunder that has probably damaged to our research income. The fact 
> that you can divulgate such rubbish and leave it out there for years 
> for referees to read (who are possibly not as well educated in physics 
> and mathematics) will do – and may already have done – damage to our 
> research.  An apology is appropriate but an apology is not enough.
> 
> 
> 
> Maybe you should ask your granting agencies how to transfer 25% of 
> your grant income to our research, in compensation of damages created 
> by your blunder!
> 
> 
> 
> Success with your request!
> 
> 
> 
> Marin
> 
> 
> 
> PS. You have also missed that our 2005 paper explicitly includes the 
> influence of the size of the object within the sampling box (your: 
> “*they are not independent of the nearby Fourier coefficients*”). I 
> remain flabbergasted.
> 
> On Fri, Feb 21, 2020 at 3:15 PM Carlos Oscar Sorzano 
> 
> wrote:
> 
> > Dear all,
> >
> > I always try to refrain myself from getting into these discussions, 
> > but I cannot resist more the temptation. Here are some more ideas 
> > that I hope bring more light than confusion:
> >
> > - There must be some functional relationship between the FSC and the 
> > SNR, but the exact analytical form of this relationship is unknown 
> > (I suspect that it must be at least monotonic, the worse the SNR, 
> > the worse FSC; but even this is difficult to prove). The 
> > relationship we normally use
> > FSC=SNR/(1+SNR) was derived in a context that does not apply to 
> > CryoEM (1D stationary signals in real space; our molecules are not 
> > stationary), and consequently any reasoning of any threshold based 
> > on this relationship is incorrect (see our review).
> >
> > - Still, as long as 

Re: [ccp4bb] [3dem] Which resolution?

2020-02-23 Thread Andreas Förster
A very good point, Gerard, but maybe too late.  It seems to me that a 
lot of microscopists have already given up this abundantly discussed 
question.  They just call everything atomic resolution irrespective of 
whatever numerical value they arrive at by whatever means.


All best.


Andreas



On 23/02/2020 8:41, Gerard Bricogne wrote:

Gentlemen,

  Please consider for a moment that by such intemperate language and
tone, you are making a topic of fundamental importance to both the MX and
the EM communities into a no-go area. This cannot be good for anyone's
reputation nor for the two fields in general. It has to be possible to
discuss the topic of "resolution" in a dispassionate way, so as to jointly
gain an improved and shared understanding of the matter, without feeling
implicitly under pressure to support one side or the other. An acrimonious
dispute like this one can only be putting people off getting involved in the
discussion, which is exactly the opposite of what a thread on a scientific
bulletin board should be doing.


  With best wishes,

   Gerard.

--
On Sun, Feb 23, 2020 at 08:15:34AM -0300, Marin van Heel wrote:

Hi Carlos Oscar and Jose-Maria,

I choose to answer you guys first, because it will take little of my time
to counter your criticism and because I have long since been less than
amused by your published, ill-conceived criticism:

“*Marin, I always suffer with your reference to sloppy statistics. If we
take your paper of 2005 where the 1/2 bit criterion was proposed, Eqs. 4 to
15 have completely ignored the fact that you are dealing with Fourier
components, that are complex numbers, and consequently you have to deal
with random variables that have TWO components, which moreover the real and
imaginary part are not independent and, in their turn, they are not
independent of the nearby Fourier coefficients so that for computing radial
averages you would need to account for the correlation among coefficients*”

I had seen this argumentation against our (2005) paper in your
manuscript/paper years back. I was so stunned by the level of
misunderstanding expressed in your manuscript that I chose not to spend any
time reacting to those statements. Now that you choose to so openly display
your thoughts on the matter, I have no other choice than to spell out your
errors in public.



All complex arrays in our 2005 paper are Hermitian (since they are the FTs
of real data), and so are all their inner products. In all the integrals
over rings one always averages a complex Fourier-space voxel with its
Hermitian conjugate yielding *ONE* real value (times two)!  Without that
Hermitian property, FRCs and FSCs, which are real normalised correlation
functions would not even have been possible. I was - and still am - stunned
by this level of misunderstanding!



This is a blatant blunder that you are propagating over years, a blunder
that does not do any good to your reputation, yet also a blunder that has
probably damaged to our research income. The fact that you can divulgate
such rubbish and leave it out there for years for referees to read (who are
possibly not as well educated in physics and mathematics) will do – and may
already have done – damage to our research.  An apology is appropriate but
an apology is not enough.



Maybe you should ask your granting agencies how to transfer 25% of your
grant income to our research, in compensation of damages created by your
blunder!



Success with your request!



Marin



PS. You have also missed that our 2005 paper explicitly includes the
influence of the size of the object within the sampling box (your: “*they
are not independent of the nearby Fourier coefficients*”). I remain
flabbergasted.

On Fri, Feb 21, 2020 at 3:15 PM Carlos Oscar Sorzano 
wrote:


Dear all,

I always try to refrain myself from getting into these discussions, but I
cannot resist more the temptation. Here are some more ideas that I hope
bring more light than confusion:

- There must be some functional relationship between the FSC and the SNR,
but the exact analytical form of this relationship is unknown (I suspect
that it must be at least monotonic, the worse the SNR, the worse FSC; but
even this is difficult to prove). The relationship we normally use
FSC=SNR/(1+SNR) was derived in a context that does not apply to CryoEM (1D
stationary signals in real space; our molecules are not stationary), and
consequently any reasoning of any threshold based on this relationship is
incorrect (see our review).

- Still, as long as we all use the same threshold, the reported
resolutions are comparable to each other. In that regard, I am happy that
we have set 0.143 (although any other number would have served the purpose)
as the standard.

- I totally agree with Steve that the full FSC is much more informative
than its crossing with the threshold. Specially, because we should be much
more worried about its behavior when it has high values than when it has
low values. Before 

Re: [ccp4bb] [3dem] Which resolution?

2020-02-23 Thread Gerard Bricogne
Gentlemen,

 Please consider for a moment that by such intemperate language and
tone, you are making a topic of fundamental importance to both the MX and
the EM communities into a no-go area. This cannot be good for anyone's
reputation nor for the two fields in general. It has to be possible to
discuss the topic of "resolution" in a dispassionate way, so as to jointly
gain an improved and shared understanding of the matter, without feeling
implicitly under pressure to support one side or the other. An acrimonious
dispute like this one can only be putting people off getting involved in the
discussion, which is exactly the opposite of what a thread on a scientific
bulletin board should be doing.


 With best wishes,

  Gerard.

--
On Sun, Feb 23, 2020 at 08:15:34AM -0300, Marin van Heel wrote:
> Hi Carlos Oscar and Jose-Maria,
> 
> I choose to answer you guys first, because it will take little of my time
> to counter your criticism and because I have long since been less than
> amused by your published, ill-conceived criticism:
> 
> “*Marin, I always suffer with your reference to sloppy statistics. If we
> take your paper of 2005 where the 1/2 bit criterion was proposed, Eqs. 4 to
> 15 have completely ignored the fact that you are dealing with Fourier
> components, that are complex numbers, and consequently you have to deal
> with random variables that have TWO components, which moreover the real and
> imaginary part are not independent and, in their turn, they are not
> independent of the nearby Fourier coefficients so that for computing radial
> averages you would need to account for the correlation among coefficients*”
> 
> I had seen this argumentation against our (2005) paper in your
> manuscript/paper years back. I was so stunned by the level of
> misunderstanding expressed in your manuscript that I chose not to spend any
> time reacting to those statements. Now that you choose to so openly display
> your thoughts on the matter, I have no other choice than to spell out your
> errors in public.
> 
> 
> 
> All complex arrays in our 2005 paper are Hermitian (since they are the FTs
> of real data), and so are all their inner products. In all the integrals
> over rings one always averages a complex Fourier-space voxel with its
> Hermitian conjugate yielding *ONE* real value (times two)!  Without that
> Hermitian property, FRCs and FSCs, which are real normalised correlation
> functions would not even have been possible. I was - and still am - stunned
> by this level of misunderstanding!
> 
> 
> 
> This is a blatant blunder that you are propagating over years, a blunder
> that does not do any good to your reputation, yet also a blunder that has
> probably damaged to our research income. The fact that you can divulgate
> such rubbish and leave it out there for years for referees to read (who are
> possibly not as well educated in physics and mathematics) will do – and may
> already have done – damage to our research.  An apology is appropriate but
> an apology is not enough.
> 
> 
> 
> Maybe you should ask your granting agencies how to transfer 25% of your
> grant income to our research, in compensation of damages created by your
> blunder!
> 
> 
> 
> Success with your request!
> 
> 
> 
> Marin
> 
> 
> 
> PS. You have also missed that our 2005 paper explicitly includes the
> influence of the size of the object within the sampling box (your: “*they
> are not independent of the nearby Fourier coefficients*”). I remain
> flabbergasted.
> 
> On Fri, Feb 21, 2020 at 3:15 PM Carlos Oscar Sorzano 
> wrote:
> 
> > Dear all,
> >
> > I always try to refrain myself from getting into these discussions, but I
> > cannot resist more the temptation. Here are some more ideas that I hope
> > bring more light than confusion:
> >
> > - There must be some functional relationship between the FSC and the SNR,
> > but the exact analytical form of this relationship is unknown (I suspect
> > that it must be at least monotonic, the worse the SNR, the worse FSC; but
> > even this is difficult to prove). The relationship we normally use
> > FSC=SNR/(1+SNR) was derived in a context that does not apply to CryoEM (1D
> > stationary signals in real space; our molecules are not stationary), and
> > consequently any reasoning of any threshold based on this relationship is
> > incorrect (see our review).
> >
> > - Still, as long as we all use the same threshold, the reported
> > resolutions are comparable to each other. In that regard, I am happy that
> > we have set 0.143 (although any other number would have served the purpose)
> > as the standard.
> >
> > - I totally agree with Steve that the full FSC is much more informative
> > than its crossing with the threshold. Specially, because we should be much
> > more worried about its behavior when it has high values than when it has
> > low values. Before crossing the threshold it should be as high as possible,
> > and that is the "true measure" of goodness of the map. When it 

Re: [ccp4bb] [3dem] Which resolution?

2020-02-23 Thread Marin van Heel
Hi Carlos Oscar and Jose-Maria,

I choose to answer you guys first, because it will take little of my time
to counter your criticism and because I have long since been less than
amused by your published, ill-conceived criticism:

“*Marin, I always suffer with your reference to sloppy statistics. If we
take your paper of 2005 where the 1/2 bit criterion was proposed, Eqs. 4 to
15 have completely ignored the fact that you are dealing with Fourier
components, that are complex numbers, and consequently you have to deal
with random variables that have TWO components, which moreover the real and
imaginary part are not independent and, in their turn, they are not
independent of the nearby Fourier coefficients so that for computing radial
averages you would need to account for the correlation among coefficients*”

I had seen this argumentation against our (2005) paper in your
manuscript/paper years back. I was so stunned by the level of
misunderstanding expressed in your manuscript that I chose not to spend any
time reacting to those statements. Now that you choose to so openly display
your thoughts on the matter, I have no other choice than to spell out your
errors in public.



All complex arrays in our 2005 paper are Hermitian (since they are the FTs
of real data), and so are all their inner products. In all the integrals
over rings one always averages a complex Fourier-space voxel with its
Hermitian conjugate yielding *ONE* real value (times two)!  Without that
Hermitian property, FRCs and FSCs, which are real normalised correlation
functions would not even have been possible. I was - and still am - stunned
by this level of misunderstanding!



This is a blatant blunder that you are propagating over years, a blunder
that does not do any good to your reputation, yet also a blunder that has
probably damaged to our research income. The fact that you can divulgate
such rubbish and leave it out there for years for referees to read (who are
possibly not as well educated in physics and mathematics) will do – and may
already have done – damage to our research.  An apology is appropriate but
an apology is not enough.



Maybe you should ask your granting agencies how to transfer 25% of your
grant income to our research, in compensation of damages created by your
blunder!



Success with your request!



Marin



PS. You have also missed that our 2005 paper explicitly includes the
influence of the size of the object within the sampling box (your: “*they
are not independent of the nearby Fourier coefficients*”). I remain
flabbergasted.

On Fri, Feb 21, 2020 at 3:15 PM Carlos Oscar Sorzano 
wrote:

> Dear all,
>
> I always try to refrain myself from getting into these discussions, but I
> cannot resist more the temptation. Here are some more ideas that I hope
> bring more light than confusion:
>
> - There must be some functional relationship between the FSC and the SNR,
> but the exact analytical form of this relationship is unknown (I suspect
> that it must be at least monotonic, the worse the SNR, the worse FSC; but
> even this is difficult to prove). The relationship we normally use
> FSC=SNR/(1+SNR) was derived in a context that does not apply to CryoEM (1D
> stationary signals in real space; our molecules are not stationary), and
> consequently any reasoning of any threshold based on this relationship is
> incorrect (see our review).
>
> - Still, as long as we all use the same threshold, the reported
> resolutions are comparable to each other. In that regard, I am happy that
> we have set 0.143 (although any other number would have served the purpose)
> as the standard.
>
> - I totally agree with Steve that the full FSC is much more informative
> than its crossing with the threshold. Specially, because we should be much
> more worried about its behavior when it has high values than when it has
> low values. Before crossing the threshold it should be as high as possible,
> and that is the "true measure" of goodness of the map. When it crosses the
> threshold of 0.143, it has too low SNR, and by definition, that is a very
> unstable part of the FSC, resulting in relatively unstable reports of
> resolution. We made some tests about the variability of the FSC (refining
> random splits of the dataset), trying to put the error bars that Steve was
> asking for, and it turned out to be pretty reproducible (rather low
> variance except in the region when it crosses the threshold) as long as the
> dataset was large enough (which is the current state).
>
> - @Marin, I always suffer with your reference to sloppy statistics. If we
> take your paper of 2005 where the 1/2 bit criterion was proposed (
> https://www.sciencedirect.com/science/article/pii/S1047847705001292),
> Eqs. 4 to 15 have completely ignored the fact that you are dealing with
> Fourier components, that are complex numbers, and consequently you have to
> deal with random variables that have two components, which moreover the
> real and imaginary part are not 

Re: [ccp4bb] [3dem] Which resolution?

2020-02-22 Thread Nave, Colin (DLSLtd,RAL,LSCI)
Alexis
This is a very useful summary.

You say you were not convinced by Marin's derivation in 2005. Are you convinced 
now and, if not, why?

My interest in this is that the FSC with half bit thresholds have the danger of 
being adopted elsewhere because they are becoming standard for protein 
structure determination (by EM or MX). If it is used for these mature 
techniques it must be right!

It is the adoption of the ½ bit threshold I worry about. I gave a rather weak 
example for MX which consisted of partial occupancy of side chains, substrates 
etc. For x-ray imaging a wide range of contrasts can occur and, if you want to 
see features with only a small contrast above the surroundings then I think the 
half bit threshold would be inappropriate.

It would be good to see a clear message from the MX and EM communities as to 
why an information content threshold of ½ a bit is generally appropriate for 
these techniques and an acknowledgement that this threshold is 
technique/problem dependent.

We might then progress from the bronze age to the iron age.

Regards
Colin



From: CCP4 bulletin board  On Behalf Of Alexis Rohou
Sent: 21 February 2020 16:35
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] [3dem] Which resolution?

Hi all,

For those bewildered by Marin's insistence that everyone's been messing up 
their stats since the bronze age, I'd like to offer what my understanding of 
the situation. More details in this thread from a few years ago on the exact 
same topic:
https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html
https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html

Notwithstanding notational problems (e.g. strict equations as opposed to 
approximation symbols, or omission of symbols to denote estimation), I believe 
Frank & Al-Ali and "descendent" papers (e.g. appendix of Rosenthal & Henderson 
2003) are fine. The cross terms that Marin is agitated about indeed do in fact 
have an expectation value of 0.0 (in the ensemble; if the experiment were 
performed an infinite number of times with different realizations of noise). I 
don't believe Pawel or Jose Maria or any of the other authors really believe 
that the cross-terms are orthogonal.

When N (the number of independent Fouier voxels in a shell) is large enough, 
mean(Signal x Noise) ~ 0.0 is only an approximation, but a pretty good one, 
even for a single FSC experiment. This is why, in my book, derivations that 
depend on Frank & Al-Ali are OK, under the strict assumption that N is large. 
Numerically, this becomes apparent when Marin's half-bit criterion is plotted - 
asymptotically it has the same behavior as a constant threshold.

So, is Marin wrong to worry about this? No, I don't think so. There are indeed 
cases where the assumption of large N is broken. And under those circumstances, 
any fixed threshold (0.143, 0.5, whatever) is dangerous. This is illustrated in 
figures of van Heel & Schatz (2005). Small boxes, high-symmetry, small objects 
in large boxes, and a number of other conditions can make fixed thresholds 
dangerous.

It would indeed be better to use a non-fixed threshold. So why am I not using 
the 1/2-bit criterion in my own work? While numerically it behaves well at most 
resolution ranges, I was not convinced by Marin's derivation in 2005. 
Philosophically though, I think he's right - we should aim for FSC thresholds 
that are more robust to the kinds of edge cases mentioned above. It would be 
the right thing to do.

Hope this helps,
Alexis



On Sun, Feb 16, 2020 at 9:00 AM Penczek, Pawel A 
mailto:pawel.a.penc...@uth.tmc.edu>> wrote:
Marin,

The statistics in 2010 review is fine. You may disagree with assumptions, but I 
can assure you the “statistics” (as you call it) is fine. Careful reading of 
the paper would reveal to you this much.
Regards,
Pawel


On Feb 16, 2020, at 10:38 AM, Marin van Heel 
mailto:marin.vanh...@googlemail.com>> wrote:


 EXTERNAL EMAIL 
Dear Pawel and All others 
This 2010 review is - unfortunately - largely based on the flawed statistics I 
mentioned before, namely on the a priori assumption that the inner product of a 
signal vector and a noise vector are ZERO (an orthogonality assumption).  The 
(Frank & Al-Ali 1975) paper we have refuted on a number of occasions (for 
example in 2005, and most recently in our BioRxiv paper) but you still take 
that as the correct relation between SNR and FRC (and you never cite the 
criticism...).
Sorry
Marin

On Thu, Feb 13, 2020 at 10:42 AM Penczek, Pawel A 
mailto:pawel.a.penc...@uth.tmc.edu>> wrote:
Dear Teige,

I am wondering whether you are familiar with

Resolution measures in molecular electron microscopy.
Penczek PA. Methods Enzymol. 2010.
Citation

Methods Enzymol. 2010;482:73-100. doi: 10.1016/S0076-6879(10)82003-8.

You will find there answers to all questions you asked and much more.

Regards,
Pawel Penczek

Regards,
Pawel
___

Re: [ccp4bb] [3dem] Which resolution?

2020-02-21 Thread Alexis Rohou
Hi all,

For those bewildered by Marin's insistence that everyone's been messing up
their stats since the bronze age, I'd like to offer what my understanding
of the situation. More details in this thread from a few years ago on the
exact same topic:
https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html
https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html

Notwithstanding notational problems (e.g. strict equations as opposed to
approximation symbols, or omission of symbols to denote estimation), I
believe Frank & Al-Ali and "descendent" papers (e.g. appendix of Rosenthal
& Henderson 2003) are fine. The cross terms that Marin is agitated about
indeed do in fact have an expectation value of 0.0 (in the ensemble; if the
experiment were performed an infinite number of times with different
realizations of noise). I don't believe Pawel or Jose Maria or any of the
other authors really believe that the cross-terms are orthogonal.

When N (the number of independent Fouier voxels in a shell) is large
enough, mean(Signal x Noise) ~ 0.0 is only an approximation, but a pretty
good one, even for a single FSC experiment. This is why, in my book,
derivations that depend on Frank & Al-Ali are OK, under the strict
assumption that N is large. Numerically, this becomes apparent when Marin's
half-bit criterion is plotted - asymptotically it has the same behavior as
a constant threshold.

So, is Marin wrong to worry about this? No, I don't think so. There are
indeed cases where the assumption of large N is broken. And under those
circumstances, any fixed threshold (0.143, 0.5, whatever) is dangerous.
This is illustrated in figures of van Heel & Schatz (2005). Small boxes,
high-symmetry, small objects in large boxes, and a number of other
conditions can make fixed thresholds dangerous.

It would indeed be better to use a non-fixed threshold. So why am I not
using the 1/2-bit criterion in my own work? While numerically it behaves
well at most resolution ranges, I was not convinced by Marin's derivation
in 2005. Philosophically though, I think he's right - we should aim for FSC
thresholds that are more robust to the kinds of edge cases mentioned above.
It would be the right thing to do.

Hope this helps,
Alexis



On Sun, Feb 16, 2020 at 9:00 AM Penczek, Pawel A <
pawel.a.penc...@uth.tmc.edu> wrote:

> Marin,
>
> The statistics in 2010 review is fine. You may disagree with assumptions,
> but I can assure you the “statistics” (as you call it) is fine. Careful
> reading of the paper would reveal to you this much.
>
> Regards,
> Pawel
>
> On Feb 16, 2020, at 10:38 AM, Marin van Heel 
> wrote:
>
> 
>
> * EXTERNAL EMAIL *
> Dear Pawel and All others 
>
> This 2010 review is - unfortunately - largely based on the flawed
> statistics I mentioned before, namely on the a priori assumption that the
> inner product of a signal vector and a noise vector are ZERO (an
> orthogonality assumption).  The (Frank & Al-Ali 1975) paper we have refuted
> on a number of occasions (for example in 2005, and most recently in our
> BioRxiv paper) but you still take that as the correct relation between SNR
> and FRC (and you never cite the criticism...).
> Sorry
> Marin
>
> On Thu, Feb 13, 2020 at 10:42 AM Penczek, Pawel A <
> pawel.a.penc...@uth.tmc.edu> wrote:
>
>> Dear Teige,
>>
>> I am wondering whether you are familiar with
>>
>> Resolution measures in molecular electron microscopy.
>> Penczek PA. Methods Enzymol. 2010.
>> Citation
>>
>> Methods Enzymol. 2010;482:73-100. doi: 10.1016/S0076-6879(10)82003-8.
>>
>> You will find there answers to all questions you asked and much more.
>>
>> Regards,
>> Pawel Penczek
>>
>>
>> Regards,
>> Pawel
>> ___
>> 3dem mailing list
>> 3...@ncmir.ucsd.edu
>> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>> 
>>
> ___
> 3dem mailing list
> 3...@ncmir.ucsd.edu
> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] [3dem] Which resolution?

2020-02-20 Thread Nave, Colin (DLSLtd,RAL,LSCI)
Dear Randy
Yes this makes sense.
Certainly cut offs are bad – I hope I my post wasn’t implying one should cut 
off the data at some particular resolution shell. Some reflections in a shell 
will be weak and some stronger. Knowing which are which is of course 
information.
I will have a look at the 2019 CCP4 study weekend paper
Regards
Colin

From: Randy Read 
Sent: 20 February 2020 11:45
To: Nave, Colin (DLSLtd,RAL,LSCI) 
Cc: CCP4BB@jiscmail.ac.uk
Subject: Re: [ccp4bb] [3dem] Which resolution?

Dear Colin,

Over the last few years we've been implementing measures of information gain to 
evaluate X-ray diffraction data in our program Phaser. Some results in a paper 
that has been accepted for publication in the 2019 CCP4 Study Weekend special 
issue are relevant to this discussion.

First, looking at data deposited in the PDB, we see that the information gain 
in the highest resolution shell is typically about 0.5-1 bit per reflection 
(though we haven't done a comprehensive analysis yet).  A very rough 
calculation suggests that a half-bit resolution threshold is equivalent to 
something like an I/SIGI threshold of one.  So that would fit with the idea 
that a possible resolution limit measure would be the resolution where the 
average information per reflection drops to half a bit.

Second, even if the half-bit threshold is where the data are starting to 
contribute less to the image and to likelihood targets for tasks like molecular 
replacement and refinement, weaker data still contribute some useful signal 
down to limits as low as 0.01 bit per reflection.  So any number attached to 
the nominal resolution of a data set should not necessarily be applied as a 
resolution cutoff, at least as long as the refinement target (such as our 
log-likelihood-gain on intensity or LLGI score) accounts properly for large 
measurement errors.

Best wishes,

Randy


On 20 Feb 2020, at 10:15, Nave, Colin (DLSLtd,RAL,LSCI) 
mailto:colin.n...@diamond.ac.uk>> wrote:

Dear all,
I have received a request to clarify what I mean by threshold in my 
contribution of 17 Feb  below and then post the clarification on CCP4BB. Being 
a loyal (but very sporadic) CCP4BBer I am now doing this. My musings in this 
thread are as much auto-didactic as didactic. In other words I am trying to 
understand it all myself.

Accepting that the FSC is a suitable metric (I believe it is) I think the most 
useful way of explaining the concept of the threshold is to refer to section 
4.2 and fig. 4 of Heel and Schatz (2005), Journal of Structural Biology, 151, 
250-262. Figure 4C show an FSC together with a half bit information curve and 
figure 4D shows the FSC with a 3sigma curve.

The point I was trying to make in rather an obtuse fashion is that the choice 
of threshold will depend on what one is trying to see in the image. I will try 
and give an example related to protein structures rather than uranium hydride 
or axons in the brain. In general protein structures consist of atoms with 
similar scattering power (C, N, O with the hydrogens for the moment invisible) 
and high occupancy. When we can for example distinguish side chains along the 
backbone we have a good basis for starting to interpret the map as a particular 
structure. An FSC with a half bit threshold at the appropriate resolution 
appears to be a good guide to whether one can do this. However, if a particular 
sidechain is disordered with 2 conformations, or a substrate is only 50% 
occupied, the contribution in the electron density map is reduced and might be 
difficult to distinguish from the noise. A  higher threshold might be necessary 
to see these atoms but this would occur at a lower resolution than given by the 
half bit threshold. One could instead increase the exposure to improve the 
resolution but of course radiation damage lurks. For reporting structures, the 
obvious thing to do is to show the complete FSC curves together with a few 
threshold curves (e.g. half bit, one bit, 2 bits). This would enable people to 
judge whether the data is likely to meet their requirements. This of course 
departs significantly from the desire to have one number. A compromise might be 
to report FSC resolutions at several thresholds.

I understand that fixed value thresholds (e.g. 0.143) were originally adopted 
for EM to conform to standards prevalent for crystallography at the time. This 
would have enabled comparison between the two techniques. For many cases (as 
stated in Heel and Schatz) there will be little difference between the 
resolution given by a half bit and that given by 0.143. However, if the former 
is mathematically correct and easy to implement then why not use it for all 
techniques? The link to Shannon is a personal reason I have for preferring a 
threshold based on information content. If I had scientific “heroes” he would 
be one of them.


I have recently had a paper on x-ray imaging of biological cells accepted for 
publication. This includes

“In

Re: [ccp4bb] [3dem] Which resolution?

2020-02-20 Thread Randy Read
 be encouraged. Much effort has gone in to 
> doing this for fields such as macromolecular crystallography but it has to be 
> admitted that this is still an ongoing process.”
> 
> I think recent activity agrees with the last 6 words!
>  
> 
> Don’t read the next bit if not interested in the relationship between the 
> Rose criterion and FSC thresholds.
> 
> The recently submitted paper also includes
> 
> “A proper analysis of the relationship between the Rose criterion and FSC 
> thresholds is outside the scope of this paper and would need to take account 
> of factors such as the number of image voxels, whether one is in an atomicity 
> or uniform voxel regime and the contrast of features to be identified in the 
> image.”
> 
> This can justifiably be interpreted as saying I did not fully understand the 
> relationship itself and was a partial reason why I raised the issue in 
> another message to this thread. 
> 
> Who cares anyway about the headline resolution? Well, defining a resolution 
> can be important if one wants to calculate the exposure required to see 
> particular features and whether they are then degraded by radiation damage. 
> This relates to the issue I raised concerning the Rose criterion. As an 
> example one might have a virus particle with an average density of 1.1 
> embedded in an object (a biological cell) of density 1.0 (I am keeping the 
> numbers simple). The virus has a diameter of 50nm. There are 5000 voxels in 
> the image (the number 5000 was used by Rose when analysing images from 
> televisions). This gives 5000 chances of a false alarm so, I want to ensure 
> the signal to noise ratio in the image is sufficiently high. This is why Rose 
> adopted a contrast to noise ratio of 5 (Rose criterion K of 5). For each 
> voxel in the image we need a noise level sufficiently low to identify the 
> feature. For a Rose criterion of 5 and the contrast of 0.1 it means that we 
> need an average (?) of 625 photons per Shannon reciprocal voxel (the 
> “speckle” given by the object as a whole) at the required resolution (1/50nm) 
> in order to achieve this. The expression for the required number of photons 
> is (K/2C)**2. However, if we have already identified a candidate voxel for 
> the virus (perhaps using labelled fluorescent methods) we can get away with a 
> Rose criterion of 3 (equivalent to K=5 over 5000 pixels) and 225 photons will 
> suffice. For this case, a signal to noise ratio of 3 corresponds to a  0.0027 
> probability of the event occurring due to Random noise. The information 
> content is therefore –log20.0027 which is 8.5 bits. I therefore have a real 
> space information content of 8.5 bits and an average 225 photons at the 
> resolution limit. The question is to relate these and come up with the 
> appropriate value for the FSC threshold so I can judge whether a particle 
> with this low contrast can be identified. In the above example, the object 
> (biological cell) as a whole has a defined boundary and forms a natural sharp 
> edged mask. The hard edge mask ( see Heel and Schatz section 4.7) is 
> therefore present.
>  
> I am sure Marin (or others) will put me right of there are mistakes in the 
> above.
>  
> Finally, for those interested in the relationship between information content 
> and probability the article by Weaver (one of Shannon’s collaborators) gives 
> a non-mathematical and perhaps philosophical description. It can be found at
> http://www.mt-archive.info/50/SciAm-1949-Weaver.pdf 
> <http://www.mt-archive.info/50/SciAm-1949-Weaver.pdf>
>  
> Sorry for the long reply – but at least some of it was requested!
>  
> Colin
>  
>  
>  
> From: CCP4 bulletin board  <mailto:CCP4BB@JISCMAIL.AC.UK>> On Behalf Of colin.n...@diamond.ac.uk 
> <mailto:colin.n...@diamond.ac.uk>
> Sent: 17 February 2020 11:26
> To: CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK>
> Subject: [ccp4bb] FW: [ccp4bb] [3dem] Which resolution?
>  
>  
> Dear all.
> Would it help to separate out the issue of the FSC from the value of the 
> threshold? My understanding is that the FSC addresses the spatial frequency 
> at which there is a reliable information content in the image. This concept 
> should apply to a wide variety of types of image. The issue is then what 
> value of the threshold to use. For interpretation of protein structures 
> (whether by x-ray or electron microscopy), a half bit threshold appears to be 
> appropriate. However, for imaging the human brain (one of Marin’s examples) a 
> higher threshold might be adopted as a range of contrasts might be present 
> (axons for example have a similar density to the surroundings). For 
> crystallography, if one wants to see lighter atoms (hydrogens

Re: [ccp4bb] [3dem] Which resolution?

2020-02-20 Thread Nave, Colin (DLSLtd,RAL,LSCI)
 need a noise level 
sufficiently low to identify the feature. For a Rose criterion of 5 and the 
contrast of 0.1 it means that we need an average (?) of 625 photons per Shannon 
reciprocal voxel (the “speckle” given by the object as a whole) at the required 
resolution (1/50nm) in order to achieve this. The expression for the required 
number of photons is (K/2C)**2. However, if we have already identified a 
candidate voxel for the virus (perhaps using labelled fluorescent methods) we 
can get away with a Rose criterion of 3 (equivalent to K=5 over 5000 pixels) 
and 225 photons will suffice. For this case, a signal to noise ratio of 3 
corresponds to a  0.0027 probability of the event occurring due to Random 
noise. The information content is therefore –log20.0027 which is 8.5 bits. I 
therefore have a real space information content of 8.5 bits and an average 225 
photons at the resolution limit. The question is to relate these and come up 
with the appropriate value for the FSC threshold so I can judge whether a 
particle with this low contrast can be identified. In the above example, the 
object (biological cell) as a whole has a defined boundary and forms a natural 
sharp edged mask. The hard edge mask ( see Heel and Schatz section 4.7) is 
therefore present.

I am sure Marin (or others) will put me right of there are mistakes in the 
above.

Finally, for those interested in the relationship between information content 
and probability the article by Weaver (one of Shannon’s collaborators) gives a 
non-mathematical and perhaps philosophical description. It can be found at
http://www.mt-archive.info/50/SciAm-1949-Weaver.pdf

Sorry for the long reply – but at least some of it was requested!

Colin



From: CCP4 bulletin board  On Behalf Of 
colin.n...@diamond.ac.uk
Sent: 17 February 2020 11:26
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] FW: [ccp4bb] [3dem] Which resolution?


Dear all.
Would it help to separate out the issue of the FSC from the value of the 
threshold? My understanding is that the FSC addresses the spatial frequency at 
which there is a reliable information content in the image. This concept should 
apply to a wide variety of types of image. The issue is then what value of the 
threshold to use. For interpretation of protein structures (whether by x-ray or 
electron microscopy), a half bit threshold appears to be appropriate. However, 
for imaging the human brain (one of Marin’s examples) a higher threshold might 
be adopted as a range of contrasts might be present (axons for example have a 
similar density to the surroundings). For crystallography, if one wants to see 
lighter atoms (hydrogens in the presence of uranium or in proteins) a higher 
threshold might also be appropriate. I am not sure about this to be honest as a 
2 bit threshold (for example) would mean that there is information to higher 
resolution at a threshold of a half bit (unless one is at a diffraction or 
instrument limited resolution).

Most CCP4BBers will understand that a single number is not good enough. 
However, many users of the protein structure databases will simply search for 
the structure with the highest named resolution. It might be difficult to send 
these users to re-education camps.

Regards
Colin

From: CCP4 bulletin board mailto:CCP4BB@JISCMAIL.AC.UK>> 
On Behalf Of Petrus Zwart
Sent: 16 February 2020 21:50
To: CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>
Subject: Re: [ccp4bb] [3dem] Which resolution?

Hi All,

How is the 'correct' resolution estimation related to the estimated error on 
some observed hydrogen bond length of interest, or an error on the estimated 
occupancy of a ligand or conformation or anything else that has structural 
significance?

In crystallography, it isn't really (only in some very approximate fashion), 
and I doubt that in EM there is something to that effect. If you want to use 
the resolution to get a gut feeling on how your maps look and how your data 
behaves, it doesn't really matter what standard you use, as long as you are 
consistent in the use of the metric you use. If you want to use this estimate 
to get to uncertainties of model parameters, you better try something else.

Regards
Peter Zwart



On Sun, Feb 16, 2020 at 8:38 AM Marin van Heel 
<057a89ab08a1-dmarc-requ...@jiscmail.ac.uk<mailto:057a89ab08a1-dmarc-requ...@jiscmail.ac.uk>>
 wrote:
Dear Pawel and All others 
This 2010 review is - unfortunately - largely based on the flawed statistics I 
mentioned before, namely on the a priori assumption that the inner product of a 
signal vector and a noise vector are ZERO (an orthogonality assumption).  The 
(Frank & Al-Ali 1975) paper we have refuted on a number of occasions (for 
example in 2005, and most recently in our BioRxiv paper) but you still take 
that as the correct relation between SNR and FRC (and you never cite the 
criticism...).
Sorry
Marin

On Thu, Feb 13, 2020 at 10:42 AM Penczek, Pawel A 
mailto:pawel.a

Re: [ccp4bb] [3dem] Which resolution?

2020-02-18 Thread Marin van Heel
Hi Pawel,

We can indeed agree to disagree upon many basic things in life. We
apparently disagree on the basic assumptions upon which you choose to build
your science!   What I am criticising is the very foundation you use to
construct your science, namely the flawed Frank & Al-Ali (1975) formula
relating SNR and CCC! Agreed, their errors are not of your making and not
your responsibility!  It is, however, your choice and your judgement to
build upon that flawed foundation.  At the end of the day, your construct,
even when based on shaky foundations created by others, will remain your
responsibility!

Sorry,
Marin

On Sun, Feb 16, 2020 at 1:59 PM Penczek, Pawel A <
pawel.a.penc...@uth.tmc.edu> wrote:

> Marin,
>
> The statistics in 2010 review is fine. You may disagree with assumptions,
> but I can assure you the “statistics” (as you call it) is fine. Careful
> reading of the paper would reveal to you this much.
>
> Regards,
> Pawel
>
> On Feb 16, 2020, at 10:38 AM, Marin van Heel 
> wrote:
>
> 
>
> * EXTERNAL EMAIL *
> Dear Pawel and All others 
>
> This 2010 review is - unfortunately - largely based on the flawed
> statistics I mentioned before, namely on the a priori assumption that the
> inner product of a signal vector and a noise vector are ZERO (an
> orthogonality assumption).  The (Frank & Al-Ali 1975) paper we have refuted
> on a number of occasions (for example in 2005, and most recently in our
> BioRxiv paper) but you still take that as the correct relation between SNR
> and FRC (and you never cite the criticism...).
> Sorry
> Marin
>
> On Thu, Feb 13, 2020 at 10:42 AM Penczek, Pawel A <
> pawel.a.penc...@uth.tmc.edu> wrote:
>
>> Dear Teige,
>>
>> I am wondering whether you are familiar with
>>
>> Resolution measures in molecular electron microscopy.
>> Penczek PA. Methods Enzymol. 2010.
>> Citation
>>
>> Methods Enzymol. 2010;482:73-100. doi: 10.1016/S0076-6879(10)82003-8.
>>
>> You will find there answers to all questions you asked and much more.
>>
>> Regards,
>> Pawel Penczek
>>
>>
>> Regards,
>> Pawel
>> ___
>> 3dem mailing list
>> 3...@ncmir.ucsd.edu
>> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>> 
>>
>



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] FW: [ccp4bb] [3dem] Which resolution?

2020-02-18 Thread Marin van Heel
Dear John,

I really like your lecture notes! My Imperial/Leiden lecture notes –
currently being updated – look a lot like yours. The earlier version:

(https://www.single-particles.org/methodology/MvH_Phase_Contrast.pdf)

And you are fully correct: “resolution of cryoEM imaging varies locally”!

That is exactly the point!

The existing cryo-EM dogmas, however,  PREVENT you from doing things right,
which seriously hampers progress (see our BioRxiv paper:
https://www.biorxiv.org/content/10.1101/224402v1 ).

Cheers,

Marin

On Tue, Feb 18, 2020 at 4:36 AM John R Helliwell 
wrote:

> Hi Colin,
> Yes I agree, see eg page 7 of
> https://www2.physics.ox.ac.uk/sites/default/files/2011-06-08/optics_notes_and_slides_part_5_pdf_63907.pdf
>  (and
> there maybe better weblinks).
>
> The resolution of cryoEM imaging varies locally and so the “local scaling
> in a complex way” is what we have to get into in practice..
>
> Greetings,
> John
>
> Emeritus Professor John R Helliwell DSc
>
>
>
> On 17 Feb 2020, at 21:57, Nave, Colin (DLSLtd,RAL,LSCI) <
> colin.n...@diamond.ac.uk> wrote:
>
> 
>
> Hi John
>
> I agree that neutrons have a role to increase the contrast for certain
> atoms. The “water window” for x-ray imaging also fulfils a similar role.
> The “locally scaled in a complex way” is a bit beyond me.
>
>
>
> The relationship between “ diffraction” errors and “imaging” errors is
>  based on Parseval’s theorem applied to the errors for electron densities
> and structure factors.  See for example
> https://www-structmed.cimr.cam.ac.uk/Course/Fourier/Fourier.html and
> scroll down to Parseval’s theorem. Admittedly not a primary reference but I
> think Randy (and Parseval, not to be confused with Wagner’s opera), are
> unlikely to have got it wrong.
>
>
>
> Imaging (with both electrons and x-rays) can be lensless (as in MX, CDI
> and variants) or with an objective lens (electron microscopes have nice
> objective lenses). The physical processes are the same up to any lens but
> MX, CDI etc. use a computer to replace the lens. The computer algorithm
> might be imperfect resulting in visible termination errors. With a decent
> lens, one can also see diffraction ripples (round bright stars in a
> telescope image) due to the restricted lens aperture.
>
>
>
> Good debate though.
>
>
>
> Colin
>
> *From:* John R Helliwell 
> *Sent:* 17 February 2020 16:36
> *To:* Nave, Colin (DLSLtd,RAL,LSCI) 
> *Cc:* CCP4BB@JISCMAIL.AC.UK
> *Subject:* Re: [ccp4bb] FW: [ccp4bb] [3dem] Which resolution?
>
>
>
> Hi Colin,
>
> Neutrons are applied to the uranyl hydrides so as to make their scattering
> lengths much more equal than with X-rays, and so side step ripple effects
> of the uranium in the Xray case, which obscures those nearby hydrogens.
>
> In terms of feature resolvability the email exchange (and there may be
> better ones):-
> http://www.phenix-online.org/pipermail/phenixbb/2017-March/023326.html
>
> refers to “locally scaled in a complex way”. So, is the physics of the
> visibility of features really comparable between the two methods of cryoEM
> and crystal structure analysis?
>
> Greetings,
>
> John
>
> Emeritus Professor John R Helliwell DSc
>
>
>
>
>
>
>
> On 17 Feb 2020, at 13:59, Nave, Colin (DLSLtd,RAL,LSCI) <
> colin.n...@diamond.ac.uk> wrote:
>
> 
>
> Hi John
>
> I agree that if I truncate the data at a high information content
> threshold (e.g. 2 bits)  series termination errors might hide the lighter
> atoms (e.g. the hydrogens in uranium hydride crystal structures). However,
> I think this is purely a limitation of producing electron density maps via
> Fourier transforms (i.e. not the physics). A variety of techniques are
> available for handling series termination including ones which are
> maximally non-committal with respect to the missing data. The issue is
> still there in some fields (see
> https://onlinelibrary.wiley.com/iucr/itc/Ha/ch4o8v0001/ ). For protein
> crystallography perhaps series termination errors have become less
> important as people are discouraged from applying some I/sigI type cut off.
>
>
>
> Cheers
>
> Colin
>
>
>
>
>
>
>
> *From:* John R Helliwell 
> *Sent:* 17 February 2020 12:09
> *To:* Nave, Colin (DLSLtd,RAL,LSCI) 
> *Subject:* Re: [ccp4bb] FW: [ccp4bb] [3dem] Which resolution?
>
>
>
> Hi Colin,
>
> I think the physics of the imaging and the crystal structure analysis,
> respectively without and with Fourier termination ripples, are different.
> For the MX re Fourier series for two types of difference map see our
> contribution:-
>
>
>
> http://scripts.iucr.org/c

Re: [ccp4bb] FW: [ccp4bb] [3dem] Which resolution?

2020-02-17 Thread John R Helliwell
Hi Colin,
Yes I agree, see eg page 7 of 
https://www2.physics.ox.ac.uk/sites/default/files/2011-06-08/optics_notes_and_slides_part_5_pdf_63907.pdf
 (and there maybe better weblinks). 

The resolution of cryoEM imaging varies locally and so the “local scaling in a 
complex way” is what we have to get into in practice..

Greetings,
John

Emeritus Professor John R Helliwell DSc



> On 17 Feb 2020, at 21:57, Nave, Colin (DLSLtd,RAL,LSCI) 
>  wrote:
> 
> 
> Hi John
> I agree that neutrons have a role to increase the contrast for certain atoms. 
> The “water window” for x-ray imaging also fulfils a similar role. The 
> “locally scaled in a complex way” is a bit beyond me.
>  
> The relationship between “ diffraction” errors and “imaging” errors is  based 
> on Parseval’s theorem applied to the errors for electron densities and 
> structure factors.  See for example 
> https://www-structmed.cimr.cam.ac.uk/Course/Fourier/Fourier.html and scroll 
> down to Parseval’s theorem. Admittedly not a primary reference but I think 
> Randy (and Parseval, not to be confused with Wagner’s opera), are unlikely to 
> have got it wrong.
>  
> Imaging (with both electrons and x-rays) can be lensless (as in MX, CDI and 
> variants) or with an objective lens (electron microscopes have nice objective 
> lenses). The physical processes are the same up to any lens but MX, CDI etc. 
> use a computer to replace the lens. The computer algorithm might be imperfect 
> resulting in visible termination errors. With a decent lens, one can also see 
> diffraction ripples (round bright stars in a telescope image) due to the 
> restricted lens aperture.
>  
> Good debate though.
>  
> Colin
> From: John R Helliwell  
> Sent: 17 February 2020 16:36
> To: Nave, Colin (DLSLtd,RAL,LSCI) 
> Cc: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] FW: [ccp4bb] [3dem] Which resolution?
>  
> Hi Colin,
> Neutrons are applied to the uranyl hydrides so as to make their scattering 
> lengths much more equal than with X-rays, and so side step ripple effects of 
> the uranium in the Xray case, which obscures those nearby hydrogens.
> In terms of feature resolvability the email exchange (and there may be better 
> ones):- http://www.phenix-online.org/pipermail/phenixbb/2017-March/023326.html
> refers to “locally scaled in a complex way”. So, is the physics of the 
> visibility of features really comparable between the two methods of cryoEM 
> and crystal structure analysis?
> Greetings,
> John
> Emeritus Professor John R Helliwell DSc
>  
>  
> 
> 
> On 17 Feb 2020, at 13:59, Nave, Colin (DLSLtd,RAL,LSCI) 
>  wrote:
> 
> 
> Hi John
> I agree that if I truncate the data at a high information content threshold 
> (e.g. 2 bits)  series termination errors might hide the lighter atoms (e.g. 
> the hydrogens in uranium hydride crystal structures). However, I think this 
> is purely a limitation of producing electron density maps via Fourier 
> transforms (i.e. not the physics). A variety of techniques are available for 
> handling series termination including ones which are maximally non-committal 
> with respect to the missing data. The issue is still there in some fields 
> (see https://onlinelibrary.wiley.com/iucr/itc/Ha/ch4o8v0001/ ). For protein 
> crystallography perhaps series termination errors have become less important 
> as people are discouraged from applying some I/sigI type cut off.
>  
> Cheers
> Colin
>  
>  
>  
> From: John R Helliwell  
> Sent: 17 February 2020 12:09
> To: Nave, Colin (DLSLtd,RAL,LSCI) 
> Subject: Re: [ccp4bb] FW: [ccp4bb] [3dem] Which resolution?
>  
> Hi Colin,
> I think the physics of the imaging and the crystal structure analysis, 
> respectively without and with Fourier termination ripples, are different. For 
> the MX re Fourier series for two types of difference map see our 
> contribution:-
>  
> http://scripts.iucr.org/cgi-bin/paper?S0907444903004219
>  
> Greetings,
> John 
>  
> 
> Emeritus Professor John R Helliwell DSc
> https://www.crcpress.com/The-Whats-of-a-Scientific-Life/Helliwell/p/book/9780367233020
>  
>  
> 
> 
> 
> On 17 Feb 2020, at 11:26, "colin.n...@diamond.ac.uk" 
>  
> 
>  
> Dear all.
> Would it help to separate out the issue of the FSC from the value of the 
> threshold? My understanding is that the FSC addresses the spatial frequency 
> at which there is a reliable information content in the image. This concept 
> should apply to a wide variety of types of image. The issue is then what 
> value of the threshold to use. For interpretation of protein structures 
> (whether by x-ray or electron microscopy), a half bit threshold appears to be 
> appropriate. However, for imagi

Re: [ccp4bb] FW: [ccp4bb] [3dem] Which resolution?

2020-02-17 Thread Marin van Heel
Dear Colin

Great that you mention the Rose equation and its consequences for cryo-EM!
I have actually written a paper on that topic some 40 years ago [Marin van
Heel: Detection of object in quantum-noise limited images. Ultramicroscopy
8 (1982) 331-342].  I honestly have not thought about it for a long time
but I have recently been thinking about revisiting the topic. It remains
one of the very first particle-picking papers ever, and certainly still one
of the very best (and reference free) ones [Afanasyev 2017].

I remember I was very pleased when I realised one could calculate local
variances rapidly using fast convolutions! I remember the very moment in
December 1979, while visiting my parents in their house in Spain and
catching a bit of winter sunshine, leaning against the wall of their house
with my eyes half closed, that the “Aha-Erlebnis” struck.  Thank you for
reminding me to go back to that issue!

Cheers

Marin

On Mon, Feb 17, 2020 at 2:36 PM colin.n...@diamond.ac.uk <
colin.n...@diamond.ac.uk> wrote:

>
>
> Dear Marin
>
> For electron microscopy, the Rose criterion (a measure of contrast/noise)
> is sometime used to distinguish low contrast features within polymers (see
> for example Libera, M. & Egerton, R. (2010). *Polymer Reviews*. *50*,
> 321-339.). A particular value of the Rose criterion implies a particular
> information content.
>
> I think this can be directly related to a particular threshold for FSC or
> FRC. If you can comment on this in your *Why-o-Why didactical crusade, I
> might even register for a twitter account!*
>
> *Regards*
>
> *Colin*
>
>
>
> *From:* CCP4 bulletin board  *On Behalf Of *Marin
> van Heel
> *Sent:* 17 February 2020 13:29
> *To:* CCP4BB@JISCMAIL.AC.UK
> *Subject:* Re: [ccp4bb] [3dem] Which resolution?
>
>
>
>
>
>
>
> Dear Petrus Zwart (and all other X-ray crystallographers and EM-ers)
>
>
>
> Resolution in the sense of the Abbe Diffraction Limit or the Rayleigh 
> *Criterion
> are part of what we would now call the theory of linear systems, and are
> described by a “transfer function”. “Fourier Optics” covers the theory of
> linear systems in Optics. These two (essentially identical) resolution
> criteria state that the smallest details (let us call that “A(r)” for
> “Airy”) you can possible observe in real-space is inversely proportional to
> the size of the maximum aperture in Fourier space, i.e., the extent of the
> transfer function in Fourier space “T(f)”. This defines what “Instrumental
> Resolution” one could possibly achieve in an experiment, “instrumental” to
> differentiate it from the “Results Resolution” you actually managed achieve
> in your data collection/processing experiment [#Why-o-Why #10].  What a
> linear imaging system will do the image of the object (under the best of
> circumstances) is described by a (Fourier-space) multiplication of the
> Fourier transform of the object O(r) [= O(f)] with the (Fourier-space)
> transfer function T(f) of the instrument, yielding O’(f), which you need to
> transfer back to real space to obtain the exit wave in the image plane;
> that is: {O’(f)=T(f)·O(f)}. *
>
>
>
> *Note, however, that the properties of the sample, that is, of O(r), does
> nowhere appear in the transfer function T(f) or in its real-space version
> A(r)!   The very concept of (instrumental) resolution is exactly that it
> does NOT depend on the object O(r)! The “results resolution” [#Why-o-Why
> #10], on the other hand, obviously depends on the sample; the illumination;
> on the radiation dose; the pH of the solvent; the air humidity; and the
> mood of the person doing the work on the day of preparation… *
>
>
>
> *The FRC/FSC “results resolution” measures we introduced in 1982/1986, fit
> perfectly in the abstract framework of linear systems and Fourier optics.
> The X-ray metrics like R-factor and phase-residuals and FOMs do NOT fit
> into that clean mathematical framework. Unfortunately, my EM colleagues
> started using X-ray metrics like “Differential Phase Residual” and “FOMs”
> in EM based on some gut feeling that the X-ray scientists know it better
> because they achieve a higher resolution than us EM blobologists. How wrong
> my EM colleagues were: the quality of the resolution metric is totally
> unrelated to the numerical resolution levels we operate at! Seeing 3mm
> kidney stones in a patient’s tomogram can be equally important as seeing *some
> hydrogen bond length in a cryo-EM density. The FRC/FSC actually make more
> sense than the indirect and hybrid X-ray ones.  *This misconception has
> introduced a very tainted – and still ongoing – discussion in cryo-EM. Now
> that the fields of X-ray crystallography and cryo-EM are merging it is time
> to get things right! *

Re: [ccp4bb] FW: [ccp4bb] [3dem] Which resolution?

2020-02-17 Thread Nave, Colin (DLSLtd,RAL,LSCI)
Hi John
I agree that neutrons have a role to increase the contrast for certain atoms. 
The “water window” for x-ray imaging also fulfils a similar role. The “locally 
scaled in a complex way” is a bit beyond me.

The relationship between “ diffraction” errors and “imaging” errors is  based 
on Parseval’s theorem applied to the errors for electron densities and 
structure factors.  See for example 
https://www-structmed.cimr.cam.ac.uk/Course/Fourier/Fourier.html and scroll 
down to Parseval’s theorem. Admittedly not a primary reference but I think 
Randy (and Parseval, not to be confused with Wagner’s opera), are unlikely to 
have got it wrong.

Imaging (with both electrons and x-rays) can be lensless (as in MX, CDI and 
variants) or with an objective lens (electron microscopes have nice objective 
lenses). The physical processes are the same up to any lens but MX, CDI etc. 
use a computer to replace the lens. The computer algorithm might be imperfect 
resulting in visible termination errors. With a decent lens, one can also see 
diffraction ripples (round bright stars in a telescope image) due to the 
restricted lens aperture.

Good debate though.

Colin
From: John R Helliwell 
Sent: 17 February 2020 16:36
To: Nave, Colin (DLSLtd,RAL,LSCI) 
Cc: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] FW: [ccp4bb] [3dem] Which resolution?

Hi Colin,
Neutrons are applied to the uranyl hydrides so as to make their scattering 
lengths much more equal than with X-rays, and so side step ripple effects of 
the uranium in the Xray case, which obscures those nearby hydrogens.
In terms of feature resolvability the email exchange (and there may be better 
ones):- http://www.phenix-online.org/pipermail/phenixbb/2017-March/023326.html
refers to “locally scaled in a complex way”. So, is the physics of the 
visibility of features really comparable between the two methods of cryoEM and 
crystal structure analysis?
Greetings,
John
Emeritus Professor John R Helliwell DSc




On 17 Feb 2020, at 13:59, Nave, Colin (DLSLtd,RAL,LSCI) 
mailto:colin.n...@diamond.ac.uk>> wrote:

Hi John
I agree that if I truncate the data at a high information content threshold 
(e.g. 2 bits)  series termination errors might hide the lighter atoms (e.g. the 
hydrogens in uranium hydride crystal structures). However, I think this is 
purely a limitation of producing electron density maps via Fourier transforms 
(i.e. not the physics). A variety of techniques are available for handling 
series termination including ones which are maximally non-committal with 
respect to the missing data. The issue is still there in some fields (see 
https://onlinelibrary.wiley.com/iucr/itc/Ha/ch4o8v0001/ ). For protein 
crystallography perhaps series termination errors have become less important as 
people are discouraged from applying some I/sigI type cut off.

Cheers
Colin



From: John R Helliwell mailto:jrhelliw...@gmail.com>>
Sent: 17 February 2020 12:09
To: Nave, Colin (DLSLtd,RAL,LSCI) 
mailto:colin.n...@diamond.ac.uk>>
Subject: Re: [ccp4bb] FW: [ccp4bb] [3dem] Which resolution?

Hi Colin,
I think the physics of the imaging and the crystal structure analysis, 
respectively without and with Fourier termination ripples, are different. For 
the MX re Fourier series for two types of difference map see our contribution:-

http://scripts.iucr.org/cgi-bin/paper?S0907444903004219

Greetings,
John

Emeritus Professor John R Helliwell DSc
https://www.crcpress.com/The-Whats-of-a-Scientific-Life/Helliwell/p/book/9780367233020





On 17 Feb 2020, at 11:26, 
"colin.n...@diamond.ac.uk<mailto:colin.n...@diamond.ac.uk>" 
mailto:CCP4BB@JISCMAIL.AC.UK>> 
On Behalf Of Petrus Zwart
Sent: 16 February 2020 21:50
To: CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>
Subject: Re: [ccp4bb] [3dem] Which resolution?

Hi All,

How is the 'correct' resolution estimation related to the estimated error on 
some observed hydrogen bond length of interest, or an error on the estimated 
occupancy of a ligand or conformation or anything else that has structural 
significance?

In crystallography, it isn't really (only in some very approximate fashion), 
and I doubt that in EM there is something to that effect. If you want to use 
the resolution to get a gut feeling on how your maps look and how your data 
behaves, it doesn't really matter what standard you use, as long as you are 
consistent in the use of the metric you use. If you want to use this estimate 
to get to uncertainties of model parameters, you better try something else.

Regards
Peter Zwart



On Sun, Feb 16, 2020 at 8:38 AM Marin van Heel 
<057a89ab08a1-dmarc-requ...@jiscmail.ac.uk<mailto:057a89ab08a1-dmarc-requ...@jiscmail.ac.uk>>
 wrote:
Dear Pawel and All others 
This 2010 review is - unfortunately - largely based on the flawed statistics I 
mentioned before, namely on the a priori assumption that the inner product of a 
signal vector and a noise vector are ZERO

[ccp4bb] FW: [ccp4bb] [3dem] Which resolution?

2020-02-17 Thread colin.n...@diamond.ac.uk

Dear Marin

For electron microscopy, the Rose criterion (a measure of contrast/noise) is 
sometime used to distinguish low contrast features within polymers (see for 
example Libera, M. & Egerton, R. (2010). Polymer Reviews. 50, 321-339.). A 
particular value of the Rose criterion implies a particular information content.

I think this can be directly related to a particular threshold for FSC or FRC. 
If you can comment on this in your Why-o-Why didactical crusade, I might even 
register for a twitter account!

Regards

Colin

From: CCP4 bulletin board mailto:CCP4BB@JISCMAIL.AC.UK>> 
On Behalf Of Marin van Heel
Sent: 17 February 2020 13:29
To: CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>
Subject: Re: [ccp4bb] [3dem] Which resolution?



Dear Petrus Zwart (and all other X-ray crystallographers and EM-ers)

Resolution in the sense of the Abbe Diffraction Limit or the Rayleigh Criterion 
are part of what we would now call the theory of linear systems, and are 
described by a “transfer function”. “Fourier Optics” covers the theory of 
linear systems in Optics. These two (essentially identical) resolution criteria 
state that the smallest details (let us call that “A(r)” for “Airy”) you can 
possible observe in real-space is inversely proportional to the size of the 
maximum aperture in Fourier space, i.e., the extent of the transfer function in 
Fourier space “T(f)”. This defines what “Instrumental Resolution” one could 
possibly achieve in an experiment, “instrumental” to differentiate it from the 
“Results Resolution” you actually managed achieve in your data 
collection/processing experiment [#Why-o-Why #10].  What a linear imaging 
system will do the image of the object (under the best of circumstances) is 
described by a (Fourier-space) multiplication of the Fourier transform of the 
object O(r) [= O(f)] with the (Fourier-space) transfer function T(f) of the 
instrument, yielding O’(f), which you need to transfer back to real space to 
obtain the exit wave in the image plane; that is: {O’(f)=T(f)·O(f)}.

Note, however, that the properties of the sample, that is, of O(r), does 
nowhere appear in the transfer function T(f) or in its real-space version A(r)! 
  The very concept of (instrumental) resolution is exactly that it does NOT 
depend on the object O(r)! The “results resolution” [#Why-o-Why #10], on the 
other hand, obviously depends on the sample; the illumination; on the radiation 
dose; the pH of the solvent; the air humidity; and the mood of the person doing 
the work on the day of preparation…

The FRC/FSC “results resolution” measures we introduced in 1982/1986, fit 
perfectly in the abstract framework of linear systems and Fourier optics. The 
X-ray metrics like R-factor and phase-residuals and FOMs do NOT fit into that 
clean mathematical framework. Unfortunately, my EM colleagues started using 
X-ray metrics like “Differential Phase Residual” and “FOMs” in EM based on some 
gut feeling that the X-ray scientists know it better because they achieve a 
higher resolution than us EM blobologists. How wrong my EM colleagues were: the 
quality of the resolution metric is totally unrelated to the numerical 
resolution levels we operate at! Seeing 3mm kidney stones in a patient’s 
tomogram can be equally important as seeing some hydrogen bond length in a 
cryo-EM density. The FRC/FSC actually make more sense than the indirect and 
hybrid X-ray ones.  This misconception has introduced a very tainted – and 
still ongoing – discussion in cryo-EM. Now that the fields of X-ray 
crystallography and cryo-EM are merging it is time to get things right!

I guess I cannot yet terminate my #Why-o-Why didactical crusade:  I will need 
at least one more on just this linear-transfer theory issue alone…

Marin van Heel, CNPEM/LNNano, Campinas, Brazil


On Sun, Feb 16, 2020 at 6:51 PM Petrus Zwart 
mailto:phzw...@lbl.gov>> wrote:
Hi All,

How is the 'correct' resolution estimation related to the estimated error on 
some observed hydrogen bond length of interest, or an error on the estimated 
occupancy of a ligand or conformation or anything else that has structural 
significance?

In crystallography, it isn't really (only in some very approximate fashion), 
and I doubt that in EM there is something to that effect. If you want to use 
the resolution to get a gut feeling on how your maps look and how your data 
behaves, it doesn't really matter what standard you use, as long as you are 
consistent in the use of the metric you use. If you want to use this estimate 
to get to uncertainties of model parameters, you better try something else.

Regards
Peter Zwart



On Sun, Feb 16, 2020 at 8:38 AM Marin van Heel 
<057a89ab08a1-dmarc-requ...@jiscmail.ac.uk<mailto:057a89ab08a1-dmarc-requ...@jiscmail.ac.uk>>
 wrote:
Dear Pawel and All others 
This 2010 review is - unfortunately - largely based on the flawed statistics I 
mentioned before, namely on the a priori

Re: [ccp4bb] FW: [ccp4bb] [3dem] Which resolution?

2020-02-17 Thread John R Helliwell
Hi Colin,
Neutrons are applied to the uranyl hydrides so as to make their scattering 
lengths much more equal than with X-rays, and so side step ripple effects of 
the uranium in the Xray case, which obscures those nearby hydrogens.
In terms of feature resolvability the email exchange (and there may be better 
ones):- http://www.phenix-online.org/pipermail/phenixbb/2017-March/023326.html
refers to “locally scaled in a complex way”. So, is the physics of the 
visibility of features really comparable between the two methods of cryoEM and 
crystal structure analysis?
Greetings,
John
Emeritus Professor John R Helliwell DSc



> On 17 Feb 2020, at 13:59, Nave, Colin (DLSLtd,RAL,LSCI) 
>  wrote:
> 
> 
> Hi John
> I agree that if I truncate the data at a high information content threshold 
> (e.g. 2 bits)  series termination errors might hide the lighter atoms (e.g. 
> the hydrogens in uranium hydride crystal structures). However, I think this 
> is purely a limitation of producing electron density maps via Fourier 
> transforms (i.e. not the physics). A variety of techniques are available for 
> handling series termination including ones which are maximally non-committal 
> with respect to the missing data. The issue is still there in some fields 
> (see https://onlinelibrary.wiley.com/iucr/itc/Ha/ch4o8v0001/ ). For protein 
> crystallography perhaps series termination errors have become less important 
> as people are discouraged from applying some I/sigI type cut off.
>  
> Cheers
> Colin
>  
>  
>  
> From: John R Helliwell  
> Sent: 17 February 2020 12:09
> To: Nave, Colin (DLSLtd,RAL,LSCI) 
> Subject: Re: [ccp4bb] FW: [ccp4bb] [3dem] Which resolution?
>  
> Hi Colin,
> I think the physics of the imaging and the crystal structure analysis, 
> respectively without and with Fourier termination ripples, are different. For 
> the MX re Fourier series for two types of difference map see our 
> contribution:-
>  
> http://scripts.iucr.org/cgi-bin/paper?S0907444903004219
>  
> Greetings,
> John 
>  
> 
> Emeritus Professor John R Helliwell DSc
> https://www.crcpress.com/The-Whats-of-a-Scientific-Life/Helliwell/p/book/9780367233020
>  
>  
> 
> 
> On 17 Feb 2020, at 11:26, "colin.n...@diamond.ac.uk" 
>  
> 
>  
> Dear all.
> Would it help to separate out the issue of the FSC from the value of the 
> threshold? My understanding is that the FSC addresses the spatial frequency 
> at which there is a reliable information content in the image. This concept 
> should apply to a wide variety of types of image. The issue is then what 
> value of the threshold to use. For interpretation of protein structures 
> (whether by x-ray or electron microscopy), a half bit threshold appears to be 
> appropriate. However, for imaging the human brain (one of Marin’s examples) a 
> higher threshold might be adopted as a range of contrasts might be present 
> (axons for example have a similar density to the surroundings). For 
> crystallography, if one wants to see lighter atoms (hydrogens in the presence 
> of uranium or in proteins) a higher threshold might also be appropriate. I am 
> not sure about this to be honest as a 2 bit threshold (for example) would 
> mean that there is information to higher resolution at a threshold of a half 
> bit (unless one is at a diffraction or instrument limited resolution).
>  
> Most CCP4BBers will understand that a single number is not good enough. 
> However, many users of the protein structure databases will simply search for 
> the structure with the highest named resolution. It might be difficult to 
> send these users to re-education camps.  
>  
> Regards
> Colin
>  
> From: CCP4 bulletin board  On Behalf Of Petrus Zwart
> Sent: 16 February 2020 21:50
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] [3dem] Which resolution?
>  
> Hi All,
>  
> How is the 'correct' resolution estimation related to the estimated error on 
> some observed hydrogen bond length of interest, or an error on the estimated 
> occupancy of a ligand or conformation or anything else that has structural 
> significance?
>  
> In crystallography, it isn't really (only in some very approximate fashion), 
> and I doubt that in EM there is something to that effect. If you want to use 
> the resolution to get a gut feeling on how your maps look and how your data 
> behaves, it doesn't really matter what standard you use, as long as you are 
> consistent in the use of the metric you use. If you want to use this estimate 
> to get to uncertainties of model parameters, you better try something else.
>  
> Regards
> Peter Zwart
>   
>  
>  
> On Sun, Feb 16, 2020 at 8:38 AM Marin van Heel 
> <057a89ab08a1-

Re: [ccp4bb] FW: [ccp4bb] [3dem] Which resolution?

2020-02-17 Thread colin.n...@diamond.ac.uk
Hi John
I agree that if I truncate the data at a high information content threshold 
(e.g. 2 bits)  series termination errors might hide the lighter atoms (e.g. the 
hydrogens in uranium hydride crystal structures). However, I think this is 
purely a limitation of producing electron density maps via Fourier transforms 
(i.e. not the physics). A variety of techniques are available for handling 
series termination including ones which are maximally non-committal with 
respect to the missing data. The issue is still there in some fields (see 
https://onlinelibrary.wiley.com/iucr/itc/Ha/ch4o8v0001/ ). For protein 
crystallography perhaps series termination errors have become less important as 
people are discouraged from applying some I/sigI type cut off.

Cheers
Colin



From: John R Helliwell 
Sent: 17 February 2020 12:09
To: Nave, Colin (DLSLtd,RAL,LSCI) 
Subject: Re: [ccp4bb] FW: [ccp4bb] [3dem] Which resolution?

Hi Colin,
I think the physics of the imaging and the crystal structure analysis, 
respectively without and with Fourier termination ripples, are different. For 
the MX re Fourier series for two types of difference map see our contribution:-

http://scripts.iucr.org/cgi-bin/paper?S0907444903004219

Greetings,
John

Emeritus Professor John R Helliwell DSc
https://www.crcpress.com/The-Whats-of-a-Scientific-Life/Helliwell/p/book/9780367233020




On 17 Feb 2020, at 11:26, 
"colin.n...@diamond.ac.uk<mailto:colin.n...@diamond.ac.uk>" 
mailto:CCP4BB@JISCMAIL.AC.UK>> 
On Behalf Of Petrus Zwart
Sent: 16 February 2020 21:50
To: CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>
Subject: Re: [ccp4bb] [3dem] Which resolution?

Hi All,

How is the 'correct' resolution estimation related to the estimated error on 
some observed hydrogen bond length of interest, or an error on the estimated 
occupancy of a ligand or conformation or anything else that has structural 
significance?

In crystallography, it isn't really (only in some very approximate fashion), 
and I doubt that in EM there is something to that effect. If you want to use 
the resolution to get a gut feeling on how your maps look and how your data 
behaves, it doesn't really matter what standard you use, as long as you are 
consistent in the use of the metric you use. If you want to use this estimate 
to get to uncertainties of model parameters, you better try something else.

Regards
Peter Zwart



On Sun, Feb 16, 2020 at 8:38 AM Marin van Heel 
<057a89ab08a1-dmarc-requ...@jiscmail.ac.uk<mailto:057a89ab08a1-dmarc-requ...@jiscmail.ac.uk>>
 wrote:
Dear Pawel and All others 
This 2010 review is - unfortunately - largely based on the flawed statistics I 
mentioned before, namely on the a priori assumption that the inner product of a 
signal vector and a noise vector are ZERO (an orthogonality assumption).  The 
(Frank & Al-Ali 1975) paper we have refuted on a number of occasions (for 
example in 2005, and most recently in our BioRxiv paper) but you still take 
that as the correct relation between SNR and FRC (and you never cite the 
criticism...).
Sorry
Marin

On Thu, Feb 13, 2020 at 10:42 AM Penczek, Pawel A 
mailto:pawel.a.penc...@uth.tmc.edu>> wrote:
Dear Teige,

I am wondering whether you are familiar with

Resolution measures in molecular electron microscopy.
Penczek PA. Methods Enzymol. 2010.
Citation

Methods Enzymol. 2010;482:73-100. doi: 10.1016/S0076-6879(10)82003-8.

You will find there answers to all questions you asked and much more.

Regards,
Pawel Penczek

Regards,
Pawel
___
3dem mailing list
3...@ncmir.ucsd.edu<mailto:3...@ncmir.ucsd.edu>
https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


--

P.H. Zwart
Staff Scientist
Molecular Biophysics and Integrated Bioimaging &
Center for Advanced Mathematics for Energy Research Applications
Lawrence Berkeley National Laboratories
1 Cyclotron Road, Berkeley, CA-94703, USA
Cell: 510 289 9246

PHENIX:   http://www.phenix-online.org<http://www.phenix-online.org/>
CAMERA: http://camera.lbl.gov/
-



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1



--

This e-mail and any attachments may contain confidential, copyright and or 
privileged material, and are for the use of the intended addressee only. If you 
are not the intended addressee or an authorised recipient of the addressee 
please notify us of receipt by returning the e-mail and do not use, copy, 
retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail

Re: [ccp4bb] [3dem] Which resolution?

2020-02-17 Thread Marin van Heel
Dear Petrus Zwart (and all other X-ray crystallographers and EM-ers)



Resolution in the sense of the Abbe Diffraction Limit or the Rayleigh
*Criterion
are part of what we would now call the theory of linear systems, and are
described by a “transfer function”. “Fourier Optics” covers the theory of
linear systems in Optics. These two (essentially identical) resolution
criteria state that the smallest details (let us call that “A(r)” for
“Airy”) you can possible observe in real-space is inversely proportional to
the size of the maximum aperture in Fourier space, i.e., the extent of the
transfer function in Fourier space “T(f)”. This defines what “Instrumental
Resolution” one could possibly achieve in an experiment, “instrumental” to
differentiate it from the “Results Resolution” you actually managed achieve
in your data collection/processing experiment [#Why-o-Why #10].  What a
linear imaging system will do the image of the object (under the best of
circumstances) is described by a (Fourier-space) multiplication of the
Fourier transform of the object O(r) [= O(f)] with the (Fourier-space)
transfer function T(f) of the instrument, yielding O’(f), which you need to
transfer back to real space to obtain the exit wave in the image plane;
that is: {O’(f)=T(f)**·**O(f)}. *



*Note, however, that the properties of the sample, that is, of O(r), does
nowhere appear in the transfer function T(f) or in its real-space version
A(r)!   The very concept of (instrumental) resolution is exactly that it
does NOT depend on the object O(r)! The “results resolution” [#Why-o-Why
#10], on the other hand, obviously depends on the sample; the illumination;
on the radiation dose; the pH of the solvent; the air humidity; and the
mood of the person doing the work on the day of preparation… *



*The FRC/FSC “results resolution” measures we introduced in 1982/1986, fit
perfectly in the abstract framework of linear systems and Fourier optics.
The X-ray metrics like R-factor and phase-residuals and FOMs do NOT fit
into that clean mathematical framework. Unfortunately, my EM colleagues
started using X-ray metrics like “Differential Phase Residual” and “FOMs”
in EM based on some gut feeling that the X-ray scientists know it better
because they achieve a higher resolution than us EM blobologists. How wrong
my EM colleagues were: the quality of the resolution metric is totally
unrelated to the numerical resolution levels we operate at! Seeing 3mm
kidney stones in a patient’s tomogram can be equally important as seeing *some
hydrogen bond length in a cryo-EM density. The FRC/FSC actually make more
sense than the indirect and hybrid X-ray ones.  *This misconception has
introduced a very tainted – and still ongoing – discussion in cryo-EM. Now
that the fields of X-ray crystallography and cryo-EM are merging it is time
to get things right! *



*I guess I cannot yet terminate my #Why-o-Why didactical crusade:  I will
need at least one more on just this linear-transfer theory issue alone…*



*Marin van Heel, CNPEM/LNNano, Campinas, Brazil  *



On Sun, Feb 16, 2020 at 6:51 PM Petrus Zwart  wrote:

> Hi All,
>
> How is the 'correct' resolution estimation related to the estimated error
> on some observed hydrogen bond length of interest, or an error on the
> estimated occupancy of a ligand or conformation or anything else that has
> structural significance?
>
> In crystallography, it isn't really (only in some very approximate
> fashion), and I doubt that in EM there is something to that effect. If you
> want to use the resolution to get a gut feeling on how your maps look and
> how your data behaves, it doesn't really matter what standard you use, as
> long as you are consistent in the use of the metric you use. If you want to
> use this estimate to get to uncertainties of model parameters, you better
> try something else.
>
> Regards
> Peter Zwart
>
>
>
> On Sun, Feb 16, 2020 at 8:38 AM Marin van Heel <
> 057a89ab08a1-dmarc-requ...@jiscmail.ac.uk> wrote:
>
>> Dear Pawel and All others 
>>
>> This 2010 review is - unfortunately - largely based on the flawed
>> statistics I mentioned before, namely on the a priori assumption that the
>> inner product of a signal vector and a noise vector are ZERO (an
>> orthogonality assumption).  The (Frank & Al-Ali 1975) paper we have refuted
>> on a number of occasions (for example in 2005, and most recently in our
>> BioRxiv paper) but you still take that as the correct relation between SNR
>> and FRC (and you never cite the criticism...).
>> Sorry
>> Marin
>>
>> On Thu, Feb 13, 2020 at 10:42 AM Penczek, Pawel A <
>> pawel.a.penc...@uth.tmc.edu> wrote:
>>
>>> Dear Teige,
>>>
>>> I am wondering whether you are familiar with
>>>
>>> Resolution measures in molecular electron microscopy.
>>> Penczek PA. Methods Enzymol. 2010.
>>> Citation
>>>
>>> Methods Enzymol. 2010;482:73-100. doi: 10.1016/S0076-6879(10)82003-8.
>>>
>>> You will find there answers to all questions you asked and much more.
>>>
>>> 

[ccp4bb] FW: [ccp4bb] [3dem] Which resolution?

2020-02-17 Thread colin.n...@diamond.ac.uk

Dear all.
Would it help to separate out the issue of the FSC from the value of the 
threshold? My understanding is that the FSC addresses the spatial frequency at 
which there is a reliable information content in the image. This concept should 
apply to a wide variety of types of image. The issue is then what value of the 
threshold to use. For interpretation of protein structures (whether by x-ray or 
electron microscopy), a half bit threshold appears to be appropriate. However, 
for imaging the human brain (one of Marin’s examples) a higher threshold might 
be adopted as a range of contrasts might be present (axons for example have a 
similar density to the surroundings). For crystallography, if one wants to see 
lighter atoms (hydrogens in the presence of uranium or in proteins) a higher 
threshold might also be appropriate. I am not sure about this to be honest as a 
2 bit threshold (for example) would mean that there is information to higher 
resolution at a threshold of a half bit (unless one is at a diffraction or 
instrument limited resolution).

Most CCP4BBers will understand that a single number is not good enough. 
However, many users of the protein structure databases will simply search for 
the structure with the highest named resolution. It might be difficult to send 
these users to re-education camps.

Regards
Colin

From: CCP4 bulletin board  On Behalf Of Petrus Zwart
Sent: 16 February 2020 21:50
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] [3dem] Which resolution?

Hi All,

How is the 'correct' resolution estimation related to the estimated error on 
some observed hydrogen bond length of interest, or an error on the estimated 
occupancy of a ligand or conformation or anything else that has structural 
significance?

In crystallography, it isn't really (only in some very approximate fashion), 
and I doubt that in EM there is something to that effect. If you want to use 
the resolution to get a gut feeling on how your maps look and how your data 
behaves, it doesn't really matter what standard you use, as long as you are 
consistent in the use of the metric you use. If you want to use this estimate 
to get to uncertainties of model parameters, you better try something else.

Regards
Peter Zwart



On Sun, Feb 16, 2020 at 8:38 AM Marin van Heel 
<057a89ab08a1-dmarc-requ...@jiscmail.ac.uk<mailto:057a89ab08a1-dmarc-requ...@jiscmail.ac.uk>>
 wrote:
Dear Pawel and All others 
This 2010 review is - unfortunately - largely based on the flawed statistics I 
mentioned before, namely on the a priori assumption that the inner product of a 
signal vector and a noise vector are ZERO (an orthogonality assumption).  The 
(Frank & Al-Ali 1975) paper we have refuted on a number of occasions (for 
example in 2005, and most recently in our BioRxiv paper) but you still take 
that as the correct relation between SNR and FRC (and you never cite the 
criticism...).
Sorry
Marin

On Thu, Feb 13, 2020 at 10:42 AM Penczek, Pawel A 
mailto:pawel.a.penc...@uth.tmc.edu>> wrote:
Dear Teige,

I am wondering whether you are familiar with

Resolution measures in molecular electron microscopy.
Penczek PA. Methods Enzymol. 2010.
Citation

Methods Enzymol. 2010;482:73-100. doi: 10.1016/S0076-6879(10)82003-8.

You will find there answers to all questions you asked and much more.

Regards,
Pawel Penczek

Regards,
Pawel
___
3dem mailing list
3...@ncmir.ucsd.edu<mailto:3...@ncmir.ucsd.edu>
https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


--

P.H. Zwart
Staff Scientist
Molecular Biophysics and Integrated Bioimaging &
Center for Advanced Mathematics for Energy Research Applications
Lawrence Berkeley National Laboratories
1 Cyclotron Road, Berkeley, CA-94703, USA
Cell: 510 289 9246

PHENIX:   http://www.phenix-online.org<http://www.phenix-online.org/>
CAMERA: http://camera.lbl.gov/
-



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

-- 
This e-mail and any attachments may contain confidential, copyright and or 
privileged material, and are for the use of the intended addressee only. If you 
are not the intended addressee or an authorised recipient of the addressee 
please notify us of receipt by returning the e-mail and do not use, copy, 
retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not 
necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments 
are free from 

Re: [ccp4bb] [3dem] Which resolution?

2020-02-16 Thread Petrus Zwart
Hi All,

How is the 'correct' resolution estimation related to the estimated error
on some observed hydrogen bond length of interest, or an error on the
estimated occupancy of a ligand or conformation or anything else that has
structural significance?

In crystallography, it isn't really (only in some very approximate
fashion), and I doubt that in EM there is something to that effect. If you
want to use the resolution to get a gut feeling on how your maps look and
how your data behaves, it doesn't really matter what standard you use, as
long as you are consistent in the use of the metric you use. If you want to
use this estimate to get to uncertainties of model parameters, you better
try something else.

Regards
Peter Zwart



On Sun, Feb 16, 2020 at 8:38 AM Marin van Heel <
057a89ab08a1-dmarc-requ...@jiscmail.ac.uk> wrote:

> Dear Pawel and All others 
>
> This 2010 review is - unfortunately - largely based on the flawed
> statistics I mentioned before, namely on the a priori assumption that the
> inner product of a signal vector and a noise vector are ZERO (an
> orthogonality assumption).  The (Frank & Al-Ali 1975) paper we have refuted
> on a number of occasions (for example in 2005, and most recently in our
> BioRxiv paper) but you still take that as the correct relation between SNR
> and FRC (and you never cite the criticism...).
> Sorry
> Marin
>
> On Thu, Feb 13, 2020 at 10:42 AM Penczek, Pawel A <
> pawel.a.penc...@uth.tmc.edu> wrote:
>
>> Dear Teige,
>>
>> I am wondering whether you are familiar with
>>
>> Resolution measures in molecular electron microscopy.
>> Penczek PA. Methods Enzymol. 2010.
>> Citation
>>
>> Methods Enzymol. 2010;482:73-100. doi: 10.1016/S0076-6879(10)82003-8.
>>
>> You will find there answers to all questions you asked and much more.
>>
>> Regards,
>> Pawel Penczek
>>
>>
>> Regards,
>> Pawel
>> ___
>> 3dem mailing list
>> 3...@ncmir.ucsd.edu
>> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>>
>
> --
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
>


-- 

P.H. Zwart
Staff Scientist
Molecular Biophysics and Integrated Bioimaging &
Center for Advanced Mathematics for Energy Research Applications
Lawrence Berkeley National Laboratories
1 Cyclotron Road, Berkeley, CA-94703, USA
Cell: 510 289 9246

PHENIX:   http://www.phenix-online.org
CAMERA: http://camera.lbl.gov/
-



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] [3dem] Which resolution?

2020-02-16 Thread Marin van Heel
Dear Pawel and All others 

This 2010 review is - unfortunately - largely based on the flawed
statistics I mentioned before, namely on the a priori assumption that the
inner product of a signal vector and a noise vector are ZERO (an
orthogonality assumption).  The (Frank & Al-Ali 1975) paper we have refuted
on a number of occasions (for example in 2005, and most recently in our
BioRxiv paper) but you still take that as the correct relation between SNR
and FRC (and you never cite the criticism...).
Sorry
Marin

On Thu, Feb 13, 2020 at 10:42 AM Penczek, Pawel A <
pawel.a.penc...@uth.tmc.edu> wrote:

> Dear Teige,
>
> I am wondering whether you are familiar with
>
> Resolution measures in molecular electron microscopy.
> Penczek PA. Methods Enzymol. 2010.
> Citation
>
> Methods Enzymol. 2010;482:73-100. doi: 10.1016/S0076-6879(10)82003-8.
>
> You will find there answers to all questions you asked and much more.
>
> Regards,
> Pawel Penczek
>
>
> Regards,
> Pawel
> ___
> 3dem mailing list
> 3...@ncmir.ucsd.edu
> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1