Re: [ccp4bb] criteria to set resolution limit

2021-09-12 Thread Peat, Tom (Manufacturing, Clayton)
Thanks to Petr and Ian for their thoughtful replies.

One worry I have is that as a community we continue to debate what is the 
'proper' or 'correct' way to measure resolution which I think is quite 
confusing to those that are early in their crystallographic (or general 
structural biology) careers. It is the scientific method to argue (sometimes ad 
nauseum) about what constitutes the best data, the best methods, etc. so this 
isn't really a surprise to those that have been around a while. But it might be 
nice to have a set of criteria which most people in the field agree to and then 
these are updated on a regular basis as we move forward as a field. Not that we 
will ever get 100% agreement to anything, as that is just unrealistic (and 
there are many posts to this BB already to show that). What I was thinking was 
a set of standards that are reasonable and that when broken (not all at once, 
but one standard at a time), one needs to explain why instead of just hoping 
that a reviewer (or other scientist looking at the data) just misses it. From 
various posts, it seems that people generally agree that CC1/2 is a good 
criteria, that Rpim and Rfree are pretty good criteria, that I/sigI is 
reasonable at some level and that completeness and multiplicity (or redundancy) 
are important as well. These are not all independent (Rpim clearly depends on 
the multiplicity/redundancy, etc) but having some kind of standard set of 
numbers to judge one's own data by as a first pass might be helpful (and I 
believe the original question was basically, what do a I report as the 
resolution?)
Just to throw some numbers out as an example: CC1/2 at 0.3 (or 30% depending on 
your reporting style), I/sigI at 1.0, completeness at 75% in the last 
resolution bin and multiplicity/redundancy at least 3.0 throughout (and in the 
last shell). Nothing magical in these numbers, but if you feel that your data 
are really good but the completeness isn't there, you just explain why, or 
something to that effect. I believe this is one of the reasons we always have a 
table 1 in our publications and that there is no one number that really gives 
us that sense of assurance that the model and data are good (or in my own 
experience, good enough).

I guess what I am trying to 'solve' is the issue I come across regularly in 
reviewing papers: the authors are very interested in the biology of their 
system and spend a lot of time explaining what the system is, why it is 
important, etc (all great stuff) and then fill in the table with a set of 
numbers that makes me then wonder why they believe their own models? Often very 
low completeness, low redundancy/ multiplicity, CC1/2 which varies from 0.99+ 
to almost zero, all in order to make the reported resolution sound good (and 
crazy numbers of decimal places- reporting a resolution of 1.39623 AA with 15% 
completeness could more realistically be reported at 1.40 AA or 1.50 AA with 
50% completeness and I don't think the actual interpretation/ electron density 
would change significantly). If it was then stated explicitly in the 
manuscript, for example, that paired refinement was done or that difference 
maps were calculated (or FEM or Polder or ?) at various resolutions which then 
showed the area of interest more clearly, the readers and reviewers might be 
more assured that the authors weren't just reporting a semi-random number as 
'the resolution'.  Numbers in the table that are clearly (?) a bit relaxed, if 
actually explained in the paper, would then make more sense. We as a community 
have gone somewhat this direction with the validation criteria given for 
deposited structures, which is a start, but it hasn't really tackled the thorny 
question of 'what is my resolution?'

As Ian mentioned, some programs and some criteria depend on relatively high 
completeness in the data in the way they are calculated (CC1/2 is perfect when 
all data are set to zero). If a program 'fills in' data that are missing, then 
that one will also be subject to issues when the data are very incomplete. One 
can always call on people to 'get better data' and of course it would always be 
fantastic if each data set was complete, had high CC1/2 and multiplicity/ 
redundancy, but then this isn't very realistic either.

Thanks again for the considered replies to the previous post, and if this 
sounds like a rant, it probably is.

cheers, tom

Tom Peat, PhD
Proteins Group
Biomedical Program, CSIRO
343 Royal Parade
Parkville, VIC, 3052
+613 9662 7304
+614 57 539 419
tom.p...@csiro.au


From: Petr Kolenko 
Sent: Sunday, September 12, 2021 10:07 PM
To: Peat, Tom (Manufacturing, Clayton) ; 
CCP4BB@JISCMAIL.AC.UK 
Subject: Re: [ccp4bb] criteria to set resolution limit

Dear Tom,
You are absolutely right with your points. But let me explain a bit more my 
opinion. And be aware that it is my opinion! Not necessarily the truth. There 
might be another opinion in the community.
In pa

Re: [ccp4bb] criteria to set resolution limit

2021-09-12 Thread Tim Gruene
Dear Farhan,

did you possibly move the detector too far from the crystal, and the
high resolution spots landed on the detector corners? This would
explain the good I/sigma at low completeness. In that case, there is no
reason to discard the data. You detector was simply not large enough to
capture the rest.

Best regards,
Tim

On Sat, 11 Sep 2021 21:25:23 +0530 Syed Farhan Ali
 wrote:

> Dear All,
> 
> I have query regarding one of my dataset. I am running aimless by
> keeping highest resolution 1.62 A and getting  I/SigI = 2 but data
> completeness is around 22 in outermost shell. And if I am increasing
> the resolution cutoff up to 1.8 A then I/SigI is 6.2 and completeness
> is 82.4. I have attached the screenshot of the result.
> What should be the criteria to set the resolution limit?  Should I
> stick to  I/SigI  or I have to consider about the completeness of
> data. And if completeness is also a guiding factor than how much
> minimum completeness I can keep in the higher resolution shell.
> 
> 
> 
> 
> 
> Regards,
> Farhan
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
> 
> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a
> mailing list hosted by www.jiscmail.ac.uk, terms & conditions are
> available at https://www.jiscmail.ac.uk/policyandsecurity/



-- 
--
Tim Gruene
Head of the Centre for X-ray Structure Analysis
Faculty of Chemistry
University of Vienna

Phone: +43-1-4277-70202

GPG Key ID = A46BEE1A



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


pgpMVhKk3LGEm.pgp
Description: OpenPGP digital signature


[ccp4bb] 10th International Conference of the Hellenic Crystallographic Association, Athens, Oct 15th-17th

2021-09-12 Thread Petros Giastas
Dear colleagues,

With this message I would like to announce the *10th International
Conference of the Hellenic Crystallographic Association* (HeCrA) will take
place, in an in-person format, in *October 15-17, 2021* at the Conference
Center of the National Centre for Scientific Research "Demokritos" in Agia
Paraskevi,* Athens, Greece* (https://sites.google.com/view/hecra2020/home).
 Call for abstracts has been opened with a *deadline of 26 Sep 2021*  (
https://sites.google.com/view/hecra2020/home/call-for-abstracts).
A limited number of IUCr bursaries for travel, accommodation and
subsistence expenses, will be granted to eligible young students who will
travel to Athens from abroad or from other Greek cities (
https://sites.google.com/view/hecra2020/home/bursaries).

On behalf of the organizing committee,

Best regards,
Petros Giastas



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] criteria to set resolution limit

2021-09-12 Thread Petr Kolenko
Dear Tom,
You are absolutely right with your points. But let me explain a bit more my 
opinion. And be aware that it is my opinion! Not necessarily the truth. There 
might be another opinion in the community.
In paired refinement, you always have the reference data. In case of 
significantly decreasing completeness, you can always select your starting 
resolution that is complete enough (e.g. more than 90% ?). And this is your 
reference data. As an increase in resolution improves your model (drop in 
R-values, mainly R-free), you always compare your models using the reference 
data. Should we use as many observables as possible? I would do so. Even if the 
completeness was very low.
Another thing is the statement that your data is processed up to 1.1 AA when 
the completeness is as low as 2%. Of course. But, this is why we have more 
cells in the so-called "Table 1". When judging the structure, one should go 
carefully through the whole table. And maybe, more resolution shells should be 
reported in extreme cases. There is a possibility to do so during the structure 
deposition. Here, I agree with you.
Low data completeness is usually a big problem. Both random and systematic. My 
personal experience is that it causes severe instability in structure 
refinement. But this is frequently projected to the R-values. And as any 
instability appears, paired refinement does not suggest using the higher 
resolution. As long as it follows the right trends, you should be fine.
Thanks for pointing out the high data completeness in our paper. We should run 
more analyses to get ready for such comments. ;-)
Best regards,
Petr

From: Peat, Tom (Manufacturing, Clayton) 
Sent: Sunday, September 12, 2021 5:02:04 AM
To: CCP4BB@JISCMAIL.AC.UK; Petr Kolenko
Subject: Re: [ccp4bb] criteria to set resolution limit

Hello Petr,

I would like to understand more completely your assertion in the last email 
regarding completeness: "I would not care about low data completeness in case 
when PAIREF shows improvement of your model."
In the papers you gave links to, the data completeness was always 90+% even in 
the outer shells. In cases where this is not true, I'm not clear why 
completeness would not be important? The ultimate thought experiment, or 
extreme case, where one has very few reflections in the resolution limit, just 
getting a 'better model' doesn't show me that the structure is now 1.3 A (or 
whatever limit one wants to set). Models with no data are perfect, in the 
physical sense of not having clashes, Ramachandran outliers, etc.
As an example, I am aware of a deposition in the PDB where the outer resolution 
shell was approximately 2% complete and I don't believe that the structure is 
really at the resolution stated as the features 'seen' in terms of electron 
density don't really measure up to what I would expect and the electron density 
looks a lot more like about 0.5A lower resolution, where the completeness is a 
bit better than 50%.
So my 'bias' is that completeness of the data is still an important feature 
that needs to be taken into account when forming the basis of 'resolution 
limit', but I'm absolutely willing to be shown that my bias is incorrect.

Best regards, tom

Tom Peat, PhD
Proteins Group
Biomedical Program, CSIRO
343 Royal Parade
Parkville, VIC, 3052
+613 9662 7304
+614 57 539 419
tom.p...@csiro.au


From: CCP4 bulletin board  on behalf of Petr Kolenko 

Sent: Sunday, September 12, 2021 5:43 AM
To: CCP4BB@JISCMAIL.AC.UK 
Subject: Re: [ccp4bb] criteria to set resolution limit

Dear Farhan,
Your dataset does not seem to be that critically anisotropic to me. But of 
course, try the STARANISO server and make your own decision.
To me, the dataset seems to be collected with a suboptimal data strategy. 
Although I do not know your setup, I would make the crystal-to-detector 
distance shorter next time. Or maybe rotate a bit more with the crystal? I do 
not know the details.
And now, to the point of the resolution. The optimal approach is to try paired 
refinement, or even better - paired refinement with the complete 
cross-validation protocol. This can be done using program PAIREF that is easy 
to be installed to your CCP4 installation by the following commands:

ccp4-python -m ensurepip --user
ccp4-python -m pip install pairef --no-deps --upgrade --user

The easiest way to use PAIREF is via GUI. Use the following command:

ccp4-python -m pairef --gui

To know more about the program and about the protocol, please read further.
The original work: 
https://journals.iucr.org/m/issues/2020/04/00/mf5044/index.html
Upgrade for PHENIX users: 
https://scripts.iucr.org/cgi-bin/paper?S2053230X21006129

We organized a webinar about the PAIREF about a half year ago. We even made a 
video from that. The video covers a short introduction to paired refinement, 
installation of PAIREF, and running a test case.

The link for the webinar is here: 
htt

Re: [ccp4bb] criteria to set resolution limit

2021-09-12 Thread Ian Tickle
Tom, the way I have always dealt with this (and the way it is currently
handled in Staraniso) is to simply count unmeasured intensities as zero in
the averaging of I/sigma(I).  This is the same as taking the mean
I/sigma(I) for all reflections in a bin = bin completeness x mean
I/sigma(I) for measured reflections in the bin, so bins with low
completeness are more likely to be cut.  This has the clear advantage that
you don't need to decide on separate arbitrary criteria for completeness
and measured mean I/sigma(I); you just need to decide one (still arbitrary)
criterion for the overall mean I/sigma(I).  For example, if the bin
completeness were only 2%, the mean I/sigma(I) for the measured reflections
in that bin would have to be > 50 times the threshold (e.g. > 50x1.5) in
order not to cut at that bin, which is extremely unlikely.

This makes sense statistically because whatever the true value of I,
because it is obviously unknown for an unmeasured reflection, sigma(I) has
to be very large and therefore I/sigma(I) for an unmeasured reflection will
be much smaller than that of its measured neighbours, and the I/sigma(I)
value won't make a contribution to the mean that is greatly different from
zero anyway.  I suppose one could have a more sophisticated treatment where
I/sigma(I) is estimated from the Wilson prior; however for that one needs
to know the absolute scale and anisotropy and those can only be determined
_after_ the cut-off has been performed.  So one would need a bootstrap
process, which would greatly increase the complexity (and failure modes !)
of the algorithm.  It's not clear to me that the difference in the results
would make the effort worthwhile.

Unfortunately one can't pull the same trick when using CC_1/2 as the
cut-off criterion because a zero intensity has perfect correlation with
another zero intensity !  This would cause CC_1/2 to increase at low
completeness (remember one is correlating deviations from the mean
intensity for the bin, so zeros will have large deviations from the mean
and make a big contribution to the CC).  This is definitely not what one
wants !  For this reason and others, for example the standard significance
test for the correlation coefficient assumes homoscedastic (i.e. uniform
variance) normally distributed data, but intensity data from area detectors
has a Wilson distribution and is always strongly heteroscedastic, unless
you somehow contrive to collect the data so that all the sigmas are equal,
as I recall was possible with 4-circle diffractometers equipped with a
single proportional counter as we had in the 60s-80s.  Also CC_1/2 is known
to be biased by significant anisotropy, so all-in-all I prefer to use the
mean I/sigma(I) criterion.

Cheers

-- Ian


On Sun, 12 Sept 2021 at 04:02, Peat, Tom (Manufacturing, Clayton)
 wrote:

> Hello Petr,
>
> I would like to understand more completely your assertion in the last
> email regarding completeness: "I would not care about low data
> completeness in case when PAIREF shows improvement of your model."
> In the papers you gave links to, the data completeness was always 90+%
> even in the outer shells. In cases where this is not true, I'm not clear
> why completeness would not be important? The ultimate thought experiment,
> or extreme case, where one has very few reflections in the resolution
> limit, just getting a 'better model' doesn't show me that the structure is
> now 1.3 A (or whatever limit one wants to set). Models with no data are
> perfect, in the physical sense of not having clashes, Ramachandran
> outliers, etc.
> As an example, I am aware of a deposition in the PDB where the outer
> resolution shell was approximately 2% complete and I don't believe that the
> structure is really at the resolution stated as the features 'seen' in
> terms of electron density don't really measure up to what I would expect
> and the electron density looks a lot more like about 0.5A lower resolution,
> where the completeness is a bit better than 50%.
> So my 'bias' is that completeness of the data is still an important
> feature that needs to be taken into account when forming the basis of
> 'resolution limit', but I'm absolutely willing to be shown that my bias is
> incorrect.
>
> Best regards, tom
>
> Tom Peat, PhD
> Proteins Group
> Biomedical Program, CSIRO
> 343 Royal Parade
> Parkville, VIC, 3052
> +613 9662 7304
> +614 57 539 419
> tom.p...@csiro.au
>
> --
> *From:* CCP4 bulletin board  on behalf of Petr
> Kolenko 
> *Sent:* Sunday, September 12, 2021 5:43 AM
> *To:* CCP4BB@JISCMAIL.AC.UK 
> *Subject:* Re: [ccp4bb] criteria to set resolution limit
>
> Dear Farhan,
> Your dataset does not seem to be that critically anisotropic to me. But of
> course, try the STARANISO server and make your own decision.
> To me, the dataset seems to be collected with a suboptimal data strategy.
> Although I do not know your setup, I would make the crystal-to-detector
> distance shorter next time. Or mayb