Re: [ccp4bb] criteria to set resolution limit
Thanks to Petr and Ian for their thoughtful replies. One worry I have is that as a community we continue to debate what is the 'proper' or 'correct' way to measure resolution which I think is quite confusing to those that are early in their crystallographic (or general structural biology) careers. It is the scientific method to argue (sometimes ad nauseum) about what constitutes the best data, the best methods, etc. so this isn't really a surprise to those that have been around a while. But it might be nice to have a set of criteria which most people in the field agree to and then these are updated on a regular basis as we move forward as a field. Not that we will ever get 100% agreement to anything, as that is just unrealistic (and there are many posts to this BB already to show that). What I was thinking was a set of standards that are reasonable and that when broken (not all at once, but one standard at a time), one needs to explain why instead of just hoping that a reviewer (or other scientist looking at the data) just misses it. From various posts, it seems that people generally agree that CC1/2 is a good criteria, that Rpim and Rfree are pretty good criteria, that I/sigI is reasonable at some level and that completeness and multiplicity (or redundancy) are important as well. These are not all independent (Rpim clearly depends on the multiplicity/redundancy, etc) but having some kind of standard set of numbers to judge one's own data by as a first pass might be helpful (and I believe the original question was basically, what do a I report as the resolution?) Just to throw some numbers out as an example: CC1/2 at 0.3 (or 30% depending on your reporting style), I/sigI at 1.0, completeness at 75% in the last resolution bin and multiplicity/redundancy at least 3.0 throughout (and in the last shell). Nothing magical in these numbers, but if you feel that your data are really good but the completeness isn't there, you just explain why, or something to that effect. I believe this is one of the reasons we always have a table 1 in our publications and that there is no one number that really gives us that sense of assurance that the model and data are good (or in my own experience, good enough). I guess what I am trying to 'solve' is the issue I come across regularly in reviewing papers: the authors are very interested in the biology of their system and spend a lot of time explaining what the system is, why it is important, etc (all great stuff) and then fill in the table with a set of numbers that makes me then wonder why they believe their own models? Often very low completeness, low redundancy/ multiplicity, CC1/2 which varies from 0.99+ to almost zero, all in order to make the reported resolution sound good (and crazy numbers of decimal places- reporting a resolution of 1.39623 AA with 15% completeness could more realistically be reported at 1.40 AA or 1.50 AA with 50% completeness and I don't think the actual interpretation/ electron density would change significantly). If it was then stated explicitly in the manuscript, for example, that paired refinement was done or that difference maps were calculated (or FEM or Polder or ?) at various resolutions which then showed the area of interest more clearly, the readers and reviewers might be more assured that the authors weren't just reporting a semi-random number as 'the resolution'. Numbers in the table that are clearly (?) a bit relaxed, if actually explained in the paper, would then make more sense. We as a community have gone somewhat this direction with the validation criteria given for deposited structures, which is a start, but it hasn't really tackled the thorny question of 'what is my resolution?' As Ian mentioned, some programs and some criteria depend on relatively high completeness in the data in the way they are calculated (CC1/2 is perfect when all data are set to zero). If a program 'fills in' data that are missing, then that one will also be subject to issues when the data are very incomplete. One can always call on people to 'get better data' and of course it would always be fantastic if each data set was complete, had high CC1/2 and multiplicity/ redundancy, but then this isn't very realistic either. Thanks again for the considered replies to the previous post, and if this sounds like a rant, it probably is. cheers, tom Tom Peat, PhD Proteins Group Biomedical Program, CSIRO 343 Royal Parade Parkville, VIC, 3052 +613 9662 7304 +614 57 539 419 tom.p...@csiro.au From: Petr Kolenko Sent: Sunday, September 12, 2021 10:07 PM To: Peat, Tom (Manufacturing, Clayton) ; CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] criteria to set resolution limit Dear Tom, You are absolutely right with your points. But let me explain a bit more my opinion. And be aware that it is my opinion! Not necessarily the truth. There might be another opinion in the community. In pa
Re: [ccp4bb] criteria to set resolution limit
Dear Farhan, did you possibly move the detector too far from the crystal, and the high resolution spots landed on the detector corners? This would explain the good I/sigma at low completeness. In that case, there is no reason to discard the data. You detector was simply not large enough to capture the rest. Best regards, Tim On Sat, 11 Sep 2021 21:25:23 +0530 Syed Farhan Ali wrote: > Dear All, > > I have query regarding one of my dataset. I am running aimless by > keeping highest resolution 1.62 A and getting I/SigI = 2 but data > completeness is around 22 in outermost shell. And if I am increasing > the resolution cutoff up to 1.8 A then I/SigI is 6.2 and completeness > is 82.4. I have attached the screenshot of the result. > What should be the criteria to set the resolution limit? Should I > stick to I/SigI or I have to consider about the completeness of > data. And if completeness is also a guiding factor than how much > minimum completeness I can keep in the higher resolution shell. > > > > > > Regards, > Farhan > > > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > > This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a > mailing list hosted by www.jiscmail.ac.uk, terms & conditions are > available at https://www.jiscmail.ac.uk/policyandsecurity/ -- -- Tim Gruene Head of the Centre for X-ray Structure Analysis Faculty of Chemistry University of Vienna Phone: +43-1-4277-70202 GPG Key ID = A46BEE1A To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/ pgpMVhKk3LGEm.pgp Description: OpenPGP digital signature
[ccp4bb] 10th International Conference of the Hellenic Crystallographic Association, Athens, Oct 15th-17th
Dear colleagues, With this message I would like to announce the *10th International Conference of the Hellenic Crystallographic Association* (HeCrA) will take place, in an in-person format, in *October 15-17, 2021* at the Conference Center of the National Centre for Scientific Research "Demokritos" in Agia Paraskevi,* Athens, Greece* (https://sites.google.com/view/hecra2020/home). Call for abstracts has been opened with a *deadline of 26 Sep 2021* ( https://sites.google.com/view/hecra2020/home/call-for-abstracts). A limited number of IUCr bursaries for travel, accommodation and subsistence expenses, will be granted to eligible young students who will travel to Athens from abroad or from other Greek cities ( https://sites.google.com/view/hecra2020/home/bursaries). On behalf of the organizing committee, Best regards, Petros Giastas To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
Re: [ccp4bb] criteria to set resolution limit
Dear Tom, You are absolutely right with your points. But let me explain a bit more my opinion. And be aware that it is my opinion! Not necessarily the truth. There might be another opinion in the community. In paired refinement, you always have the reference data. In case of significantly decreasing completeness, you can always select your starting resolution that is complete enough (e.g. more than 90% ?). And this is your reference data. As an increase in resolution improves your model (drop in R-values, mainly R-free), you always compare your models using the reference data. Should we use as many observables as possible? I would do so. Even if the completeness was very low. Another thing is the statement that your data is processed up to 1.1 AA when the completeness is as low as 2%. Of course. But, this is why we have more cells in the so-called "Table 1". When judging the structure, one should go carefully through the whole table. And maybe, more resolution shells should be reported in extreme cases. There is a possibility to do so during the structure deposition. Here, I agree with you. Low data completeness is usually a big problem. Both random and systematic. My personal experience is that it causes severe instability in structure refinement. But this is frequently projected to the R-values. And as any instability appears, paired refinement does not suggest using the higher resolution. As long as it follows the right trends, you should be fine. Thanks for pointing out the high data completeness in our paper. We should run more analyses to get ready for such comments. ;-) Best regards, Petr From: Peat, Tom (Manufacturing, Clayton) Sent: Sunday, September 12, 2021 5:02:04 AM To: CCP4BB@JISCMAIL.AC.UK; Petr Kolenko Subject: Re: [ccp4bb] criteria to set resolution limit Hello Petr, I would like to understand more completely your assertion in the last email regarding completeness: "I would not care about low data completeness in case when PAIREF shows improvement of your model." In the papers you gave links to, the data completeness was always 90+% even in the outer shells. In cases where this is not true, I'm not clear why completeness would not be important? The ultimate thought experiment, or extreme case, where one has very few reflections in the resolution limit, just getting a 'better model' doesn't show me that the structure is now 1.3 A (or whatever limit one wants to set). Models with no data are perfect, in the physical sense of not having clashes, Ramachandran outliers, etc. As an example, I am aware of a deposition in the PDB where the outer resolution shell was approximately 2% complete and I don't believe that the structure is really at the resolution stated as the features 'seen' in terms of electron density don't really measure up to what I would expect and the electron density looks a lot more like about 0.5A lower resolution, where the completeness is a bit better than 50%. So my 'bias' is that completeness of the data is still an important feature that needs to be taken into account when forming the basis of 'resolution limit', but I'm absolutely willing to be shown that my bias is incorrect. Best regards, tom Tom Peat, PhD Proteins Group Biomedical Program, CSIRO 343 Royal Parade Parkville, VIC, 3052 +613 9662 7304 +614 57 539 419 tom.p...@csiro.au From: CCP4 bulletin board on behalf of Petr Kolenko Sent: Sunday, September 12, 2021 5:43 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] criteria to set resolution limit Dear Farhan, Your dataset does not seem to be that critically anisotropic to me. But of course, try the STARANISO server and make your own decision. To me, the dataset seems to be collected with a suboptimal data strategy. Although I do not know your setup, I would make the crystal-to-detector distance shorter next time. Or maybe rotate a bit more with the crystal? I do not know the details. And now, to the point of the resolution. The optimal approach is to try paired refinement, or even better - paired refinement with the complete cross-validation protocol. This can be done using program PAIREF that is easy to be installed to your CCP4 installation by the following commands: ccp4-python -m ensurepip --user ccp4-python -m pip install pairef --no-deps --upgrade --user The easiest way to use PAIREF is via GUI. Use the following command: ccp4-python -m pairef --gui To know more about the program and about the protocol, please read further. The original work: https://journals.iucr.org/m/issues/2020/04/00/mf5044/index.html Upgrade for PHENIX users: https://scripts.iucr.org/cgi-bin/paper?S2053230X21006129 We organized a webinar about the PAIREF about a half year ago. We even made a video from that. The video covers a short introduction to paired refinement, installation of PAIREF, and running a test case. The link for the webinar is here: htt
Re: [ccp4bb] criteria to set resolution limit
Tom, the way I have always dealt with this (and the way it is currently handled in Staraniso) is to simply count unmeasured intensities as zero in the averaging of I/sigma(I). This is the same as taking the mean I/sigma(I) for all reflections in a bin = bin completeness x mean I/sigma(I) for measured reflections in the bin, so bins with low completeness are more likely to be cut. This has the clear advantage that you don't need to decide on separate arbitrary criteria for completeness and measured mean I/sigma(I); you just need to decide one (still arbitrary) criterion for the overall mean I/sigma(I). For example, if the bin completeness were only 2%, the mean I/sigma(I) for the measured reflections in that bin would have to be > 50 times the threshold (e.g. > 50x1.5) in order not to cut at that bin, which is extremely unlikely. This makes sense statistically because whatever the true value of I, because it is obviously unknown for an unmeasured reflection, sigma(I) has to be very large and therefore I/sigma(I) for an unmeasured reflection will be much smaller than that of its measured neighbours, and the I/sigma(I) value won't make a contribution to the mean that is greatly different from zero anyway. I suppose one could have a more sophisticated treatment where I/sigma(I) is estimated from the Wilson prior; however for that one needs to know the absolute scale and anisotropy and those can only be determined _after_ the cut-off has been performed. So one would need a bootstrap process, which would greatly increase the complexity (and failure modes !) of the algorithm. It's not clear to me that the difference in the results would make the effort worthwhile. Unfortunately one can't pull the same trick when using CC_1/2 as the cut-off criterion because a zero intensity has perfect correlation with another zero intensity ! This would cause CC_1/2 to increase at low completeness (remember one is correlating deviations from the mean intensity for the bin, so zeros will have large deviations from the mean and make a big contribution to the CC). This is definitely not what one wants ! For this reason and others, for example the standard significance test for the correlation coefficient assumes homoscedastic (i.e. uniform variance) normally distributed data, but intensity data from area detectors has a Wilson distribution and is always strongly heteroscedastic, unless you somehow contrive to collect the data so that all the sigmas are equal, as I recall was possible with 4-circle diffractometers equipped with a single proportional counter as we had in the 60s-80s. Also CC_1/2 is known to be biased by significant anisotropy, so all-in-all I prefer to use the mean I/sigma(I) criterion. Cheers -- Ian On Sun, 12 Sept 2021 at 04:02, Peat, Tom (Manufacturing, Clayton) wrote: > Hello Petr, > > I would like to understand more completely your assertion in the last > email regarding completeness: "I would not care about low data > completeness in case when PAIREF shows improvement of your model." > In the papers you gave links to, the data completeness was always 90+% > even in the outer shells. In cases where this is not true, I'm not clear > why completeness would not be important? The ultimate thought experiment, > or extreme case, where one has very few reflections in the resolution > limit, just getting a 'better model' doesn't show me that the structure is > now 1.3 A (or whatever limit one wants to set). Models with no data are > perfect, in the physical sense of not having clashes, Ramachandran > outliers, etc. > As an example, I am aware of a deposition in the PDB where the outer > resolution shell was approximately 2% complete and I don't believe that the > structure is really at the resolution stated as the features 'seen' in > terms of electron density don't really measure up to what I would expect > and the electron density looks a lot more like about 0.5A lower resolution, > where the completeness is a bit better than 50%. > So my 'bias' is that completeness of the data is still an important > feature that needs to be taken into account when forming the basis of > 'resolution limit', but I'm absolutely willing to be shown that my bias is > incorrect. > > Best regards, tom > > Tom Peat, PhD > Proteins Group > Biomedical Program, CSIRO > 343 Royal Parade > Parkville, VIC, 3052 > +613 9662 7304 > +614 57 539 419 > tom.p...@csiro.au > > -- > *From:* CCP4 bulletin board on behalf of Petr > Kolenko > *Sent:* Sunday, September 12, 2021 5:43 AM > *To:* CCP4BB@JISCMAIL.AC.UK > *Subject:* Re: [ccp4bb] criteria to set resolution limit > > Dear Farhan, > Your dataset does not seem to be that critically anisotropic to me. But of > course, try the STARANISO server and make your own decision. > To me, the dataset seems to be collected with a suboptimal data strategy. > Although I do not know your setup, I would make the crystal-to-detector > distance shorter next time. Or mayb