Re: [ccp4bb] AW: [ccp4bb] over-fitting? over-refinement?

2020-10-20 Thread Diana Tomchick
I too have seen some horrendous low-resolution models, with correspondingly bad 
validation statistics. A little time spent cleaning up the outliers (geometric 
and others) rarely results in large reductions in R(free) for these types of 
datasets & models, but ultimately we as a community need to emphasize that the 
R(free) is not the be all and end all as a quality metric.

Diana

**
Diana R. Tomchick
Professor
Departments of Biophysics and Biochemistry
UT Southwestern Medical Center
5323 Harry Hines Blvd.
Rm. ND10.214A
Dallas, TX 75390-8816
diana.tomch...@utsouthwestern.edu
(214) 645-6383 (phone)
(214) 645-6353 (fax)

From: CCP4 bulletin board  on behalf of Tristan Croll 

Sent: Tuesday, October 20, 2020 7:11 AM
To: CCP4BB@JISCMAIL.AC.UK 
Subject: Re: [ccp4bb] AW: [ccp4bb] over-fitting? over-refinement?


EXTERNAL MAIL

I'd like to append a very important caveat to this discussion: most of the talk 
on Rfree as protection against overfitting is perfectly correct, if your 
dataset is high enough resolution. Remember that Rfree only provides protection 
against one form of overfitting: that is, fitting of atoms into random noise. 
What it doesn't protect well against is fitting the wrong atoms into real 
density. Remember, all your x-ray data ultimately says is "there are electrons 
here" - your R-factors don't care where those electrons come from, as long as 
they're present in about the right numbers (with some fudge-room for B-factors 
and occupancies). If you browse through the back catalogue of >3A models, 
you'll find some with horrendous geometry statistics but remarkably good 
R-factors (both work and free) - ultimately, I think, because the model atoms 
have been "overstuffed" into density that is real according to both the working 
and free data. In quite a few such cases I find that even after extensive 
reworking I'm unable to beat the original R-free, despite every other metric 
improving markedly.

Best regards,

Tristan

From: CCP4 bulletin board  on behalf of Barone, Matthias 

Sent: 20 October 2020 12:59
To: CCP4BB@JISCMAIL.AC.UK 
Subject: Re: [ccp4bb] AW: [ccp4bb] over-fitting? over-refinement?


Eleanor rises a very important practical point here..."sidechains at the 
solvent interface have multiple conformations, and that as a result the water 
networks should also have partial occupancies". I was fighting with such a 
model for half a year and also tested XSHEL (there was a thread in here for 
that..). Coupling partial occupancies of sidchains with waters and other 
sidchains is a horrendously time-consuming task...and in the end, as Eleanor 
said, "correcting these details does not change the Rfactors at all". You just 
get fed up with that puzzle and stop right there.

best, matthias


Dr. Matthias Barone

AG Kuehne, Rational Drug Design

Leibniz-Forschungsinstitut für Molekulare Pharmakologie (FMP)
Robert-Rössle-Strasse 10
13125 Berlin

Germany
Phone: +49 (0)30 94793-284


From: CCP4 bulletin board  on behalf of Eleanor Dodson 
<176a9d5ebad7-dmarc-requ...@jiscmail.ac.uk>
Sent: Tuesday, October 20, 2020 12:40:19 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] AW: [ccp4bb] over-fitting? over-refinement?

It is always hard to know when to stop tweaking a model.. We know from high 
resolution studies that many sidechains at the solvent interface have multiple 
conformations, and that as a result the water networks should also have partial 
occupancies. But usually correcting these details does not change the Rfactors 
at all - nor contribute much to the biological relevance of your structure!
So often the point to stop is when you get fed up, Phil Evans said years ago - 
I spend 95% of my time on 5% of the structure, most of which is unimportant..
In practice I let the difference maps decide when to stop - 10 Sigma peak - 
think why - lots of 5 Sigma positive and negative ones not so important
Eleanor

On Tue, 20 Oct 2020 at 11:27, Schreuder, Herman /DE 
mailto:herman.schreu...@sanofi.com>> wrote:

A practice that was very popular before the Rfree came around was to fit a 
water molecule in every noise peak. One would get spectacular low Rfactors this 
way, but I cannot imagine that anyone would believe that this would be fitting 
and not over-fitting.



Best,

Herman



Von: CCP4 bulletin board mailto:CCP4BB@JISCMAIL.AC.UK>> 
Im Auftrag von Sam Tang
Gesendet: Dienstag, 20. Oktober 2020 05:27
An: CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>
Betreff: [ccp4bb] over-fitting? over-refinement?



Hi, the question may be a bit weird, but how do you define 'over-fitting' in 
the context of structure refinement? From users' perspective the practical 
aspect is to 'fit' the model into the density. So there comes this questio

Re: [ccp4bb] AW: [ccp4bb] over-fitting? over-refinement?

2020-10-20 Thread Tristan Croll
I'd like to append a very important caveat to this discussion: most of the talk 
on Rfree as protection against overfitting is perfectly correct, if your 
dataset is high enough resolution. Remember that Rfree only provides protection 
against one form of overfitting: that is, fitting of atoms into random noise. 
What it doesn't protect well against is fitting the wrong atoms into real 
density. Remember, all your x-ray data ultimately says is "there are electrons 
here" - your R-factors don't care where those electrons come from, as long as 
they're present in about the right numbers (with some fudge-room for B-factors 
and occupancies). If you browse through the back catalogue of >3A models, 
you'll find some with horrendous geometry statistics but remarkably good 
R-factors (both work and free) - ultimately, I think, because the model atoms 
have been "overstuffed" into density that is real according to both the working 
and free data. In quite a few such cases I find that even after extensive 
reworking I'm unable to beat the original R-free, despite every other metric 
improving markedly.

Best regards,

Tristan

From: CCP4 bulletin board  on behalf of Barone, Matthias 

Sent: 20 October 2020 12:59
To: CCP4BB@JISCMAIL.AC.UK 
Subject: Re: [ccp4bb] AW: [ccp4bb] over-fitting? over-refinement?


Eleanor rises a very important practical point here..."sidechains at the 
solvent interface have multiple conformations, and that as a result the water 
networks should also have partial occupancies". I was fighting with such a 
model for half a year and also tested XSHEL (there was a thread in here for 
that..). Coupling partial occupancies of sidchains with waters and other 
sidchains is a horrendously time-consuming task...and in the end, as Eleanor 
said, "correcting these details does not change the Rfactors at all". You just 
get fed up with that puzzle and stop right there.

best, matthias


Dr. Matthias Barone

AG Kuehne, Rational Drug Design

Leibniz-Forschungsinstitut für Molekulare Pharmakologie (FMP)
Robert-Rössle-Strasse 10
13125 Berlin

Germany
Phone: +49 (0)30 94793-284


From: CCP4 bulletin board  on behalf of Eleanor Dodson 
<176a9d5ebad7-dmarc-requ...@jiscmail.ac.uk>
Sent: Tuesday, October 20, 2020 12:40:19 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] AW: [ccp4bb] over-fitting? over-refinement?

It is always hard to know when to stop tweaking a model.. We know from high 
resolution studies that many sidechains at the solvent interface have multiple 
conformations, and that as a result the water networks should also have partial 
occupancies. But usually correcting these details does not change the Rfactors 
at all - nor contribute much to the biological relevance of your structure!
So often the point to stop is when you get fed up, Phil Evans said years ago - 
I spend 95% of my time on 5% of the structure, most of which is unimportant..
In practice I let the difference maps decide when to stop - 10 Sigma peak - 
think why - lots of 5 Sigma positive and negative ones not so important
Eleanor

On Tue, 20 Oct 2020 at 11:27, Schreuder, Herman /DE 
mailto:herman.schreu...@sanofi.com>> wrote:

A practice that was very popular before the Rfree came around was to fit a 
water molecule in every noise peak. One would get spectacular low Rfactors this 
way, but I cannot imagine that anyone would believe that this would be fitting 
and not over-fitting.



Best,

Herman



Von: CCP4 bulletin board mailto:CCP4BB@JISCMAIL.AC.UK>> 
Im Auftrag von Sam Tang
Gesendet: Dienstag, 20. Oktober 2020 05:27
An: CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>
Betreff: [ccp4bb] over-fitting? over-refinement?



Hi, the question may be a bit weird, but how do you define 'over-fitting' in 
the context of structure refinement? From users' perspective the practical 
aspect is to 'fit' the model into the density. So there comes this question 
from our juniors: fit is fit, how is a model over-fit?



BRS



Sam





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1=04%7C01%7CHerman.Schreuder%40SANOFI.COM%7Cfca18f01417745b3655008d874a81d74%7Caca3c8d6aa714e1aa10e03572fc58c0b%7C0%7C0%7C637387612965782189%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=s1FpY1ufI7De3N6J2%2FivUy4zehp%2BcGl1gGjHeNrzUeA%3D=0>



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1



To unsubscribe from the CCP4BB list, click the follo

Re: [ccp4bb] AW: [ccp4bb] over-fitting? over-refinement?

2020-10-20 Thread Barone, Matthias
Eleanor rises a very important practical point here..."sidechains at the 
solvent interface have multiple conformations, and that as a result the water 
networks should also have partial occupancies". I was fighting with such a 
model for half a year and also tested XSHEL (there was a thread in here for 
that..). Coupling partial occupancies of sidchains with waters and other 
sidchains is a horrendously time-consuming task...and in the end, as Eleanor 
said, "correcting these details does not change the Rfactors at all". You just 
get fed up with that puzzle and stop right there.

best, matthias


Dr. Matthias Barone

AG Kuehne, Rational Drug Design

Leibniz-Forschungsinstitut für Molekulare Pharmakologie (FMP)
Robert-Rössle-Strasse 10
13125 Berlin

Germany
Phone: +49 (0)30 94793-284


From: CCP4 bulletin board  on behalf of Eleanor Dodson 
<176a9d5ebad7-dmarc-requ...@jiscmail.ac.uk>
Sent: Tuesday, October 20, 2020 12:40:19 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] AW: [ccp4bb] over-fitting? over-refinement?

It is always hard to know when to stop tweaking a model.. We know from high 
resolution studies that many sidechains at the solvent interface have multiple 
conformations, and that as a result the water networks should also have partial 
occupancies. But usually correcting these details does not change the Rfactors 
at all - nor contribute much to the biological relevance of your structure!
So often the point to stop is when you get fed up, Phil Evans said years ago - 
I spend 95% of my time on 5% of the structure, most of which is unimportant..
In practice I let the difference maps decide when to stop - 10 Sigma peak - 
think why - lots of 5 Sigma positive and negative ones not so important
Eleanor

On Tue, 20 Oct 2020 at 11:27, Schreuder, Herman /DE 
mailto:herman.schreu...@sanofi.com>> wrote:
A practice that was very popular before the Rfree came around was to fit a 
water molecule in every noise peak. One would get spectacular low Rfactors this 
way, but I cannot imagine that anyone would believe that this would be fitting 
and not over-fitting.

Best,
Herman

Von: CCP4 bulletin board mailto:CCP4BB@JISCMAIL.AC.UK>> 
Im Auftrag von Sam Tang
Gesendet: Dienstag, 20. Oktober 2020 05:27
An: CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>
Betreff: [ccp4bb] over-fitting? over-refinement?

Hi, the question may be a bit weird, but how do you define 'over-fitting' in 
the context of structure refinement? From users' perspective the practical 
aspect is to 'fit' the model into the density. So there comes this question 
from our juniors: fit is fit, how is a model over-fit?

BRS

Sam



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1=04%7C01%7CHerman.Schreuder%40SANOFI.COM%7Cfca18f01417745b3655008d874a81d74%7Caca3c8d6aa714e1aa10e03572fc58c0b%7C0%7C0%7C637387612965782189%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=s1FpY1ufI7De3N6J2%2FivUy4zehp%2BcGl1gGjHeNrzUeA%3D=0>



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] AW: [ccp4bb] over-fitting? over-refinement?

2020-10-20 Thread Eleanor Dodson
It is always hard to know when to stop tweaking a model.. We know from high
resolution studies that many sidechains at the solvent interface have
multiple conformations, and that as a result the water networks should also
have partial occupancies. But usually correcting these details does not
change the Rfactors at all - nor contribute much to the biological
relevance of your structure!
So often the point to stop is when you get fed up, Phil Evans said years
ago - I spend 95% of my time on 5% of the structure, most of which is
unimportant..
In practice I let the difference maps decide when to stop - 10 Sigma peak -
think why - lots of 5 Sigma positive and negative ones not so important
Eleanor

On Tue, 20 Oct 2020 at 11:27, Schreuder, Herman /DE <
herman.schreu...@sanofi.com> wrote:

> A practice that was very popular before the Rfree came around was to fit a
> water molecule in every noise peak. One would get spectacular low Rfactors
> this way, but I cannot imagine that anyone would believe that this would be
> fitting and not over-fitting.
>
>
>
> Best,
>
> Herman
>
>
>
> *Von:* CCP4 bulletin board  *Im Auftrag von *Sam
> Tang
> *Gesendet:* Dienstag, 20. Oktober 2020 05:27
> *An:* CCP4BB@JISCMAIL.AC.UK
> *Betreff:* [ccp4bb] over-fitting? over-refinement?
>
>
>
> Hi, the question may be a bit weird, but how do you define 'over-fitting'
> in the context of structure refinement? From users' perspective the
> practical aspect is to 'fit' the model into the density. So there comes
> this question from our juniors: fit is fit, how is a model over-fit?
>
>
>
> BRS
>
>
>
> Sam
>
>
> --
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
> 
>
> --
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
>



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] AW: [ccp4bb] over-fitting? over-refinement?

2020-10-20 Thread Schreuder, Herman /DE
A practice that was very popular before the Rfree came around was to fit a 
water molecule in every noise peak. One would get spectacular low Rfactors this 
way, but I cannot imagine that anyone would believe that this would be fitting 
and not over-fitting.

Best,
Herman

Von: CCP4 bulletin board  Im Auftrag von Sam Tang
Gesendet: Dienstag, 20. Oktober 2020 05:27
An: CCP4BB@JISCMAIL.AC.UK
Betreff: [ccp4bb] over-fitting? over-refinement?

Hi, the question may be a bit weird, but how do you define 'over-fitting' in 
the context of structure refinement? From users' perspective the practical 
aspect is to 'fit' the model into the density. So there comes this question 
from our juniors: fit is fit, how is a model over-fit?

BRS

Sam



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/