Re: [ccp4bb] should the final model be refined against full datset

2011-10-18 Thread Ed Pozharski
Selecting a test set that minimizes Rfree is so wrong on so many levels. Unless, of course, the only thing I know about Rfree is that it is the magic number that I need to make small by all means necessary. By using a simple genetic algorithm, I managed to get Rfree for a well-refined model as

Re: [ccp4bb] should the final model be refined against full datset

2011-10-17 Thread Tim Gruene
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear Nicholas, for a data set with 5132 unique reflections you should flag 10.5% for Rfree, otherwise you could as well drop Rfree completely and use the whole data set for refinement. At least this is how I understand Axel Brunger's article about

Re: [ccp4bb] should the final model be refined against full datset

2011-10-17 Thread John R Helliwell
Dear Gerard,Tom and Bernhard, Thankyou for highlighting the IUCr Diffraction Data Deposition Working Group and Forum. Dear Colleagues, I am travelling at present and apologise for not replying sooner to the CCP4bb, and also am with intermittent email access until later this week when I 'return

Re: [ccp4bb] should the final model be refined against full datset

2011-10-17 Thread Thomas C. Terwilliger
I think that we are using the test set for many things: 1. Determining and communicating to others whether our overall procedure is overfitting the data. 2. Identifying the optimal overall procedure in cases where very different options are being considered (e.g., should I use TLS). 3.

Re: [ccp4bb] should the final model be refined against full datset

2011-10-17 Thread Pavel Afonine
Yes, Rsleep seems to be just the right thing to use for this: Separating model optimization and model validation in statistical cross-validation as applied to crystallography G. J. Kleywegt Acta Cryst. (2007). D63, 939-940 Practically, it would mean that we split 10% of test reflections into 5%

Re: [ccp4bb] should the final model be refined against full datset

2011-10-16 Thread Ed Pozharski
On Sat, 2011-10-15 at 11:48 +0300, Nicholas M Glykos wrote: For structures with a small number of reflections, the statistical noise in the 5% sets can be very significant indeed. We have seen differences between Rfree values obtained from different sets reaching up to 4%. This

Re: [ccp4bb] should the final model be refined against full datset

2011-10-16 Thread Pavel Afonine
Hi, On Sun, Oct 16, 2011 at 7:48 PM, Ed Pozharski epozh...@umaryland.eduwrote: On Sat, 2011-10-15 at 11:48 +0300, Nicholas M Glykos wrote: For structures with a small number of reflections, the statistical noise in the 5% sets can be very significant indeed. We have seen

Re: [ccp4bb] should the final model be refined against full datset

2011-10-15 Thread Nicholas M Glykos
Dear Ethan, List, Surely someone must have done this! But I can't recall ever reading an analysis of such a refinement protocol. Does anyone know of relevant reports in the literature? Total statistical cross validation is indeed what we should be doing, but for large structures the

Re: [ccp4bb] should the final model be refined against full datset

2011-10-15 Thread Anastassis Perrakis
For structures with a small number of reflections, the statistical noise in the 5% sets can be very significant indeed. We have seen differences between Rfree values obtained from different sets reaching up to 4%. This is very intriguing indeed! Is there something specific in these

Re: [ccp4bb] should the final model be refined against full datset

2011-10-15 Thread Nicholas M Glykos
For structures with a small number of reflections, the statistical noise in the 5% sets can be very significant indeed. We have seen differences between Rfree values obtained from different sets reaching up to 4%. This is very intriguing indeed! Is there something specific in these

[ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Ed Pozharski
This is a follow up (or a digression) to James comparing test set to missing reflections. I also heard this issue mentioned before but was always too lazy to actually pursue it. So. The role of the test set is to prevent overfitting. Let's say I have the final model and I monitored the Rfree

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Nat Echols
On Fri, Oct 14, 2011 at 12:52 PM, Ed Pozharski epozh...@umaryland.eduwrote: The second question is practical. Let's say I want to deposit the results of the refinement against the full dataset as my final model. Should I not report the Rfree and instead insert a remark explaining the

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Robbie Joosten
Hi Ed, This is a follow up (or a digression) to James comparing test set to missing reflections. I also heard this issue mentioned before but was always too lazy to actually pursue it. So. The role of the test set is to prevent overfitting. Let's say I have the final model and I

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Quyen Hoang
Sorry, I don't quite understand your reasoning for how the structure is rendered useless if one refined it with all data. Would your argument also apply to all the structures that were refined before R-free existed? Quyen You should enter the statistics for the model and data that you

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Craig A. Bingman
Recent experience indicates that the PDB is checking these statistics very closely for new depositions. The checks made by the PDB are intended to prevent accidents and oversights made by honest people from creeping into the database. Getting away with something seems to imply some intention

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Jan Dohnalek
Regarding refinement against all reflections: the main goal of our work is to provide the best possible representation of the experimental data in the form of the structure model. Once the structure building and refinement process is finished keeping the Rfree set separate does not make sense any

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Nat Echols
On Fri, Oct 14, 2011 at 1:20 PM, Quyen Hoang qqho...@gmail.com wrote: Sorry, I don't quite understand your reasoning for how the structure is rendered useless if one refined it with all data. Useless was too strong a word (it's Friday, sorry). I guess simulated annealing can address the

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Quyen Hoang
I still don't understand how a structure model refined with all data would negatively affect the determination and/or refinement of an isomorphous structure using a different data set (even without doing SA first). Quyen On Oct 14, 2011, at 4:35 PM, Nat Echols wrote: On Fri, Oct 14, 2011

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Phil Jeffrey
Let's say you have two isomorphous crystals of two different protein-ligand complexes. Same protein different ligand, same xtal form. Conventionally you'd keep the same free set reflections (hkl values) between the two datasets to reduce biasing. However if the first model had been refined

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Felix Frolow
Recently we (I mean WE - community) frequently refine structures around 1 Angstrom resolution. This is not what for the Rfree was invented. It was invented to go away with 3.0-2.8 Angstrom data in times when people did not possess facilities good enough to look on the electron density maps….

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Ed Pozharski
On Fri, 2011-10-14 at 13:07 -0700, Nat Echols wrote: You should enter the statistics for the model and data that you actually deposit, not statistics for some other model that you might have had at one point but which the PDB will never see. If you read my post carefully, you'll see that I

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Craig A. Bingman
We have obligations that extend beyond simply presenting a best model. In an ideal world, the PDB would accept two coordinate sets and two sets of statistics, one for the last step where the cross-validation set was valid, and a final model refined against all the data. Until there is a

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Quyen Hoang
Thanks for the clear explanation. I understood that. But I was trying to understand how this would negatively affects the initial model to render it useless or less useful. In the scenario that you presented, I would expect a better result (better model) if the initial model was refined with

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Ethan Merritt
On Friday, October 14, 2011 02:45:08 pm Ed Pozharski wrote: On Fri, 2011-10-14 at 13:07 -0700, Nat Echols wrote: The benefit of including those extra 5% of data is always minimal And so is probably the benefit of excluding when all the steps that require cross-validation have already

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Phil Evans
I just tried refining a finished structure turning off the FreeR set, in Refmac, and I have to say I can barely see any difference between the two sets of coordinates. From this n=1 trial, I can't see that it improves the model significantly, nor that it ruins the model irretrievably for

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Thomas C. Terwilliger
For those who have strong opinions on what data should be deposited... The IUCR is just starting a serious discussion of this subject. Two committees, the Data Deposition Working Group, led by John Helliwell, and the Commission on Biological Macromolecules (chaired by Xiao-Dong Su) are working on

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Gerard Bricogne
Dear Tom, I am not sure that I feel happy with your invitation that views on such crucial matters as these deposition issues be communicated to you off-list. It would seem much healthier if these views were aired out within the BB. Again!, some will say ... but the difference is that there

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread D Bonsor
I may be missing something or someone could point out that I am wrong and why as I am curious, but with a highly redundant dataset the difference between refining the final model against the full dataset would be small based upon the random selection of reflections for Rfree?

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Thomas C. Terwilliger
Dear Gerard, I'm very happy for the discussion to be on the CCP4 list (or on the IUCR forums, or both). I was only trying to not create too much traffic. All the best, Tom T Dear Tom, I am not sure that I feel happy with your invitation that views on such crucial matters as these

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Edward A. Berry
Now it would be interesting to refine this structure to convergence, with the original free set. If I understood correctly Ian Tickle has done essentially this, and the Free R returns essentially to its original value: the minimum arrived at is independent of starting point, perhaps within

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread James Stroud
Each R-free flag corresponds a particular HKL index. Redundancy refers to the number of times a reflection corresponding to a given HKL index is observed. The final structure factor of a given HKL can be thought of as an average of these redundant observations. Related to your question,

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Ed Pozharski
On Fri, 2011-10-14 at 23:41 +0100, Phil Evans wrote: I just tried refining a finished structure turning off the FreeR set, in Refmac, and I have to say I can barely see any difference between the two sets of coordinates. The amplitude of the shift, I presume, depends on the resolution and data

Re: [ccp4bb] should the final model be refined against full datset

2011-10-14 Thread Pavel Afonine
Hi, yes, shifts depend on resolution indeed. See pages 75-77 here: http://www.phenix-online.org/presentations/latest/pavel_refinement_general.pdf Pavel On Fri, Oct 14, 2011 at 7:34 PM, Ed Pozharski epozh...@umaryland.eduwrote: On Fri, 2011-10-14 at 23:41 +0100, Phil Evans wrote: I just