Re: [ccp4bb] Question about R/Rfree value difference
Hi Tom, very nice tool! It would be good to get numerical values of the plotted distributions as well, like mean, median, standard deviation and so on. Best regards, Dirk. Am 08.07.10 15:20, schrieb Tom Oldfield: Sampath With regard to your question on what sort of statistics you should get within structure determination you might find this service at the PDBe useful : http://www.ebi.ac.uk/pdbe-as/pdbestatistics/PDBeStatistics.jsp You can view and manipulate distributions of R, Rfree and R-Rfree along within many other data distributions from Xray (also NMR/EM) during structure determination. There are also links (clicking the graph) that list all the depositions that have a particular value within the distribution. I agree with Pavel that from your quoted statistics that it would be un-wise to deposit the structure in the current state of refinement as there is clearly an issue. Regards Tom Oldfield Hi Sampath, this is how the distribution of Rwork, Rfree and Rfree-Rwork look like for 'all' PDB structures refined at around 2A resolution. The indicates where your structure stands with respect to this distribution. Histogram of Rwork for models in PDB at resolution 1.90-2.10 A: 0.093 - 0.118 : 2 0.118 - 0.143 : 35 0.143 - 0.168 : 390 0.168 - 0.193 : 1439 0.193 - 0.218 : 1802 your structure 0.218 - 0.242 : 785 0.242 - 0.267 : 159 0.267 - 0.292 : 14 0.292 - 0.317 : 1 0.317 - 0.342 : 1 Histogram of Rfree for models in PDB at resolution 1.90-2.10 A: 0.149 - 0.170 : 10 0.170 - 0.191 : 116 0.191 - 0.213 : 534 0.213 - 0.234 : 1166 0.234 - 0.255 : 1417 0.255 - 0.276 : 942 0.276 - 0.297 : 343 0.297 - 0.319 : 78 0.319 - 0.340 : 17 your structure 0.340 - 0.361 : 5 Histogram of Rfree-Rwork for all model in PDB at resolution 1.90-2.10 A: 0.001 - 0.011 : 41 0.011 - 0.021 : 230 0.021 - 0.031 : 724 0.031 - 0.041 : 1210 0.041 - 0.050 : 1206 0.050 - 0.060 : 654 0.060 - 0.070 : 318 0.070 - 0.080 : 160 0.080 - 0.090 : 56 0.090 - 0.100 : 29 So, it seems your case is the example of typical overfitting, which means the model parameterization or/and the refinement strategy is not good for your data and model. If you send me the data and model files then I will be able (hopefully) to suggest a better refinement strategy or explaine why it's not feasible with available tools. All files will be kept confidentially. The histograms above are obtained using this command from PHENIX family: phenix.r_factor_statistics 2.0 Good luck! Pavel. On 7/7/10 10:17 PM, Sampath Natarajan wrote: Dear all, I have a question about the R free value. I refined a structure with 2A resolution. After model building and restraint refinement using Refmac program, the average B factor was around 50 for all atoms. The R/Rfree were around 22/34. Then used the TLS refinement choosing entire molecule. Then R/Rfree reduced as 20/32. But the average B factor was reduced as 30. The R/Rfree difference is about 12% in final refinement. I feel it is significantly higher. Could any one suggest me to reduce the Rfree value more? or is it good to submit the data in the PDB database with this 12% difference? Thanks for the suggestions. Sincerely, Sampath N -- *** Dirk Kostrewa Gene Center Munich, A5.07 Department of Biochemistry Ludwig-Maximilians-Universität München Feodor-Lynen-Str. 25 D-81377 Munich Germany Phone: +49-89-2180-76845 Fax:+49-89-2180-76999 E-mail: kostr...@genzentrum.lmu.de WWW:www.genzentrum.lmu.de ***
Re: [ccp4bb] Question about R/Rfree value difference
I do agree with Tim's reasoning in general, but as Pavel also implied by offering the statistics, I would not be worried about the difference, but by the unreasonably high absolute value of Free R for 2.0 A resolution. I do not think that its simple 'over-fitting' and my worry would not be just the very high difference between R/Rfree, but simply the high Rfree. I would more or less bet that if you check your model in the MolProbity server (http://molprobity.biochem.duke.edu/) it will also have appalling geometry scores(*) Here is what I would suspect and what I would do to check what is wrong: -very incomplete or badly built model: validate in Molprobity, and then rebuild (Coot obviously ...). while doing this check the following which just takes computer time, not yours: -twining: try any de-twining tool, I would simply switch on twining refinement in REFMAC and sit back and read the log carefully. -wrong space group: Try the Zanuda server: http://www.ysbl.york.ac.uk/YSBLPrograms/index.jsp regards - Tassos (*)on the other hand I did bet on Germany winning against Spain, so my betting skills are worse than these of a cephalopod mollusk named 'Paul', http://en.wikipedia.org/wiki/Paul_the_Octopus PS When you use TLS the average B will always go down since a lot of the movement is 'absorbed' by the TLS tensors that describe domain movement - what is in the PDB is just the 'residual' B that cannot be explained by domains moves. On Jul 8, 2010, at 8:32, Tim Gruene wrote: Dear Sampath, You are right, the gap between R and Rfree is significant and indicates that your model was overfitted. Without knowing your data or your model, some reasons for overfitting might be: - you used automated placement of water molecules (e.g. through arpwaters or in coot) and never checked the water molecules for chemical reasonability. How many residues are there in your structure and how many water molecules? - there might be a domain that - despite the resolution - does not resolve with your data but you built somethig nevertheless - you build things into your model while using a too low (approx. 1.0sigma) sigma-level for your map. At too low a contour level you can often see what you _want_ to see in my experience, and not what is there - you screwed up the Rfree set and it's not indendent anymore. However, in that case I would rather expect the difference or the ration to be too small rather than too big. - Your data may be twinned. That's just a first set of reasons but there might be something one could only know by looking at your data. Tim On Wed, Jul 07, 2010 at 10:17:14PM -0700, Sampath Natarajan wrote: Dear all, I have a question about the R free value. I refined a structure with 2A resolution. After model building and restraint refinement using Refmac program, the average B factor was around 50 for all atoms. The R/ Rfree were around 22/34. Then used the TLS refinement choosing entire molecule. Then R/Rfree reduced as 20/32. But the average B factor was reduced as 30. The R/Rfree difference is about 12% in final refinement. I feel it is significantly higher. Could any one suggest me to reduce the Rfree value more? or is it good to submit the data in the PDB database with this 12% difference? Thanks for the suggestions. Sincerely, Sampath N -- -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A P please don't print this e-mail unless you really need to Anastassis (Tassos) Perrakis, Principal Investigator / Staff Member Department of Biochemistry (B8) Netherlands Cancer Institute, Dept. B8, 1066 CX Amsterdam, The Netherlands Tel: +31 20 512 1951 Fax: +31 20 512 1954 Mobile / SMS: +31 6 28 597791
Re: [ccp4bb] Question about R/Rfree value difference
badly built model reminds me of another possible reason: if you have an old version of ccp4 installed (before 6.1??), there is a default weight of 0.3 for refmac between data and restraints. This value is - in my experience - way too high for normal resolution and at the beginning of refinement resulting, as Tassos described, in a terrible geometry of the model. With recent versions of ccp4, the auto-weighting option in refmac is switched on which usually does a good job. Tim P.S.: Because of this version question one might add to the CCP4 netiquette to always state what program was used and which version of it when asking program specific questions. On Thu, Jul 08, 2010 at 10:14:56AM +0200, Anastassis Perrakis wrote: I do agree with Tim's reasoning in general, but as Pavel also implied by offering the statistics, I would not be worried about the difference, but by the unreasonably high absolute value of Free R for 2.0 A resolution. I do not think that its simple 'over-fitting' and my worry would not be just the very high difference between R/Rfree, but simply the high Rfree. I would more or less bet that if you check your model in the MolProbity server (http://molprobity.biochem.duke.edu/) it will also have appalling geometry scores(*) Here is what I would suspect and what I would do to check what is wrong: -very incomplete or badly built model: validate in Molprobity, and then rebuild (Coot obviously ...). while doing this check the following which just takes computer time, not yours: -twining: try any de-twining tool, I would simply switch on twining refinement in REFMAC and sit back and read the log carefully. -wrong space group: Try the Zanuda server: http://www.ysbl.york.ac.uk/YSBLPrograms/index.jsp regards - Tassos (*)on the other hand I did bet on Germany winning against Spain, so my betting skills are worse than these of a cephalopod mollusk named 'Paul', http://en.wikipedia.org/wiki/Paul_the_Octopus PS When you use TLS the average B will always go down since a lot of the movement is 'absorbed' by the TLS tensors that describe domain movement - what is in the PDB is just the 'residual' B that cannot be explained by domains moves. On Jul 8, 2010, at 8:32, Tim Gruene wrote: Dear Sampath, You are right, the gap between R and Rfree is significant and indicates that your model was overfitted. Without knowing your data or your model, some reasons for overfitting might be: - you used automated placement of water molecules (e.g. through arpwaters or in coot) and never checked the water molecules for chemical reasonability. How many residues are there in your structure and how many water molecules? - there might be a domain that - despite the resolution - does not resolve with your data but you built somethig nevertheless - you build things into your model while using a too low (approx. 1.0sigma) sigma-level for your map. At too low a contour level you can often see what you _want_ to see in my experience, and not what is there - you screwed up the Rfree set and it's not indendent anymore. However, in that case I would rather expect the difference or the ration to be too small rather than too big. - Your data may be twinned. That's just a first set of reasons but there might be something one could only know by looking at your data. Tim On Wed, Jul 07, 2010 at 10:17:14PM -0700, Sampath Natarajan wrote: Dear all, I have a question about the R free value. I refined a structure with 2A resolution. After model building and restraint refinement using Refmac program, the average B factor was around 50 for all atoms. The R/ Rfree were around 22/34. Then used the TLS refinement choosing entire molecule. Then R/Rfree reduced as 20/32. But the average B factor was reduced as 30. The R/Rfree difference is about 12% in final refinement. I feel it is significantly higher. Could any one suggest me to reduce the Rfree value more? or is it good to submit the data in the PDB database with this 12% difference? Thanks for the suggestions. Sincerely, Sampath N -- -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A P please don't print this e-mail unless you really need to Anastassis (Tassos) Perrakis, Principal Investigator / Staff Member Department of Biochemistry (B8) Netherlands Cancer Institute, Dept. B8, 1066 CX Amsterdam, The Netherlands Tel: +31 20 512 1951 Fax: +31 20 512 1954 Mobile / SMS: +31 6 28 597791 -- -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A signature.asc Description: Digital signature
Re: [ccp4bb] Question about R/Rfree value difference
Something else to consider is what is your space group ? P212121 but truly P21 with twinning fraction close to 0.5 ? That's one of my recent cases. 1.9 Å data beautifully refined built but the Rwork/Rfree gap was 13 percent. After changing the space group and applying the twin law the gap is 3 percent 18.4 and 21.3 Jürgen - Jürgen Bosch Johns Hopkins Bloomberg School of Public Health Department of Biochemistry Molecular Biology Johns Hopkins Malaria Research Institute 615 North Wolfe Street, W8708 Baltimore, MD 21205 Phone: +1-410-614-4742 Lab: +1-410-614-4894 Fax: +1-410-955-3655 http://web.mac.com/bosch_lab/ On Jul 8, 2010, at 1:17 AM, Sampath Natarajan wrote: Dear all, I have a question about the R free value. I refined a structure with 2A resolution. After model building and restraint refinement using Refmac program, the average B factor was around 50 for all atoms. The R/Rfree were around 22/34. Then used the TLS refinement choosing entire molecule. Then R/Rfree reduced as 20/32. But the average B factor was reduced as 30. The R/Rfree difference is about 12% in final refinement. I feel it is significantly higher. Could any one suggest me to reduce the Rfree value more? or is it good to submit the data in the PDB database with this 12% difference? Thanks for the suggestions. Sincerely, Sampath N
Re: [ccp4bb] Question about R/Rfree value difference
Sampath With regard to your question on what sort of statistics you should get within structure determination you might find this service at the PDBe useful : http://www.ebi.ac.uk/pdbe-as/pdbestatistics/PDBeStatistics.jsp You can view and manipulate distributions of R, Rfree and R-Rfree along within many other data distributions from Xray (also NMR/EM) during structure determination. There are also links (clicking the graph) that list all the depositions that have a particular value within the distribution. I agree with Pavel that from your quoted statistics that it would be un-wise to deposit the structure in the current state of refinement as there is clearly an issue. Regards Tom Oldfield Hi Sampath, this is how the distribution of Rwork, Rfree and Rfree-Rwork look like for 'all' PDB structures refined at around 2A resolution. The indicates where your structure stands with respect to this distribution. Histogram of Rwork for models in PDB at resolution 1.90-2.10 A: 0.093 - 0.118 : 2 0.118 - 0.143 : 35 0.143 - 0.168 : 390 0.168 - 0.193 : 1439 0.193 - 0.218 : 1802 your structure 0.218 - 0.242 : 785 0.242 - 0.267 : 159 0.267 - 0.292 : 14 0.292 - 0.317 : 1 0.317 - 0.342 : 1 Histogram of Rfree for models in PDB at resolution 1.90-2.10 A: 0.149 - 0.170 : 10 0.170 - 0.191 : 116 0.191 - 0.213 : 534 0.213 - 0.234 : 1166 0.234 - 0.255 : 1417 0.255 - 0.276 : 942 0.276 - 0.297 : 343 0.297 - 0.319 : 78 0.319 - 0.340 : 17 your structure 0.340 - 0.361 : 5 Histogram of Rfree-Rwork for all model in PDB at resolution 1.90-2.10 A: 0.001 - 0.011 : 41 0.011 - 0.021 : 230 0.021 - 0.031 : 724 0.031 - 0.041 : 1210 0.041 - 0.050 : 1206 0.050 - 0.060 : 654 0.060 - 0.070 : 318 0.070 - 0.080 : 160 0.080 - 0.090 : 56 0.090 - 0.100 : 29 So, it seems your case is the example of typical overfitting, which means the model parameterization or/and the refinement strategy is not good for your data and model. If you send me the data and model files then I will be able (hopefully) to suggest a better refinement strategy or explaine why it's not feasible with available tools. All files will be kept confidentially. The histograms above are obtained using this command from PHENIX family: phenix.r_factor_statistics 2.0 Good luck! Pavel. On 7/7/10 10:17 PM, Sampath Natarajan wrote: Dear all, I have a question about the R free value. I refined a structure with 2A resolution. After model building and restraint refinement using Refmac program, the average B factor was around 50 for all atoms. The R/Rfree were around 22/34. Then used the TLS refinement choosing entire molecule. Then R/Rfree reduced as 20/32. But the average B factor was reduced as 30. The R/Rfree difference is about 12% in final refinement. I feel it is significantly higher. Could any one suggest me to reduce the Rfree value more? or is it good to submit the data in the PDB database with this 12% difference? Thanks for the suggestions. Sincerely, Sampath N
[ccp4bb] Question about R/Rfree value difference
Dear all, I have a question about the R free value. I refined a structure with 2A resolution. After model building and restraint refinement using Refmac program, the average B factor was around 50 for all atoms. The R/Rfree were around 22/34. Then used the TLS refinement choosing entire molecule. Then R/Rfree reduced as 20/32. But the average B factor was reduced as 30. The R/Rfree difference is about 12% in final refinement. I feel it is significantly higher. Could any one suggest me to reduce the Rfree value more? or is it good to submit the data in the PDB database with this 12% difference? Thanks for the suggestions. Sincerely, Sampath N
Re: [ccp4bb] Question about R/Rfree value difference
Hi Sampath, this is how the distribution of Rwork, Rfree and Rfree-Rwork look like for 'all' PDB structures refined at around 2A resolution. The indicates where your structure stands with respect to this distribution. Histogram of Rwork for models in PDB at resolution 1.90-2.10 A: 0.093 - 0.118 : 2 0.118 - 0.143 : 35 0.143 - 0.168 : 390 0.168 - 0.193 : 1439 0.193 - 0.218 : 1802 your structure 0.218 - 0.242 : 785 0.242 - 0.267 : 159 0.267 - 0.292 : 14 0.292 - 0.317 : 1 0.317 - 0.342 : 1 Histogram of Rfree for models in PDB at resolution 1.90-2.10 A: 0.149 - 0.170 : 10 0.170 - 0.191 : 116 0.191 - 0.213 : 534 0.213 - 0.234 : 1166 0.234 - 0.255 : 1417 0.255 - 0.276 : 942 0.276 - 0.297 : 343 0.297 - 0.319 : 78 0.319 - 0.340 : 17 your structure 0.340 - 0.361 : 5 Histogram of Rfree-Rwork for all model in PDB at resolution 1.90-2.10 A: 0.001 - 0.011 : 41 0.011 - 0.021 : 230 0.021 - 0.031 : 724 0.031 - 0.041 : 1210 0.041 - 0.050 : 1206 0.050 - 0.060 : 654 0.060 - 0.070 : 318 0.070 - 0.080 : 160 0.080 - 0.090 : 56 0.090 - 0.100 : 29 So, it seems your case is the example of typical overfitting, which means the model parameterization or/and the refinement strategy is not good for your data and model. If you send me the data and model files then I will be able (hopefully) to suggest a better refinement strategy or explaine why it's not feasible with available tools. All files will be kept confidentially. The histograms above are obtained using this command from PHENIX family: phenix.r_factor_statistics 2.0 Good luck! Pavel. On 7/7/10 10:17 PM, Sampath Natarajan wrote: Dear all, I have a question about the R free value. I refined a structure with 2A resolution. After model building and restraint refinement using Refmac program, the average B factor was around 50 for all atoms. The R/Rfree were around 22/34. Then used the TLS refinement choosing entire molecule. Then R/Rfree reduced as 20/32. But the average B factor was reduced as 30. The R/Rfree difference is about 12% in final refinement. I feel it is significantly higher. Could any one suggest me to reduce the Rfree value more? or is it good to submit the data in the PDB database with this 12% difference? Thanks for the suggestions. Sincerely, Sampath N