Re: [ccp4bb] Question about R/Rfree value difference

2010-07-09 Thread Dirk Kostrewa

 Hi Tom,

very nice tool! It would be good to get numerical values of the plotted 
distributions as well, like mean, median, standard deviation and so on.


Best regards,

Dirk.

Am 08.07.10 15:20, schrieb Tom Oldfield:

Sampath

With regard to your question on what sort of statistics you should get 
within

structure determination you might find this service at the PDBe useful :
http://www.ebi.ac.uk/pdbe-as/pdbestatistics/PDBeStatistics.jsp

You can view and manipulate distributions of R, Rfree and R-Rfree along
within many other data distributions from Xray (also NMR/EM) during
structure determination.  There are also links (clicking the graph) 
that list

all the depositions that have a particular value within the distribution.

I agree with Pavel that from your quoted statistics that it would be 
un-wise to deposit
the structure in the current state of refinement as there is clearly 
an issue.


Regards
Tom Oldfield


Hi Sampath,

this is how the distribution of Rwork, Rfree and Rfree-Rwork look 
like for 'all' PDB structures refined at around 2A resolution. The 
 indicates where your structure stands with respect to this 
distribution.


Histogram of Rwork for models in PDB at resolution 1.90-2.10 A:
 0.093 - 0.118  : 2
 0.118 - 0.143  : 35
 0.143 - 0.168  : 390
 0.168 - 0.193  : 1439
 0.193 - 0.218  : 1802  your structure
 0.218 - 0.242  : 785
 0.242 - 0.267  : 159
 0.267 - 0.292  : 14
 0.292 - 0.317  : 1
 0.317 - 0.342  : 1
Histogram of Rfree for models in PDB at resolution 1.90-2.10 A:
 0.149 - 0.170  : 10
 0.170 - 0.191  : 116
 0.191 - 0.213  : 534
 0.213 - 0.234  : 1166
 0.234 - 0.255  : 1417
 0.255 - 0.276  : 942
 0.276 - 0.297  : 343
 0.297 - 0.319  : 78
 0.319 - 0.340  : 17  your structure
 0.340 - 0.361  : 5
Histogram of Rfree-Rwork for all model in PDB at resolution 1.90-2.10 A:
 0.001 - 0.011  : 41
 0.011 - 0.021  : 230
 0.021 - 0.031  : 724
 0.031 - 0.041  : 1210
 0.041 - 0.050  : 1206
 0.050 - 0.060  : 654
 0.060 - 0.070  : 318
 0.070 - 0.080  : 160
 0.080 - 0.090  : 56
 0.090 - 0.100  : 29

So, it seems your case is the example of typical overfitting, which 
means the model parameterization or/and the refinement strategy is 
not good for your data and model.


If you send me the data and model files then I will be able 
(hopefully) to suggest a better refinement strategy or explaine why 
it's not feasible with available tools. All files will be kept 
confidentially.


The histograms above are obtained using this command from PHENIX family:

phenix.r_factor_statistics 2.0

Good luck!
Pavel.


On 7/7/10 10:17 PM, Sampath Natarajan wrote:

Dear all,

I have a question about the R free value. I refined a structure with 
2A resolution. After model building and restraint refinement using 
Refmac program, the average B factor was around 50 for all atoms. 
The R/Rfree were around 22/34. Then used the TLS refinement choosing 
entire molecule. Then R/Rfree reduced as 20/32. But the average B 
factor was reduced as 30. The R/Rfree difference is about 12% in 
final refinement. I feel it is significantly higher.


Could any one suggest me to reduce the Rfree value more? or is it 
good to submit the data in the PDB database with this 12% difference?


Thanks for the suggestions.

Sincerely,
Sampath N


--

***
Dirk Kostrewa
Gene Center Munich, A5.07
Department of Biochemistry
Ludwig-Maximilians-Universität München
Feodor-Lynen-Str. 25
D-81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:+49-89-2180-76999
E-mail: kostr...@genzentrum.lmu.de
WWW:www.genzentrum.lmu.de
***


Re: [ccp4bb] Question about R/Rfree value difference

2010-07-08 Thread Anastassis Perrakis
I do agree with Tim's reasoning in general, but as Pavel also implied  
by offering the statistics,
I would not be worried about the difference, but by the unreasonably  
high absolute value of Free R for 2.0 A resolution.


I do not think that its simple 'over-fitting' and my worry would not  
be just the very high difference between
R/Rfree, but simply the high Rfree. I would more or less bet that if  
you check your model in
the MolProbity server (http://molprobity.biochem.duke.edu/) it will  
also have appalling geometry scores(*)


Here is what I would suspect and what I would do to check what is wrong:

-very incomplete or badly built model: validate in Molprobity, and  
then rebuild (Coot obviously ...).
 while doing this check the following which just takes computer  
time, not yours:
-twining: try any de-twining tool, I would simply switch on twining  
refinement in REFMAC and sit back and read the log carefully.

-wrong space group: Try the Zanuda server: 
http://www.ysbl.york.ac.uk/YSBLPrograms/index.jsp

regards -

Tassos

(*)on the other hand I did bet on Germany winning against Spain, so my  
betting skills are worse than these

of a cephalopod mollusk named 'Paul', 
http://en.wikipedia.org/wiki/Paul_the_Octopus


PS When you use TLS the average B will always go down since a lot of  
the movement is 'absorbed' by the TLS tensors
that describe domain movement - what is in the PDB is just the  
'residual' B that cannot be explained by domains moves.


On Jul 8, 2010, at 8:32, Tim Gruene wrote:


Dear Sampath,

You are right, the gap between R and Rfree is significant and  
indicates that

your model was overfitted.
Without knowing your data or your model, some reasons for  
overfitting might be:
- you used automated placement of water molecules (e.g. through  
arpwaters or in
 coot) and never checked the water molecules for chemical  
reasonability. How
 many residues are there in your structure and how many water  
molecules?
- there might be a domain that - despite the resolution - does not  
resolve with

 your data but you built somethig nevertheless
- you build things into your model while using a too low (approx.  
1.0sigma)
 sigma-level for your map. At too low a contour level you can often  
see what

 you _want_ to see in my experience, and not what is there
- you screwed up the Rfree set and it's not indendent anymore.  
However, in that
 case I would rather expect the difference or the ration to be too  
small rather

 than too big.
- Your data may be twinned.

That's just a first set of reasons but there might be something one  
could only

know by looking at your data.

Tim



On Wed, Jul 07, 2010 at 10:17:14PM -0700, Sampath Natarajan wrote:

Dear all,

I have a question about the R free value. I refined a structure  
with 2A
resolution. After model building and restraint refinement using  
Refmac
program, the average B factor was around 50 for all atoms. The R/ 
Rfree were
around 22/34. Then used the TLS refinement choosing entire  
molecule. Then
R/Rfree reduced as 20/32. But the average B factor was reduced as  
30. The

R/Rfree difference is about 12% in final refinement. I feel it is
significantly higher.

Could any one suggest me to reduce the Rfree value more? or is it  
good to

submit the data in the PDB database with this 12% difference?

Thanks for the suggestions.

Sincerely,
Sampath N


--
--
Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A



P please don't print this e-mail unless you really need to
Anastassis (Tassos) Perrakis, Principal Investigator / Staff Member
Department of Biochemistry (B8)
Netherlands Cancer Institute,
Dept. B8, 1066 CX Amsterdam, The Netherlands
Tel: +31 20 512 1951 Fax: +31 20 512 1954 Mobile / SMS: +31 6 28 597791






Re: [ccp4bb] Question about R/Rfree value difference

2010-07-08 Thread Tim Gruene
badly built model reminds me of another possible reason:
if you have an old version of ccp4 installed (before 6.1??), there is a default
weight of 0.3 for refmac between data and restraints.

This value is - in my experience - way too high for normal resolution and at
the beginning of refinement resulting, as Tassos described, in a terrible
geometry of the model.

With recent versions of ccp4, the auto-weighting option in refmac is switched on
which usually does a good job.

Tim

P.S.: Because of this version question one might add to the CCP4 netiquette to
always state what program was used and which version of it when asking program
specific questions.

On Thu, Jul 08, 2010 at 10:14:56AM +0200, Anastassis Perrakis wrote:
 I do agree with Tim's reasoning in general, but as Pavel also implied by 
 offering the statistics,
 I would not be worried about the difference, but by the unreasonably  
 high absolute value of Free R for 2.0 A resolution.

 I do not think that its simple 'over-fitting' and my worry would not be 
 just the very high difference between
 R/Rfree, but simply the high Rfree. I would more or less bet that if you 
 check your model in
 the MolProbity server (http://molprobity.biochem.duke.edu/) it will also 
 have appalling geometry scores(*)

 Here is what I would suspect and what I would do to check what is wrong:

 -very incomplete or badly built model: validate in Molprobity, and then 
 rebuild (Coot obviously ...).
  while doing this check the following which just takes computer  
 time, not yours:
 -twining: try any de-twining tool, I would simply switch on twining  
 refinement in REFMAC and sit back and read the log carefully.
 -wrong space group: Try the Zanuda server: 
 http://www.ysbl.york.ac.uk/YSBLPrograms/index.jsp

 regards -

 Tassos

 (*)on the other hand I did bet on Germany winning against Spain, so my  
 betting skills are worse than these
 of a cephalopod mollusk named 'Paul', 
 http://en.wikipedia.org/wiki/Paul_the_Octopus


 PS When you use TLS the average B will always go down since a lot of the 
 movement is 'absorbed' by the TLS tensors
 that describe domain movement - what is in the PDB is just the  
 'residual' B that cannot be explained by domains moves.

 On Jul 8, 2010, at 8:32, Tim Gruene wrote:

 Dear Sampath,

 You are right, the gap between R and Rfree is significant and  
 indicates that
 your model was overfitted.
 Without knowing your data or your model, some reasons for overfitting 
 might be:
 - you used automated placement of water molecules (e.g. through  
 arpwaters or in
  coot) and never checked the water molecules for chemical  
 reasonability. How
  many residues are there in your structure and how many water  
 molecules?
 - there might be a domain that - despite the resolution - does not  
 resolve with
  your data but you built somethig nevertheless
 - you build things into your model while using a too low (approx.  
 1.0sigma)
  sigma-level for your map. At too low a contour level you can often  
 see what
  you _want_ to see in my experience, and not what is there
 - you screwed up the Rfree set and it's not indendent anymore.  
 However, in that
  case I would rather expect the difference or the ration to be too  
 small rather
  than too big.
 - Your data may be twinned.

 That's just a first set of reasons but there might be something one  
 could only
 know by looking at your data.

 Tim



 On Wed, Jul 07, 2010 at 10:17:14PM -0700, Sampath Natarajan wrote:
 Dear all,

 I have a question about the R free value. I refined a structure with 
 2A
 resolution. After model building and restraint refinement using  
 Refmac
 program, the average B factor was around 50 for all atoms. The R/ 
 Rfree were
 around 22/34. Then used the TLS refinement choosing entire molecule. 
 Then
 R/Rfree reduced as 20/32. But the average B factor was reduced as  
 30. The
 R/Rfree difference is about 12% in final refinement. I feel it is
 significantly higher.

 Could any one suggest me to reduce the Rfree value more? or is it  
 good to
 submit the data in the PDB database with this 12% difference?

 Thanks for the suggestions.

 Sincerely,
 Sampath N

 -- 
 --
 Tim Gruene
 Institut fuer anorganische Chemie
 Tammannstr. 4
 D-37077 Goettingen

 GPG Key ID = A46BEE1A


 P please don't print this e-mail unless you really need to
 Anastassis (Tassos) Perrakis, Principal Investigator / Staff Member
 Department of Biochemistry (B8)
 Netherlands Cancer Institute,
 Dept. B8, 1066 CX Amsterdam, The Netherlands
 Tel: +31 20 512 1951 Fax: +31 20 512 1954 Mobile / SMS: +31 6 28 597791





-- 
--
Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A



signature.asc
Description: Digital signature


Re: [ccp4bb] Question about R/Rfree value difference

2010-07-08 Thread jbosch
Something else to consider is what is your space group ?
P212121 but truly P21 with twinning fraction close to 0.5 ?
That's one of my recent cases. 1.9 Å data beautifully refined  built but the 
Rwork/Rfree gap was 13 percent. After changing the space group and applying the 
twin law the gap is 3 percent 18.4 and 21.3

Jürgen

-
Jürgen Bosch
Johns Hopkins Bloomberg School of Public Health
Department of Biochemistry  Molecular Biology
Johns Hopkins Malaria Research Institute
615 North Wolfe Street, W8708
Baltimore, MD 21205
Phone: +1-410-614-4742
Lab:  +1-410-614-4894
Fax:  +1-410-955-3655
http://web.mac.com/bosch_lab/

On Jul 8, 2010, at 1:17 AM, Sampath Natarajan wrote:

 Dear all,
  
 I have a question about the R free value. I refined a structure with 2A 
 resolution. After model building and restraint refinement using Refmac 
 program, the average B factor was around 50 for all atoms. The R/Rfree were 
 around 22/34. Then used the TLS refinement choosing entire molecule. Then 
 R/Rfree reduced as 20/32. But the average B factor was reduced as 30. The 
 R/Rfree difference is about 12% in final refinement. I feel it is 
 significantly higher.
  
 Could any one suggest me to reduce the Rfree value more? or is it good to 
 submit the data in the PDB database with this 12% difference?
  
 Thanks for the suggestions.
  
 Sincerely,
 Sampath N



Re: [ccp4bb] Question about R/Rfree value difference

2010-07-08 Thread Tom Oldfield

Sampath

With regard to your question on what sort of statistics you should get 
within

structure determination you might find this service at the PDBe useful :
http://www.ebi.ac.uk/pdbe-as/pdbestatistics/PDBeStatistics.jsp

You can view and manipulate distributions of R, Rfree and R-Rfree along
within many other data distributions from Xray (also NMR/EM) during
structure determination.  There are also links (clicking the graph) that 
list

all the depositions that have a particular value within the distribution.

I agree with Pavel that from your quoted statistics that it would be 
un-wise to deposit
the structure in the current state of refinement as there is clearly an 
issue.


Regards
Tom Oldfield


Hi Sampath,

this is how the distribution of Rwork, Rfree and Rfree-Rwork look like 
for 'all' PDB structures refined at around 2A resolution. The  
indicates where your structure stands with respect to this distribution.


Histogram of Rwork for models in PDB at resolution 1.90-2.10 A:
 0.093 - 0.118  : 2
 0.118 - 0.143  : 35
 0.143 - 0.168  : 390
 0.168 - 0.193  : 1439
 0.193 - 0.218  : 1802  your structure
 0.218 - 0.242  : 785
 0.242 - 0.267  : 159
 0.267 - 0.292  : 14
 0.292 - 0.317  : 1
 0.317 - 0.342  : 1
Histogram of Rfree for models in PDB at resolution 1.90-2.10 A:
 0.149 - 0.170  : 10
 0.170 - 0.191  : 116
 0.191 - 0.213  : 534
 0.213 - 0.234  : 1166
 0.234 - 0.255  : 1417
 0.255 - 0.276  : 942
 0.276 - 0.297  : 343
 0.297 - 0.319  : 78
 0.319 - 0.340  : 17  your structure
 0.340 - 0.361  : 5
Histogram of Rfree-Rwork for all model in PDB at resolution 1.90-2.10 A:
 0.001 - 0.011  : 41
 0.011 - 0.021  : 230
 0.021 - 0.031  : 724
 0.031 - 0.041  : 1210
 0.041 - 0.050  : 1206
 0.050 - 0.060  : 654
 0.060 - 0.070  : 318
 0.070 - 0.080  : 160
 0.080 - 0.090  : 56
 0.090 - 0.100  : 29

So, it seems your case is the example of typical overfitting, which 
means the model parameterization or/and the refinement strategy is not 
good for your data and model.


If you send me the data and model files then I will be able 
(hopefully) to suggest a better refinement strategy or explaine why 
it's not feasible with available tools. All files will be kept 
confidentially.


The histograms above are obtained using this command from PHENIX family:

phenix.r_factor_statistics 2.0

Good luck!
Pavel.


On 7/7/10 10:17 PM, Sampath Natarajan wrote:

Dear all,
 
I have a question about the R free value. I refined a structure with 
2A resolution. After model building and restraint refinement using 
Refmac program, the average B factor was around 50 for all atoms. The 
R/Rfree were around 22/34. Then used the TLS refinement choosing 
entire molecule. Then R/Rfree reduced as 20/32. But the average B 
factor was reduced as 30. The R/Rfree difference is about 12% in 
final refinement. I feel it is significantly higher.
 
Could any one suggest me to reduce the Rfree value more? or is it 
good to submit the data in the PDB database with this 12% difference?
 
Thanks for the suggestions.
 
Sincerely,

Sampath N


[ccp4bb] Question about R/Rfree value difference

2010-07-07 Thread Sampath Natarajan
Dear all,

I have a question about the R free value. I refined a structure with 2A
resolution. After model building and restraint refinement using Refmac
program, the average B factor was around 50 for all atoms. The R/Rfree were
around 22/34. Then used the TLS refinement choosing entire molecule. Then
R/Rfree reduced as 20/32. But the average B factor was reduced as 30. The
R/Rfree difference is about 12% in final refinement. I feel it is
significantly higher.

Could any one suggest me to reduce the Rfree value more? or is it good to
submit the data in the PDB database with this 12% difference?

Thanks for the suggestions.

Sincerely,
Sampath N


Re: [ccp4bb] Question about R/Rfree value difference

2010-07-07 Thread Pavel Afonine

Hi Sampath,

this is how the distribution of Rwork, Rfree and Rfree-Rwork look like 
for 'all' PDB structures refined at around 2A resolution. The  
indicates where your structure stands with respect to this distribution.


Histogram of Rwork for models in PDB at resolution 1.90-2.10 A:
0.093 - 0.118  : 2
0.118 - 0.143  : 35
0.143 - 0.168  : 390
0.168 - 0.193  : 1439
0.193 - 0.218  : 1802  your structure
0.218 - 0.242  : 785
0.242 - 0.267  : 159
0.267 - 0.292  : 14
0.292 - 0.317  : 1
0.317 - 0.342  : 1
Histogram of Rfree for models in PDB at resolution 1.90-2.10 A:
0.149 - 0.170  : 10
0.170 - 0.191  : 116
0.191 - 0.213  : 534
0.213 - 0.234  : 1166
0.234 - 0.255  : 1417
0.255 - 0.276  : 942
0.276 - 0.297  : 343
0.297 - 0.319  : 78
0.319 - 0.340  : 17  your structure
0.340 - 0.361  : 5
Histogram of Rfree-Rwork for all model in PDB at resolution 1.90-2.10 A:
0.001 - 0.011  : 41
0.011 - 0.021  : 230
0.021 - 0.031  : 724
0.031 - 0.041  : 1210
0.041 - 0.050  : 1206
0.050 - 0.060  : 654
0.060 - 0.070  : 318
0.070 - 0.080  : 160
0.080 - 0.090  : 56
0.090 - 0.100  : 29

So, it seems your case is the example of typical overfitting, which 
means the model parameterization or/and the refinement strategy is not 
good for your data and model.


If you send me the data and model files then I will be able (hopefully) 
to suggest a better refinement strategy or explaine why it's not 
feasible with available tools. All files will be kept confidentially.


The histograms above are obtained using this command from PHENIX family:

phenix.r_factor_statistics 2.0

Good luck!
Pavel.


On 7/7/10 10:17 PM, Sampath Natarajan wrote:

Dear all,
 
I have a question about the R free value. I refined a structure with 
2A resolution. After model building and restraint refinement using 
Refmac program, the average B factor was around 50 for all atoms. The 
R/Rfree were around 22/34. Then used the TLS refinement choosing 
entire molecule. Then R/Rfree reduced as 20/32. But the average B 
factor was reduced as 30. The R/Rfree difference is about 12% in final 
refinement. I feel it is significantly higher.
 
Could any one suggest me to reduce the Rfree value more? or is it good 
to submit the data in the PDB database with this 12% difference?
 
Thanks for the suggestions.
 
Sincerely,

Sampath N