Re: [ccp4bb] Question: Refmac5 stats reported in pdb REMARK 3

2010-05-21 Thread Christine Zardecki

Dear Phil,

Your observation that the refinement details in PDB format REMARKs  
are difficult to interpret and compare is well taken. Each refinement  
package produces its own set of refinement results calculated in its  
own way. Both the calculation and presentation of this information in  
PDB format differs between programs, and even between program  
versions. The lack of standardization in how refinement information  
is reported is confusing to many PDB users.


In the spirit of supporting innovation, the PDB has historically  
tried to accommodate this diversity by providing program- and version- 
specific REMARK 3 formats.  However, the field of structural biology  
has matured considerably in the past few decades, and time-tested,  
consensus, and best-practice approaches can now be defined in many  
cases. In our view, adopting such approaches (rather than  
accomodating every variant ever implemented) would be the best way to  
serve the interests of both non-expert user communities and the  
experimental structural biology community.


As an illustration, it is interesting to note that there are at least  
20 different types of R-values reported in the current archive. The  
subtle differences in these quantities may be of interest in  
understanding the evolution of refinement methodology. However, we  
believe that a smaller, common set of well-defined data items  
describing refinement results would be more useful to the broader  
community of PDB users.


To this end, the wwPDB maintains an Exchange Data Dictionary of  
community-vetted definitions and examples of each data item in the  
PDB archive. This is an extensible dictionary that grows with new  
technologies and science. For instance, wwPDB has used this  
extensibility to capture and define all the various R-values. While  
the dictionary technology provides a framework for definition and  
standardization, this only addresses part of the problem.


Even though we have precise definitions for the wide range of R-value  
types, R-value comparisons  between entries is still complicated  
because the values are not uniformly populated across the archive. To  
fully address the problem, we not only need the standardization  
provided by the dictionary technology but also the cooperation of the  
software package developers in producing a common set of statistics  
and diagnostics. This does not preclude reporting new and novel data  
items, but these should be provided in addition to a common core of  
data results.


Further information about the PDB Exchange Data Dictionary can be  
found at our dictionary resource site, http://mmcif.pdb.org/


Correspondence information between our PDB Exchange Data Dictionary  
and items in the current PDB format is also available at

http://mmcif.pdb.org/dictionaries/pdb-correspondence/pdb2mmcif-2010.html

Sincerely,

Christine Zardecki
for the wwPDB




From: Phil Jeffrey pjeff...@princeton.edu
Date: May 19, 2010 4:02:22 PM EDT
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] Question: Refmac5 stats reported in pdb REMARK 3
Reply-To: Phil Jeffrey pjeff...@princeton.edu



Compare these two lines from phenix.refine:
REMARK   3   NUMBER OF REFLECTIONS : 46001
REMARK   3   FREE R VALUE TEST SET COUNT  : 2339

with those from refmac, ostensibly using the same data and start pdb:
REMARK   3   NUMBER OF REFLECTIONS :   43672
REMARK   3   FREE R VALUE TEST SET COUNT  :  2339


I know there are 46011 reflections with |F|0 in the files I used.
phenix.refine removes 10 of these as outliers.  The 46001 remaining  
reported in REMARK 3 *include* the test set.


With REFMAC, 43672+2339=46011 so it appears that Refmac reports  
just the *working* set count in that first line, excluding the test  
set.


Is this is a bug with one program or the other, or a bug in the PDB  
definition of REMARK 3 ? http://www.wwpdb.org/documentation/ 
format23/remark3.html


This appears to be a source of inconsistency.

phenix.refine 1.6-289
refmac5 5.4.0077  (I'm apparently a Luddite)

Phil Jeffrey
Princeton





[ccp4bb] Question: Refmac5 stats reported in pdb REMARK 3

2010-05-19 Thread Phil Jeffrey

Compare these two lines from phenix.refine:
REMARK   3   NUMBER OF REFLECTIONS : 46001
REMARK   3   FREE R VALUE TEST SET COUNT  : 2339

with those from refmac, ostensibly using the same data and start pdb:
REMARK   3   NUMBER OF REFLECTIONS :   43672
REMARK   3   FREE R VALUE TEST SET COUNT  :  2339


I know there are 46011 reflections with |F|0 in the files I used.
phenix.refine removes 10 of these as outliers.  The 46001 remaining 
reported in REMARK 3 *include* the test set.


With REFMAC, 43672+2339=46011 so it appears that Refmac reports just the 
*working* set count in that first line, excluding the test set.


Is this is a bug with one program or the other, or a bug in the PDB 
definition of REMARK 3 ? 
http://www.wwpdb.org/documentation/format23/remark3.html


This appears to be a source of inconsistency.

phenix.refine 1.6-289
refmac5 5.4.0077  (I'm apparently a Luddite)

Phil Jeffrey
Princeton


Re: [ccp4bb] Question: Refmac5 stats reported in pdb REMARK 3

2010-05-19 Thread Ian Tickle
Phil,

I think the PDB documentation in this area (such as it is) is unclear,
and it's not easy to fathom what was the original intent.  The first
line you refer to:

REMARK   3   NUMBER OF REFLECTIONS :

appears in the section with the sub-heading:

REMARK   3  DATA USED IN REFINEMENT.

Arguably, and this was my initial impression, the test set is not
'used' in refinement, i.e. it has no effect on the refined parameters
(except possibly indirectly via adjustment of the weights).  So on
that basis, Refmac is correct, the number of reflections 'used' should
*exclude* the test set.

However the second line you refer to:

REMARK   3   FREE R VALUE TEST SET COUNT  :

appears in the next section with the other cross-validation info, with
the sub-heading:

REMARK   3  FIT TO DATA USED IN REFINEMENT.

and seems to contradict this, since its use of the word 'used' (i.e.
in the wider sense of the reflections merely being read in by the
refinement program and passing the various rejection tests) clearly
does imply that the test set is *included* in the count of 'used'
reflections.  So it all comes down to what is meant by 'used'.

Cheers

-- Ian

On Wed, May 19, 2010 at 9:02 PM, Phil Jeffrey pjeff...@princeton.edu wrote:
 Compare these two lines from phenix.refine:
 REMARK   3   NUMBER OF REFLECTIONS             : 46001
 REMARK   3   FREE R VALUE TEST SET COUNT      : 2339

 with those from refmac, ostensibly using the same data and start pdb:
 REMARK   3   NUMBER OF REFLECTIONS             :   43672
 REMARK   3   FREE R VALUE TEST SET COUNT      :  2339


 I know there are 46011 reflections with |F|0 in the files I used.
 phenix.refine removes 10 of these as outliers.  The 46001 remaining reported
 in REMARK 3 *include* the test set.

 With REFMAC, 43672+2339=46011 so it appears that Refmac reports just the
 *working* set count in that first line, excluding the test set.

 Is this is a bug with one program or the other, or a bug in the PDB
 definition of REMARK 3 ?
 http://www.wwpdb.org/documentation/format23/remark3.html

 This appears to be a source of inconsistency.

 phenix.refine 1.6-289
 refmac5 5.4.0077      (I'm apparently a Luddite)

 Phil Jeffrey
 Princeton