[ccp4bb] Effect of NCS on estimate of data:parameter ratio

2010-09-18 Thread Florian Schmitzberger

Dear All,

I would have a question regarding the effect of non-crystallographic  
symmetry (NCS) on the data:parameter ratio in refinement.


I am working with X-ray data to a maximum resolution of 4.1-4.4  
Angstroem, 79 % solvent content, in P6222 space group; with 22 300  
unique reflections and expected 1132 amino acid residues in the  
asymmetric unit, proper 2-fold rotational NCS (SAD phased and no high- 
resolution molecular replacement or homology model available).


Assuming refinement of x,y,z, B and a polyalanine model (i.e. ca. 5700  
atoms), this would equal an observation:parameter ratio of roughly  
1:1. This I think would be equivalent to a "normal" protein with 50 %  
solvent content, diffracting to better than 3 Angstroem resolution  
(from the statistics I could find, at that resolution a mean  
data:parameter ratio of ca. 0.9:1 can be expected for refinement of  
x,y,z, and individual isotropic B; ignoring bond angle/length  
geometrical restraints at the moment).


My question is how I could factor in the 2-fold rotational NCS for the  
estimate of the observations, assuming tight NCS restraints (or even  
constraint). It is normally assumed NCS reduces the noise by a factor  
of the square root of the NCS order, but I would be more interested  
how much it adds on the observation side (used as a restraint) or  
reduction of the parameters (used as a constraint). I don't suppose it  
would be correct to assume that the 2-fold NCS would half the number  
of parameters to refine (assuming an NCS constraint)?


Regards,

Florian

---
Florian Schmitzberger
Biological Chemistry and Molecular Pharmacology
Harvard Medical School
250 Longwood Avenue, SGM 130
Boston, MA 02115, US
Tel: 001 617 432 5602


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-18 Thread Ethan Merritt
On Saturday 18 September 2010, Nicholas M Glykos wrote:
> 
> > it seems that we are trying to deposit one model to satisfy two 
> > different purposes - one for model validation and the other for model 
> > interpretation (use in docking etc), and what's good for one purpose 
> > might not be necessarily good for the other.
> 
> 
> This has been discussed before on this list, but allow me to repeat it: 
> You would have expected that the crystallographers' aim would be to 
> deposit the model that maximises the product (likelihood * prior). 
> Clearly, this is not what we do, 

I guess I have more faith that we do in fact aim for that.
Our data, programs, models, and insight are imperfect,
but we do our best with what we have.

> mainly because (a) the calculation of 
> likelihood is only based on a subset of the 'data' that are obtained from 
> an X-ray diffraction experiment (for example, we ignore diffuse scattering 
> as Ian pointed-out), 

I do not think that is a valid criticism.  In any field of science 
one might hypothesize that conducting a different kind of experiment
and fitting it in accordance with a different theory would produce
a different model.  But that is only a hypothetical;  it does not
invalidate the analysis of the experiment you did do based on the
data you did collect.

> (b) we consciously avoid 'prior' because this would 
> make the models 'subjective', meaning that better informed people would 
> deposit (for the same data) different models than the less well informed, 

I don't know of anyone who consciously avoids using their prior
knowledge to inform their current work.  But yes, people with more
experience may in the end deposit better models than people with 
little experience.  That's why it is valuable to have automated tools
like Molprobity to check a proposed model against established prior
expectations.  It's also one way this bulletin board is value, because
it allows those with less experience to ask advice from those with
more experience.

> (c) the format of the PDB does not offer much room for 'creative 
> interpretations' of the electron density maps [for example, you can't have 
> discrete disorder on the backbone (or has this changed ?)]. 

Could you expand on this point?  
I am not aware of any restriction on multiple backbone conformations,
now or ever.   It is true that our refinement programs have not always
been very well suited to refine such a model, but that is not a fault
of the PDB format.

> I sense that 
> what is being deposited is not the 'best model' in any conceivable way, 
> but the model that 'best' accounts for the final 2mFo-DFc map within the 
> limitations of the program used for the final refinement.

That would be true if the refinement is conducted in real space.
However, it is nearly universal to do the final refinement in
reciprocal space.

If a maximum likelihood residual is used, the aim is to achieve the
"best model" in the generally accepted formal sense of being the
the set of model parameter values that provide the most likely explanation
for the observed data.  The priors are imposed as restraints;
the partial residual R_crystallographic(Fo, Fc) encompasses the agreement
with the observed data.

> My twocents,
> Nicholas

And mine in return :-) 
Ethan


Re: [ccp4bb] Matthews Coeff.

2010-09-18 Thread Randy Read
The main reason for reporting the Matthews coefficient might be historical, 
i.e. old fogeys are familiar with the numbers because that's the first way the 
cell content was reported.  People might have been reluctant to report the 
solvent content in the old days because it requires making some assumption 
about the protein having a certain partial specific volume.

Nonetheless, Phaser (and probably other programs) also reports the solvent 
content in the cell content analysis step, so you're free to choose the one you 
prefer.

Regards,

Randy Read

On 18 Sep 2010, at 19:52, Tim Gruene wrote:

> Hello,
> 
> why do people and programs (like phaser) use the Matthews coefficient instead
> of percentage of solvent content? The amount of information seems the same to 
> me
> and the coefficient is very cumbersome, whereas a percentage is obvious and 
> it's
> easy to imagine what it means.
> 
> Thanks for the discussion, Tim
> 
> -- 
> --
> Tim Gruene
> Institut fuer anorganische Chemie
> Tammannstr. 4
> D-37077 Goettingen
> 
> GPG Key ID = A46BEE1A
> 

--
Randy J. Read
Department of Haematology, University of Cambridge
Cambridge Institute for Medical Research  Tel: + 44 1223 336500
Wellcome Trust/MRC Building   Fax: + 44 1223 336827
Hills RoadE-mail: rj...@cam.ac.uk
Cambridge CB2 0XY, U.K.   www-structmed.cimr.cam.ac.uk


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-18 Thread Nicholas M Glykos

> it seems that we are trying to deposit one model to satisfy two 
> different purposes - one for model validation and the other for model 
> interpretation (use in docking etc), and what's good for one purpose 
> might not be necessarily good for the other.


This has been discussed before on this list, but allow me to repeat it: 
You would have expected that the crystallographers' aim would be to 
deposit the model that maximises the product (likelihood * prior). 
Clearly, this is not what we do, mainly because (a) the calculation of 
likelihood is only based on a subset of the 'data' that are obtained from 
an X-ray diffraction experiment (for example, we ignore diffuse scattering 
as Ian pointed-out), (b) we consciously avoid 'prior' because this would 
make the models 'subjective', meaning that better informed people would 
deposit (for the same data) different models than the less well informed, 
(c) the format of the PDB does not offer much room for 'creative 
interpretations' of the electron density maps [for example, you can't have 
discrete disorder on the backbone (or has this changed ?)]. I sense that 
what is being deposited is not the 'best model' in any conceivable way, 
but the model that 'best' accounts for the final 2mFo-DFc map within the 
limitations of the program used for the final refinement.

My twocents,
Nicholas

ps. May I say parenthetically that making the deposited models dependant 
on their intended usage, would possibly qualify as 'fraud' ;-)


-- 


  Dr Nicholas M. Glykos, Department of Molecular Biology
 and Genetics, Democritus University of Thrace, University Campus,
  Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620,
Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/


[ccp4bb] Matthews Coeff.

2010-09-18 Thread Tim Gruene
Hello,

why do people and programs (like phaser) use the Matthews coefficient instead
of percentage of solvent content? The amount of information seems the same to me
and the coefficient is very cumbersome, whereas a percentage is obvious and it's
easy to imagine what it means.

Thanks for the discussion, Tim

-- 
--
Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A



signature.asc
Description: Digital signature