[ccp4bb] Effect of NCS on estimate of data:parameter ratio
Dear All, I would have a question regarding the effect of non-crystallographic symmetry (NCS) on the data:parameter ratio in refinement. I am working with X-ray data to a maximum resolution of 4.1-4.4 Angstroem, 79 % solvent content, in P6222 space group; with 22 300 unique reflections and expected 1132 amino acid residues in the asymmetric unit, proper 2-fold rotational NCS (SAD phased and no high- resolution molecular replacement or homology model available). Assuming refinement of x,y,z, B and a polyalanine model (i.e. ca. 5700 atoms), this would equal an observation:parameter ratio of roughly 1:1. This I think would be equivalent to a "normal" protein with 50 % solvent content, diffracting to better than 3 Angstroem resolution (from the statistics I could find, at that resolution a mean data:parameter ratio of ca. 0.9:1 can be expected for refinement of x,y,z, and individual isotropic B; ignoring bond angle/length geometrical restraints at the moment). My question is how I could factor in the 2-fold rotational NCS for the estimate of the observations, assuming tight NCS restraints (or even constraint). It is normally assumed NCS reduces the noise by a factor of the square root of the NCS order, but I would be more interested how much it adds on the observation side (used as a restraint) or reduction of the parameters (used as a constraint). I don't suppose it would be correct to assume that the 2-fold NCS would half the number of parameters to refine (assuming an NCS constraint)? Regards, Florian --- Florian Schmitzberger Biological Chemistry and Molecular Pharmacology Harvard Medical School 250 Longwood Avenue, SGM 130 Boston, MA 02115, US Tel: 001 617 432 5602
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On Saturday 18 September 2010, Nicholas M Glykos wrote: > > > it seems that we are trying to deposit one model to satisfy two > > different purposes - one for model validation and the other for model > > interpretation (use in docking etc), and what's good for one purpose > > might not be necessarily good for the other. > > > This has been discussed before on this list, but allow me to repeat it: > You would have expected that the crystallographers' aim would be to > deposit the model that maximises the product (likelihood * prior). > Clearly, this is not what we do, I guess I have more faith that we do in fact aim for that. Our data, programs, models, and insight are imperfect, but we do our best with what we have. > mainly because (a) the calculation of > likelihood is only based on a subset of the 'data' that are obtained from > an X-ray diffraction experiment (for example, we ignore diffuse scattering > as Ian pointed-out), I do not think that is a valid criticism. In any field of science one might hypothesize that conducting a different kind of experiment and fitting it in accordance with a different theory would produce a different model. But that is only a hypothetical; it does not invalidate the analysis of the experiment you did do based on the data you did collect. > (b) we consciously avoid 'prior' because this would > make the models 'subjective', meaning that better informed people would > deposit (for the same data) different models than the less well informed, I don't know of anyone who consciously avoids using their prior knowledge to inform their current work. But yes, people with more experience may in the end deposit better models than people with little experience. That's why it is valuable to have automated tools like Molprobity to check a proposed model against established prior expectations. It's also one way this bulletin board is value, because it allows those with less experience to ask advice from those with more experience. > (c) the format of the PDB does not offer much room for 'creative > interpretations' of the electron density maps [for example, you can't have > discrete disorder on the backbone (or has this changed ?)]. Could you expand on this point? I am not aware of any restriction on multiple backbone conformations, now or ever. It is true that our refinement programs have not always been very well suited to refine such a model, but that is not a fault of the PDB format. > I sense that > what is being deposited is not the 'best model' in any conceivable way, > but the model that 'best' accounts for the final 2mFo-DFc map within the > limitations of the program used for the final refinement. That would be true if the refinement is conducted in real space. However, it is nearly universal to do the final refinement in reciprocal space. If a maximum likelihood residual is used, the aim is to achieve the "best model" in the generally accepted formal sense of being the the set of model parameter values that provide the most likely explanation for the observed data. The priors are imposed as restraints; the partial residual R_crystallographic(Fo, Fc) encompasses the agreement with the observed data. > My twocents, > Nicholas And mine in return :-) Ethan
Re: [ccp4bb] Matthews Coeff.
The main reason for reporting the Matthews coefficient might be historical, i.e. old fogeys are familiar with the numbers because that's the first way the cell content was reported. People might have been reluctant to report the solvent content in the old days because it requires making some assumption about the protein having a certain partial specific volume. Nonetheless, Phaser (and probably other programs) also reports the solvent content in the cell content analysis step, so you're free to choose the one you prefer. Regards, Randy Read On 18 Sep 2010, at 19:52, Tim Gruene wrote: > Hello, > > why do people and programs (like phaser) use the Matthews coefficient instead > of percentage of solvent content? The amount of information seems the same to > me > and the coefficient is very cumbersome, whereas a percentage is obvious and > it's > easy to imagine what it means. > > Thanks for the discussion, Tim > > -- > -- > Tim Gruene > Institut fuer anorganische Chemie > Tammannstr. 4 > D-37077 Goettingen > > GPG Key ID = A46BEE1A > -- Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical Research Tel: + 44 1223 336500 Wellcome Trust/MRC Building Fax: + 44 1223 336827 Hills RoadE-mail: rj...@cam.ac.uk Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
> it seems that we are trying to deposit one model to satisfy two > different purposes - one for model validation and the other for model > interpretation (use in docking etc), and what's good for one purpose > might not be necessarily good for the other. This has been discussed before on this list, but allow me to repeat it: You would have expected that the crystallographers' aim would be to deposit the model that maximises the product (likelihood * prior). Clearly, this is not what we do, mainly because (a) the calculation of likelihood is only based on a subset of the 'data' that are obtained from an X-ray diffraction experiment (for example, we ignore diffuse scattering as Ian pointed-out), (b) we consciously avoid 'prior' because this would make the models 'subjective', meaning that better informed people would deposit (for the same data) different models than the less well informed, (c) the format of the PDB does not offer much room for 'creative interpretations' of the electron density maps [for example, you can't have discrete disorder on the backbone (or has this changed ?)]. I sense that what is being deposited is not the 'best model' in any conceivable way, but the model that 'best' accounts for the final 2mFo-DFc map within the limitations of the program used for the final refinement. My twocents, Nicholas ps. May I say parenthetically that making the deposited models dependant on their intended usage, would possibly qualify as 'fraud' ;-) -- Dr Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
[ccp4bb] Matthews Coeff.
Hello, why do people and programs (like phaser) use the Matthews coefficient instead of percentage of solvent content? The amount of information seems the same to me and the coefficient is very cumbersome, whereas a percentage is obvious and it's easy to imagine what it means. Thanks for the discussion, Tim -- -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A signature.asc Description: Digital signature