Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-20 Thread Quyen Hoang

Hi Nicholas,

Thank you for your reply.



snip

it seems that we are trying to deposit one model to satisfy two
different purposes - one for model validation and the other for model
interpretation (use in docking etc), and what's good for one purpose
might not be necessarily good for the other.

/snip

This has been discussed before on this list, but allow me to repeat  
it:

You would have expected that the crystallographers' aim would be to
deposit the model that maximises the product (likelihood * prior).
Clearly, this is not what we do, mainly because (a) the calculation of
likelihood is only based on a subset of the 'data' that are obtained  
from
an X-ray diffraction experiment (for example, we ignore diffuse  
scattering
as Ian pointed-out), (b) we consciously avoid 'prior' because this  
would
make the models 'subjective', meaning that better informed people  
would
deposit (for the same data) different models than the less well  
informed,

(c) the format of the PDB does not offer much room for 'creative
interpretations' of the electron density maps [for example, you  
can't have
discrete disorder on the backbone (or has this changed ?)]. I sense  
that
what is being deposited is not the 'best model' in any conceivable  
way,
but the model that 'best' accounts for the final 2mFo-DFc map within  
the

limitations of the program used for the final refinement.


I don't quite understand your point. We currently deposit electron  
densities and movies, I don't see how depositing an energy minimized  
structure is so difficult. It doesn't need to be on the same pdb file  
as the model used in refinement nor does it need to be deposited into  
the PDB server, but even if it does, is it not possible to have it as  
a new Chain or new atom type in the current pdb file format?




ps. May I say parenthetically that making the deposited models  
dependant

on their intended usage, would possibly qualify as 'fraud' ;-)


I don't quite understand this either. When I prepare a protein model  
for simulation, I would remove all alternative conformations, add  
hydrogens, and then minimize the structure. If I make such a minimized  
structure available for others to use with full disclosure, how would  
that constitute fraud? I was going to start offering minimized  
models on our future structures on our lab website, but if that  
constitutes fraud, then I might have to rethink.


I don't know enough to argue with anyone here and that's not the  
intention of my posts - I am just trying to help figure out a way to  
resolve a significant problem that will likely to resurface down the  
road. It would be helpful if the more experienced people here can  
start a discussion of 'how to resolve' the problems exposed by this  
thread so far - assuming that you agree that it's a problem worth your  
time.


Cheers,
Quyen

__
Quyen Hoang, Ph.D
Assistant Professor
Department of Biochemistry and Molecular Biology,
Stark Neurosciences Research Institute
Indiana University School of Medicine
635 Barnhill Drive, Room MS0013D
Indianapolis, Indiana 46202-5122

Phone: 317-274-4371
Fax: 317-274-4686
email: qqho...@iupui.edu



--


 Dr Nicholas M. Glykos, Department of Molecular Biology
and Genetics, Democritus University of Thrace, University Campus,
 Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office)  
+302551030620,

   Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/









Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-19 Thread Nicholas M Glykos
Hi Ethan,

  mainly because (a) the calculation of likelihood is only based on a 
  subset of the 'data' that are obtained from an X-ray diffraction 
  experiment (for example, we ignore diffuse scattering as Ian 
  pointed-out),
 
 I do not think that is a valid criticism.  In any field of science 
 one might hypothesize that conducting a different kind of experiment
 and fitting it in accordance with a different theory would produce
 a different model.  But that is only a hypothetical;  it does not
 invalidate the analysis of the experiment you did do based on the
 data you did collect.

For the example I mentioned (diffuse scattering), the experiment would be 
identical. Although using only subset of the available information may not 
invalidate the analysis performed, still it is not the best that can be 
done with the data in hand.


  (b) we consciously avoid 'prior' because this would make the models 
  'subjective', meaning that better informed people would deposit (for 
  the same data) different models than the less well informed,
 
 I don't know of anyone who consciously avoids using their prior 
 knowledge to inform their current work.  But yes, people with more 
 experience may in the end deposit better models than people with little 
 experience.  That's why it is valuable to have automated tools like 
 Molprobity to check a proposed model against established prior 
 expectations.  It's also one way this bulletin board is value, because 
 it allows those with less experience to ask advice from those with more 
 experience.

Most people would like to think that the models they deposit correspond to 
an 'objective' representation of the experimentally accessible physical 
reality. The validation tools, mainly by enforcing a uniformity of 
interpretation, discourage (and not encourage) the incorporation in the 
model of prior knowledge about the problem at hand, and thus, offer to 
their users the safety of an 'objectively validated model'.



  (c) the format of the PDB does not offer much room for 'creative 
  interpretations' of the electron density maps [for example, you can't 
  have discrete disorder on the backbone (or has this changed ?)].
 
 Could you expand on this point?  
 I am not aware of any restriction on multiple backbone conformations,
 now or ever.   It is true that our refinement programs have not always
 been very well suited to refine such a model, but that is not a fault
 of the PDB format.

I stand corrected on that. It was probably just me :-)



  I sense that what is being deposited is not the 'best model' in any 
  conceivable way, but the model that 'best' accounts for the final 
  2mFo-DFc map within the limitations of the program used for the final 
  refinement.
 
 That would be true if the refinement is conducted in real space.
 However, it is nearly universal to do the final refinement in
 reciprocal space.

The emphasis of what I said was clearly on model building, and not on the 
refinement methodology. The reference to the refinement program was again 
model-centric (ranging from the treatment of hydrogens, to the bulk 
solvent model used).


Best regards,
Nicholas


-- 


  Dr Nicholas M. Glykos, Department of Molecular Biology
 and Genetics, Democritus University of Thrace, University Campus,
  Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620,
Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-18 Thread Nicholas M Glykos
snip
 it seems that we are trying to deposit one model to satisfy two 
 different purposes - one for model validation and the other for model 
 interpretation (use in docking etc), and what's good for one purpose 
 might not be necessarily good for the other.
/snip

This has been discussed before on this list, but allow me to repeat it: 
You would have expected that the crystallographers' aim would be to 
deposit the model that maximises the product (likelihood * prior). 
Clearly, this is not what we do, mainly because (a) the calculation of 
likelihood is only based on a subset of the 'data' that are obtained from 
an X-ray diffraction experiment (for example, we ignore diffuse scattering 
as Ian pointed-out), (b) we consciously avoid 'prior' because this would 
make the models 'subjective', meaning that better informed people would 
deposit (for the same data) different models than the less well informed, 
(c) the format of the PDB does not offer much room for 'creative 
interpretations' of the electron density maps [for example, you can't have 
discrete disorder on the backbone (or has this changed ?)]. I sense that 
what is being deposited is not the 'best model' in any conceivable way, 
but the model that 'best' accounts for the final 2mFo-DFc map within the 
limitations of the program used for the final refinement.

My twocents,
Nicholas

ps. May I say parenthetically that making the deposited models dependant 
on their intended usage, would possibly qualify as 'fraud' ;-)


-- 


  Dr Nicholas M. Glykos, Department of Molecular Biology
 and Genetics, Democritus University of Thrace, University Campus,
  Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620,
Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-18 Thread Ethan Merritt
On Saturday 18 September 2010, Nicholas M Glykos wrote:
 snip
  it seems that we are trying to deposit one model to satisfy two 
  different purposes - one for model validation and the other for model 
  interpretation (use in docking etc), and what's good for one purpose 
  might not be necessarily good for the other.
 /snip
 
 This has been discussed before on this list, but allow me to repeat it: 
 You would have expected that the crystallographers' aim would be to 
 deposit the model that maximises the product (likelihood * prior). 
 Clearly, this is not what we do, 

I guess I have more faith that we do in fact aim for that.
Our data, programs, models, and insight are imperfect,
but we do our best with what we have.

 mainly because (a) the calculation of 
 likelihood is only based on a subset of the 'data' that are obtained from 
 an X-ray diffraction experiment (for example, we ignore diffuse scattering 
 as Ian pointed-out), 

I do not think that is a valid criticism.  In any field of science 
one might hypothesize that conducting a different kind of experiment
and fitting it in accordance with a different theory would produce
a different model.  But that is only a hypothetical;  it does not
invalidate the analysis of the experiment you did do based on the
data you did collect.

 (b) we consciously avoid 'prior' because this would 
 make the models 'subjective', meaning that better informed people would 
 deposit (for the same data) different models than the less well informed, 

I don't know of anyone who consciously avoids using their prior
knowledge to inform their current work.  But yes, people with more
experience may in the end deposit better models than people with 
little experience.  That's why it is valuable to have automated tools
like Molprobity to check a proposed model against established prior
expectations.  It's also one way this bulletin board is value, because
it allows those with less experience to ask advice from those with
more experience.

 (c) the format of the PDB does not offer much room for 'creative 
 interpretations' of the electron density maps [for example, you can't have 
 discrete disorder on the backbone (or has this changed ?)]. 

Could you expand on this point?  
I am not aware of any restriction on multiple backbone conformations,
now or ever.   It is true that our refinement programs have not always
been very well suited to refine such a model, but that is not a fault
of the PDB format.

 I sense that 
 what is being deposited is not the 'best model' in any conceivable way, 
 but the model that 'best' accounts for the final 2mFo-DFc map within the 
 limitations of the program used for the final refinement.

That would be true if the refinement is conducted in real space.
However, it is nearly universal to do the final refinement in
reciprocal space.

If a maximum likelihood residual is used, the aim is to achieve the
best model in the generally accepted formal sense of being the
the set of model parameter values that provide the most likely explanation
for the observed data.  The priors are imposed as restraints;
the partial residual R_crystallographic(Fo, Fc) encompasses the agreement
with the observed data.

 My twocents,
 Nicholas

And mine in return :-) 
Ethan


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-17 Thread Dirk Kostrewa

 Hi Pavel,


Am 16.09.10 17:56, schrieb Pavel Afonine:

 Hi Dirk,

so, wouldn't be the deposition of the final model's Fcalc, Phic (and 
their weights) along with the final coordinates be the best solution? 
The final Fcalc are our best model and can be used to reproduce the 
final statistics (which would remove the sfcheck annoyance) and to 
reproduce the final electron density maps, and the coordinates can be 
used for what ever purpose they are needed, irrespective of adding 
riding hydrogens or not.


it is a great idea and if you look in PDB deposited structure factors 
there is a number of them (but certainly not the majority) that are 
accompanied by Fcalc. However, a few things to keep in mind:


- Imagine a (not very uncommon, unfortunately) situation when someone 
obtains the final model and Fcalc, and then, right before the PDB 
deposition does a final check in Coot, and moves/removes a few atoms 
(a few waters, or instance) here and there. Or may be does a 
real-space fit of a residue. Or removes H, if present. Or renames a 
ligand by request of PDB staff and accidentally change an atom 
parameter(s). All this in turn will invalidate the R-factors and make 
previously calculated Fcalc inconsistent with such a manipulated model.
So, the bottom-line is: having a model that you can use to reproduce 
the reported statistics is important (for validation and database 
sanity at least, if someones believe that such a minor things wouldn't 
impair the biological interpretation - ultimate goal of protein 
structures).
but this is exactly what one shouldn't do: manipulate the structure 
after the final refinement! And if you manipulate it for a good reason, 
do a last final refinement after that, before depositing coordinates 
and structure factors. Then, there will be no problems, as far as I can see.


Best regards,

Dirk

--

***
Dirk Kostrewa
Gene Center Munich, A5.07
Department of Biochemistry
Ludwig-Maximilians-Universität München
Feodor-Lynen-Str. 25
D-81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:+49-89-2180-76999
E-mail: kostr...@genzentrum.lmu.de
WWW:www.genzentrum.lmu.de
***


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-17 Thread Pavel Afonine

 Dirk,

- Imagine a (not very uncommon, unfortunately) situation when someone 
obtains the final model and Fcalc, and then, right before the PDB 
deposition does a final check in Coot, and moves/removes a few atoms 
(a few waters, or instance) here and there. Or may be does a 
real-space fit of a residue. Or removes H, if present. Or renames a 
ligand by request of PDB staff and accidentally change an atom 
parameter(s). All this in turn will invalidate the R-factors and make 
previously calculated Fcalc inconsistent with such a manipulated model.
So, the bottom-line is: having a model that you can use to reproduce 
the reported statistics is important (for validation and database 
sanity at least, if someones believe that such a minor things 
wouldn't impair the biological interpretation - ultimate goal of 
protein structures).
but this is exactly what one shouldn't do: manipulate the structure 
after the final refinement! And if you manipulate it for a good 
reason, do a last final refinement after that, before depositing 
coordinates and structure factors. Then, there will be no problems, as 
far as I can see.


I apology if what I wrote doesn't read clearly - this is exactly what 
I'm saying: in this particular reply and across the whole discussion. 
Note, I used the word unfortunately above. Anyway, saying it again: 
What I mentioned is based on my (and not only my - see relevant papers) 
observation running validation tools through the whole PDB and making 
note of such manipulated structure. It is a matter of fact that there 
are some intentionally or unintentionally manipulated models, it is very 
bad, it is unfortunate and obviously I'm strictly against it. I'm 
against it to a such a degree so even didn't bother to write a paper on 
this matter, which I mentioned on this thread already:


J. Appl. Cryst. 2010, 43, 669-67.

Therefore it is important to have a model that you can use to reproduce 
the reported statistics (for validation, at least), although having 
Fcalc around wouldn't hurt.


Sorry again, if I wasn't clear in my previous reply.

All the best!
Pavel.


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-17 Thread Quyen Hoang
As a relatively inexperienced scientist, I find this discussion  
fascinating.
I wonder if NMR and EM people are also worried about depositing enough  
modeled info to allow back calculation of data.


Regarding the original discussion of whether to deposit riding  
hydrogens used in the refinement, it seems that we are trying to  
deposit one model to satisfy two different purposes - one for model  
validation and the other for model interpretation (use in docking  
etc), and what's good for one purpose might not be necessarily good  
for the other.
I wonder if it would help to deposit two different models; one  
precisely reflects the model used in refinement and the other an  
energy minimized model with predicted hydrogens and alternative  
conformations removed?


Cheers,
Quyen

__
Quyen Hoang, Ph.D
Assistant Professor
Department of Biochemistry and Molecular Biology,
Stark Neurosciences Research Institute
Indiana University School of Medicine
635 Barnhill Drive, Room MS0013D
Indianapolis, Indiana 46202-5122

Phone: 317-274-4371
Fax: 317-274-4686
email: qqho...@iupui.edu


On Sep 17, 2010, at 8:28 AM, Ian Tickle wrote:


Oh, goodness, I see: even here, we would need clear rules what the
calculated structure factors are, which weights are were, which  
bulk solvent

correction was applied ... a maze, too!


Fortunately the X-ray  restraint weights/target values are not an
issue here: varying them changes the refined model parameters of
course, but they do not appear in the structure factor formula, so
don't need to be specified in the mathematical model to obtain the
Fcalcs.  You would of course need to know all the weights  target
values (as well as the SF formula) to reproduce the refinement to get
the deposited model.

But could future programs really re-calculate the same structure  
factors

from the deposited model? Because of the expected development of more
advanced methods and algorithms, I have my doubts ... *sigh*


Yes, if the deposited mathematical model is completely specified in
terms of the SF formula used and the values of *all* the parameters
that go into it, then in principle future versions of software using
more advanced models will be able to reproduce the exact Fcalcs.  This
assumes that the advanced models will use the same 'core' formula but
with additional terms and adjustable parameters, so that the simple
model can be obtained from the advanced one by constraining the extra
parameters to fixed values.  However if the simple model is not
'nested' inside the more advanced model in this way, then no it will
not be possible to reproduce the Fcalcs.

However as I implied, the main issue is that we're rather lax at fully
specifying our models (both formulae  parameters): obviously if in
future you don't have all the information you need to reproduce the
calculation then you have no hope of getting the same Fcalcs!

Cheers

-- Ian







Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-17 Thread Kendall Nettles
Very interesting discussion. I wonder if the inexperienced user of PDB really 
exists? I don't know anyone off-hand who would really make use of information 
from hydrogen positions but not understand the issues. Although I hear they 
have been sighted in the Everglades  http://en.wikipedia.org/wiki/Skunk_ape

Kendall


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-16 Thread Dirk Kostrewa

 Dear Ian and contributors to this interesting thread,

(please, scroll down a little bit)

Am 15.09.10 23:34, schrieb Ian Tickle:

I should just like to point out that the main source of the
disagreement here seems to be that people have very different ideas
about what a 'model' is or should be.  Strictly a model is a purely
mathematical construct, in this case it consists of the appropriate
equation for the calculated structure factor and the best-fit values
of the various parameters (scattering factors, atomic positions,
occupancies, B factors, TLS parameters etc.) that appear in it. A
mathematical model is inevitably going to be an imperfect
representation of reality, but hopefully it's the best one we can come
up with, in the sense of best explaining the data without significant
overfitting.

The problem arises because many users of the PDB, and I suspect many
contributors to this BB, particularly non-crystallographers, don't see
it like that, because they view a PDB file as a physical model, i.e.
not as the best fit to the data (assuming that the
non-crystallographers even know what the data are!), but the closest
representation of reality.  The difference between the N-H bond
lengths that Ed referred to illustrates the distinction between the
mathematical and the physical model.  The mathematical model requires
that the bond length is 0.86 Ang because that value gives the best fit
of the assumed spherical scattering factor of H to the deformation
density of the X-H covalent bond.  The physical model requires that it
be 1.00 Ang because that is the internuclear distance found by
spectroscopic methods  predicted by QM calculations.  The same goes
for B factors and TLS: to a large extent they are a mathematical
construct whose purpose is to provide an optimal fit to the data.  The
connection of Bs  TLS with reality is tenuous at best, nevertheless
people obviously would like to have a physical interpretation such as
rigid-body correlated motion.  The fact that Bragg scattering provides
no information about correlated motion (you need to measure the
diffuse scattering for that) doesn't seem to deter them!

I have no doubt in my mind that it is the mathematical model that
should be published, because hopefully it's the best available
interpretation of the data.  Whether that involves publishing the
riding H atoms explicitly, or alternatively the formulae and
parameters that were used to calculate their positions I don't mind,
as long as I can faithfully reproduce the Fcalcs to check the validity
of the model.  Then users of the PDB are free to *interpret* the
mathematical models as physical models in a appropriate manner (e.g.
by adjusting the bond lengths to H), and crystallographers have the
untainted mathematical models needed to reproduce the Fcalcs.


so, wouldn't be the deposition of the final model's Fcalc, Phic (and 
their weights) along with the final coordinates be the best solution? 
The final Fcalc are our best model and can be used to reproduce the 
final statistics (which would remove the sfcheck annoyance) and to 
reproduce the final electron density maps, and the coordinates can be 
used for what ever purpose they are needed, irrespective of adding 
riding hydrogens or not.


Best regards,

Dirk.

--

***
Dirk Kostrewa
Gene Center Munich, A5.07
Department of Biochemistry
Ludwig-Maximilians-Universität München
Feodor-Lynen-Str. 25
D-81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:+49-89-2180-76999
E-mail: kostr...@genzentrum.lmu.de
WWW:www.genzentrum.lmu.de
***


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-16 Thread Ethan Merritt
On Thursday 16 September 2010 01:25:12 am Dirk Kostrewa wrote:
 
 so, wouldn't be the deposition of the final model's Fcalc, Phic (and 
 their weights) along with the final coordinates be the best solution? 
 The final Fcalc are our best model and can be used to reproduce the 
 final statistics (which would remove the sfcheck annoyance) and to 
 reproduce the final electron density maps, and the coordinates can be 
 used for what ever purpose they are needed, irrespective of adding 
 riding hydrogens or not.

Now I'm confused.  Isn't that already the recommended, if not required,
practice?

Ethan

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-16 Thread Pavel Afonine

 Hi Dirk,

so, wouldn't be the deposition of the final model's Fcalc, Phic (and 
their weights) along with the final coordinates be the best solution? 
The final Fcalc are our best model and can be used to reproduce the 
final statistics (which would remove the sfcheck annoyance) and to 
reproduce the final electron density maps, and the coordinates can be 
used for what ever purpose they are needed, irrespective of adding 
riding hydrogens or not.


it is a great idea and if you look in PDB deposited structure factors 
there is a number of them (but certainly not the majority) that are 
accompanied by Fcalc. However, a few things to keep in mind:


- Imagine a (not very uncommon, unfortunately) situation when someone 
obtains the final model and Fcalc, and then, right before the PDB 
deposition does a final check in Coot, and moves/removes a few atoms (a 
few waters, or instance) here and there. Or may be does a real-space fit 
of a residue. Or removes H, if present. Or renames a ligand by request 
of PDB staff and accidentally change an atom parameter(s). All this in 
turn will invalidate the R-factors and make previously calculated Fcalc 
inconsistent with such a manipulated model.
So, the bottom-line is: having a model that you can use to reproduce the 
reported statistics is important (for validation and database sanity at 
least, if someones believe that such a minor things wouldn't impair the 
biological interpretation - ultimate goal of protein structures).


- To reproduce typically the most used electron density maps, such as 
2mFo-DFc and mFo-DFc, you would also need to deposit coefficients m and 
D, or, alternatively, have a program and free-R flags handy to compute m 
and D yourself.


- Requiring Fcalc, you would have to make sure that this is actually the 
total structure factors Fmodel = scales*(Fcalc_atoms + F_bulk_solvent) 
with all other appropriate scales included. Although, this is easy to do 
by computing the R-factor and comparing it with the reported number.


All the best!
Pavel.


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-16 Thread Dr. Mark Mayer

Ethan wrote


I believe that deposition of Fc Phic FOM should be required.
Certainly it should be the recommended practice.



For the same series of structures I just deposited, which started the 
the riding H discussion, my mtz file had Fc Phic FOM + other data put 
out by Phenix - pavel can elaborate. rcsb stripped almost all of this 
and the processed file has only:


HKL, Flag,  Fc, SigmaF and FOC :{

What's a structural biologist to do?


--

Mark


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-16 Thread Eric Larson

Hi Mark,

I assume you deposited the mtz?  This is what Ethan was referring to - the pdb 
does not do well with maintaining all the relevant columns when submitting the 
mtz file.  However, if you convert your mtz to cif yourself and make sure it 
has all the columns you would like to include and then submit this cif file to 
the pdb, all the information is retained.

Eric  
__

Eric Larson, PhD
Biomolecular Structure Center
Department of Biochemistry
Box 357742
University of Washington
Seattle, WA 98195

On Thu, 16 Sep 2010, Dr. Mark Mayer wrote:


Ethan wrote


I believe that deposition of Fc Phic FOM should be required.
Certainly it should be the recommended practice.



For the same series of structures I just deposited, which started the the 
riding H discussion, my mtz file had Fc Phic FOM + other data put out by 
Phenix - pavel can elaborate. rcsb stripped almost all of this and the 
processed file has only:


HKL, Flag,  Fc, SigmaF and FOC :{

What's a structural biologist to do?


--

Mark



Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-16 Thread Ethan Merritt
On Thursday 16 September 2010 09:56:14 am Dr. Mark Mayer wrote:
 Ethan wrote
 
 I believe that deposition of Fc Phic FOM should be required.
 Certainly it should be the recommended practice.
 
 
 For the same series of structures I just deposited, which started the 
 the riding H discussion, my mtz file had Fc Phic FOM + other data put 
 out by Phenix - pavel can elaborate. rcsb stripped almost all of this 
 and the processed file has only:
 
 HKL, Flag,  Fc, SigmaF and FOC :{

Huh?  That's not a cif fragment. What file are you looking at?
In my experience the PDB feeds back to you a cif format structure factor
file with a name like   rcsb054058-sf.cif
Near the top of that file you should find a description of the data
columns. The columns present depend on what you fed it, of course.

loop_
_refln.crystal_id
_refln.wavelength_id
_refln.scale_group_code
_refln.status
_refln.index_h
_refln.index_k
_refln.index_l
_refln.F_meas_au
_refln.F_meas_sigma_au
_refln.intensity_meas
_refln.intensity_sigma
_refln.F_calc
_refln.fom
_refln.phase_meas


Caveat:  
I have never tried to deposit a structure factor file from phenix; 
maybe that triggers some other processing pathway. Does anyone here know?

I would say that the simple, and almost guaranteed to work, procedure
is to do the cif conversion yourself and deposit the cif file.

I noted in another message that the auto-conversion script on
the PDB deposition site has a tendency to lose columns.
That's why it is better to do the conversion yourself.
I can't say that they _never_ lose columns in an uploaded cif file.
I have had that happen, but only once and quite a while ago.


 What's a structural biologist to do?

The empiricist's approach.
Experiment till you find a procedure that works, then stick to it :-)

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-16 Thread Tim Gruene
On Thu, Sep 16, 2010 at 10:19:14AM -0700, Ethan Merritt wrote:
 [...] 
  What's a structural biologist to do?
 
 The empiricist's approach.
 Experiment till you find a procedure that works, then stick to it :-)

... or the social approach: communicate with the person at the PDB responsible
for your deposition. So far that's work great for me (plaudit for the people at
the PDB(e)).

Tim

 
 -- 
 Ethan A Merritt
 Biomolecular Structure Center,  K-428 Health Sciences Bldg
 University of Washington, Seattle 98195-7742

-- 
--
Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A



signature.asc
Description: Digital signature


[ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-16 Thread Dr. Mark Mayer

Huh?  That's not a cif fragment. What file are you looking at?
In my experience the PDB feeds back to you a cif format structure factor
file with a name like   rcsb054058-sf.cif
Near the top of that file you should find a description of the data
columns. The columns present depend on what you fed it, of course.



Come on guys - give me a break ... all I posted was just a list of 
the columns in the sf file - here's a cut and paste of what rcsb 
actually generated


rcsb061284-sf.cif

data_r3om0sf
#
_audit.revision_id  1_0
_audit.creation_date  ?
_audit.update_record'Initial release'

loop_
_refln.wavelength_id
_refln.crystal_id
_refln.scale_group_code
_refln.index_h
_refln.index_k
_refln.index_l
_refln.status
_refln.F_meas_au
_refln.F_meas_sigma_au
_refln.fom
1 1 1   008 o 203.06.3  0.99
1 1 1   00   10 o 281.58.7  0.86

Below is mtzdmp of what I actually deposited (as MTZ)


 Col SortMinMaxNum  % Mean Mean   Resolution 
Type Column
 num order   Missing complete  abs.   LowHigh 
label


   1 ASC  0  46  0  100.00 17.7 17.7  31.88   1.40   H  H
   2 NONE 0  72  0  100.00 27.4 27.4  31.88   1.40   H  K
   3 NONE 0  81  0  100.00 30.5 30.5  31.88   1.40   H  L
   4 NONE3.3  2160.3 0  100.00   162.89   162.89  31.88 
1.40   F  FOBS
   5 NONE0.960.0 0  100.00 5.36 5.36  31.88 
1.40   Q  SIGFOBS
   6 NONE0.0 1.0 0  100.00 0.05 0.05  31.88 
1.40   I  R_FREE_FLAGS
   7 NONE0.1  2253.6 0  100.00   157.73   157.73  31.88 
1.40   F  FMODEL
   8 NONE -180.0   180.0 0  100.00 2.6590.13  31.88 
1.40   P  PHIFMODEL
   9 NONE0.0  5823.1 0  100.00   219.29   219.29  31.88 
1.40   F  FCALC
  10 NONE -180.0   180.0 0  100.00 3.2490.09  31.88 
1.40   P  PHIFCALC
  11 NONE0.0 15330.0 0  100.00   141.04   141.04  31.88 
1.40   F  FMASK
  12 NONE -180.0   180.0 0  100.00 4.2990.74  31.88 
1.40   P  PHIFMASK
  13 NONE0.0  6909.4 0  100.0015.4215.42  31.88 
1.40   F  FBULK
  14 NONE -180.0   180.0 0  100.00 4.2990.74  31.88 
1.40   P  PHIFBULK
  15 NONE  0.803   1.199 0  100.001.0041.004  31.88 
1.40   W  FB_CART

  16 NONE  0.001   1.000 0  100.000.8770.877  31.88   1.40   W  FOM
  17 NONE  0.576   0.754 0  100.000.7050.705  31.88 
1.40   W  ALPHA
  18 NONE277.388 0  100.00 5655.391 5655.391  31.88 
1.40   W  BETA



--

Mark

Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-16 Thread Ethan Merritt
On Thursday 16 September 2010 10:34:14 am Dr. Mark Mayer wrote:
 Huh?  That's not a cif fragment. What file are you looking at?
 In my experience the PDB feeds back to you a cif format structure factor
 file with a name like   rcsb054058-sf.cif
 Near the top of that file you should find a description of the data
 columns. The columns present depend on what you fed it, of course.
 
 
 Come on guys - give me a break ... all I posted was just a list of 
 the columns in the sf file

I sincerely apologize.  
Believe it or not, I mistook your emoticon for part of a file syntax
that I was not familiar with.

 HKL, Flag,  Fc, SigmaF and FOC :{

I thought that colon + curly bracket was some funky data delimiter.

Ethan

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-15 Thread Ed Pozharski
On Wed, 2010-09-15 at 10:50 -0700, Pavel Afonine wrote:
 I wouldn't dare calling a model manipulation that typically changes
 the 
 R-factor by 0.5 ... ~2% as nothing.   Although, you are may be right
 - 
 who cares?

It's not a manipulation because no parameters were manipulated in the
model.  Don't you agree that using the riding model does not add
additional refinable parameters?

But your insistence has awakened my curiosity.  So I looked at hydrogens
as produced by phenix.refine for a 1.8A structure I randomly picked.
Just as George has pointed out, the covalent bonds are too short.  for
instance, when hydrogens are added, the average N-H distance is
1.1(5), but upon refinement the value is down to 0.85998(4).  I
won't even begin discussing the fact that some of these hydrogens added
to K,Y,S etc are placed in positions that are not justified by data (not
in definitely wrong positions either, it's just that there is no
evidence to support a particular torsion angle).  And that it is
unlikely that every histidine in the structure is fully protonated.

Do you see the problem?  I fully understand your desire to be able to
reproduce the R-factors (although I don't necessarily share it), but if
I decide to deposit this model with hydrogens, am I essentially stating
that N-H bond is magically shortened to ~0.86A?  Sure, it is driver's
(PDB user's) responsibility to know the meaning of the red light (riding
hydrogens), but wouldn't depositing riding hydrogens be equivalent to
putting 70 mph sign at the ramp, just because all the cops know that
it's not the actual safe speed?  And then tell the accident victim that
there was a fine print in the rule book?  I think this situation is
particularly problematic given that these days some enter the field the
same way many people (at least so it seems here in Baltimore) get their
driver's licenses, i.e. without ever learning the rules?

Cheers,

Ed.

-- 
I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-15 Thread Pavel Afonine

 Dear Ed,

On 9/15/10 12:54 PM, Ed Pozharski wrote:

On Wed, 2010-09-15 at 10:50 -0700, Pavel Afonine wrote:

I wouldn't dare calling a model manipulation that typically changes
the
R-factor by 0.5 ... ~2% as nothing.   Although, you are may be right
-
who cares?

It's not a manipulation because no parameters were manipulated in the
model.


I can't agree with this, sorry. A change to a model content (especially 
the one that changes Fcalc) is a model manipulation.


Pavel.


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-15 Thread Phil Jeffrey

On 9/15/10 3:54 PM, Ed Pozharski wrote:


Don't you agree that using the riding model does not add
additional refinable parameters?

(snip)

instance, when hydrogens are added, the average N-H distance is
1.1(5), but upon refinement the value is down to 0.85998(4).  I


So the riding hydrogen model is imperfect.  At least with phenix.refine 
you can measure it, unlike the default behavior of REFMAC.  (But you can 
tell it to write hydrogens out, I believe).


Obviously this question is not one amenable to a simple answer.  In some 
sense (as per George) riding hydrogens are merely a restraint.  In some 
other sense they are fundamentally a part of the model - they have very 
directional properties via bumping restraints that most certainly alter 
the atomic model for the heavy atoms in a very direct way via collision. 
 Since the nature of these atoms - locationally specific - differs from 
the more amorphous extended atom restraints (CH3E for methyl in CNS 
etc) it could make sense to include them in the model at deposition.


As far as I know we do not delete atoms from the final model that 
contribute to scattering and geometric restraints under any other 
circumstances, except perhaps in the nearly-as-contentious how do I 
model my disordered side-chain case.  Also not amenable to a simple answer.


Both approaches (REFMAC-esque and PHENIX-esque) have their merits.
I doubt I'm the only person here conflicted over what to do about it.
However this thread appears to have reached the point where not much new 
ground is being broken.


Phil Jeffrey
Princeton


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-15 Thread Ed Pozharski
On Wed, 2010-09-15 at 13:13 -0700, Pavel Afonine wrote:
 I can't agree with this, sorry. A change to a model content
 (especially 
 the one that changes Fcalc) is a model manipulation.
 
That is not what I asked.  Do you agree that using the riding model does
not add additional refinable parameters?


-- 
I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-15 Thread Ian Tickle
I should just like to point out that the main source of the
disagreement here seems to be that people have very different ideas
about what a 'model' is or should be.  Strictly a model is a purely
mathematical construct, in this case it consists of the appropriate
equation for the calculated structure factor and the best-fit values
of the various parameters (scattering factors, atomic positions,
occupancies, B factors, TLS parameters etc.) that appear in it. A
mathematical model is inevitably going to be an imperfect
representation of reality, but hopefully it's the best one we can come
up with, in the sense of best explaining the data without significant
overfitting.

The problem arises because many users of the PDB, and I suspect many
contributors to this BB, particularly non-crystallographers, don't see
it like that, because they view a PDB file as a physical model, i.e.
not as the best fit to the data (assuming that the
non-crystallographers even know what the data are!), but the closest
representation of reality.  The difference between the N-H bond
lengths that Ed referred to illustrates the distinction between the
mathematical and the physical model.  The mathematical model requires
that the bond length is 0.86 Ang because that value gives the best fit
of the assumed spherical scattering factor of H to the deformation
density of the X-H covalent bond.  The physical model requires that it
be 1.00 Ang because that is the internuclear distance found by
spectroscopic methods  predicted by QM calculations.  The same goes
for B factors and TLS: to a large extent they are a mathematical
construct whose purpose is to provide an optimal fit to the data.  The
connection of Bs  TLS with reality is tenuous at best, nevertheless
people obviously would like to have a physical interpretation such as
rigid-body correlated motion.  The fact that Bragg scattering provides
no information about correlated motion (you need to measure the
diffuse scattering for that) doesn't seem to deter them!

I have no doubt in my mind that it is the mathematical model that
should be published, because hopefully it's the best available
interpretation of the data.  Whether that involves publishing the
riding H atoms explicitly, or alternatively the formulae and
parameters that were used to calculate their positions I don't mind,
as long as I can faithfully reproduce the Fcalcs to check the validity
of the model.  Then users of the PDB are free to *interpret* the
mathematical models as physical models in a appropriate manner (e.g.
by adjusting the bond lengths to H), and crystallographers have the
untainted mathematical models needed to reproduce the Fcalcs.

Cheers

-- Ian

On Wed, Sep 15, 2010 at 9:13 PM, Pavel Afonine pafon...@lbl.gov wrote:
  Dear Ed,

 On 9/15/10 12:54 PM, Ed Pozharski wrote:

 On Wed, 2010-09-15 at 10:50 -0700, Pavel Afonine wrote:

 I wouldn't dare calling a model manipulation that typically changes
 the
 R-factor by 0.5 ... ~2% as nothing.   Although, you are may be right
 -
 who cares?

 It's not a manipulation because no parameters were manipulated in the
 model.

 I can't agree with this, sorry. A change to a model content (especially the
 one that changes Fcalc) is a model manipulation.

 Pavel.



Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-15 Thread Ed Pozharski
On Wed, 2010-09-15 at 16:26 -0400, Phil Jeffrey wrote:
 So the riding hydrogen model is imperfect.  At least with
 phenix.refine 
 you can measure it, unlike the default behavior of REFMAC.  (But you
 can 
 tell it to write hydrogens out, I believe).
 

My impression is that default behavior of phenix.refine is the same - I
had to change parameters to include hydrogens in the output.

Without breaking any new ground, there is really no conflict here.  Is
it a good idea to make a complete model description (including riding
hydrogens, input files, cif-files, special case restraints etc)
available for structures deposited in the PDB?  Absolutely.  But not in
this form, when model is implying that we know the protonation states of
all the atoms and has unreasonable geometry.  For the example that I
provided, the rmsd_bonds for that particular group is 0.14A, certainly
unacceptable.  Maybe one can use different record for these atoms, say
RIDING instead of ATOM.  Thus complete model can be recovered and at
the same time the nature of these items is explicitly stated.  In this
way riding hydrogens are clearly distinguished from those that are
actually refined at ultrahigh resolution.

Cheers,

Ed.

-- 
I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-15 Thread Pavel Afonine

 Dear Ed,

On 9/15/10 2:47 PM, Ed Pozharski wrote:

On Wed, 2010-09-15 at 16:26 -0400, Phil Jeffrey wrote:

So the riding hydrogen model is imperfect.  At least with
phenix.refine
you can measure it, unlike the default behavior of REFMAC.  (But you
can
tell it to write hydrogens out, I believe).


My impression is that default behavior of phenix.refine is the same - I
had to change parameters to include hydrogens in the output.


No, if your input file contains H atoms, the output file will contain 
them too (in phenix.refine). You don't have to change any parameters for 
this.


Pavel.


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-15 Thread Ed Pozharski
Sure.  But if I start with model that has no hydrogens, they will be
generated but not passed to the output, right.  just like refmac.

On Wed, 2010-09-15 at 14:52 -0700, Pavel Afonine wrote:
 Dear Ed,
 
 On 9/15/10 2:47 PM, Ed Pozharski wrote:
  On Wed, 2010-09-15 at 16:26 -0400, Phil Jeffrey wrote:
  So the riding hydrogen model is imperfect.  At least with
  phenix.refine
  you can measure it, unlike the default behavior of REFMAC.  (But you
  can
  tell it to write hydrogens out, I believe).
 
  My impression is that default behavior of phenix.refine is the same - I
  had to change parameters to include hydrogens in the output.
 
 No, if your input file contains H atoms, the output file will contain 
 them too (in phenix.refine). You don't have to change any parameters for 
 this.
 
 Pavel.
 

-- 
I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs


Re: [ccp4bb] Deposition of riding H: R-factor is overrated

2010-09-15 Thread Pavel Afonine

 Dear Ed,

no, if you start with model that has no hydrogens, they will not be 
generated internally.


Pavel.

On 9/15/10 2:58 PM, Ed Pozharski wrote:

Sure.  But if I start with model that has no hydrogens, they will be
generated but not passed to the output, right.  just like refmac.

On Wed, 2010-09-15 at 14:52 -0700, Pavel Afonine wrote:

Dear Ed,

On 9/15/10 2:47 PM, Ed Pozharski wrote:

On Wed, 2010-09-15 at 16:26 -0400, Phil Jeffrey wrote:

So the riding hydrogen model is imperfect.  At least with
phenix.refine
you can measure it, unlike the default behavior of REFMAC.  (But you
can
tell it to write hydrogens out, I believe).


My impression is that default behavior of phenix.refine is the same - I
had to change parameters to include hydrogens in the output.

No, if your input file contains H atoms, the output file will contain
them too (in phenix.refine). You don't have to change any parameters for
this.

Pavel.