Re: [ccp4bb] how to bring back the missing density for half of the structure

2007-08-01 Thread Fan, Hai-fu
Dear Eric,

For your question 2, the following paper provides some examples:

Acta Cryst. D63, 793-799 (2007).

In one of the examples, there is shown a partial model from Phaser
containing ~20% residues in the ASU can be extended to more than 90% of the
ASU by the iteration of ARP/wARP-OASIS-DM. Detailed examples and scripts on
using OASIS can be found on the download page of

http://cryst.iphy.ac.cn

Regards,

Hai-fu


On 8/1/07, Eric Liu [EMAIL PROTECTED] wrote:

 Hi All,
 I would like to get some help from here for a data set I recently worked
 on. I have been working on a new kinase data set which does not have a close
 homolog. The data was collected to 2.1A  resolution in space group P212121
 however the difference between a and b is only 0.5A. If I index the data
 as P4, Rmerge is increased from 13% to 39%. I used the most close homologs
 which have about 37% sequence identity as search model for molecular
 replacement and it seemed I have got the solution by using Phaser with only
 the c-terminal part of the search model and also a long loop removed. After
 changed the different residues back to the target protein, the structure was
 refined to Rfree/R  46% and 43%  to 2.1 A resolution. The existing
 c-terminal structure has well defined density except 25ish residue at the
 very c-terminal end doesn't have well connected density. Current
 model contains about 50% of overall target residues. I can see some extented
 difference density for several residues going to the N-terminal part and
 also extented density for the C-terminal loop for several residues. I also
 see tones of not well-conncted difference density in the N-terminal region.
 There was no sever clashes between molecules after mount all
 symmetry related molecules. My question is the following:

 1. Have I got the correct  solution for the molecular replacement?
 2. How can I bring back the missing density for the N-terminal residues
 and the loop region?

 I would really appreciate any inputs or suggestions.

 Eric



Re: [ccp4bb] difference density ripples around Hg atoms

2007-08-01 Thread Eleanor Dodson
Well - there will be a ripple, but is it there in the difference map as 
well? that is meantto be less affected.


REFMAC5 claims to be able to refine some atoms anisotropically  and 
that would be a good place to start


Maybe you will need to read the documentation!  There is some way of 
requesting the option..
The PDB doesinclude structures with some anisotropic/ some isotropic B 
values., usually waters

Eleanor


Klemens Wild wrote:

Dear friends of the Fourier transform,

I am refining a structure with 2 adjacent Hg atoms bound to cysteines 
of different monomers in the crystal contacts, which means I need to 
refine them as well. While the structure nicely refines (2.2 A data), 
I do not get rid of negative density ripple layers next to them (-10 
sigmas). My question: is this likely due to anistropy of the soft 
mercury atoms (anisotropic  B refinement decreases the ripples) or is 
this likely a summation truncation effect prominent for heavy atoms? 
Can I just anistropically refine the mercuries while I keep the rest 
isotropic? Never saw this in a PDB. Suggestions are very welcome.


Greetings

Klemens Wild




Re: [ccp4bb] difference density ripples around Hg atoms

2007-08-01 Thread David J. Schuller
On Wed, 2007-08-01 at 09:35 +0200, Klemens Wild wrote:
 Dear friends of the Fourier transform,
 
 I am refining a structure with 2 adjacent Hg atoms bound to cysteines of 
 different monomers in the crystal contacts, which means I need to refine 
 them as well. While the structure nicely refines (2.2 A data), I do not 
 get rid of negative density ripple layers next to them (-10 sigmas). My 
 question: is this likely due to anistropy of the soft mercury atoms 
 (anisotropic  B refinement decreases the ripples) or is this likely a 
 summation truncation effect prominent for heavy atoms? Can I just 
 anistropically refine the mercuries while I keep the rest isotropic? 

Yes, that sounds worth a try. At 2.2 A you probably don't have the
data/parameter ratio to justify anisotropic refinement for the whole
molecule, but since you know the mercury atoms are not being treated
adequately, adding an extra ~ 10 parameters to refine them as
anisotropic is worth a try. Don't expect it to completely eliminate the
ripples, but hopefully you can get some improvement on R/Rfree.

Cheers,

-- 
===
With the single exception of Cornell, there is not a college in the
United States where truth has ever been a welcome guest - R.G. Ingersoll
===
  David J. Schuller
  modern man in a post-modern world
  MacCHESS, Cornell University
  [EMAIL PROTECTED]


Re: [ccp4bb] difference density ripples around Hg atoms

2007-08-01 Thread Kay Diederichs

Klemens Wild schrieb:

Dear friends of the Fourier transform,

I am refining a structure with 2 adjacent Hg atoms bound to cysteines of 
different monomers in the crystal contacts, which means I need to refine 
them as well. While the structure nicely refines (2.2 A data), I do not 
get rid of negative density ripple layers next to them (-10 sigmas). My 
question: is this likely due to anistropy of the soft mercury atoms 
(anisotropic  B refinement decreases the ripples) or is this likely a 
summation truncation effect prominent for heavy atoms? Can I just 
anistropically refine the mercuries while I keep the rest isotropic? 
Never saw this in a PDB. Suggestions are very welcome.


Greetings

Klemens Wild


Dear Klemens,

the height of a Fourier ripple should not exceed about 12% of the peak 
itself (just look at the maxima of sin(x)/x which is the Fourier 
transform of a truncation function). In reality it should even be lower 
due to the average temperature factor being 0.
Thus, only if your Hg peaks are on the order of 80 sigmas (which I 
doubt) it appears justified to consider the 10 sigma peaks as ripples.


It is more likely that aniso refinement should be able to get rid of the 
ripples.


best,
Kay
--
Kay Diederichshttp://strucbio.biologie.uni-konstanz.de
email: [EMAIL PROTECTED]Tel +49 7531 88 4049 Fax 3183
Fachbereich Biologie, Universität Konstanz, Box M647, D-78457 Konstanz


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [ccp4bb] difference density ripples around Hg atoms

2007-08-01 Thread Peter Adrian Meyer
You've most likely looked at this, but if not it might be worthwhile to
check how these ripples behave while varing the low-resolution limit used
(20-2.2,15-2.2, etc).

Pete

 Dear friends of the Fourier transform,

 I am refining a structure with 2 adjacent Hg atoms bound to cysteines of
 different monomers in the crystal contacts, which means I need to refine
 them as well. While the structure nicely refines (2.2 A data), I do not
 get rid of negative density ripple layers next to them (-10 sigmas). My
 question: is this likely due to anistropy of the soft mercury atoms
 (anisotropic  B refinement decreases the ripples) or is this likely a
 summation truncation effect prominent for heavy atoms? Can I just
 anistropically refine the mercuries while I keep the rest isotropic?
 Never saw this in a PDB. Suggestions are very welcome.

 Greetings

 Klemens Wild



Pete Meyer
Fu Lab
BMCB grad student
Cornell University


Re: [ccp4bb] difference density ripples around Hg atoms

2007-08-01 Thread Bart Hazes

Hi Klemens,

As friends of the Fourier transform we hate to see it truncated. 
Although others don't think this is your problem I personally think it 
very well may be. To get a truncation effect you must first have 
truncated your data.


- Is the I/SigI of your highest resolution data in the 1-2 region or 
more like 3 or higher?


- Second, truncation ripples are just that oscillating negative and 
positive shells of density around the central atom density. The first 
negative ripple will be strongest strongest, but if you contour lower 
you may be able to see a second positive one at a little greater 
distance (you do say ripple layers so you may already have spotted it).


The bad news is that as far as I know there is no remedy. The ripples 
are not due to your model so no refinement trick can help you out (when 
you would have perfect experimental phases you would still see the ripples).
You can apply a de-sharpening B-factor to the data to weaken the high 
resolution terms. That would dampen the ripples but also harm the rest 
of your data.


The good news is that the ripples don't really affect your model or the 
biological conclusions you derive from it. In the paper you will just 
have to confess that you didn't do your data collection properly and 
then get on with the show. Unfortunately, there are far too many papers 
with native data sets that do not collect data to the diffraction limit. 
I think we need a Save the Native Structure Factor action group to 
protect the endangered high resolution native reflections. This is 
ALWAYS bad (the exception is for experimental phasing data sets) but 
only when you have a heavy atom do you see the ripples (I have had it 
myself with an ion as light as copper).


W.r.t. Kay's reply I think the argument does not hold since it depends 
on how badly the data is truncated. E.g. truncated near the limit of 
diffraction will give few ripples whereas a data set truncated at I/SigI 
of 5 will have much more servious effects.


Bart

Kay Diederichs wrote:

Klemens Wild schrieb:


Dear friends of the Fourier transform,

I am refining a structure with 2 adjacent Hg atoms bound to cysteines 
of different monomers in the crystal contacts, which means I need to 
refine them as well. While the structure nicely refines (2.2 A data), 
I do not get rid of negative density ripple layers next to them (-10 
sigmas). My question: is this likely due to anistropy of the soft 
mercury atoms (anisotropic  B refinement decreases the ripples) or is 
this likely a summation truncation effect prominent for heavy atoms? 
Can I just anistropically refine the mercuries while I keep the rest 
isotropic? Never saw this in a PDB. Suggestions are very welcome.


Greetings

Klemens Wild



Dear Klemens,

the height of a Fourier ripple should not exceed about 12% of the peak 
itself (just look at the maxima of sin(x)/x which is the Fourier 
transform of a truncation function). In reality it should even be lower 
due to the average temperature factor being 0.
Thus, only if your Hg peaks are on the order of 80 sigmas (which I 
doubt) it appears justified to consider the 10 sigma peaks as ripples.


It is more likely that aniso refinement should be able to get rid of the 
ripples.


best,
Kay



--

==

Bart Hazes (Assistant Professor)
Dept. of Medical Microbiology  Immunology
University of Alberta
1-15 Medical Sciences Building
Edmonton, Alberta
Canada, T6G 2H7
phone:  1-780-492-0042
fax:1-780-492-7521

==


Re: [ccp4bb] how to bring back the missing density for half of the structure

2007-08-01 Thread Peter Adrian Meyer
 If your Phaser results show a high Z-score ( 8) AND high LLG AND your
 solution packs without clashes AND refines (even though starting R/Rfree
 is high) AND reproduces density for the model portion AND produces some
 Fo-Fc density for the missing portion, most probably your solution is
 correct.

AND the z-score for your solution stands out from the z-scores for
incorrect (/other) solutions.  I've gotten z-scores  8 for a known
incorrect solution while testing (searching for a domain not present in
the crystal, so this test was probably unrealisticly difficult).  The
highest/second highest z-scores for the incorrect domain were roughtly
equal (~8.7/~8.2); for the correct domain they were ~ 35/7).

So as long as you're checking phaser statistics, this is another one to
check.

Pete


Pete Meyer
Fu Lab
BMCB grad student
Cornell University


Re: [ccp4bb] difference density ripples around Hg atoms

2007-08-01 Thread George M. Sheldrick
Although I would certainly try refining just Hg anisotropically and think 
that truncation ripples are very likely, you should also take into account 
that mercury derivatives are particularly sensitive to radiation damage. 
Often the Hg atoms have departed (but may still be in the vicinity) before 
the rest of the structure shows signs of the radiation damage. Since 
different reflections are measured at different times, this is general 
gives a mess in the difference map and there is not much you can do 
about it, though it might be worth refining the Hg occupancies. Normally 
one only refines against the native data and so does not see the mess.

George

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry, 
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582


On Wed, 1 Aug 2007, Bart Hazes wrote:

 Hi Klemens,
 
 As friends of the Fourier transform we hate to see it truncated. Although
 others don't think this is your problem I personally think it very well may
 be. To get a truncation effect you must first have truncated your data.
 
 - Is the I/SigI of your highest resolution data in the 1-2 region or more like
 3 or higher?
 
 - Second, truncation ripples are just that oscillating negative and positive
 shells of density around the central atom density. The first negative ripple
 will be strongest strongest, but if you contour lower you may be able to see a
 second positive one at a little greater distance (you do say ripple layers
 so you may already have spotted it).
 
 The bad news is that as far as I know there is no remedy. The ripples are not
 due to your model so no refinement trick can help you out (when you would have
 perfect experimental phases you would still see the ripples).
 You can apply a de-sharpening B-factor to the data to weaken the high
 resolution terms. That would dampen the ripples but also harm the rest of your
 data.
 
 The good news is that the ripples don't really affect your model or the
 biological conclusions you derive from it. In the paper you will just have to
 confess that you didn't do your data collection properly and then get on with
 the show. Unfortunately, there are far too many papers with native data sets
 that do not collect data to the diffraction limit. I think we need a Save the
 Native Structure Factor action group to protect the endangered high
 resolution native reflections. This is ALWAYS bad (the exception is for
 experimental phasing data sets) but only when you have a heavy atom do you see
 the ripples (I have had it myself with an ion as light as copper).
 
 W.r.t. Kay's reply I think the argument does not hold since it depends on how
 badly the data is truncated. E.g. truncated near the limit of diffraction will
 give few ripples whereas a data set truncated at I/SigI of 5 will have much
 more servious effects.
 
 Bart
 
 Kay Diederichs wrote:
  Klemens Wild schrieb:
  
   Dear friends of the Fourier transform,
   
   I am refining a structure with 2 adjacent Hg atoms bound to cysteines
   of different monomers in the crystal contacts, which means I need to
   refine them as well. While the structure nicely refines (2.2 A data),
   I do not get rid of negative density ripple layers next to them (-10
   sigmas). My question: is this likely due to anistropy of the soft
   mercury atoms (anisotropic  B refinement decreases the ripples) or is
   this likely a summation truncation effect prominent for heavy atoms?
   Can I just anistropically refine the mercuries while I keep the rest
   isotropic? Never saw this in a PDB. Suggestions are very welcome.
   
   Greetings
   
   Klemens Wild
  
  
  Dear Klemens,
  
  the height of a Fourier ripple should not exceed about 12% of the peak
  itself (just look at the maxima of sin(x)/x which is the Fourier
  transform of a truncation function). In reality it should even be lower
  due to the average temperature factor being 0.
  Thus, only if your Hg peaks are on the order of 80 sigmas (which I doubt)
  it appears justified to consider the 10 sigma peaks as ripples.
  
  It is more likely that aniso refinement should be able to get rid of the
  ripples.
  
  best,
  Kay
 
 
 -- 
 
 ==
 
 Bart Hazes (Assistant Professor)
 Dept. of Medical Microbiology  Immunology
 University of Alberta
 1-15 Medical Sciences Building
 Edmonton, Alberta
 Canada, T6G 2H7
 phone:  1-780-492-0042
 fax:1-780-492-7521
 
 ==
 


Re: [ccp4bb] how to bring back the missing density for half of the structure

2007-08-01 Thread Eric Liu
Hi All,

Here are the summary from all the answers to my questions:

1. Try use arp/warp to build the missing part of structure.
2. Build as much as possible for the missing part and the current c-terminal
domain, using as low as 0.5 contour of the 2Fo-Fc density. Generate mask and
then do averaging and density modification using
DM/Resolev/pirate/buccaneer.
3. Align the c-terminal part of other closest kinases to the current model,
then try to find which N-terminal domain matches the difference density the
best by eyeballing.
4. Look into the possiblity of twinning

Thanks,

Eric

On 7/31/07, Eric Liu [EMAIL PROTECTED] wrote:

 Hi All,
 I would like to get some help from here for a data set I recently worked
 on. I have been working on a new kinase data set which does not have a close
 homolog. The data was collected to 2.1A  resolution in space group P212121
 however the difference between a and b is only 0.5A. If I index the data
 as P4, Rmerge is increased from 13% to 39%. I used the most close homologs
 which have about 37% sequence identity as search model for molecular
 replacement and it seemed I have got the solution by using Phaser with only
 the c-terminal part of the search model and also a long loop removed. After
 changed the different residues back to the target protein, the structure was
 refined to Rfree/R  46% and 43%  to 2.1 A resolution. The existing
 c-terminal structure has well defined density except 25ish residue at the
 very c-terminal end doesn't have well connected density. Current
 model contains about 50% of overall target residues. I can see some extented
 difference density for several residues going to the N-terminal part and
 also extented density for the C-terminal loop for several residues. I also
 see tones of not well-conncted difference density in the N-terminal region.
 There was no sever clashes between molecules after mount all
 symmetry related molecules. My question is the following:

 1. Have I got the correct  solution for the molecular replacement?
 2. How can I bring back the missing density for the N-terminal residues
 and the loop region?

 I would really appreciate any inputs or suggestions.

 Eric



[ccp4bb] PDB format survey?

2007-08-01 Thread Joe Krahn
So, I am thinking about putting up a survey somewhere to get a measure
of the user-communities interests, because RCSB and wwPDB seem
uninterested in doing so. Maybe a group result would be more useful in
influencing the standards. I am hoping that the wwPDB can become a
better place for format standards instead of RCSB which keeps busy
handling new data.

In addition to questions about the PDB standard, it is probably
important to consider mmCIF. One thing I don't like about it is that
columns can be randomized (i.e. X, Y, and Z can be in any column), but
the mmCIF standards people have no interest in defining a more strict
standard that would require files to be as human readable as RCSB's
mmCIF files.

Does this sound useful, or have most people given up on having any
influence on standards? Or, should the structural biology software
developers get together and just make our own OpenPDB format?

Joe Krahn


[ccp4bb] pseudo-translation vectors in molrep vs other programs

2007-08-01 Thread Savvas Savvides
Dear colleagues,
I would like to thank J. Murray, J. Wright, K. Futterer, E. Dodson, A. Forster,
and F. Long for responding to my posting of two days ago on pseudo-translation
vectors in molrep vs other programs (see original posting at the end of this
message).
I should have said at the outset that we are dealing with a limiting data set
(see stats below), but since this is the only data we were ever able to collect
on this membrane protein, we have no option but to milk it as much as we can.

P21 with 104.82  151.28  109.49   90.00  118.13   90.00
Resolution: 30-4.2 angs (4.3-4.2)
Rmeas=0.15 (0.380)
I/sigma: 7.2 (1.9)
Completeness=93% (75%)
Redundancy= 2.3 (2.1)
Mosaicity= 1.1 deg
High data anisotropy, primarily along the K reciprocal axis.

The comments from Eleanor Dodson and Klaus Futterer prompted me to take another
look at the data frame per frame. I concluded that in several frames there were
a few reflections in the 40-30 angs range that obviously did not fit my
spot-integration strategy very well.  After failing repeatedly to get them to
integrate acceptably without compromising the rest of the data too much, I
decided to exclude all reflections between 40 and 30 angs res.

This has resulted in three important improvements:

(1) Better data integration and scaling statistics across the board.
(1) The spurious peaks clustering around the origin in the native patterson are
fewer, and those that do remain have a peak-height around 10-12% of the origin.
(2) The new data set has yielded unambiguous peaks in the self-rotation function
consistent with a 2-fold NCS axis.

I have now used this SRF peak in MolRep and came up with a reasonable MR
solution. I will soon try to implement this SRF info in PHASER as well via the
Rotate around option.

Best regards
Savvas


Savvas N. Savvides
Unit for Structural Biology and Biophysics
Laboratory for Protein Biochemistry - Ghent University
K.L. Ledeganckstraat 35
9000 Ghent, BELGIUM
Phone: +32-(0)9-264.51.24 ; +32-(0)472-92.85.19
Email: [EMAIL PROTECTED]
http://www.eiwitbiochemie.ugent.be/units_en/structbio_en.html



  Dear colleagues,
 
  For a particular MR problem I am dealing with, 'analyse_mr' suggests
  that there maybe a pseudo-translation vector as evidenced by the very
  significant non-origin peaks in the native patterson: e.g
 
  GRID  80 112  80
  CELL  104.8290  151.2840  109.4910   90.  118.1310   90.
  ATOM1   Ano   0.  0.  0.  181.08  0.0 BFAC  20.0
  ATOM2   Ano   0.9483  0.  0.0106   46.89  0.0 BFAC  20.0
  ATOM3   Ano   0.0517  0.  0.9875   46.89  0.0 BFAC  20.0
  ATOM4   Ano   0.9494  0.9911  0.0090   40.66  0.0 BFAC  20.0
  ATOM5   Ano   0.0506  0.9911  0.9875   40.66  0.0 BFAC  20.0
  ATOM6   Ano   0.0572  0.9911  0.   37.26  0.0 BFAC  20.0
 
  BALBES also reports a pseudo-translation vector at 0.951 0.000 0.007,
  i.e. very similar to the output from 'analyse_mr'.
 
  Yet, Molrep fails to recognize this possibility (in auto' mode for
  the PST) claiming that the 0.125 limit for the peak height compared to
  the origin has not been reached. When I look at the output from
  'analyse_mr' it is quite clear the peak is at 0.25 of the origin peak.
 
  Why is there such a discrepancy in the interpretation of the native
  patterson map?
 
  Best regards
  Savvas


Re: [ccp4bb] pdb-l: Stop the new PDB format!

2007-08-01 Thread Frances C. Bernstein
 I was present at the creation of what is called the PDB
format in the mid-1970's and HOH was always HETATM.  The only
thing special about HOH was that we felt that it was not
necessary to include a HET record in (virtually) every entry
to define HOH.

 We felt that it would be useful to be able to compute
the total number of each type of atom in an entry and this
can be done by summing the residues listed on SEQRES, subtracting
the appropriate number of waters, and then adding in the formulae
for the HETATMs.

 [For those of you interested in ancient history there
actually was a format before the PDB format that was used
for the first 100 or so entries.  It was based on the output
format of Bob Diamond's real space refinement program.]

   Frances Bernstein

=
Bernstein + Sons
*   *   Information Systems Consultants
5 Brewster Lane, Bellport, NY 11713-2803
*   * ***
 *Frances C. Bernstein
  *   ***  [EMAIL PROTECTED]
 *** *
  *   *** 1-631-286-1339FAX: 1-631-286-1999
=

On Wed, 1 Aug 2007, Eric Pettersen wrote:

 Well, you're right that originally water was in the list of standard
 residues that supposedly would use ATOM records.  But water has been
 in HETATM records for many years now, and for that same amount of
 time ATOM records have been used exclusively for standard polymer
 residues and HETATM for everything else (including MSE).  So my point
 is that this particular complaint isn't a v2.3 vs. v3 issue per se.
 It wasn't really directed at the main thrust of your post, the
 opportunity for feedback.

 --Eric

 On Aug 1, 2007, at 1:50 PM, Joe Krahn wrote:

  Eric Pettersen wrote:
  On Jul 21, 2007, at 11:12 AM, Joe Krahn wrote:
 
  Another problem is that the original meaning of HET groups
  continues to
  be corrupted. ATOM records are for commonly occurring residues
  from a
  list of standard residues.
 
  No, they're for commonly occurring _polymer_ residues.  Two
  consecutive
  residues contained in ATOM records are implied to connected to each
  other barring an intervening TER card.  I imagine this is the
  principal
  reason that water residues use HETATM records.
 
  --Eric
 
 
  The idea that ATOM is only for _polymer_ residues was not part of the
  original format, and is specifically one of the changes that I am
  asserting as wrong. The original PDB format stated that ATOM is for
  standard residues which are defined by a list of residue names given
  in the PDB format documentation, and the list of standard residues
  included water. Non-standard residues must define themselves with
  extra
  HET records. With RCSB's database, HETs must be completely defined as
  well, which makes it easy for them to forget that the whole idea of
  HETATM is to allow unknown residue types to be displayed.
 
  RCSB has added the concept of HET's being non-polymers, but also keeps
  this concept mixed up by not including Se-Met (MSE) which is certainly
  enough not to be a HET group. So, the idea that ATOM implies some
  polymerization linkage is dysfunctional. What the PDB format should
  include is an INIT record that is the counterpart of the TER record.
 
  The bigger point of my post, however, was that the interests of the
  non-database user community are, in my opinion, being ignored,
  particularly with the PDB format. Structural biology is so diverse
  that
  it really needs input from the whole community to do the right thing.
 
  The problem is that when the PDB 3.0 format was announced 3 months
  ago,
  it was done with the intent of intentionally not allowing time to
  consider problems and alternatives posed by the user community.
 
  Joe Krahn
 
  TO UNSUBSCRIBE OR CHANGE YOUR SUBSCRIPTION OPTIONS, please see
  https://lists.sdsc.edu/mailman/listinfo.cgi/pdb-l .


 TO UNSUBSCRIBE OR CHANGE YOUR SUBSCRIPTION OPTIONS, please see
 https://lists.sdsc.edu/mailman/listinfo.cgi/pdb-l .



Re: [ccp4bb] PDB format survey?

2007-08-01 Thread Ethan Merritt
On Wednesday 01 August 2007 14:10, Joe Krahn wrote:
 In addition to questions about the PDB standard, it is probably
 important to consider mmCIF. One thing I don't like about it is that
 columns can be randomized (i.e. X, Y, and Z can be in any column), but
 the mmCIF standards people have no interest in defining a more strict
 standard that would require files to be as human readable as RCSB's
 mmCIF files.

The important thing about mmCIF is not the precise file format,
which is ultimately not of interest except as a parsible exchange
medium, but rather the existence of the mmCIF dictionaries.

A more productive discussion may be to revisit the definition
of what information we as a community expect to be captured in the
PDB database.  The question of export formats is secondary.
 
 Does this sound useful, or have most people given up on having any
 influence on standards? Or, should the structural biology software
 developers get together and just make our own OpenPDB format?

As discussed at the PDB group discussion at the ACA meeting, some new
depositions are not representable in the PDB format (including v3).

Examples include:
- very large structures, for which the current 80 column PDB format
  runs out of space for atom numbers (4 columns - max )
  or for chain ids (1 column - single char A-Z 0-9)
  [don't ask my why they don't want lower case]
- new classes of experiment (SAXS, EM)
- new classes of model (TLS or normal-mode displacements,
  ensemble models, envelope representations)

I am inclined to say that there should be a fork into two distinct
formats, used for different purposes.

The 80 column PDB format should be frozen, preferably at the
pre-version3 state. Freezing it would allow legacy programs to continue
to read old PDB files without modification. These programs will not be
able to handle certain classes of new structures, but this would be true
in any case for legacy code.  Churn in the 80 column PDB format would
aggravate rather than relieve this limitation. This branch would serve
the general community who are primarily viewers of previously deposited
structures, and any programs not currently being maintained.

Currently-maintained programs should move to mmCIF or XML, whichever
is convenient.  These formats are intrinsically open-ended, and can
handle the problematic structures mentioned above so long as the
corresponding mmCIF dictionaries are updated to define the relevant
entities.

The wwwPDB database is already capable of exporting to any PDB, XML,
or mmCIF format. So this would really be a change on the user
side more than on the database side. 

The barrier to converting programs to mmCIF is lower than you
might think.  Several mmCIF parsing libraries are available to
allow currently maintained programs to offer mmCIF input/output
if they do not already do so.  One such is the mmlib library
developed by Jay Painter and hosted on SourceForge:

http://pymmlib.sourceforge.net/

J Painter and EA Merritt
J. Appl. Cryst. 37, 174-178, (2004).
mmLib Python toolkit for manipulating annotated structural
 models of biological macromolecules.  

-- 
Ethan A Merritt


Re: [ccp4bb] PDB format survey?

2007-08-01 Thread Thomas Stout

I suspect this will be throwing fuel on the fire, but what is so great about 
the PDB format (any version) besides familiarity?  It seems to me to be 
outdated, inadequate and generally mis-used by all.  I say scrap it, make a 
clean break and devote everyone's energies to making a format that will work 
for everyone. (granted: it is inexcusable for the RCSB to be developing new 
formats without the input from affected parties).   mmCIF seems like a good 
idea that has not gotten the attention it needs (and deserves) to be formulated 
to meet everyone's needs.  As for the legacy program argument: that's what 
translation programs like OpenBabel are for (or even a very simple 
python/perl/your-favorite-hammer script).  Perhaps even the RCSB could be 
convinced to offer several formats for download..oh, wait - they already 
do.

Ducking behind my asbestos-free, all-natural organic firewall,
-Tom


-Original Message-
From: CCP4 bulletin board on behalf of Ethan Merritt
Sent: Wed 8/1/2007 3:06 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] PDB format survey?
 
On Wednesday 01 August 2007 14:10, Joe Krahn wrote:
 In addition to questions about the PDB standard, it is probably
 important to consider mmCIF. One thing I don't like about it is that
 columns can be randomized (i.e. X, Y, and Z can be in any column), but
 the mmCIF standards people have no interest in defining a more strict
 standard that would require files to be as human readable as RCSB's
 mmCIF files.

The important thing about mmCIF is not the precise file format,
which is ultimately not of interest except as a parsible exchange
medium, but rather the existence of the mmCIF dictionaries.

A more productive discussion may be to revisit the definition
of what information we as a community expect to be captured in the
PDB database.  The question of export formats is secondary.
 
 Does this sound useful, or have most people given up on having any
 influence on standards? Or, should the structural biology software
 developers get together and just make our own OpenPDB format?

As discussed at the PDB group discussion at the ACA meeting, some new
depositions are not representable in the PDB format (including v3).

Examples include:
- very large structures, for which the current 80 column PDB format
  runs out of space for atom numbers (4 columns - max )
  or for chain ids (1 column - single char A-Z 0-9)
  [don't ask my why they don't want lower case]
- new classes of experiment (SAXS, EM)
- new classes of model (TLS or normal-mode displacements,
  ensemble models, envelope representations)

I am inclined to say that there should be a fork into two distinct
formats, used for different purposes.

The 80 column PDB format should be frozen, preferably at the
pre-version3 state. Freezing it would allow legacy programs to continue
to read old PDB files without modification. These programs will not be
able to handle certain classes of new structures, but this would be true
in any case for legacy code.  Churn in the 80 column PDB format would
aggravate rather than relieve this limitation. This branch would serve
the general community who are primarily viewers of previously deposited
structures, and any programs not currently being maintained.

Currently-maintained programs should move to mmCIF or XML, whichever
is convenient.  These formats are intrinsically open-ended, and can
handle the problematic structures mentioned above so long as the
corresponding mmCIF dictionaries are updated to define the relevant
entities.

The wwwPDB database is already capable of exporting to any PDB, XML,
or mmCIF format. So this would really be a change on the user
side more than on the database side. 

The barrier to converting programs to mmCIF is lower than you
might think.  Several mmCIF parsing libraries are available to
allow currently maintained programs to offer mmCIF input/output
if they do not already do so.  One such is the mmlib library
developed by Jay Painter and hosted on SourceForge:

http://pymmlib.sourceforge.net/

J Painter and EA Merritt
J. Appl. Cryst. 37, 174-178, (2004).
mmLib Python toolkit for manipulating annotated structural
 models of biological macromolecules.  

-- 
Ethan A Merritt



This email (including any attachments) may contain material
that is confidential and privileged and is for the sole use of
the intended recipient. Any review, reliance or distribution by
others or forwarding without express permission is strictly
prohibited. If you are not the intended recipient, please
contact the sender and delete all copies.


Exelixis, Inc. reserves the right, to the extent and under
circumstances permitted by applicable law, to retain, monitor
and intercept e-mail messages to and from its systems.


Re: [ccp4bb] PDB format survey?

2007-08-01 Thread Joe Krahn
Ethan Merritt wrote:
 Examples include:
 - very large structures, for which the current 80 column PDB format
  runs out of space for atom numbers (4 columns - max )
   or for chain ids (1 column - single char A-Z 0-9)
   [don't ask my why they don't want lower case]
 - new classes of experiment (SAXS, EM)
 - new classes of model (TLS or normal-mode displacements,
   ensemble models, envelope representations)
It would be trivial to update the PDB format to handle large structures.
In fact, such extensions are already being planned. Atom numbers can
simply be handled by truncating them; the serial design of PDB files
makes it redundant.

As for other experiments, like SAX or EM, I only think that the PDB
format should continue to be used for atomic coordinates. Using them as
a complete data reference has never been good.

...
 Currently-maintained programs should move to mmCIF or XML, whichever
 is convenient.  These formats are intrinsically open-ended, and can
 handle the problematic structures mentioned above so long as the
 corresponding mmCIF dictionaries are updated to define the relevant
 entities.
Being intrinsically open-ended is an advantage for parsing, but it still
takes a lot of work to actually make use of new data. The software still
has to be updated to handle the data. Formats like mmCIF and XML only
handle part of the 'file format' issue. One problem is that mmCIF can be
too open-ended, depending on how the schema is managed.

I would be much more willing to work toward switching to mmCIF if RCSB
showed more interest in collaborating with the user community. If we
can't even get involvement in something as simple as the PDB format, why
should we think working with mmCIF will be any better?

Joe Krahn