Re: [ccp4bb] how to bring back the missing density for half of the structure
Dear Eric, For your question 2, the following paper provides some examples: Acta Cryst. D63, 793-799 (2007). In one of the examples, there is shown a partial model from Phaser containing ~20% residues in the ASU can be extended to more than 90% of the ASU by the iteration of ARP/wARP-OASIS-DM. Detailed examples and scripts on using OASIS can be found on the download page of http://cryst.iphy.ac.cn Regards, Hai-fu On 8/1/07, Eric Liu [EMAIL PROTECTED] wrote: Hi All, I would like to get some help from here for a data set I recently worked on. I have been working on a new kinase data set which does not have a close homolog. The data was collected to 2.1A resolution in space group P212121 however the difference between a and b is only 0.5A. If I index the data as P4, Rmerge is increased from 13% to 39%. I used the most close homologs which have about 37% sequence identity as search model for molecular replacement and it seemed I have got the solution by using Phaser with only the c-terminal part of the search model and also a long loop removed. After changed the different residues back to the target protein, the structure was refined to Rfree/R 46% and 43% to 2.1 A resolution. The existing c-terminal structure has well defined density except 25ish residue at the very c-terminal end doesn't have well connected density. Current model contains about 50% of overall target residues. I can see some extented difference density for several residues going to the N-terminal part and also extented density for the C-terminal loop for several residues. I also see tones of not well-conncted difference density in the N-terminal region. There was no sever clashes between molecules after mount all symmetry related molecules. My question is the following: 1. Have I got the correct solution for the molecular replacement? 2. How can I bring back the missing density for the N-terminal residues and the loop region? I would really appreciate any inputs or suggestions. Eric
Re: [ccp4bb] difference density ripples around Hg atoms
Well - there will be a ripple, but is it there in the difference map as well? that is meantto be less affected. REFMAC5 claims to be able to refine some atoms anisotropically and that would be a good place to start Maybe you will need to read the documentation! There is some way of requesting the option.. The PDB doesinclude structures with some anisotropic/ some isotropic B values., usually waters Eleanor Klemens Wild wrote: Dear friends of the Fourier transform, I am refining a structure with 2 adjacent Hg atoms bound to cysteines of different monomers in the crystal contacts, which means I need to refine them as well. While the structure nicely refines (2.2 A data), I do not get rid of negative density ripple layers next to them (-10 sigmas). My question: is this likely due to anistropy of the soft mercury atoms (anisotropic B refinement decreases the ripples) or is this likely a summation truncation effect prominent for heavy atoms? Can I just anistropically refine the mercuries while I keep the rest isotropic? Never saw this in a PDB. Suggestions are very welcome. Greetings Klemens Wild
Re: [ccp4bb] difference density ripples around Hg atoms
On Wed, 2007-08-01 at 09:35 +0200, Klemens Wild wrote: Dear friends of the Fourier transform, I am refining a structure with 2 adjacent Hg atoms bound to cysteines of different monomers in the crystal contacts, which means I need to refine them as well. While the structure nicely refines (2.2 A data), I do not get rid of negative density ripple layers next to them (-10 sigmas). My question: is this likely due to anistropy of the soft mercury atoms (anisotropic B refinement decreases the ripples) or is this likely a summation truncation effect prominent for heavy atoms? Can I just anistropically refine the mercuries while I keep the rest isotropic? Yes, that sounds worth a try. At 2.2 A you probably don't have the data/parameter ratio to justify anisotropic refinement for the whole molecule, but since you know the mercury atoms are not being treated adequately, adding an extra ~ 10 parameters to refine them as anisotropic is worth a try. Don't expect it to completely eliminate the ripples, but hopefully you can get some improvement on R/Rfree. Cheers, -- === With the single exception of Cornell, there is not a college in the United States where truth has ever been a welcome guest - R.G. Ingersoll === David J. Schuller modern man in a post-modern world MacCHESS, Cornell University [EMAIL PROTECTED]
Re: [ccp4bb] difference density ripples around Hg atoms
Klemens Wild schrieb: Dear friends of the Fourier transform, I am refining a structure with 2 adjacent Hg atoms bound to cysteines of different monomers in the crystal contacts, which means I need to refine them as well. While the structure nicely refines (2.2 A data), I do not get rid of negative density ripple layers next to them (-10 sigmas). My question: is this likely due to anistropy of the soft mercury atoms (anisotropic B refinement decreases the ripples) or is this likely a summation truncation effect prominent for heavy atoms? Can I just anistropically refine the mercuries while I keep the rest isotropic? Never saw this in a PDB. Suggestions are very welcome. Greetings Klemens Wild Dear Klemens, the height of a Fourier ripple should not exceed about 12% of the peak itself (just look at the maxima of sin(x)/x which is the Fourier transform of a truncation function). In reality it should even be lower due to the average temperature factor being 0. Thus, only if your Hg peaks are on the order of 80 sigmas (which I doubt) it appears justified to consider the 10 sigma peaks as ripples. It is more likely that aniso refinement should be able to get rid of the ripples. best, Kay -- Kay Diederichshttp://strucbio.biologie.uni-konstanz.de email: [EMAIL PROTECTED]Tel +49 7531 88 4049 Fax 3183 Fachbereich Biologie, Universität Konstanz, Box M647, D-78457 Konstanz smime.p7s Description: S/MIME Cryptographic Signature
Re: [ccp4bb] difference density ripples around Hg atoms
You've most likely looked at this, but if not it might be worthwhile to check how these ripples behave while varing the low-resolution limit used (20-2.2,15-2.2, etc). Pete Dear friends of the Fourier transform, I am refining a structure with 2 adjacent Hg atoms bound to cysteines of different monomers in the crystal contacts, which means I need to refine them as well. While the structure nicely refines (2.2 A data), I do not get rid of negative density ripple layers next to them (-10 sigmas). My question: is this likely due to anistropy of the soft mercury atoms (anisotropic B refinement decreases the ripples) or is this likely a summation truncation effect prominent for heavy atoms? Can I just anistropically refine the mercuries while I keep the rest isotropic? Never saw this in a PDB. Suggestions are very welcome. Greetings Klemens Wild Pete Meyer Fu Lab BMCB grad student Cornell University
Re: [ccp4bb] difference density ripples around Hg atoms
Hi Klemens, As friends of the Fourier transform we hate to see it truncated. Although others don't think this is your problem I personally think it very well may be. To get a truncation effect you must first have truncated your data. - Is the I/SigI of your highest resolution data in the 1-2 region or more like 3 or higher? - Second, truncation ripples are just that oscillating negative and positive shells of density around the central atom density. The first negative ripple will be strongest strongest, but if you contour lower you may be able to see a second positive one at a little greater distance (you do say ripple layers so you may already have spotted it). The bad news is that as far as I know there is no remedy. The ripples are not due to your model so no refinement trick can help you out (when you would have perfect experimental phases you would still see the ripples). You can apply a de-sharpening B-factor to the data to weaken the high resolution terms. That would dampen the ripples but also harm the rest of your data. The good news is that the ripples don't really affect your model or the biological conclusions you derive from it. In the paper you will just have to confess that you didn't do your data collection properly and then get on with the show. Unfortunately, there are far too many papers with native data sets that do not collect data to the diffraction limit. I think we need a Save the Native Structure Factor action group to protect the endangered high resolution native reflections. This is ALWAYS bad (the exception is for experimental phasing data sets) but only when you have a heavy atom do you see the ripples (I have had it myself with an ion as light as copper). W.r.t. Kay's reply I think the argument does not hold since it depends on how badly the data is truncated. E.g. truncated near the limit of diffraction will give few ripples whereas a data set truncated at I/SigI of 5 will have much more servious effects. Bart Kay Diederichs wrote: Klemens Wild schrieb: Dear friends of the Fourier transform, I am refining a structure with 2 adjacent Hg atoms bound to cysteines of different monomers in the crystal contacts, which means I need to refine them as well. While the structure nicely refines (2.2 A data), I do not get rid of negative density ripple layers next to them (-10 sigmas). My question: is this likely due to anistropy of the soft mercury atoms (anisotropic B refinement decreases the ripples) or is this likely a summation truncation effect prominent for heavy atoms? Can I just anistropically refine the mercuries while I keep the rest isotropic? Never saw this in a PDB. Suggestions are very welcome. Greetings Klemens Wild Dear Klemens, the height of a Fourier ripple should not exceed about 12% of the peak itself (just look at the maxima of sin(x)/x which is the Fourier transform of a truncation function). In reality it should even be lower due to the average temperature factor being 0. Thus, only if your Hg peaks are on the order of 80 sigmas (which I doubt) it appears justified to consider the 10 sigma peaks as ripples. It is more likely that aniso refinement should be able to get rid of the ripples. best, Kay -- == Bart Hazes (Assistant Professor) Dept. of Medical Microbiology Immunology University of Alberta 1-15 Medical Sciences Building Edmonton, Alberta Canada, T6G 2H7 phone: 1-780-492-0042 fax:1-780-492-7521 ==
Re: [ccp4bb] how to bring back the missing density for half of the structure
If your Phaser results show a high Z-score ( 8) AND high LLG AND your solution packs without clashes AND refines (even though starting R/Rfree is high) AND reproduces density for the model portion AND produces some Fo-Fc density for the missing portion, most probably your solution is correct. AND the z-score for your solution stands out from the z-scores for incorrect (/other) solutions. I've gotten z-scores 8 for a known incorrect solution while testing (searching for a domain not present in the crystal, so this test was probably unrealisticly difficult). The highest/second highest z-scores for the incorrect domain were roughtly equal (~8.7/~8.2); for the correct domain they were ~ 35/7). So as long as you're checking phaser statistics, this is another one to check. Pete Pete Meyer Fu Lab BMCB grad student Cornell University
Re: [ccp4bb] difference density ripples around Hg atoms
Although I would certainly try refining just Hg anisotropically and think that truncation ripples are very likely, you should also take into account that mercury derivatives are particularly sensitive to radiation damage. Often the Hg atoms have departed (but may still be in the vicinity) before the rest of the structure shows signs of the radiation damage. Since different reflections are measured at different times, this is general gives a mess in the difference map and there is not much you can do about it, though it might be worth refining the Hg occupancies. Normally one only refines against the native data and so does not see the mess. George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-2582 On Wed, 1 Aug 2007, Bart Hazes wrote: Hi Klemens, As friends of the Fourier transform we hate to see it truncated. Although others don't think this is your problem I personally think it very well may be. To get a truncation effect you must first have truncated your data. - Is the I/SigI of your highest resolution data in the 1-2 region or more like 3 or higher? - Second, truncation ripples are just that oscillating negative and positive shells of density around the central atom density. The first negative ripple will be strongest strongest, but if you contour lower you may be able to see a second positive one at a little greater distance (you do say ripple layers so you may already have spotted it). The bad news is that as far as I know there is no remedy. The ripples are not due to your model so no refinement trick can help you out (when you would have perfect experimental phases you would still see the ripples). You can apply a de-sharpening B-factor to the data to weaken the high resolution terms. That would dampen the ripples but also harm the rest of your data. The good news is that the ripples don't really affect your model or the biological conclusions you derive from it. In the paper you will just have to confess that you didn't do your data collection properly and then get on with the show. Unfortunately, there are far too many papers with native data sets that do not collect data to the diffraction limit. I think we need a Save the Native Structure Factor action group to protect the endangered high resolution native reflections. This is ALWAYS bad (the exception is for experimental phasing data sets) but only when you have a heavy atom do you see the ripples (I have had it myself with an ion as light as copper). W.r.t. Kay's reply I think the argument does not hold since it depends on how badly the data is truncated. E.g. truncated near the limit of diffraction will give few ripples whereas a data set truncated at I/SigI of 5 will have much more servious effects. Bart Kay Diederichs wrote: Klemens Wild schrieb: Dear friends of the Fourier transform, I am refining a structure with 2 adjacent Hg atoms bound to cysteines of different monomers in the crystal contacts, which means I need to refine them as well. While the structure nicely refines (2.2 A data), I do not get rid of negative density ripple layers next to them (-10 sigmas). My question: is this likely due to anistropy of the soft mercury atoms (anisotropic B refinement decreases the ripples) or is this likely a summation truncation effect prominent for heavy atoms? Can I just anistropically refine the mercuries while I keep the rest isotropic? Never saw this in a PDB. Suggestions are very welcome. Greetings Klemens Wild Dear Klemens, the height of a Fourier ripple should not exceed about 12% of the peak itself (just look at the maxima of sin(x)/x which is the Fourier transform of a truncation function). In reality it should even be lower due to the average temperature factor being 0. Thus, only if your Hg peaks are on the order of 80 sigmas (which I doubt) it appears justified to consider the 10 sigma peaks as ripples. It is more likely that aniso refinement should be able to get rid of the ripples. best, Kay -- == Bart Hazes (Assistant Professor) Dept. of Medical Microbiology Immunology University of Alberta 1-15 Medical Sciences Building Edmonton, Alberta Canada, T6G 2H7 phone: 1-780-492-0042 fax:1-780-492-7521 ==
Re: [ccp4bb] how to bring back the missing density for half of the structure
Hi All, Here are the summary from all the answers to my questions: 1. Try use arp/warp to build the missing part of structure. 2. Build as much as possible for the missing part and the current c-terminal domain, using as low as 0.5 contour of the 2Fo-Fc density. Generate mask and then do averaging and density modification using DM/Resolev/pirate/buccaneer. 3. Align the c-terminal part of other closest kinases to the current model, then try to find which N-terminal domain matches the difference density the best by eyeballing. 4. Look into the possiblity of twinning Thanks, Eric On 7/31/07, Eric Liu [EMAIL PROTECTED] wrote: Hi All, I would like to get some help from here for a data set I recently worked on. I have been working on a new kinase data set which does not have a close homolog. The data was collected to 2.1A resolution in space group P212121 however the difference between a and b is only 0.5A. If I index the data as P4, Rmerge is increased from 13% to 39%. I used the most close homologs which have about 37% sequence identity as search model for molecular replacement and it seemed I have got the solution by using Phaser with only the c-terminal part of the search model and also a long loop removed. After changed the different residues back to the target protein, the structure was refined to Rfree/R 46% and 43% to 2.1 A resolution. The existing c-terminal structure has well defined density except 25ish residue at the very c-terminal end doesn't have well connected density. Current model contains about 50% of overall target residues. I can see some extented difference density for several residues going to the N-terminal part and also extented density for the C-terminal loop for several residues. I also see tones of not well-conncted difference density in the N-terminal region. There was no sever clashes between molecules after mount all symmetry related molecules. My question is the following: 1. Have I got the correct solution for the molecular replacement? 2. How can I bring back the missing density for the N-terminal residues and the loop region? I would really appreciate any inputs or suggestions. Eric
[ccp4bb] PDB format survey?
So, I am thinking about putting up a survey somewhere to get a measure of the user-communities interests, because RCSB and wwPDB seem uninterested in doing so. Maybe a group result would be more useful in influencing the standards. I am hoping that the wwPDB can become a better place for format standards instead of RCSB which keeps busy handling new data. In addition to questions about the PDB standard, it is probably important to consider mmCIF. One thing I don't like about it is that columns can be randomized (i.e. X, Y, and Z can be in any column), but the mmCIF standards people have no interest in defining a more strict standard that would require files to be as human readable as RCSB's mmCIF files. Does this sound useful, or have most people given up on having any influence on standards? Or, should the structural biology software developers get together and just make our own OpenPDB format? Joe Krahn
[ccp4bb] pseudo-translation vectors in molrep vs other programs
Dear colleagues, I would like to thank J. Murray, J. Wright, K. Futterer, E. Dodson, A. Forster, and F. Long for responding to my posting of two days ago on pseudo-translation vectors in molrep vs other programs (see original posting at the end of this message). I should have said at the outset that we are dealing with a limiting data set (see stats below), but since this is the only data we were ever able to collect on this membrane protein, we have no option but to milk it as much as we can. P21 with 104.82 151.28 109.49 90.00 118.13 90.00 Resolution: 30-4.2 angs (4.3-4.2) Rmeas=0.15 (0.380) I/sigma: 7.2 (1.9) Completeness=93% (75%) Redundancy= 2.3 (2.1) Mosaicity= 1.1 deg High data anisotropy, primarily along the K reciprocal axis. The comments from Eleanor Dodson and Klaus Futterer prompted me to take another look at the data frame per frame. I concluded that in several frames there were a few reflections in the 40-30 angs range that obviously did not fit my spot-integration strategy very well. After failing repeatedly to get them to integrate acceptably without compromising the rest of the data too much, I decided to exclude all reflections between 40 and 30 angs res. This has resulted in three important improvements: (1) Better data integration and scaling statistics across the board. (1) The spurious peaks clustering around the origin in the native patterson are fewer, and those that do remain have a peak-height around 10-12% of the origin. (2) The new data set has yielded unambiguous peaks in the self-rotation function consistent with a 2-fold NCS axis. I have now used this SRF peak in MolRep and came up with a reasonable MR solution. I will soon try to implement this SRF info in PHASER as well via the Rotate around option. Best regards Savvas Savvas N. Savvides Unit for Structural Biology and Biophysics Laboratory for Protein Biochemistry - Ghent University K.L. Ledeganckstraat 35 9000 Ghent, BELGIUM Phone: +32-(0)9-264.51.24 ; +32-(0)472-92.85.19 Email: [EMAIL PROTECTED] http://www.eiwitbiochemie.ugent.be/units_en/structbio_en.html Dear colleagues, For a particular MR problem I am dealing with, 'analyse_mr' suggests that there maybe a pseudo-translation vector as evidenced by the very significant non-origin peaks in the native patterson: e.g GRID 80 112 80 CELL 104.8290 151.2840 109.4910 90. 118.1310 90. ATOM1 Ano 0. 0. 0. 181.08 0.0 BFAC 20.0 ATOM2 Ano 0.9483 0. 0.0106 46.89 0.0 BFAC 20.0 ATOM3 Ano 0.0517 0. 0.9875 46.89 0.0 BFAC 20.0 ATOM4 Ano 0.9494 0.9911 0.0090 40.66 0.0 BFAC 20.0 ATOM5 Ano 0.0506 0.9911 0.9875 40.66 0.0 BFAC 20.0 ATOM6 Ano 0.0572 0.9911 0. 37.26 0.0 BFAC 20.0 BALBES also reports a pseudo-translation vector at 0.951 0.000 0.007, i.e. very similar to the output from 'analyse_mr'. Yet, Molrep fails to recognize this possibility (in auto' mode for the PST) claiming that the 0.125 limit for the peak height compared to the origin has not been reached. When I look at the output from 'analyse_mr' it is quite clear the peak is at 0.25 of the origin peak. Why is there such a discrepancy in the interpretation of the native patterson map? Best regards Savvas
Re: [ccp4bb] pdb-l: Stop the new PDB format!
I was present at the creation of what is called the PDB format in the mid-1970's and HOH was always HETATM. The only thing special about HOH was that we felt that it was not necessary to include a HET record in (virtually) every entry to define HOH. We felt that it would be useful to be able to compute the total number of each type of atom in an entry and this can be done by summing the residues listed on SEQRES, subtracting the appropriate number of waters, and then adding in the formulae for the HETATMs. [For those of you interested in ancient history there actually was a format before the PDB format that was used for the first 100 or so entries. It was based on the output format of Bob Diamond's real space refinement program.] Frances Bernstein = Bernstein + Sons * * Information Systems Consultants 5 Brewster Lane, Bellport, NY 11713-2803 * * *** *Frances C. Bernstein * *** [EMAIL PROTECTED] *** * * *** 1-631-286-1339FAX: 1-631-286-1999 = On Wed, 1 Aug 2007, Eric Pettersen wrote: Well, you're right that originally water was in the list of standard residues that supposedly would use ATOM records. But water has been in HETATM records for many years now, and for that same amount of time ATOM records have been used exclusively for standard polymer residues and HETATM for everything else (including MSE). So my point is that this particular complaint isn't a v2.3 vs. v3 issue per se. It wasn't really directed at the main thrust of your post, the opportunity for feedback. --Eric On Aug 1, 2007, at 1:50 PM, Joe Krahn wrote: Eric Pettersen wrote: On Jul 21, 2007, at 11:12 AM, Joe Krahn wrote: Another problem is that the original meaning of HET groups continues to be corrupted. ATOM records are for commonly occurring residues from a list of standard residues. No, they're for commonly occurring _polymer_ residues. Two consecutive residues contained in ATOM records are implied to connected to each other barring an intervening TER card. I imagine this is the principal reason that water residues use HETATM records. --Eric The idea that ATOM is only for _polymer_ residues was not part of the original format, and is specifically one of the changes that I am asserting as wrong. The original PDB format stated that ATOM is for standard residues which are defined by a list of residue names given in the PDB format documentation, and the list of standard residues included water. Non-standard residues must define themselves with extra HET records. With RCSB's database, HETs must be completely defined as well, which makes it easy for them to forget that the whole idea of HETATM is to allow unknown residue types to be displayed. RCSB has added the concept of HET's being non-polymers, but also keeps this concept mixed up by not including Se-Met (MSE) which is certainly enough not to be a HET group. So, the idea that ATOM implies some polymerization linkage is dysfunctional. What the PDB format should include is an INIT record that is the counterpart of the TER record. The bigger point of my post, however, was that the interests of the non-database user community are, in my opinion, being ignored, particularly with the PDB format. Structural biology is so diverse that it really needs input from the whole community to do the right thing. The problem is that when the PDB 3.0 format was announced 3 months ago, it was done with the intent of intentionally not allowing time to consider problems and alternatives posed by the user community. Joe Krahn TO UNSUBSCRIBE OR CHANGE YOUR SUBSCRIPTION OPTIONS, please see https://lists.sdsc.edu/mailman/listinfo.cgi/pdb-l . TO UNSUBSCRIBE OR CHANGE YOUR SUBSCRIPTION OPTIONS, please see https://lists.sdsc.edu/mailman/listinfo.cgi/pdb-l .
Re: [ccp4bb] PDB format survey?
On Wednesday 01 August 2007 14:10, Joe Krahn wrote: In addition to questions about the PDB standard, it is probably important to consider mmCIF. One thing I don't like about it is that columns can be randomized (i.e. X, Y, and Z can be in any column), but the mmCIF standards people have no interest in defining a more strict standard that would require files to be as human readable as RCSB's mmCIF files. The important thing about mmCIF is not the precise file format, which is ultimately not of interest except as a parsible exchange medium, but rather the existence of the mmCIF dictionaries. A more productive discussion may be to revisit the definition of what information we as a community expect to be captured in the PDB database. The question of export formats is secondary. Does this sound useful, or have most people given up on having any influence on standards? Or, should the structural biology software developers get together and just make our own OpenPDB format? As discussed at the PDB group discussion at the ACA meeting, some new depositions are not representable in the PDB format (including v3). Examples include: - very large structures, for which the current 80 column PDB format runs out of space for atom numbers (4 columns - max ) or for chain ids (1 column - single char A-Z 0-9) [don't ask my why they don't want lower case] - new classes of experiment (SAXS, EM) - new classes of model (TLS or normal-mode displacements, ensemble models, envelope representations) I am inclined to say that there should be a fork into two distinct formats, used for different purposes. The 80 column PDB format should be frozen, preferably at the pre-version3 state. Freezing it would allow legacy programs to continue to read old PDB files without modification. These programs will not be able to handle certain classes of new structures, but this would be true in any case for legacy code. Churn in the 80 column PDB format would aggravate rather than relieve this limitation. This branch would serve the general community who are primarily viewers of previously deposited structures, and any programs not currently being maintained. Currently-maintained programs should move to mmCIF or XML, whichever is convenient. These formats are intrinsically open-ended, and can handle the problematic structures mentioned above so long as the corresponding mmCIF dictionaries are updated to define the relevant entities. The wwwPDB database is already capable of exporting to any PDB, XML, or mmCIF format. So this would really be a change on the user side more than on the database side. The barrier to converting programs to mmCIF is lower than you might think. Several mmCIF parsing libraries are available to allow currently maintained programs to offer mmCIF input/output if they do not already do so. One such is the mmlib library developed by Jay Painter and hosted on SourceForge: http://pymmlib.sourceforge.net/ J Painter and EA Merritt J. Appl. Cryst. 37, 174-178, (2004). mmLib Python toolkit for manipulating annotated structural models of biological macromolecules. -- Ethan A Merritt
Re: [ccp4bb] PDB format survey?
I suspect this will be throwing fuel on the fire, but what is so great about the PDB format (any version) besides familiarity? It seems to me to be outdated, inadequate and generally mis-used by all. I say scrap it, make a clean break and devote everyone's energies to making a format that will work for everyone. (granted: it is inexcusable for the RCSB to be developing new formats without the input from affected parties). mmCIF seems like a good idea that has not gotten the attention it needs (and deserves) to be formulated to meet everyone's needs. As for the legacy program argument: that's what translation programs like OpenBabel are for (or even a very simple python/perl/your-favorite-hammer script). Perhaps even the RCSB could be convinced to offer several formats for download..oh, wait - they already do. Ducking behind my asbestos-free, all-natural organic firewall, -Tom -Original Message- From: CCP4 bulletin board on behalf of Ethan Merritt Sent: Wed 8/1/2007 3:06 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] PDB format survey? On Wednesday 01 August 2007 14:10, Joe Krahn wrote: In addition to questions about the PDB standard, it is probably important to consider mmCIF. One thing I don't like about it is that columns can be randomized (i.e. X, Y, and Z can be in any column), but the mmCIF standards people have no interest in defining a more strict standard that would require files to be as human readable as RCSB's mmCIF files. The important thing about mmCIF is not the precise file format, which is ultimately not of interest except as a parsible exchange medium, but rather the existence of the mmCIF dictionaries. A more productive discussion may be to revisit the definition of what information we as a community expect to be captured in the PDB database. The question of export formats is secondary. Does this sound useful, or have most people given up on having any influence on standards? Or, should the structural biology software developers get together and just make our own OpenPDB format? As discussed at the PDB group discussion at the ACA meeting, some new depositions are not representable in the PDB format (including v3). Examples include: - very large structures, for which the current 80 column PDB format runs out of space for atom numbers (4 columns - max ) or for chain ids (1 column - single char A-Z 0-9) [don't ask my why they don't want lower case] - new classes of experiment (SAXS, EM) - new classes of model (TLS or normal-mode displacements, ensemble models, envelope representations) I am inclined to say that there should be a fork into two distinct formats, used for different purposes. The 80 column PDB format should be frozen, preferably at the pre-version3 state. Freezing it would allow legacy programs to continue to read old PDB files without modification. These programs will not be able to handle certain classes of new structures, but this would be true in any case for legacy code. Churn in the 80 column PDB format would aggravate rather than relieve this limitation. This branch would serve the general community who are primarily viewers of previously deposited structures, and any programs not currently being maintained. Currently-maintained programs should move to mmCIF or XML, whichever is convenient. These formats are intrinsically open-ended, and can handle the problematic structures mentioned above so long as the corresponding mmCIF dictionaries are updated to define the relevant entities. The wwwPDB database is already capable of exporting to any PDB, XML, or mmCIF format. So this would really be a change on the user side more than on the database side. The barrier to converting programs to mmCIF is lower than you might think. Several mmCIF parsing libraries are available to allow currently maintained programs to offer mmCIF input/output if they do not already do so. One such is the mmlib library developed by Jay Painter and hosted on SourceForge: http://pymmlib.sourceforge.net/ J Painter and EA Merritt J. Appl. Cryst. 37, 174-178, (2004). mmLib Python toolkit for manipulating annotated structural models of biological macromolecules. -- Ethan A Merritt This email (including any attachments) may contain material that is confidential and privileged and is for the sole use of the intended recipient. Any review, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Exelixis, Inc. reserves the right, to the extent and under circumstances permitted by applicable law, to retain, monitor and intercept e-mail messages to and from its systems.
Re: [ccp4bb] PDB format survey?
Ethan Merritt wrote: Examples include: - very large structures, for which the current 80 column PDB format runs out of space for atom numbers (4 columns - max ) or for chain ids (1 column - single char A-Z 0-9) [don't ask my why they don't want lower case] - new classes of experiment (SAXS, EM) - new classes of model (TLS or normal-mode displacements, ensemble models, envelope representations) It would be trivial to update the PDB format to handle large structures. In fact, such extensions are already being planned. Atom numbers can simply be handled by truncating them; the serial design of PDB files makes it redundant. As for other experiments, like SAX or EM, I only think that the PDB format should continue to be used for atomic coordinates. Using them as a complete data reference has never been good. ... Currently-maintained programs should move to mmCIF or XML, whichever is convenient. These formats are intrinsically open-ended, and can handle the problematic structures mentioned above so long as the corresponding mmCIF dictionaries are updated to define the relevant entities. Being intrinsically open-ended is an advantage for parsing, but it still takes a lot of work to actually make use of new data. The software still has to be updated to handle the data. Formats like mmCIF and XML only handle part of the 'file format' issue. One problem is that mmCIF can be too open-ended, depending on how the schema is managed. I would be much more willing to work toward switching to mmCIF if RCSB showed more interest in collaborating with the user community. If we can't even get involvement in something as simple as the PDB format, why should we think working with mmCIF will be any better? Joe Krahn