Re: [ccp4bb] mmCIF as working format?
Dear David and Kaiser: While the PDB format is (thankfully--to those used to it) around, it seems to me it is certainly a rather poor deterrent to the enjoyment of AWK: For fixed-field format input, the designers of AWK suggested a useful solution: the function substr(s,p,n), i.e., return substring of s of length n starting at position p (Aho et al. The AWK Programming Language. Addison-Wesley, 1988, pp. 42, 43, 72). The solution I've used, though, is to use gnu awk (gawk) with the format definition as follows: BEGIN {FIELDWIDTHS=6 5 1 4 1 3 1 1 4 1 3 8 8 8 6 6 10 2 2;} --hope you'd find that useful too. As for Perl, somebody put it nicely that one should comment programs bearing in mind that the person reading them later is always a different one from the one who wrote them; that includes the programmer as she/he will always be in a different state of mind her/himself. Best regards, Navdeep --- On Tue, Aug 06, 2013 at 08:07:22AM -0400, David A Case wrote: An awk script with /^ATOM/ as its selection is actually easier to write than the corresponding script for a PDB ATOM record, since the line can be split on white space. On Mon, Aug 05, 2013 at 03:10:55AM -0700, kaiser wrote: Yes, using grep on mmcif files is awkward (but petfectly possible); awk on the other hand works much better. It's actually more of a pain to use it on pdb files. And perl, well perl can handle anything and it will always look nice while you write it and never look nice when you look back at it... --- Navdeep Sidhu University of Goettingen ---
[ccp4bb] calculation of shape complementarity of different protein-ligand complexes
Dear CCP4bb, I would like to calculate the shape complementarity of several protein-ligand complexes (crystal structures with ligand available). This involves a set of different proteins and also different ligands. The ligands are similar in size, but not in chemical composition. I have looked into the program sc (originally developed to calculate shape complementarity for protein-protein interfaces), but since the interfaces are rather small - as pointed out by Mike Lawrence - it might not be suitable for this type of problem. Has anyone done something similar before? There are some mutants available, so it would be good to quantify the change in shape complementarity for different mutations/ligands for one protein, but also to be able to compare the different protein-ligand complexes to one another. Thanks in advance, Tobias. -- ___ Dr. Tobias Beck ETH Zurich Laboratory of Organic Chemistry Wolfgang-Pauli-Str. 10, HCI F 322 8093 Zurich, Switzerland phone: +41 44 632 68 65 fax:+41 44 632 14 86 web: http://www.protein.ethz.ch/people/tobias ___
Re: [ccp4bb] calculation of shape complementarity of different protein-ligand complexes
VROCS, www.eyesopen.comhttp://www.eyesopen.com Jürgen On Aug 7, 2013, at 9:03 AM, Tobias Beck wrote: Dear CCP4bb, I would like to calculate the shape complementarity of several protein-ligand complexes (crystal structures with ligand available). This involves a set of different proteins and also different ligands. The ligands are similar in size, but not in chemical composition. I have looked into the program sc (originally developed to calculate shape complementarity for protein-protein interfaces), but since the interfaces are rather small - as pointed out by Mike Lawrence - it might not be suitable for this type of problem. Has anyone done something similar before? There are some mutants available, so it would be good to quantify the change in shape complementarity for different mutations/ligands for one protein, but also to be able to compare the different protein-ligand complexes to one another. Thanks in advance, Tobias. -- ___ Dr. Tobias Beck ETH Zurich Laboratory of Organic Chemistry Wolfgang-Pauli-Str. 10, HCI F 322 8093 Zurich, Switzerland phone: +41 44 632 68 65 fax:+41 44 632 14 86 web: http://www.protein.ethz.ch/people/tobias ___ .. Jürgen Bosch Johns Hopkins University Bloomberg School of Public Health Department of Biochemistry Molecular Biology Johns Hopkins Malaria Research Institute 615 North Wolfe Street, W8708 Baltimore, MD 21205 Office: +1-410-614-4742 Lab: +1-410-614-4894 Fax: +1-410-955-2926 http://lupo.jhsph.edu
[ccp4bb] Problems with SANS data analysis
Dear CCP4bb, I have a few questions concerning SANS data recently collected that I'm having trouble analyzing. The data was collected at 2 different detector distances (4m, 2.5m) to achieve higher q-range, but I worry that the curves don't overlap enough at intermediate q, which might indicate a problem with the data. The links below are pictures of the corresponding datasets, before truncating the 4m high-q data and merging them into one. Is there a problem evident with the data, or am I imagining a problem? http://postimg.org/image/qb00y20qr/ http://postimg.org/image/8trbp7akj/ http://postimg.org/image/hni86axj7/ http://postimg.org/image/3sjxnu343/ http://postimg.org/image/4ysj0dgsj/ http://postimg.org/image/9ypz8bmf7/ http://postimg.org/image/m358pazb7/ http://postimg.org/image/jzuthmzib/ My second question concerns the values obtained in the analysis of the final scattering curves. The second sample in my experiment shows serious deviation in the values obtained for I(0) and Rg by Guinier analysis compared to the values obtained by the P(r) analysis. In other words, either the P(r) values match the Guinier and the P(r) fit is terrible, or else the P(r) fit is good but doesn't match the Guinier at all (5-10 difference in Rg, 2x difference in I(0)). I've checked to make sure the buffer subtraction algorithm was OK, and I'm pretty certain that the buffers were exact matches, so I don't know how to explain this variation. There's no evidence of aggregation or polydispersity to throw off the values, either. Does anyone know how this can happen?
Re: [ccp4bb] Problems with SANS data analysis
This question may be better suited for more small-angle-oriented forum, e.g. http://www.saxier.org/forum/ On 08/07/2013 11:22 AM, Remec, Mark wrote: Dear CCP4bb, I have a few questions concerning SANS data recently collected that I'm having trouble analyzing. The data was collected at 2 different detector distances (4m, 2.5m) to achieve higher q-range, but I worry that the curves don't overlap enough at intermediate q, which might indicate a problem with the data. The links below are pictures of the corresponding datasets, before truncating the 4m high-q data and merging them into one. Is there a problem evident with the data, or am I imagining a problem? http://postimg.org/image/qb00y20qr/ http://postimg.org/image/8trbp7akj/ http://postimg.org/image/hni86axj7/ http://postimg.org/image/3sjxnu343/ http://postimg.org/image/4ysj0dgsj/ http://postimg.org/image/9ypz8bmf7/ http://postimg.org/image/m358pazb7/ http://postimg.org/image/jzuthmzib/ My second question concerns the values obtained in the analysis of the final scattering curves. The second sample in my experiment shows serious deviation in the values obtained for I(0) and Rg by Guinier analysis compared to the values obtained by the P(r) analysis. In other words, either the P(r) values match the Guinier and the P(r) fit is terrible, or else the P(r) fit is good but doesn't match the Guinier at all (5-10 difference in Rg, 2x difference in I(0)). I've checked to make sure the buffer subtraction algorithm was OK, and I'm pretty certain that the buffers were exact matches, so I don't know how to explain this variation. There's no evidence of aggregation or polydispersity to throw off the values, either. Does anyone know how this can happen? -- Oh, suddenly throwing a giraffe into a volcano to make water is crazy? Julian, King of Lemurs
[ccp4bb] Workshop Drug Target Crystallography and SBDD
Dear All, In light of the recent studies emphasizing the need for careful analysis and validation of protein-ligand complex structures, Ruben Abagyan from UCSD and myself are conducting an intense 2-day workshop on drug target/ligand structure determination and the use of such X-ray models in structure guided drug discovery/design. Also covered will be presentation of complex models via interactive 3D documents, embedded in presentations and also on mobile devices. It takes place in sunny San Diego (always worth visiting) on Mo, Oct 7 and Tue, Oct 8, 2013. Costs are modest and one lucky participant will receive a free copy of my book. Details can be found on: http://www.ruppweb.org/workshops/Molsoft_2013.htm Hope to see you in San Diego and best wishes, BR - Bernhard Rupp 001 (925) 209-7429 +43 (676) 571-0536 b...@ruppweb.org hofkristall...@gmail.com http://www.ruppweb.org/ - A little revolution now and then is a healthy thing -
Re: [ccp4bb] mmCIF as working format?
Are all the APIs open source ? I was under the impression that CCP4 had moved away from that, which might justifiably reduce interest in any limited-availability API. Phil Jeffrey Princeton From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of James Stroud [xtald...@gmail.com] Sent: Wednesday, August 07, 2013 1:51 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] mmCIF as working format? On Aug 5, 2013, at 4:33 AM, Eugene Krissinel wrote: I just hope that one day we all will be discussing a sort of universal API to read/write structural information instead of referencing to raw formats, and routines to query MX data, which would be more appropriate than grep (would many SB students/postdocs use grep these days? but many if them would need to inspect files somehow). This, in essence, is similar to discussing read/write primitives in C/C++/Fortran rather than I/O functions of BIOS and HDD/BUS commands that they drive. I just want to reinforce this point by quoting it verbatim and also emphasize that it was not lost on some of us. In the long term, the MM structure community should perhaps get its inspiration from SQL, which focuses on the scope of data and the semantics its manipulation, rather than how the data is encoded beneath the surface. James
Re: [ccp4bb] mmCIF as working format?
This is to confirm very publicly that CCP4 libraries (of which APIs is one example) are open source and free to use. There are no plans to change this and, on contrary, there is a common consensus that it should stay as is. Eugene On 7 Aug 2013, at 19:16, Jeffrey, Philip D. wrote: Are all the APIs open source ? I was under the impression that CCP4 had moved away from that, which might justifiably reduce interest in any limited-availability API. Phil Jeffrey Princeton From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of James Stroud [xtald...@gmail.com] Sent: Wednesday, August 07, 2013 1:51 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] mmCIF as working format? On Aug 5, 2013, at 4:33 AM, Eugene Krissinel wrote: I just hope that one day we all will be discussing a sort of universal API to read/write structural information instead of referencing to raw formats, and routines to query MX data, which would be more appropriate than grep (would many SB students/postdocs use grep these days? but many if them would need to inspect files somehow). This, in essence, is similar to discussing read/write primitives in C/C++/Fortran rather than I/O functions of BIOS and HDD/BUS commands that they drive. I just want to reinforce this point by quoting it verbatim and also emphasize that it was not lost on some of us. In the long term, the MM structure community should perhaps get its inspiration from SQL, which focuses on the scope of data and the semantics its manipulation, rather than how the data is encoded beneath the surface. James -- Scanned by iCritical.
Re: [ccp4bb] mmCIF as working format?
On 08/07/2013 01:51 PM, James Stroud wrote: In the long term, the MM structure community should perhaps get its inspiration from SQL For this to work, a particular interface must monopolize access to structural data. Then maintainers of that victorious interface could change the underlying format whichever way they want while supplying the never ending stream of useful features. And all other programs would be just frontends to the interface. As long as data format remains easily readable and there is more than one person willing to fiddle with code, persistence or at the very least backward compatibility of the data format will remain a (minor to me) issue. It is also important that it is much easier to write a pdb parser in your favourite language than to implement general purpose relational database management system. For full disclosure, I personally do not share the apocalyptic feeling about transition to mmCIF. Cheers, Ed. -- Oh, suddenly throwing a giraffe into a volcano to make water is crazy? Julian, King of Lemurs
Re: [ccp4bb] mmCIF as working format?
On Aug 7, 2013, at 1:06 PM, Ed Pozharski wrote: On 08/07/2013 01:51 PM, James Stroud wrote: In the long term, the MM structure community should perhaps get its inspiration from SQL For this to work, a particular interface must monopolize access to structural data. Not necessarily, although the alternative pathway might be more idealistic and hence unrealistic. All that needs to happen is that the community agree on 1. What is the finite set of essential/useful attributes of macromolecular structural data. 2. What is the syntax of (a) accessing and (b) modifying those attributes. 3. What is the syntax of selecting subsets of structural data based on those attributes. The resulting syntax (i.e. language) itself should be terse, easy to learn, easy to use, and preferably easy to implement. If such a standard is created, then I believe awk-ing/grep-ing/sed-ing/etc PDBs and mmCIFs would quickly become historical. James
Re: [ccp4bb] mmCIF as working format?
On Wed, Aug 7, 2013 at 12:54 PM, James Stroud xtald...@gmail.com wrote: All that needs to happen is that the community agree on 1. What is the finite set of essential/useful attributes of macromolecular structural data. 2. What is the syntax of (a) accessing and (b) modifying those attributes. 3. What is the syntax of selecting subsets of structural data based on those attributes. The resulting syntax (i.e. language) itself should be terse, easy to learn, easy to use, and preferably easy to implement. Ah, but the nice thing about mmCIF is that it isn't truly finite - the PDB may limit what tags are actually included in the distributed files, but there is nothing preventing other developers from including their own tags, and there is a community process for extending the officially defined tags. Item (2) is very well-established, unlike the current chaos of REMARK records. I think (3) will be left to the various libraries to deal with. -Nat
Re: [ccp4bb] mmCIF as working format?
Nobody has addressed the fact that mmCIF is a format that allows for many ways of presenting the same data. The recent discussions seem to be based on the assumption that all mmCIF files will look like those currently prepared by the PDB. Any code that reads an mmCIF file should be prepared to read any file that meets the mmCIF specifications. This requires the use of software tools and it may not be possible to use a simple script that works against PDB mmCIF entries to read arbitrary mmCIF files. Or are people saying/hoping/redefining that mmCIF will turn into a fixed column/field format? Frances Bernstein = Bernstein + Sons * * Information Systems Consultants 5 Brewster Lane, Bellport, NY 11713-2803 * * *** *Frances C. Bernstein * *** f...@bernstein-plus-sons.com *** * * *** 1-631-286-1339FAX: 1-631-286-1999 =
Re: [ccp4bb] mmCIF as working format?
The flexibility of CIF is indeed infinite. Even the names of the unit-cell dimsnsions are different in mmCIF and (small molecule) core CIF. One of the main reasons why I had to bring out a new version of SHELXL recently (SHELXL-2013 to replace SHELXL-97) was that in the meantime COMCIFS committee had changed many of the names. George meantime the COMCIFS committee of the IUCr had changed many of the names. On 08/07/2013 10:02 PM, Nat Echols wrote: On Wed, Aug 7, 2013 at 12:54 PM, James Stroud xtald...@gmail.com mailto:xtald...@gmail.com wrote: All that needs to happen is that the community agree on 1. What is the finite set of essential/useful attributes of macromolecular structural data. 2. What is the syntax of (a) accessing and (b) modifying those attributes. 3. What is the syntax of selecting subsets of structural data based on those attributes. The resulting syntax (i.e. language) itself should be terse, easy to learn, easy to use, and preferably easy to implement. Ah, but the nice thing about mmCIF is that it isn't truly finite - the PDB may limit what tags are actually included in the distributed files, but there is nothing preventing other developers from including their own tags, and there is a community process for extending the officially defined tags. Item (2) is very well-established, unlike the current chaos of REMARK records. I think (3) will be left to the various libraries to deal with. -Nat -- Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-33021 or -33068 Fax. +49-551-39-22582
Re: [ccp4bb] mmCIF as working format?
On 08/07/2013 03:54 PM, James Stroud wrote: On Aug 7, 2013, at 1:06 PM, Ed Pozharski wrote: On 08/07/2013 01:51 PM, James Stroud wrote: In the long term, the MM structure community should perhaps get its inspiration from SQL For this to work, a particular interface must monopolize access to structural data. Not necessarily, although the alternative pathway might be more idealistic and hence unrealistic. All that needs to happen is that the community agree on 1. What is the finite set of essential/useful attributes of macromolecular structural data. 2. What is the syntax of (a) accessing and (b) modifying those attributes. 3. What is the syntax of selecting subsets of structural data based on those attributes. The resulting syntax (i.e. language) itself should be terse, easy to learn, easy to use, and preferably easy to implement. If such a standard is created, then I believe awk-ing/grep-ing/sed-ing/etc PDBs and mmCIFs would quickly become historical. James James, frankly, I am not sure which part of your description is not equivalent to monopolistic interface. If I understand your proposal and reference to SQL correctly, you want some scripting language that sounds like simple English. Is the advantage over existing APIs here that one does not need to learn Python, C++, (or, heaven forbid, FORTRAN)? I.e. programs would look like this --- GRAB protein FROM FILE best_model_ever.cif; SELECT CHAIN A FROM protein AS chA; SET chA BFACTORS TO 30.0; GRAB data FROM FILE best_data_ever.cif; BIND protein TO data; REFINE protein USING BUSTER WITH TLS+ANISO; DROP protein INTO FILE better_model_yet.cif; --- Not necessarily a bad idea but now through the fog of time I remember something oddly reminiscent... ah, CNS! (for those googling for it it's not the central nervous system :). Cheers, Ed. -- Oh, suddenly throwing a giraffe into a volcano to make water is crazy? Julian, King of Lemurs
Re: [ccp4bb] mmCIF as working format?
Ed Pozharski wrote: [snip] If I understand your proposal and reference to SQL correctly, you want some scripting language that sounds like simple English. Is the advantage over existing APIs here that one does not need to learn Python, C++, (or, heaven forbid, FORTRAN)? I.e. programs would look like this XML DOM is probably a better example of a standardized API to shoot for than SQL in this case. Regardless of which language or library you use, getChildNodes still does the same thing (at least conceptually). If the recommendation is that crystallographers should be using an API for data stored in a standardized format instead of parsing it themselves, then it would seem to make sense to me that the API should also be standardized (ideally with a well-documented reference implementation). In some sense this is monopolistic - but hopefully it'd be a benevolent monopoly. If I remember correctly, there was a time when the creator of Python referred to himself as the benevolent dictator for life of the project; and it turned out pretty well. [snip] Not necessarily a bad idea but now through the fog of time I remember something oddly reminiscent... ah, CNS! (for those googling for it it's not the central nervous system :). I'm still impressed by the fact that a useful scripting language was implemented in fortran. Pete
Re: [ccp4bb] mmCIF as working format?
On Aug 7, 2013, at 2:35 PM, Ed Pozharski wrote: If I understand your proposal and reference to SQL correctly, you want some scripting language that sounds like simple English. I didn't say anything about being English-like. English and other natural languages are ill-adapted to describing the well-defined operations one might perform on a data structure. Is the advantage over existing APIs here that one does not need to learn Python, C++, (or, heaven forbid, FORTRAN)? Anyone can learn Python in an hour and a half. That's not an issue (except for whitespace nuts). If one wants to use Python to modify PDB structural data, I recommend starting with the tutorial I wrote for CCTBX: http://cctbxwiki.bravais.net/CCTBX_Wiki#Working_with_pdb_Files The advantage of a language over an API is that an API requires coding overhead and must (by the definition of API) be part of an Application. SQL has no such requirement and neither would an ideal language for *selecting* and *modifying* macromolecular structural data. In SQL, one can make selections and modifications without importing libraries, defining a main function, declaring variables, etc. Low overhead is probably the reason so many crystallographers (myself not included) are fluent in the likes of awk. I.e. programs would look like this --- GRAB protein FROM FILE best_model_ever.cif; SELECT CHAIN A FROM protein AS chA; SET chA BFACTORS TO 30.0; GRAB data FROM FILE best_data_ever.cif; BIND protein TO data; REFINE protein USING BUSTER WITH TLS+ANISO; DROP protein INTO FILE better_model_yet.cif; --- Not necessarily a bad idea but now through the fog of time I remember something oddly reminiscent... ah, CNS! (for those googling for it it's not the central nervous system :). Although a little too much like natural language, it is not a bad idea. But, where is the link describing the layer of CNS that looks like that? In my X-Plor 3.1 manual (Yale University Press, 1987) I see nothing remotely like what you describe. CNS, according to the most recent tutorial for 1.3, looks like this: topology evaluate ($counter=1) evaluate ($done=false) while ( $done = false ) loop read if ( exist_topology_infile_$counter = true ) then if ( BLANK%topology_infile_$counter = false ) then @@topology_infile_$counter end if else evaluate ($done=true) end if evaluate ($counter=$counter+1) end loop read end This example makes a point about the problems of APIs. Namely, they require loops and tests, and lack a true selection mechanism, except perhaps for the scripting layer of CNS. But even with CNS, once you have a selection, you must loop over it to modify the data. Although it is likely the best library for working with structural data, CCTBX requires a loop just to change a specific chain ID (to the best of my knowledge): pdb_inp = pdb.input(file_name=best-model.pdb) hierarchy = pdb_inp.construct_hierarchy() for model in hierarchy.models(): for chain in model.chains(): if chain.id == A: chain.id = B I don't intend to pick on CCTBX specifically (because the CCTBX developers have specific needs to which they program), but loop/test mechanisms are awkward for selecting and modifying structural data, and get much more awkward as selections get more complex (e.g. selecting the C-alpha of every alanine of chain A, etc.). James
Re: [ccp4bb] mmCIF as working format?
On Wed, Aug 7, 2013 at 2:36 PM, James Stroud xtald...@gmail.com wrote: Although it is likely the best library for working with structural data, CCTBX requires a loop just to change a specific chain ID (to the best of my knowledge): ... I don't intend to pick on CCTBX specifically (because the CCTBX developers have specific needs to which they program), but loop/test mechanisms are awkward for selecting and modifying structural data, and get much more awkward as selections get more complex (e.g. selecting the C-alpha of every alanine of chain A, etc.). True - it's really an issue of what purpose the libraries were designed for. CCTBX wasn't intended to be a general-purpose tool for users to perform quick manipulations of a model; the goal was to build large, complex, and more-or-less automated crystallography applications on top of it. (The same applies to the CCP4 libraries, mmdb, clipper, etc.; BioPython I guess is designed for bioinformatics.) The design of CNS (for example) reflects an era where it was much more likely that the average crystallographer knew some programming, worked exclusively on the command line, built new models manually, and didn't have access to a large number of convenient tools for purposes like this. (Or so I've heard; I was in still in high school.) Personally, if I need to change a chain ID, I can use Coot or pdbset or many other tools. Writing code for this should only be necessary if you're processing large numbers of models, or have a spectacularly misformatted PDB file. Again, I'll repeat what I said before: if it's truly necessary to view or edit a model by hand or with custom shell scripts, this often means that the available software is deficient. PLEASE tell the developers what you need to get your job done; we can't read minds. -Nat
Re: [ccp4bb] mmCIF as working format?
The cctbx provides comprehensive tools for handling mmcif files (and indeed all types of cif files - it is not fussy), freely available under the BSD-style cctbx licence. Cheers, Richard On 7 Aug 2013, at 19:16, Jeffrey, Philip D. pjeff...@princeton.edu wrote: Are all the APIs open source ? I was under the impression that CCP4 had moved away from that, which might justifiably reduce interest in any limited-availability API. Phil Jeffrey Princeton From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of James Stroud [xtald...@gmail.com] Sent: Wednesday, August 07, 2013 1:51 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] mmCIF as working format? On Aug 5, 2013, at 4:33 AM, Eugene Krissinel wrote: I just hope that one day we all will be discussing a sort of universal API to read/write structural information instead of referencing to raw formats, and routines to query MX data, which would be more appropriate than grep (would many SB students/postdocs use grep these days? but many if them would need to inspect files somehow). This, in essence, is similar to discussing read/write primitives in C/C++/Fortran rather than I/O functions of BIOS and HDD/BUS commands that they drive. I just want to reinforce this point by quoting it verbatim and also emphasize that it was not lost on some of us. In the long term, the MM structure community should perhaps get its inspiration from SQL, which focuses on the scope of data and the semantics its manipulation, rather than how the data is encoded beneath the surface. James -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
Re: [ccp4bb] mmCIF as working format?
James, On 08/07/2013 05:36 PM, James Stroud wrote: Anyone can learn Python in an hour and a half. Isn't this a bit of an exaggeration? Python is designed to be easy to learn, but we probably talking about different definitions of learning and anyone. I.e. programs would look like this --- GRAB protein FROM FILE best_model_ever.cif; SELECT CHAIN A FROM protein AS chA; SET chA BFACTORS TO 30.0; GRAB data FROM FILE best_data_ever.cif; BIND protein TO data; REFINE protein USING BUSTER WITH TLS+ANISO; DROP protein INTO FILE better_model_yet.cif; --- Not necessarily a bad idea but now through the fog of time I remember something oddly reminiscent... ah, CNS! (for those googling for it it's not the central nervous system :). Although a little too much like natural language, it is not a bad idea. But, where is the link describing the layer of CNS that looks like that? I should probably use tongue-in-cheek/tongue-in-check markup next time to prevent my poor attempt at humorous tribute to CNS from being understood so literally. At the very least you might agree that CNS is the closest thing we ever had to MX-oriented general purpose interpreter. Your quote is also from below-the-magic-line-do-not-change area of a CNS script. Cheers, Ed. -- Oh, suddenly throwing a giraffe into a volcano to make water is crazy? Julian, King of Lemurs
Re: [ccp4bb] mmCIF as working format?
On 08/07/2013 05:54 PM, Nat Echols wrote: Personally, if I need to change a chain ID, I can use Coot or pdbset or many other tools. Writing code for this should only be necessary if you're processing large numbers of models, or have a spectacularly misformatted PDB file. Again, I'll repeat what I said before: if it's truly necessary to view or edit a model by hand or with custom shell scripts, this often means that the available software is deficient. PLEASE tell the developers what you need to get your job done; we can't read minds. Nat, I don't think anyone here really means that the only way to change a chain ID is to write, say, a perl script. But an interpreter of the kind advocated by James (as much as I have hijacked/misinterpreted his vision) could indeed be very useful for people pursuing simple bioinformatics projects and new ways to analyse structural models. While I understand your view that everyone should seek assistance from developers with every problem encountered, I also recall some reasonable idea about self-sufficiency that should cover scientific research (something like give man a fish and you feed him for a day, teach him to fish and he starts paying taxes... something along these lines ;). There is a difference betweens tools that allow to easily perform useful non-standard analysis and highly specialized tools that strive to cover every situation imaginable. Cheers, Ed. -- Oh, suddenly throwing a giraffe into a volcano to make water is crazy? Julian, King of Lemurs
Re: [ccp4bb] mmCIF as working format?
On Wednesday, August 07, 2013 04:00:16 pm Ed Pozharski wrote: On 08/07/2013 05:54 PM, Nat Echols wrote: Personally, if I need to change a chain ID, I can use Coot or pdbset or many other tools. Writing code for this should only be necessary if you're processing large numbers of models, or have a spectacularly misformatted PDB file. Again, I'll repeat what I said before: if it's truly necessary to view or edit a model by hand or with custom shell scripts, this often means that the available software is deficient. PLEASE tell the developers what you need to get your job done; we can't read minds. Nat, I don't think anyone here really means that the only way to change a chain ID is to write, say, a perl script. But an interpreter of the kind advocated by James (as much as I have hijacked/misinterpreted his vision) could indeed be very useful for people pursuing simple bioinformatics projects and new ways to analyse structural models. We tackled this a while back for the then-current incarnation of mmCIF. http://www.bmsc.washington.edu/parvati/mmLib.pdf I suppose it will all have to be revisited so that it knows the quirks, features, and foibles of the new and improved mmCIF. Ethan While I understand your view that everyone should seek assistance from developers with every problem encountered, I also recall some reasonable idea about self-sufficiency that should cover scientific research (something like give man a fish and you feed him for a day, teach him to fish and he starts paying taxes... something along these lines ;). There is a difference betweens tools that allow to easily perform useful non-standard analysis and highly specialized tools that strive to cover every situation imaginable. Cheers, Ed. -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742
Re: [ccp4bb] mmCIF as working format?
I.e. programs would look like this --- GRAB protein FROM FILE best_model_ever.cif; SELECT CHAIN A FROM protein AS chA; SET chA BFACTORS TO 30.0; GRAB data FROM FILE best_data_ever.cif; BIND protein TO data; REFINE protein USING BUSTER WITH TLS+ANISO; DROP protein INTO FILE better_model_yet.cif; --- This brings to mind James Holton's Elves program(s): http://bl831.als.lbl.gov/~jamesh/elves/ Phil Jeffrey Princeton
Re: [ccp4bb] mmCIF as working format?
Nat Echols wrote: Personally, if I need to change a chain ID, I can use Coot or pdbset or many other tools. Writing code for this should only be necessary if you're processing large numbers of models, or have a spectacularly misformatted PDB file. Problem. Coot is bad at the chain label aspect. Create a pdb file containing residues A1-A20 and X101-X120 - non-overlapping numbering. Try to change the chain label of X to A. I get WARNING:: CONFLICT: chain id already exists in this molecule This is (IMHO) a bizarre feature because this is exactly the sort of thing you do when building structures. Therefore I do one of two things: 1. Open it in (x)emacs, replace X with A and Bob's your uncle. 2. Start Peek2 - that's my interactive program for doing simple and stupid things like this. I type read test.pdb and chain and Peek2 prompts me at perceived chain breaks (change in chain label, CA-CA breaks, ATOM/HETATM transitions c) and then write test.pdb. Takes less than 10 seconds. CCP4i would probably still be launching, as would Phenix. The reason I do #1 or #2 is not to be a Luddite, but to do something trivial and boring quickly so I can get back to something interesting like building structures, or beating subjects to death on CCP4bb. What's lacking is an interactive, or just plain fast method in any guise, way of doing simple PDB manipulations that we do tons of times when building protein structures. I've used Peek2 thousands of times for this purpose, which is the only reason it still exists because it's a fairly stupid program. A truly interactive version of PDBSET would be splendid. But, again, it always runs in batch mode. mmCIF looked promising, apropos emacs, when I looked at the spec page at: http://www.iucr.org/__data/iucr/cifdic_html/2/cif_mm.dic/Catom_site.html because that ATOM data is column-formatted. Cool. However looking at 6LYZ.cif from RCSB's site revealed that the XYZ's were LEFT-justified: http://www.rcsb.org/pdb/files/6LYZ.cif which makes me recoil in horror and resolve to use PDB format until someone puts a gun to my head. Really, guys, if you can put multiple successive spaces to the RIGHT of the number, why didn't you put them to the LEFT of it instead ? Same parsing, better readability. Phil Jeffrey Princeton (using the vernacular but deathly serious about protein structure)
Re: [ccp4bb] mmCIF as working format?
Quoting Jeffrey, Philip D. pjeff...@princeton.edu: Nat Echols wrote: Personally, if I need to change a chain ID, I can use Coot or pdbset or many other tools. Writing code for this should only be necessary if you're processing large numbers of models, or have a spectacularly misformatted PDB file. Problem. Coot is bad at the chain label aspect. Create a pdb file containing residues A1-A20 and X101-X120 - non-overlapping numbering. Try to change the chain label of X to A. I get WARNING:: CONFLICT: chain id already exists in this molecule Having had to show this to a student today, it does work fine if you select the Use Residue Range option rather than changing the whole chain. Not quite so convenient, but at least it makes the user think. This message was sent using IMP, the Internet Messaging Program.
Re: [ccp4bb] mmCIF as working format?
On Wednesday, August 07, 2013 04:54:39 pm Jeffrey, Philip D. wrote: Nat Echols wrote: Personally, if I need to change a chain ID, I can use Coot or pdbset or many other tools. Writing code for this should only be necessary if you're processing large numbers of models, or have a spectacularly misformatted PDB file. Problem. Coot is bad at the chain label aspect. Create a pdb file containing residues A1-A20 and X101-X120 - non-overlapping numbering. Try to change the chain label of X to A. I get WARNING:: CONFLICT: chain id already exists in this molecule That would be a bug. But it hasn't been true for any version of coot that I have used. As you say, this is a common thing to do and I am certain I would have noticed if it didn't work. I just checked that it isn't true for 0.7.1-pre. What _is_ true is that renaming X to A in this case will not re-order the residues in the file. So if you had A1-100 followed by B1-10 followed by X101-200 there would not be a peptide link between A100 and A(old X)101 after the renaming. To fix this you need to write out the file and use an editor to move the records for A101-200 to immediately after the records for A1-100. This does illustrate the point that expecting all tools to handle all possible manipulations is unrealistic. I think there will always be a need for a separate tool that can do anything imaginable, whether that tool is vi or emacs or some spiffy new mmCIF editing GUI. The problem with this is that any tool capable or arbitrarily editing your file is also capable of subtly mangling your file. The current PDB format is horribly sensitive to this. For example if you reorder/renumber/relabel ATOM records in a PDB file then references to them in the header records (TLS, SITE, etc) and LINK/CONECT records will now point to the wrong atoms. I am not convinced that the new mmCIF format has gotten this quite right either, at least in the examples given, but it does have the flexibility to attach such links or properties directly to the ATOM record where it is more likely to be carried along correctly if moved. That by itself is IMHO enough to justify the switch from PDB to mmCIF. Ethan This is (IMHO) a bizarre feature because this is exactly the sort of thing you do when building structures. Therefore I do one of two things: 1. Open it in (x)emacs, replace X with A and Bob's your uncle. 2. Start Peek2 - that's my interactive program for doing simple and stupid things like this. I type read test.pdb and chain and Peek2 prompts me at perceived chain breaks (change in chain label, CA-CA breaks, ATOM/HETATM transitions c) and then write test.pdb. Takes less than 10 seconds. CCP4i would probably still be launching, as would Phenix. The reason I do #1 or #2 is not to be a Luddite, but to do something trivial and boring quickly so I can get back to something interesting like building structures, or beating subjects to death on CCP4bb. What's lacking is an interactive, or just plain fast method in any guise, way of doing simple PDB manipulations that we do tons of times when building protein structures. I've used Peek2 thousands of times for this purpose, which is the only reason it still exists because it's a fairly stupid program. A truly interactive version of PDBSET would be splendid. But, again, it always runs in batch mode. mmCIF looked promising, apropos emacs, when I looked at the spec page at: http://www.iucr.org/__data/iucr/cifdic_html/2/cif_mm.dic/Catom_site.html because that ATOM data is column-formatted. Cool. However looking at 6LYZ.cif from RCSB's site revealed that the XYZ's were LEFT-justified: http://www.rcsb.org/pdb/files/6LYZ.cif which makes me recoil in horror and resolve to use PDB format until someone puts a gun to my head. Really, guys, if you can put multiple successive spaces to the RIGHT of the number, why didn't you put them to the LEFT of it instead ? Same parsing, better readability. Phil Jeffrey Princeton (using the vernacular but deathly serious about protein structure) -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742