Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Navdeep Sidhu
Dear David and Kaiser:

While the PDB format is (thankfully--to those used to it) around, it seems to 
me it is certainly a rather poor deterrent to the enjoyment of AWK:

For fixed-field format input, the designers of AWK suggested a useful solution: 
the function substr(s,p,n), i.e., return substring of s of length n starting 
at position p (Aho et al. The AWK Programming Language. Addison-Wesley, 1988, 
pp. 42, 43, 72).

The solution I've used, though, is to use gnu awk (gawk) with the format 
definition as follows:
BEGIN {FIELDWIDTHS=6 5 1 4 1 3 1 1 4 1 3 8 8 8 6 6 10 2 2;}
--hope you'd find that useful too.

As for Perl, somebody put it nicely that one should comment programs bearing in 
mind that the person reading them later is always a different one from the one 
who wrote them; that includes the programmer as she/he will always be in a 
different state of mind her/himself.

Best regards,
Navdeep


---
On Tue, Aug 06, 2013 at 08:07:22AM -0400, David A Case wrote:
 
 An awk script with /^ATOM/ as its selection is actually easier to write
 than the corresponding script for a PDB ATOM record, since the line can
 be split on white space.

On Mon, Aug 05, 2013 at 03:10:55AM -0700, kaiser wrote:
   Yes, using grep on mmcif files is awkward (but petfectly possible); awk 
 on the other hand works much better. It's actually more of a pain to use it 
 on pdb files. And perl, well perl can handle anything and it will always look 
 nice while you write it and never look nice when you look back at it...


---
Navdeep Sidhu
University of Goettingen
---


[ccp4bb] calculation of shape complementarity of different protein-ligand complexes

2013-08-07 Thread Tobias Beck
Dear CCP4bb,

I would like to calculate the shape complementarity of several
protein-ligand complexes (crystal structures with ligand available). This
involves a set of different proteins and also different ligands. The
ligands are similar in size, but not in chemical composition.

I have looked into the program sc (originally developed to calculate shape
complementarity for protein-protein interfaces), but since the interfaces
are rather small - as pointed out by Mike Lawrence - it might not be
suitable for this type of problem.

Has anyone done something similar before? There are some mutants available,
so it would be good to quantify the change in shape complementarity for
different mutations/ligands for one protein, but also to be able to compare
the different protein-ligand complexes to one another.

Thanks in advance,

Tobias.

-- 
___

Dr. Tobias Beck
ETH Zurich
Laboratory of Organic Chemistry
Wolfgang-Pauli-Str. 10, HCI F 322
8093 Zurich, Switzerland
phone:   +41 44 632 68 65
fax:+41 44 632 14 86
web:  http://www.protein.ethz.ch/people/tobias
___


Re: [ccp4bb] calculation of shape complementarity of different protein-ligand complexes

2013-08-07 Thread Bosch, Juergen
VROCS, www.eyesopen.comhttp://www.eyesopen.com

Jürgen

On Aug 7, 2013, at 9:03 AM, Tobias Beck wrote:

Dear CCP4bb,

I would like to calculate the shape complementarity of several protein-ligand 
complexes (crystal structures with ligand available). This involves a set of 
different proteins and also different ligands. The ligands are similar in size, 
but not in chemical composition.

I have looked into the program sc (originally developed to calculate shape 
complementarity for protein-protein interfaces), but since the interfaces are 
rather small - as pointed out by Mike Lawrence - it might not be suitable for 
this type of problem.

Has anyone done something similar before? There are some mutants available, so 
it would be good to quantify the change in shape complementarity for different 
mutations/ligands for one protein, but also to be able to compare the different 
protein-ligand complexes to one another.

Thanks in advance,

Tobias.

--
___

Dr. Tobias Beck
ETH Zurich
Laboratory of Organic Chemistry
Wolfgang-Pauli-Str. 10, HCI F 322
8093 Zurich, Switzerland
phone:   +41 44 632 68 65
fax:+41 44 632 14 86
web:  http://www.protein.ethz.ch/people/tobias
___

..
Jürgen Bosch
Johns Hopkins University
Bloomberg School of Public Health
Department of Biochemistry  Molecular Biology
Johns Hopkins Malaria Research Institute
615 North Wolfe Street, W8708
Baltimore, MD 21205
Office: +1-410-614-4742
Lab:  +1-410-614-4894
Fax:  +1-410-955-2926
http://lupo.jhsph.edu






[ccp4bb] Problems with SANS data analysis

2013-08-07 Thread Remec, Mark
Dear CCP4bb,

I have a few questions concerning SANS data recently collected that I'm having 
trouble analyzing. The data was collected at 2 different detector distances 
(4m, 2.5m) to achieve higher q-range, but I worry that the curves don't overlap 
enough at intermediate q, which might indicate a problem with the data. The 
links below are pictures of the corresponding datasets, before truncating the 
4m high-q data and merging them into one. Is there a problem evident with the 
data, or am I imagining a problem?

http://postimg.org/image/qb00y20qr/

http://postimg.org/image/8trbp7akj/

http://postimg.org/image/hni86axj7/

http://postimg.org/image/3sjxnu343/

http://postimg.org/image/4ysj0dgsj/

http://postimg.org/image/9ypz8bmf7/

http://postimg.org/image/m358pazb7/

http://postimg.org/image/jzuthmzib/

My second question concerns the values obtained in the analysis of the final 
scattering curves. The second sample in my experiment shows serious deviation 
in the values obtained for I(0) and Rg by Guinier analysis compared to the 
values obtained by the P(r) analysis. In other words, either the P(r) values 
match the Guinier and the P(r) fit is terrible, or else the P(r) fit is good 
but doesn't match the Guinier at all (5-10 difference in Rg, 2x difference in 
I(0)). I've checked to make sure the buffer subtraction algorithm was OK, and 
I'm pretty certain that the buffers were exact matches, so I don't know how to 
explain this variation. There's no evidence of aggregation or polydispersity to 
throw off the values, either. Does anyone know how this can happen?




Re: [ccp4bb] Problems with SANS data analysis

2013-08-07 Thread Ed Pozharski
This question may be better suited for more small-angle-oriented forum, 
e.g.

http://www.saxier.org/forum/


On 08/07/2013 11:22 AM, Remec, Mark wrote:


Dear CCP4bb,

I have a few questions concerning SANS data recently collected that 
I'm having trouble analyzing. The data was collected at 2 different 
detector distances (4m, 2.5m) to achieve higher q-range, but I worry 
that the curves don't overlap enough at intermediate q, which might 
indicate a problem with the data. The links below are pictures of the 
corresponding datasets, before truncating the 4m high-q data and 
merging them into one. Is there a problem evident with the data, or am 
I imagining a problem?


http://postimg.org/image/qb00y20qr/

http://postimg.org/image/8trbp7akj/

http://postimg.org/image/hni86axj7/

http://postimg.org/image/3sjxnu343/

http://postimg.org/image/4ysj0dgsj/

http://postimg.org/image/9ypz8bmf7/

http://postimg.org/image/m358pazb7/

http://postimg.org/image/jzuthmzib/

My second question concerns the values obtained in the analysis of the 
final scattering curves. The second sample in my experiment shows 
serious deviation in the values obtained for I(0) and Rg by Guinier 
analysis compared to the values obtained by the P(r) analysis. In 
other words, either the P(r) values match the Guinier and the P(r) fit 
is terrible, or else the P(r) fit is good but doesn't match the 
Guinier at all (5-10 difference in Rg, 2x difference in I(0)). I've 
checked to make sure the buffer subtraction algorithm was OK, and I'm 
pretty certain that the buffers were exact matches, so I don't know 
how to explain this variation. There's no evidence of aggregation or 
polydispersity to throw off the values, either. Does anyone know how 
this can happen?






--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs



[ccp4bb] Workshop Drug Target Crystallography and SBDD

2013-08-07 Thread Bernhard Rupp
Dear All,

In light of the recent studies emphasizing the need for careful  analysis
and validation of
protein-ligand complex structures, Ruben Abagyan from UCSD and myself are 
conducting an intense 2-day workshop on drug target/ligand structure
determination
and the use of such X-ray models in structure guided drug discovery/design.
Also covered
will be presentation of complex models via interactive 3D documents,
embedded in presentations 
and also on mobile devices.

It takes place in sunny San Diego (always worth visiting) on Mo, Oct 7 and
Tue, Oct 8, 2013. 
Costs are modest and one lucky participant will receive a free copy of my
book.

Details can be found on:
http://www.ruppweb.org/workshops/Molsoft_2013.htm


Hope to see you in San Diego and best wishes, BR
-
Bernhard Rupp
001 (925) 209-7429
+43 (676) 571-0536
b...@ruppweb.org
hofkristall...@gmail.com
http://www.ruppweb.org/
-
A little revolution now and then
is a healthy thing
-


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Jeffrey, Philip D.
Are all the APIs open source ?  I was under the impression that CCP4 had moved 
away from that, which might justifiably reduce interest in any 
limited-availability API.

Phil Jeffrey
Princeton

From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of James Stroud 
[xtald...@gmail.com]
Sent: Wednesday, August 07, 2013 1:51 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] mmCIF as working format?

On Aug 5, 2013, at 4:33 AM, Eugene Krissinel wrote:
 I just hope that one day we all will be discussing a sort of universal API to 
 read/write structural information instead of referencing to raw formats, and 
 routines to query MX data, which would be more appropriate than grep (would 
 many SB students/postdocs use grep these days? but many if them would need to 
 inspect files somehow). This, in essence, is similar to discussing read/write 
 primitives in C/C++/Fortran rather than I/O functions of BIOS and HDD/BUS 
 commands that they drive.

I just want to reinforce this point by quoting it verbatim and also emphasize 
that it was not lost on some of us.

In the long term, the MM structure community should perhaps get its inspiration 
from SQL, which focuses on the scope of data and the semantics its 
manipulation, rather than how the data is encoded beneath the surface.

James


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Eugene Krissinel
This is to confirm very publicly that CCP4 libraries (of which APIs is one 
example) are open source and free to use. There are no plans to change this 
and, on contrary, there is a common consensus that it should stay as is.

Eugene


On 7 Aug 2013, at 19:16, Jeffrey, Philip D. wrote:

 Are all the APIs open source ?  I was under the impression that CCP4 had 
 moved away from that, which might justifiably reduce interest in any 
 limited-availability API.
 
 Phil Jeffrey
 Princeton
 
 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of James Stroud 
 [xtald...@gmail.com]
 Sent: Wednesday, August 07, 2013 1:51 PM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] mmCIF as working format?
 
 On Aug 5, 2013, at 4:33 AM, Eugene Krissinel wrote:
 I just hope that one day we all will be discussing a sort of universal API 
 to read/write structural information instead of referencing to raw formats, 
 and routines to query MX data, which would be more appropriate than grep 
 (would many SB students/postdocs use grep these days? but many if them would 
 need to inspect files somehow). This, in essence, is similar to discussing 
 read/write primitives in C/C++/Fortran rather than I/O functions of BIOS and 
 HDD/BUS commands that they drive.
 
 I just want to reinforce this point by quoting it verbatim and also emphasize 
 that it was not lost on some of us.
 
 In the long term, the MM structure community should perhaps get its 
 inspiration from SQL, which focuses on the scope of data and the semantics 
 its manipulation, rather than how the data is encoded beneath the surface.
 
 James


-- 
Scanned by iCritical.



Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Ed Pozharski

On 08/07/2013 01:51 PM, James Stroud wrote:

In the long term, the MM structure community should perhaps get its inspiration 
from SQL
For this to work, a particular interface must monopolize access to 
structural data.  Then maintainers of that victorious interface could 
change the underlying format whichever way they want while supplying the 
never ending stream of useful features.  And all other programs would be 
just frontends to the interface.  As long as data format remains easily 
readable and there is more than one person willing to fiddle with code, 
persistence or at the very least backward compatibility of the data 
format will remain a (minor to me) issue.  It is also important that it 
is much easier to write a pdb parser in your favourite language than to 
implement general purpose relational database management system.


For full disclosure, I personally do not share the apocalyptic feeling 
about transition to mmCIF.


Cheers,

Ed.


--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread James Stroud
On Aug 7, 2013, at 1:06 PM, Ed Pozharski wrote:
 On 08/07/2013 01:51 PM, James Stroud wrote:
 In the long term, the MM structure community should perhaps get its 
 inspiration from SQL
 For this to work, a particular interface must monopolize access to structural 
 data.

Not necessarily, although the alternative pathway might be more idealistic and 
hence unrealistic.

All that needs to happen is that the community agree on

1. What is the finite set of essential/useful attributes of macromolecular 
structural data.
2. What is the syntax of (a) accessing and (b) modifying those attributes.
3. What is the syntax of selecting subsets of structural data based on those 
attributes.

The resulting syntax (i.e. language) itself should be terse, easy to learn, 
easy to use, and preferably easy to implement.

If such a standard is created, then I believe awk-ing/grep-ing/sed-ing/etc PDBs 
and mmCIFs would quickly become historical.

James


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Nat Echols
On Wed, Aug 7, 2013 at 12:54 PM, James Stroud xtald...@gmail.com wrote:

 All that needs to happen is that the community agree on

 1. What is the finite set of essential/useful attributes of macromolecular
 structural data.
 2. What is the syntax of (a) accessing and (b) modifying those attributes.
 3. What is the syntax of selecting subsets of structural data based on
 those attributes.

 The resulting syntax (i.e. language) itself should be terse, easy to
 learn, easy to use, and preferably easy to implement.


Ah, but the nice thing about mmCIF is that it isn't truly finite - the
PDB may limit what tags are actually included in the distributed files, but
there is nothing preventing other developers from including their own tags,
and there is a community process for extending the officially defined
tags.  Item (2) is very well-established, unlike the current chaos of
REMARK records.  I think (3) will be left to the various libraries to deal
with.

-Nat


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Frances C. Bernstein

 Nobody has addressed the fact that mmCIF is a format
that allows for many ways of presenting the same data.  The
recent discussions seem to be based on the assumption that
all mmCIF files will look like those currently prepared by
the PDB.

 Any code that reads an mmCIF file should be prepared to
read any file that meets the mmCIF specifications.  This
requires the use of software tools and it may not be possible
to use a simple script that works against PDB mmCIF entries
to read arbitrary mmCIF files.

 Or are people saying/hoping/redefining that mmCIF will
turn into a fixed column/field format?

Frances Bernstein

=
Bernstein + Sons
*   *   Information Systems Consultants
5 Brewster Lane, Bellport, NY 11713-2803
*   * ***
 *Frances C. Bernstein
  *   ***  f...@bernstein-plus-sons.com
 *** *
  *   *** 1-631-286-1339FAX: 1-631-286-1999
=


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread George Sheldrick
The flexibility of CIF is indeed infinite. Even the names of the 
unit-cell dimsnsions are different in mmCIF and (small molecule) core CIF.
One of the main reasons why I had to bring out a new version of SHELXL 
recently (SHELXL-2013 to replace SHELXL-97) was that in the

meantime COMCIFS committee had changed many of the names.

George



meantime the COMCIFS committee of the IUCr had changed many of the names.


On 08/07/2013 10:02 PM, Nat Echols wrote:
On Wed, Aug 7, 2013 at 12:54 PM, James Stroud xtald...@gmail.com 
mailto:xtald...@gmail.com wrote:


All that needs to happen is that the community agree on

1. What is the finite set of essential/useful attributes of
macromolecular structural data.
2. What is the syntax of (a) accessing and (b) modifying those
attributes.
3. What is the syntax of selecting subsets of structural data
based on those attributes.

The resulting syntax (i.e. language) itself should be terse, easy
to learn, easy to use, and preferably easy to implement.


Ah, but the nice thing about mmCIF is that it isn't truly finite - 
the PDB may limit what tags are actually included in the distributed 
files, but there is nothing preventing other developers from including 
their own tags, and there is a community process for extending the 
officially defined tags.  Item (2) is very well-established, unlike 
the current chaos of REMARK records.  I think (3) will be left to the 
various libraries to deal with.


-Nat



--
Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-33021 or -33068
Fax. +49-551-39-22582




Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Ed Pozharski

On 08/07/2013 03:54 PM, James Stroud wrote:

On Aug 7, 2013, at 1:06 PM, Ed Pozharski wrote:

On 08/07/2013 01:51 PM, James Stroud wrote:

In the long term, the MM structure community should perhaps get its inspiration 
from SQL

For this to work, a particular interface must monopolize access to structural 
data.

Not necessarily, although the alternative pathway might be more idealistic and 
hence unrealistic.

All that needs to happen is that the community agree on

1. What is the finite set of essential/useful attributes of macromolecular 
structural data.
2. What is the syntax of (a) accessing and (b) modifying those attributes.
3. What is the syntax of selecting subsets of structural data based on those 
attributes.

The resulting syntax (i.e. language) itself should be terse, easy to learn, 
easy to use, and preferably easy to implement.

If such a standard is created, then I believe awk-ing/grep-ing/sed-ing/etc PDBs 
and mmCIFs would quickly become historical.

James

James,

frankly, I am not sure which part of your description is not equivalent 
to monopolistic interface.


If I understand your proposal and reference to SQL correctly, you want 
some scripting language that sounds like simple English.  Is the 
advantage over existing APIs here that one does not need to learn 
Python, C++, (or, heaven forbid, FORTRAN)?  I.e. programs would look 
like this


---
GRAB protein FROM FILE best_model_ever.cif;
SELECT CHAIN A FROM protein AS chA;
SET chA BFACTORS TO 30.0;
GRAB data FROM FILE best_data_ever.cif;
BIND protein TO data;
REFINE protein USING BUSTER WITH TLS+ANISO;
DROP protein INTO FILE better_model_yet.cif;
---

Not necessarily a bad idea but now through the fog of time I remember 
something oddly reminiscent... ah, CNS! (for those googling for it it's 
not the central nervous system :).


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Pete Meyer

Ed Pozharski wrote:
[snip]
If I understand your proposal and reference to SQL correctly, you want 
some scripting language that sounds like simple English.  Is the 
advantage over existing APIs here that one does not need to learn 
Python, C++, (or, heaven forbid, FORTRAN)?  I.e. programs would look 
like this


XML DOM is probably a better example of a standardized API to shoot for 
than SQL in this case.  Regardless of which language or library you use, 
getChildNodes still does the same thing (at least conceptually).


If the recommendation is that crystallographers should be using an API 
for data stored in a standardized format instead of parsing it 
themselves, then it would seem to make sense to me that the API should 
also be standardized (ideally with a well-documented reference 
implementation).


In some sense this is monopolistic - but hopefully it'd be a benevolent 
monopoly.  If I remember correctly, there was a time when the creator of 
Python referred to himself as the benevolent dictator for life of the 
project; and it turned out pretty well.


[snip]
Not necessarily a bad idea but now through the fog of time I remember 
something oddly reminiscent... ah, CNS! (for those googling for it it's 
not the central nervous system :).


I'm still impressed by the fact that a useful scripting language was 
implemented in fortran.


Pete


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread James Stroud
On Aug 7, 2013, at 2:35 PM, Ed Pozharski wrote:
 If I understand your proposal and reference to SQL correctly, you want some 
 scripting language that sounds like simple English.

I didn't say anything about being English-like. English and other natural 
languages are ill-adapted to describing the well-defined operations one might 
perform on a data structure.

 Is the advantage over existing APIs here that one does not need to learn 
 Python, C++, (or, heaven forbid, FORTRAN)?

Anyone can learn Python in an hour and a half. That's not an issue (except for 
whitespace nuts). If one wants to use Python to modify PDB structural data, I 
recommend starting with the tutorial I wrote for CCTBX: 
http://cctbxwiki.bravais.net/CCTBX_Wiki#Working_with_pdb_Files

The advantage of a language over an API is that an API requires coding overhead 
and must (by the definition of API) be part of an Application. SQL has no 
such requirement and neither would an ideal language for *selecting* and 
*modifying* macromolecular structural data. In SQL, one can make selections and 
modifications without importing libraries, defining a main function, declaring 
variables, etc. Low overhead is probably the reason so many crystallographers 
(myself not included) are fluent in the likes of awk.

 I.e. programs would look like this
 
 ---
 GRAB protein FROM FILE best_model_ever.cif;
 SELECT CHAIN A FROM protein AS chA;
 SET chA BFACTORS TO 30.0;
 GRAB data FROM FILE best_data_ever.cif;
 BIND protein TO data;
 REFINE protein USING BUSTER WITH TLS+ANISO;
 DROP protein INTO FILE better_model_yet.cif;
 ---
 
 Not necessarily a bad idea but now through the fog of time I remember 
 something oddly reminiscent... ah, CNS! (for those googling for it it's not 
 the central nervous system :).

Although a little too much like natural language, it is not a bad idea. But, 
where is the link describing the layer of CNS that looks like that? In my 
X-Plor 3.1 manual (Yale University Press, 1987) I see nothing remotely like 
what you describe. CNS, according to the most recent tutorial for 1.3, looks 
like this:

topology
evaluate ($counter=1)
evaluate ($done=false)
while ( $done = false ) loop read
   if ( exist_topology_infile_$counter = true ) then
  if ( BLANK%topology_infile_$counter = false ) then
 @@topology_infile_$counter
  end if
else
   evaluate ($done=true)
end if
evaluate ($counter=$counter+1)
end loop read
end

This example makes a point about the problems of APIs. Namely, they require 
loops and tests, and lack a true selection mechanism, except perhaps for the 
scripting layer of CNS. But even with CNS, once you have a selection, you must 
loop over it to modify the data.

Although it is likely the best library for working with structural data, 
CCTBX requires a loop just to change a specific chain ID (to the best of my 
knowledge):

pdb_inp = pdb.input(file_name=best-model.pdb)
hierarchy = pdb_inp.construct_hierarchy()
for model in hierarchy.models():
  for chain in model.chains():
if chain.id == A:
  chain.id = B

I don't intend to pick on CCTBX specifically (because the CCTBX developers have 
specific needs to which they program), but loop/test mechanisms are awkward for 
selecting and modifying structural data, and get much more awkward as 
selections get more complex (e.g. selecting the C-alpha of every alanine of 
chain A, etc.).

James

Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Nat Echols
On Wed, Aug 7, 2013 at 2:36 PM, James Stroud xtald...@gmail.com wrote:

 Although it is likely the best library for working with structural data,
 CCTBX requires a loop just to change a specific chain ID (to the best of my
 knowledge):

 ...

 I don't intend to pick on CCTBX specifically (because the CCTBX developers
 have specific needs to which they program), but loop/test mechanisms are
 awkward for selecting and modifying structural data, and get much more
 awkward as selections get more complex (e.g. selecting the C-alpha of every
 alanine of chain A, etc.).


True - it's really an issue of what purpose the libraries were designed
for.  CCTBX wasn't intended to be a general-purpose tool for users to
perform quick manipulations of a model; the goal was to build large,
complex, and more-or-less automated crystallography applications on top of
it.  (The same applies to the CCP4 libraries, mmdb, clipper, etc.;
BioPython I guess is designed for bioinformatics.)  The design of CNS (for
example) reflects an era where it was much more likely that the average
crystallographer knew some programming, worked exclusively on the command
line, built new models manually, and didn't have access to a large number
of convenient tools for purposes like this.  (Or so I've heard; I was in
still in high school.)

Personally, if I need to change a chain ID, I can use Coot or pdbset or
many other tools.  Writing code for this should only be necessary if you're
processing large numbers of models, or have a spectacularly misformatted
PDB file.  Again, I'll repeat what I said before: if it's truly necessary
to view or edit a model by hand or with custom shell scripts, this often
means that the available software is deficient.  PLEASE tell the developers
what you need to get your job done; we can't read minds.

-Nat


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Richard Gildea
The cctbx provides comprehensive tools for handling mmcif files (and indeed all 
types of cif files - it is not fussy), freely available under the BSD-style 
cctbx licence.

Cheers,

Richard

On 7 Aug 2013, at 19:16, Jeffrey, Philip D. pjeff...@princeton.edu wrote:

 Are all the APIs open source ?  I was under the impression that CCP4 had 
 moved away from that, which might justifiably reduce interest in any 
 limited-availability API.
 
 Phil Jeffrey
 Princeton
 
 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of James Stroud 
 [xtald...@gmail.com]
 Sent: Wednesday, August 07, 2013 1:51 PM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] mmCIF as working format?
 
 On Aug 5, 2013, at 4:33 AM, Eugene Krissinel wrote:
 I just hope that one day we all will be discussing a sort of universal API 
 to read/write structural information instead of referencing to raw formats, 
 and routines to query MX data, which would be more appropriate than grep 
 (would many SB students/postdocs use grep these days? but many if them would 
 need to inspect files somehow). This, in essence, is similar to discussing 
 read/write primitives in C/C++/Fortran rather than I/O functions of BIOS and 
 HDD/BUS commands that they drive.
 
 I just want to reinforce this point by quoting it verbatim and also emphasize 
 that it was not lost on some of us.
 
 In the long term, the MM structure community should perhaps get its 
 inspiration from SQL, which focuses on the scope of data and the semantics 
 its manipulation, rather than how the data is encoded beneath the surface.
 
 James

--
This e-mail and any attachments may contain confidential, copyright and or 
privileged material, and are for the use of the intended addressee only. If you 
are not the intended addressee or an authorised recipient of the addressee 
please notify us of receipt by returning the e-mail and do not use, copy, 
retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not 
necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments 
are free from viruses and we cannot accept liability for any damage which you 
may sustain as a result of software viruses which may be transmitted in or with 
the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and 
Wales with its registered office at Diamond House, Harwell Science and 
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom



Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Ed Pozharski

James,

On 08/07/2013 05:36 PM, James Stroud wrote:
Anyone can learn Python in an hour and a half. 


Isn't this a bit of an exaggeration?  Python is designed to be easy to 
learn, but we probably talking about different definitions of learning 
and anyone.



I.e. programs would look like this

---
GRAB protein FROM FILE best_model_ever.cif;
SELECT CHAIN A FROM protein AS chA;
SET chA BFACTORS TO 30.0;
GRAB data FROM FILE best_data_ever.cif;
BIND protein TO data;
REFINE protein USING BUSTER WITH TLS+ANISO;
DROP protein INTO FILE better_model_yet.cif;
---

Not necessarily a bad idea but now through the fog of time I remember something oddly 
reminiscent... ah, CNS! (for those googling for it it's not the central nervous 
system :).
Although a little too much like natural language, it is not a bad idea. But, 
where is the link describing the layer of CNS that looks like that?


I should probably use tongue-in-cheek/tongue-in-check markup next 
time to prevent my poor attempt at humorous tribute to CNS from being 
understood so literally.  At the very least you might agree that CNS is 
the closest thing we ever had to MX-oriented general purpose 
interpreter.  Your quote is also from 
below-the-magic-line-do-not-change area of a CNS script.


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Ed Pozharski

On 08/07/2013 05:54 PM, Nat Echols wrote:
Personally, if I need to change a chain ID, I can use Coot or pdbset 
or many other tools.  Writing code for this should only be necessary 
if you're processing large numbers of models, or have a spectacularly 
misformatted PDB file.  Again, I'll repeat what I said before: if it's 
truly necessary to view or edit a model by hand or with custom shell 
scripts, this often means that the available software is deficient.  
PLEASE tell the developers what you need to get your job done; we 
can't read minds.


Nat,

I don't think anyone here really means that the only way to change a 
chain ID is to write, say, a perl script.  But an interpreter of the 
kind advocated by James (as much as I have hijacked/misinterpreted his 
vision) could indeed be very useful for people pursuing simple 
bioinformatics projects and new ways to analyse structural models. While 
I understand your view that everyone should seek assistance from 
developers with every problem encountered, I also recall some 
reasonable idea about self-sufficiency that should cover scientific 
research (something like give man a fish and you feed him for a day, 
teach him to fish and he starts paying taxes... something along these 
lines ;).  There is a difference betweens tools that allow to easily 
perform useful non-standard analysis and highly specialized tools that 
strive to cover every situation imaginable.


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Ethan Merritt
On Wednesday, August 07, 2013 04:00:16 pm Ed Pozharski wrote:
 On 08/07/2013 05:54 PM, Nat Echols wrote:
  Personally, if I need to change a chain ID, I can use Coot or pdbset 
  or many other tools.  Writing code for this should only be necessary 
  if you're processing large numbers of models, or have a spectacularly 
  misformatted PDB file.  Again, I'll repeat what I said before: if it's 
  truly necessary to view or edit a model by hand or with custom shell 
  scripts, this often means that the available software is deficient.  
  PLEASE tell the developers what you need to get your job done; we 
  can't read minds.
 
 Nat,
 
 I don't think anyone here really means that the only way to change a 
 chain ID is to write, say, a perl script.  But an interpreter of the 
 kind advocated by James (as much as I have hijacked/misinterpreted his 
 vision) could indeed be very useful for people pursuing simple 
 bioinformatics projects and new ways to analyse structural models. 

We tackled this a while back for the then-current incarnation of mmCIF.

   http://www.bmsc.washington.edu/parvati/mmLib.pdf

I suppose it will all have to be revisited so that it knows the quirks,
features, and foibles of the new and improved mmCIF.

Ethan


 While 
 I understand your view that everyone should seek assistance from 
 developers with every problem encountered, I also recall some 
 reasonable idea about self-sufficiency that should cover scientific 
 research (something like give man a fish and you feed him for a day, 
 teach him to fish and he starts paying taxes... something along these 
 lines ;).  There is a difference betweens tools that allow to easily 
 perform useful non-standard analysis and highly specialized tools that 
 strive to cover every situation imaginable.
 
 Cheers,
 
 Ed.
 
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Jeffrey, Philip D.
 I.e. programs would look like this

 ---
 GRAB protein FROM FILE best_model_ever.cif;
 SELECT CHAIN A FROM protein AS chA;
 SET chA BFACTORS TO 30.0;
 GRAB data FROM FILE best_data_ever.cif;
 BIND protein TO data;
 REFINE protein USING BUSTER WITH TLS+ANISO;
 DROP protein INTO FILE better_model_yet.cif;
 ---

This brings to mind James Holton's Elves program(s):
http://bl831.als.lbl.gov/~jamesh/elves/

Phil Jeffrey
Princeton


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Jeffrey, Philip D.
 Nat Echols wrote:
 Personally, if I need to change a chain ID, I can use Coot or pdbset or many 
 other tools.  Writing code for
 this should only be necessary if you're processing large numbers of models, 
 or have a spectacularly
 misformatted PDB file.

Problem.  Coot is bad at the chain label aspect.
Create a pdb file containing residues A1-A20 and X101-X120 - non-overlapping 
numbering.
Try to change the chain label of X to A.
I get WARNING:: CONFLICT: chain id already exists in this molecule

This is (IMHO) a bizarre feature because this is exactly the sort of thing you 
do when building structures.

Therefore I do one of two things:
1.  Open it in (x)emacs, replace  X  with  A  and Bob's your uncle.
2.  Start Peek2 - that's my interactive program for doing simple and stupid 
things like this.  I type read test.pdb and chain and Peek2 prompts me at 
perceived chain breaks (change in chain label, CA-CA breaks, ATOM/HETATM 
transitions c) and then write test.pdb.   Takes less than 10 seconds.  CCP4i 
would probably still be launching, as would Phenix.

The reason I do #1 or #2 is not to be a Luddite, but to do something trivial 
and boring quickly so I can get back to something interesting like building 
structures, or beating subjects to death on CCP4bb.

What's lacking is an interactive, or just plain fast method in any guise, way 
of doing simple PDB manipulations that we do tons of times when building 
protein structures.  I've used Peek2 thousands of times for this purpose, which 
is the only reason it still exists because it's a fairly stupid program.  A 
truly interactive version of PDBSET would be splendid.  But, again, it always 
runs in batch mode.

mmCIF looked promising, apropos emacs, when I looked at the spec page at:
http://www.iucr.org/__data/iucr/cifdic_html/2/cif_mm.dic/Catom_site.html
because that ATOM data is column-formatted.  Cool.  However looking at 6LYZ.cif 
from RCSB's site revealed that the XYZ's were LEFT-justified: 
http://www.rcsb.org/pdb/files/6LYZ.cif
which makes me recoil in horror and resolve to use PDB format until someone 
puts a gun to my head.

Really, guys, if you can put multiple successive spaces to the RIGHT of the 
number, why didn't you put them to the LEFT of it instead ?  Same parsing, 
better readability.

Phil Jeffrey
Princeton
(using the vernacular but deathly serious about protein structure)








Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Andrew Purkiss-Trew

Quoting Jeffrey, Philip D. pjeff...@princeton.edu:


 Nat Echols wrote:
Personally, if I need to change a chain ID, I can use Coot or  
pdbset or many other tools.  Writing code for
this should only be necessary if you're processing large numbers of  
models, or have a spectacularly

misformatted PDB file.


Problem.  Coot is bad at the chain label aspect.
Create a pdb file containing residues A1-A20 and X101-X120 -  
non-overlapping numbering.

Try to change the chain label of X to A.
I get WARNING:: CONFLICT: chain id already exists in this molecule



Having had to show this to a student today, it does work fine if you  
select the Use Residue Range option rather than changing the whole  
chain. Not quite so convenient, but at least it makes the user think.




This message was sent using IMP, the Internet Messaging Program.


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Ethan Merritt
On Wednesday, August 07, 2013 04:54:39 pm Jeffrey, Philip D. wrote:
  Nat Echols wrote:
  Personally, if I need to change a chain ID, I can use Coot or pdbset or 
  many other tools.  Writing code for
  this should only be necessary if you're processing large numbers of models, 
  or have a spectacularly
  misformatted PDB file.
 
 Problem.  Coot is bad at the chain label aspect.
 Create a pdb file containing residues A1-A20 and X101-X120 - non-overlapping 
 numbering.
 Try to change the chain label of X to A.
 I get WARNING:: CONFLICT: chain id already exists in this molecule

That would be a bug.  But it hasn't been true for any version of coot
that I have used.  As you say, this is a common thing to do and I am
certain I would have noticed if it didn't work. I just checked that
it isn't true for 0.7.1-pre.

What _is_ true is that renaming X to A in this case will not re-order
the residues in the file.  So if you had A1-100 followed by B1-10
followed by X101-200 there would not be a peptide  link between A100 and
A(old X)101 after the renaming.
To fix this you need to write out the file and use an editor to move the
records for A101-200 to immediately after the records for A1-100.

This does illustrate the point that expecting all tools to handle all
possible manipulations is unrealistic.  I think there will always be a
need for a separate tool that can do anything imaginable, whether that
tool is vi or emacs or some spiffy new mmCIF editing GUI.

The problem with this is that any tool capable or arbitrarily editing
your file is also capable of subtly mangling your file.  The current PDB
format is horribly sensitive to this.  For example if you
reorder/renumber/relabel ATOM records in a PDB file then references to them
in the header records (TLS, SITE, etc) and LINK/CONECT records will now point
to the wrong atoms.   I am not convinced that the new mmCIF format has gotten
this quite right either, at least in the examples given, but it does have the
flexibility to attach such links or properties directly to the ATOM record
where it is more likely to be carried along correctly if moved. 
That by itself is IMHO enough to justify the switch from PDB to mmCIF.

Ethan


 
 This is (IMHO) a bizarre feature because this is exactly the sort of thing 
 you do when building structures.
 
 Therefore I do one of two things:
 1.  Open it in (x)emacs, replace  X  with  A  and Bob's your uncle.
 2.  Start Peek2 - that's my interactive program for doing simple and stupid 
 things like this.  I type read test.pdb and chain and Peek2 prompts me at 
 perceived chain breaks (change in chain label, CA-CA breaks, ATOM/HETATM 
 transitions c) and then write test.pdb.   Takes less than 10 seconds.  
 CCP4i would probably still be launching, as would Phenix.
 
 The reason I do #1 or #2 is not to be a Luddite, but to do something trivial 
 and boring quickly so I can get back to something interesting like building 
 structures, or beating subjects to death on CCP4bb.
 
 What's lacking is an interactive, or just plain fast method in any guise, way 
 of doing simple PDB manipulations that we do tons of times when building 
 protein structures.  I've used Peek2 thousands of times for this purpose, 
 which is the only reason it still exists because it's a fairly stupid 
 program.  A truly interactive version of PDBSET would be splendid.  But, 
 again, it always runs in batch mode.
 
 mmCIF looked promising, apropos emacs, when I looked at the spec page at:
 http://www.iucr.org/__data/iucr/cifdic_html/2/cif_mm.dic/Catom_site.html
 because that ATOM data is column-formatted.  Cool.  However looking at 
 6LYZ.cif from RCSB's site revealed that the XYZ's were LEFT-justified: 
 http://www.rcsb.org/pdb/files/6LYZ.cif
 which makes me recoil in horror and resolve to use PDB format until someone 
 puts a gun to my head.
 
 Really, guys, if you can put multiple successive spaces to the RIGHT of the 
 number, why didn't you put them to the LEFT of it instead ?  Same parsing, 
 better readability.
 
 Phil Jeffrey
 Princeton
 (using the vernacular but deathly serious about protein structure)
 
 
 
 
 
 
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742