Re: [ccp4bb] mmCIF as working format?

2013-08-08 Thread Tim Gruene
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dear Paul,

my email was meant to be provocative but neither insulting nor offensive
(having provoked quite a few responses when I last used the word
'offense' this email does not suffer from a subtle misinterpretation of
mine while not using my mothertongue. The German 'Provokation' is
something a scientist would welcome since it is critism expressed in a
rhetorically pleasing or intelectually amusing way, and criticism
drives science forward).

I am very grateful never to have met a non-helpful developer when I
addressed one with a request or suggestion and I fully agree with you.

I rather meant to point out that most developers are usually
overwhelmed with work, suggestions, or ideas for improvements, and for
that reason I think having formats that allow users to help themselves
or each other (while of course they still can suggest their ideas to
the developers) is a good thing, and having a format that only allows
access through some API (or help from a developer) is not.

I would also like to point out that my initial fear that we were
moving away from such a format with the replacement of PDB with mmCIF
has been soothened with this discussion and hence the content of my
previous paragraph is deprecated w.r.t this thread's context.

Regards,
Tim

On 08/08/2013 05:29 PM, Paul Adams wrote:
> Tim,
> 
> I'm sure your email was tongue-in-check, but it's provocative 
> nevertheless. I suspect that Nat's point was that scientific
> software developers (who are predominantly scientists of course)
> are helpful people who want to see their field of research be
> successful. If it is possible to spend an hour writing a tool that
> helps several thousand researchers to do their work that's probably
> a valuable use of time. An enlightened funding agency might even
> see the value! Sometimes it's a challenge to figure out exactly
> what would be of most help, hence Nat's plea for input. I don't
> know about other software development efforts, but we're very happy
> to get ideas and suggestions from researchers - just don't assume
> that we can implement them all (by tomorrow)!
> 
> Cheers, Paul
> 
> On Aug 8, 2013, at 12:17 AM, Tim Gruene  
> wrote:
> 
> 
> 
> On 08/07/2013 11:54 PM, Nat Echols wrote:
> 
 PLEASE tell the developers what you need to get your job
 done; we can't read minds.
 
 -Nat
 
> 
> Dear Nat,
> 
> I have a student working for me until the end of the month. I asked
>  her to calculate the mean ratio of U(H)/U(X) where X is the atom
> the corresponding hydrogen is bound to. I would like her to group 
> together as follows:
> 
> 1) all N-H and O-H within that protein 2) all Calpha-Halpha 3) all 
> remaining C-H bonds 4) all O-H from the H2O and H3O in the 
> structure.
> 
> I am not sure whom to address this request to, so please forward
> it to the developer. If the could would actually work on a shelxl 
> res-file it would be brilliant. I shall not ask George for this 
> software because as a scientist he has much more important and
> much more general problems to work on than this.
> 
> At the moment the person is doing it by hand which might take a
> day. So if you could return the code by tomorrow that would be
> nice.
> 
> Out of the tens of thousands of crystallographers coming up with 
> funny ideas (because, yes, you cannot read minds) you might
> receive such requests several times a day. And you seriously think
> this is the way we should go? Bless your funding agencies.
> 
> Cheers, Tim
> 
> P.S.: I found this discussing about mmCIF  quite interesting, and 
> since I was reminded that mmCIF is still kind of line oriented, I
> am pretty relieved. I just don't think that a 'universal' API
> exists - the student I am talking about does not know any
> programming language at all, and the next student might require an
> API in scheme, ruby, java, C#++-3.141, fortran-123, ...
> 

- -- 
Dr Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFSA8/4UxlJ7aRr7hoRAkzkAJ9ZLVYbzRQKerwADyH3c9nkqd44EwCeMlLD
iDGIYVZuI1YDhgbyaWtOJkQ=
=cwrn
-END PGP SIGNATURE-


Re: [ccp4bb] mmCIF as working format?

2013-08-08 Thread Paul Adams
Tim,

  I'm sure your email was tongue-in-check, but it's provocative nevertheless. I 
suspect that Nat's point was that scientific software developers (who are 
predominantly scientists of course) are helpful people who want to see their 
field of research be successful. If it is possible to spend an hour writing a 
tool that helps several thousand researchers to do their work that's probably a 
valuable use of time. An enlightened funding agency might even see the value! 
Sometimes it's a challenge to figure out exactly what would be of most help, 
hence Nat's plea for input. I don't know about other software development 
efforts, but we're very happy to get ideas and suggestions from researchers - 
just don't assume that we can implement them all (by tomorrow)! 

  Cheers,
Paul

On Aug 8, 2013, at 12:17 AM, Tim Gruene  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> 
> 
> On 08/07/2013 11:54 PM, Nat Echols wrote:
> 
>> PLEASE tell the developers what you need to get your job done; we
>> can't read minds.
>> 
>> -Nat
>> 
> 
> Dear Nat,
> 
> I have a student working for me until the end of the month. I asked
> her to calculate the mean ratio of U(H)/U(X) where X is the atom the
> corresponding hydrogen is bound to. I would like her to group together
> as follows:
> 
> 1) all N-H and O-H within that protein
> 2) all Calpha-Halpha
> 3) all remaining C-H bonds
> 4) all O-H from the H2O and H3O in the structure.
> 
> I am not sure whom to address this request to, so please forward it to
> the developer. If the could would actually work on a shelxl res-file
> it would be brilliant. I shall not ask George for this software
> because as a scientist he has much more important and much more
> general problems to work on than this.
> 
> At the moment the person is doing it by hand which might take a day.
> So if you could return the code by tomorrow that would be nice.
> 
> Out of the tens of thousands of crystallographers coming up with funny
> ideas (because, yes, you cannot read minds) you might receive such
> requests several times a day. And you seriously think this is the way
> we should go? Bless your funding agencies.
> 
> Cheers,
> Tim
> 
> P.S.: I found this discussing about mmCIF  quite interesting, and
> since I was reminded that mmCIF is still kind of line oriented, I am
> pretty relieved. I just don't think that a 'universal' API exists -
> the student I am talking about does not know any programming language
> at all, and the next student might require an API in scheme, ruby,
> java, C#++-3.141, fortran-123, ...
> - -- 
> - --
> Dr Tim Gruene
> Institut fuer anorganische Chemie
> Tammannstr. 4
> D-37077 Goettingen
> 
> GPG Key ID = A46BEE1A
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iD8DBQFSA0YIUxlJ7aRr7hoRApyzAJ94tzJVf81vOggf7KO9SEwoidUz2QCcCkwQ
> 9IB2FlSTW7oiMP21vUP7QsY=
> =dGz8
> -END PGP SIGNATURE-

-- 
Paul Adams
Deputy Division Director, Physical Biosciences Division, Lawrence Berkeley Lab
Division Deputy for Biosciences, Advanced Light Source, Lawrence Berkeley Lab
Adjunct Professor, Department of Bioengineering, U.C. Berkeley
Vice President for Technology, the Joint BioEnergy Institute
Laboratory Research Manager, ENIGMA Science Focus Area

Building 64, Room 248
Building 80, Room 247
Building 978, Room 4126
Tel: 1-510-486-4225, Fax: 1-510-486-5909
http://cci.lbl.gov/paul

Lawrence Berkeley Laboratory
1 Cyclotron Road
BLDG 64R0121
Berkeley, CA 94720, USA.

Executive Assistant: Louise Benvenue [ lbenve...@lbl.gov ][ 1-510-495-2506 ]
--


Re: [ccp4bb] mmCIF as working format?

2013-08-08 Thread Phil Jeffrey

On 8/7/13 8:27 PM, Ethan Merritt wrote:

That would be a bug.  But it hasn't been true for any version of coot
that I have used.  As you say, this is a common thing to do and I am
certain I would have noticed if it didn't work. I just checked that
it isn't true for 0.7.1-pre.


Thanks.
Turns out I'm using 0.7 and 0.7-pre on the octacore Mac and the laptop I 
use for building - slightly different versions updated at different 
times.  I'll change versions.


Apropos the other point I invariably do segment reordering via xemacs 
cut and paste although clearly Peek2 needs a "reorder" command.


Phil


Re: [ccp4bb] mmCIF as working format?

2013-08-08 Thread Robbie Joosten
Apart from editors we also need tools to validate mmCIF files for integrity, 
similar to what W3C has for (x)html and css.

I've mostly dealt with mmCIF reflection files so my experience with what can go 
wrong is limited. So far, I encountered these 'issues' that may be flagged.

1) Data items given twice with different values. This ambiguous, I suppose most 
parsers will use the last value given.

2) Values that should not occur for a specific data item. E.g. 19 in 
_refln.status

3) Proper closing of text blocks.

4) Things that can go in one loop, should go in one loop. I've seen examples 
where the Fmean and sigF are in one loop and I+ and I- are in another. It's not 
wrong, but annoying.

5) Proper space delimited values in loops.

6) Wrapping. Should this be allowed or not? I'm not a fan...

7) Data given in plain text or in new data items, even though proper data items 
exists.

8) Silly data such as negative amplitudes, suspiciously high values for h,k,l 
(such as 999), intensities between -180 and 180

There must be more things that could be checked.

Cheers,
Robbie

Sent from my Windows Phone

-Oorspronkelijk bericht-
Van: Ethan Merritt
Verzonden: 8-8-2013 2:28
Aan: CCP4BB@JISCMAIL.AC.UK
Onderwerp: Re: [ccp4bb] mmCIF as working format?



Re: [ccp4bb] mmCIF as working format?

2013-08-08 Thread Tim Gruene
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



On 08/07/2013 11:54 PM, Nat Echols wrote:

> PLEASE tell the developers what you need to get your job done; we
> can't read minds.
> 
> -Nat
> 

Dear Nat,

I have a student working for me until the end of the month. I asked
her to calculate the mean ratio of U(H)/U(X) where X is the atom the
corresponding hydrogen is bound to. I would like her to group together
as follows:

1) all N-H and O-H within that protein
2) all Calpha-Halpha
3) all remaining C-H bonds
4) all O-H from the H2O and H3O in the structure.

I am not sure whom to address this request to, so please forward it to
the developer. If the could would actually work on a shelxl res-file
it would be brilliant. I shall not ask George for this software
because as a scientist he has much more important and much more
general problems to work on than this.

At the moment the person is doing it by hand which might take a day.
So if you could return the code by tomorrow that would be nice.

Out of the tens of thousands of crystallographers coming up with funny
ideas (because, yes, you cannot read minds) you might receive such
requests several times a day. And you seriously think this is the way
we should go? Bless your funding agencies.

Cheers,
Tim

P.S.: I found this discussing about mmCIF  quite interesting, and
since I was reminded that mmCIF is still kind of line oriented, I am
pretty relieved. I just don't think that a 'universal' API exists -
the student I am talking about does not know any programming language
at all, and the next student might require an API in scheme, ruby,
java, C#++-3.141, fortran-123, ...
- -- 
- --
Dr Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFSA0YIUxlJ7aRr7hoRApyzAJ94tzJVf81vOggf7KO9SEwoidUz2QCcCkwQ
9IB2FlSTW7oiMP21vUP7QsY=
=dGz8
-END PGP SIGNATURE-


Re: [ccp4bb] mmCIF as working format?

2013-08-08 Thread Phil Evans
I hope that some [X]Emacs expert can rewrite Charlie Bond's wonderful pdb-mode 
to work with mmCIF files (or at least the coordinate bits)

… for exactly the reasons Phil Jeffrey points out

Phil

On 8 Aug 2013, at 00:54, "Jeffrey, Philip D."  wrote:

>  Nat Echols wrote:
> > Personally, if I need to change a chain ID, I can use Coot or pdbset or 
> > many other tools.  Writing code for 
> > this should only be necessary if you're processing large numbers of models, 
> > or have a spectacularly 
> > misformatted PDB file.
> 
> Problem.  Coot is bad at the chain label aspect.
> Create a pdb file containing residues A1-A20 and X101-X120 - non-overlapping 
> numbering.
> Try to change the chain label of X to A.
> I get "WARNING:: CONFLICT: chain id already exists in this molecule"
> 
> This is (IMHO) a bizarre feature because this is exactly the sort of thing 
> you do when building structures.
> 
> Therefore I do one of two things:
> 1.  Open it in (x)emacs, replace " X " with " A " and Bob's your uncle.
> 2.  Start Peek2 - that's my interactive program for doing simple and stupid 
> things like this.  I type "read test.pdb" and "chain" and Peek2 prompts me at 
> perceived chain breaks (change in chain label, CA-CA breaks, ATOM/HETATM 
> transitions &c) and then "write test.pdb".   Takes less than 10 seconds.  
> CCP4i would probably still be launching, as would Phenix.
> 
> The reason I do #1 or #2 is not to be a Luddite, but to do something trivial 
> and boring quickly so I can get back to something interesting like building 
> structures, or beating subjects to death on CCP4bb.
> 
> What's lacking is an interactive, or just plain fast method in any guise, way 
> of doing simple PDB manipulations that we do tons of times when building 
> protein structures.  I've used Peek2 thousands of times for this purpose, 
> which is the only reason it still exists because it's a fairly stupid 
> program.  A truly interactive version of PDBSET would be splendid.  But, 
> again, it always runs in batch mode.
> 
> mmCIF looked promising, apropos emacs, when I looked at the spec page at:
> http://www.iucr.org/__data/iucr/cifdic_html/2/cif_mm.dic/Catom_site.html
> because that ATOM data is column-formatted.  Cool.  However looking at 
> 6LYZ.cif from RCSB's site revealed that the XYZ's were LEFT-justified: 
> http://www.rcsb.org/pdb/files/6LYZ.cif
> which makes me recoil in horror and resolve to use PDB format until someone 
> puts a gun to my head.
> 
> Really, guys, if you can put multiple successive spaces to the RIGHT of the 
> number, why didn't you put them to the LEFT of it instead ?  Same parsing, 
> better readability.
> 
> Phil Jeffrey
> Princeton
> (using the vernacular but deathly serious about protein structure)


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Ethan Merritt
On Wednesday, August 07, 2013 04:54:39 pm Jeffrey, Philip D. wrote:
>  Nat Echols wrote:
> > Personally, if I need to change a chain ID, I can use Coot or pdbset or 
> > many other tools.  Writing code for
> > this should only be necessary if you're processing large numbers of models, 
> > or have a spectacularly
> > misformatted PDB file.
> 
> Problem.  Coot is bad at the chain label aspect.
> Create a pdb file containing residues A1-A20 and X101-X120 - non-overlapping 
> numbering.
> Try to change the chain label of X to A.
> I get "WARNING:: CONFLICT: chain id already exists in this molecule"

That would be a bug.  But it hasn't been true for any version of coot
that I have used.  As you say, this is a common thing to do and I am
certain I would have noticed if it didn't work. I just checked that
it isn't true for 0.7.1-pre.

What _is_ true is that renaming X to A in this case will not re-order
the residues in the file.  So if you had A1-100 followed by B1-10
followed by X101-200 there would not be a peptide  link between A100 and
A(old X)101 after the renaming.
To fix this you need to write out the file and use an editor to move the
records for A101-200 to immediately after the records for A1-100.

This does illustrate the point that expecting all tools to handle all
possible manipulations is unrealistic.  I think there will always be a
need for a separate tool that can do anything imaginable, whether that
tool is vi or emacs or some spiffy new mmCIF editing GUI.

The problem with this is that any tool capable or arbitrarily editing
your file is also capable of subtly mangling your file.  The current PDB
format is horribly sensitive to this.  For example if you
reorder/renumber/relabel ATOM records in a PDB file then references to them
in the header records (TLS, SITE, etc) and LINK/CONECT records will now point
to the wrong atoms.   I am not convinced that the new mmCIF format has gotten
this quite right either, at least in the examples given, but it does have the
flexibility to attach such links or properties directly to the ATOM record
where it is more likely to be carried along correctly if moved. 
That by itself is IMHO enough to justify the switch from PDB to mmCIF.

Ethan


> 
> This is (IMHO) a bizarre feature because this is exactly the sort of thing 
> you do when building structures.
> 
> Therefore I do one of two things:
> 1.  Open it in (x)emacs, replace " X " with " A " and Bob's your uncle.
> 2.  Start Peek2 - that's my interactive program for doing simple and stupid 
> things like this.  I type "read test.pdb" and "chain" and Peek2 prompts me at 
> perceived chain breaks (change in chain label, CA-CA breaks, ATOM/HETATM 
> transitions &c) and then "write test.pdb".   Takes less than 10 seconds.  
> CCP4i would probably still be launching, as would Phenix.
> 
> The reason I do #1 or #2 is not to be a Luddite, but to do something trivial 
> and boring quickly so I can get back to something interesting like building 
> structures, or beating subjects to death on CCP4bb.
> 
> What's lacking is an interactive, or just plain fast method in any guise, way 
> of doing simple PDB manipulations that we do tons of times when building 
> protein structures.  I've used Peek2 thousands of times for this purpose, 
> which is the only reason it still exists because it's a fairly stupid 
> program.  A truly interactive version of PDBSET would be splendid.  But, 
> again, it always runs in batch mode.
> 
> mmCIF looked promising, apropos emacs, when I looked at the spec page at:
> http://www.iucr.org/__data/iucr/cifdic_html/2/cif_mm.dic/Catom_site.html
> because that ATOM data is column-formatted.  Cool.  However looking at 
> 6LYZ.cif from RCSB's site revealed that the XYZ's were LEFT-justified: 
> http://www.rcsb.org/pdb/files/6LYZ.cif
> which makes me recoil in horror and resolve to use PDB format until someone 
> puts a gun to my head.
> 
> Really, guys, if you can put multiple successive spaces to the RIGHT of the 
> number, why didn't you put them to the LEFT of it instead ?  Same parsing, 
> better readability.
> 
> Phil Jeffrey
> Princeton
> (using the vernacular but deathly serious about protein structure)
> 
> 
> 
> 
> 
> 
> 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Andrew Purkiss-Trew

Quoting "Jeffrey, Philip D." :


 Nat Echols wrote:
Personally, if I need to change a chain ID, I can use Coot or  
pdbset or many other tools.  Writing code for
this should only be necessary if you're processing large numbers of  
models, or have a spectacularly

misformatted PDB file.


Problem.  Coot is bad at the chain label aspect.
Create a pdb file containing residues A1-A20 and X101-X120 -  
non-overlapping numbering.

Try to change the chain label of X to A.
I get "WARNING:: CONFLICT: chain id already exists in this molecule"



Having had to show this to a student today, it does work fine if you  
select the "Use Residue Range" option rather than changing the whole  
chain. Not quite so convenient, but at least it makes the user think.




This message was sent using IMP, the Internet Messaging Program.


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Jeffrey, Philip D.
 Nat Echols wrote:
> Personally, if I need to change a chain ID, I can use Coot or pdbset or many 
> other tools.  Writing code for
> this should only be necessary if you're processing large numbers of models, 
> or have a spectacularly
> misformatted PDB file.

Problem.  Coot is bad at the chain label aspect.
Create a pdb file containing residues A1-A20 and X101-X120 - non-overlapping 
numbering.
Try to change the chain label of X to A.
I get "WARNING:: CONFLICT: chain id already exists in this molecule"

This is (IMHO) a bizarre feature because this is exactly the sort of thing you 
do when building structures.

Therefore I do one of two things:
1.  Open it in (x)emacs, replace " X " with " A " and Bob's your uncle.
2.  Start Peek2 - that's my interactive program for doing simple and stupid 
things like this.  I type "read test.pdb" and "chain" and Peek2 prompts me at 
perceived chain breaks (change in chain label, CA-CA breaks, ATOM/HETATM 
transitions &c) and then "write test.pdb".   Takes less than 10 seconds.  CCP4i 
would probably still be launching, as would Phenix.

The reason I do #1 or #2 is not to be a Luddite, but to do something trivial 
and boring quickly so I can get back to something interesting like building 
structures, or beating subjects to death on CCP4bb.

What's lacking is an interactive, or just plain fast method in any guise, way 
of doing simple PDB manipulations that we do tons of times when building 
protein structures.  I've used Peek2 thousands of times for this purpose, which 
is the only reason it still exists because it's a fairly stupid program.  A 
truly interactive version of PDBSET would be splendid.  But, again, it always 
runs in batch mode.

mmCIF looked promising, apropos emacs, when I looked at the spec page at:
http://www.iucr.org/__data/iucr/cifdic_html/2/cif_mm.dic/Catom_site.html
because that ATOM data is column-formatted.  Cool.  However looking at 6LYZ.cif 
from RCSB's site revealed that the XYZ's were LEFT-justified: 
http://www.rcsb.org/pdb/files/6LYZ.cif
which makes me recoil in horror and resolve to use PDB format until someone 
puts a gun to my head.

Really, guys, if you can put multiple successive spaces to the RIGHT of the 
number, why didn't you put them to the LEFT of it instead ?  Same parsing, 
better readability.

Phil Jeffrey
Princeton
(using the vernacular but deathly serious about protein structure)








Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Jeffrey, Philip D.
> I.e. programs would look like this
>
> ---
> GRAB protein FROM FILE "best_model_ever.cif";
> SELECT CHAIN A FROM protein AS chA;
> SET chA BFACTORS TO 30.0;
> GRAB data FROM FILE "best_data_ever.cif";
> BIND protein TO data;
> REFINE protein USING BUSTER WITH TLS+ANISO;
> DROP protein INTO FILE "better_model_yet.cif";
> ---

This brings to mind James Holton's Elves program(s):
http://bl831.als.lbl.gov/~jamesh/elves/

Phil Jeffrey
Princeton


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Ethan Merritt
On Wednesday, August 07, 2013 04:00:16 pm Ed Pozharski wrote:
> On 08/07/2013 05:54 PM, Nat Echols wrote:
> > Personally, if I need to change a chain ID, I can use Coot or pdbset 
> > or many other tools.  Writing code for this should only be necessary 
> > if you're processing large numbers of models, or have a spectacularly 
> > misformatted PDB file.  Again, I'll repeat what I said before: if it's 
> > truly necessary to view or edit a model by hand or with custom shell 
> > scripts, this often means that the available software is deficient.  
> > PLEASE tell the developers what you need to get your job done; we 
> > can't read minds.
> 
> Nat,
> 
> I don't think anyone here really means that the only way to change a 
> chain ID is to write, say, a perl script.  But an interpreter of the 
> kind advocated by James (as much as I have hijacked/misinterpreted his 
> vision) could indeed be very useful for people pursuing simple 
> bioinformatics projects and new ways to analyse structural models. 

We tackled this a while back for the then-current incarnation of mmCIF.

   http://www.bmsc.washington.edu/parvati/mmLib.pdf

I suppose it will all have to be revisited so that it knows the quirks,
features, and foibles of the new and improved mmCIF.

Ethan


> While 
> I understand your view that everyone should seek assistance from 
> "developers" with every problem encountered, I also recall some 
> reasonable idea about self-sufficiency that should cover scientific 
> research (something like "give man a fish and you feed him for a day, 
> teach him to fish and he starts paying taxes"... something along these 
> lines ;).  There is a difference betweens tools that allow to easily 
> perform useful non-standard analysis and highly specialized tools that 
> strive to cover every situation imaginable.
> 
> Cheers,
> 
> Ed.
> 
> 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Ed Pozharski

On 08/07/2013 05:54 PM, Nat Echols wrote:
Personally, if I need to change a chain ID, I can use Coot or pdbset 
or many other tools.  Writing code for this should only be necessary 
if you're processing large numbers of models, or have a spectacularly 
misformatted PDB file.  Again, I'll repeat what I said before: if it's 
truly necessary to view or edit a model by hand or with custom shell 
scripts, this often means that the available software is deficient.  
PLEASE tell the developers what you need to get your job done; we 
can't read minds.


Nat,

I don't think anyone here really means that the only way to change a 
chain ID is to write, say, a perl script.  But an interpreter of the 
kind advocated by James (as much as I have hijacked/misinterpreted his 
vision) could indeed be very useful for people pursuing simple 
bioinformatics projects and new ways to analyse structural models. While 
I understand your view that everyone should seek assistance from 
"developers" with every problem encountered, I also recall some 
reasonable idea about self-sufficiency that should cover scientific 
research (something like "give man a fish and you feed him for a day, 
teach him to fish and he starts paying taxes"... something along these 
lines ;).  There is a difference betweens tools that allow to easily 
perform useful non-standard analysis and highly specialized tools that 
strive to cover every situation imaginable.


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Ed Pozharski

James,

On 08/07/2013 05:36 PM, James Stroud wrote:
Anyone can learn Python in an hour and a half. 


Isn't this a bit of an exaggeration?  Python is designed to be easy to 
learn, but we probably talking about different definitions of "learning" 
and "anyone".



I.e. programs would look like this

---
GRAB protein FROM FILE "best_model_ever.cif";
SELECT CHAIN A FROM protein AS chA;
SET chA BFACTORS TO 30.0;
GRAB data FROM FILE "best_data_ever.cif";
BIND protein TO data;
REFINE protein USING BUSTER WITH TLS+ANISO;
DROP protein INTO FILE "better_model_yet.cif";
---

Not necessarily a bad idea but now through the fog of time I remember something oddly 
reminiscent... ah, CNS! (for those googling for it it's not the "central nervous 
system" :).
Although a little too much like natural language, it is not a bad idea. But, 
where is the link describing the layer of CNS that looks like that?


I should probably use  markup next 
time to prevent my poor attempt at humorous tribute to CNS from being 
understood so literally.  At the very least you might agree that CNS is 
the closest thing we ever had to MX-oriented general purpose 
interpreter.  Your quote is also from 
"below-the-magic-line-do-not-change" area of a CNS script.


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Richard Gildea
The cctbx provides comprehensive tools for handling mmcif files (and indeed all 
types of cif files - it is not fussy), freely available under the BSD-style 
cctbx licence.

Cheers,

Richard

On 7 Aug 2013, at 19:16, "Jeffrey, Philip D."  wrote:

> Are all the APIs open source ?  I was under the impression that CCP4 had 
> moved away from that, which might justifiably reduce interest in any 
> limited-availability API.
> 
> Phil Jeffrey
> Princeton
> 
> From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of James Stroud 
> [xtald...@gmail.com]
> Sent: Wednesday, August 07, 2013 1:51 PM
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] mmCIF as working format?
> 
> On Aug 5, 2013, at 4:33 AM, Eugene Krissinel wrote:
>> I just hope that one day we all will be discussing a sort of universal API 
>> to read/write structural information instead of referencing to raw formats, 
>> and routines to query MX data, which would be more appropriate than grep 
>> (would many SB students/postdocs use grep these days? but many if them would 
>> need to inspect files somehow). This, in essence, is similar to discussing 
>> read/write primitives in C/C++/Fortran rather than I/O functions of BIOS and 
>> HDD/BUS commands that they drive.
> 
> I just want to reinforce this point by quoting it verbatim and also emphasize 
> that it was not lost on some of us.
> 
> In the long term, the MM structure community should perhaps get its 
> inspiration from SQL, which focuses on the scope of data and the semantics 
> its manipulation, rather than how the data is encoded beneath the surface.
> 
> James

--
This e-mail and any attachments may contain confidential, copyright and or 
privileged material, and are for the use of the intended addressee only. If you 
are not the intended addressee or an authorised recipient of the addressee 
please notify us of receipt by returning the e-mail and do not use, copy, 
retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not 
necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments 
are free from viruses and we cannot accept liability for any damage which you 
may sustain as a result of software viruses which may be transmitted in or with 
the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and 
Wales with its registered office at Diamond House, Harwell Science and 
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom



Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Nat Echols
On Wed, Aug 7, 2013 at 2:36 PM, James Stroud  wrote:

> Although it is likely the "best" library for working with structural data,
> CCTBX requires a loop just to change a specific chain ID (to the best of my
> knowledge):
>
> ...
>
> I don't intend to pick on CCTBX specifically (because the CCTBX developers
> have specific needs to which they program), but loop/test mechanisms are
> awkward for selecting and modifying structural data, and get much more
> awkward as selections get more complex (e.g. selecting the C-alpha of every
> alanine of chain A, etc.).
>

True - it's really an issue of what purpose the libraries were designed
for.  CCTBX wasn't intended to be a general-purpose tool for users to
perform quick manipulations of a model; the goal was to build large,
complex, and more-or-less automated crystallography applications on top of
it.  (The same applies to the CCP4 libraries, mmdb, clipper, etc.;
BioPython I guess is designed for bioinformatics.)  The design of CNS (for
example) reflects an era where it was much more likely that the average
crystallographer knew some programming, worked exclusively on the command
line, built new models manually, and didn't have access to a large number
of convenient tools for purposes like this.  (Or so I've heard; I was in
still in high school.)

Personally, if I need to change a chain ID, I can use Coot or pdbset or
many other tools.  Writing code for this should only be necessary if you're
processing large numbers of models, or have a spectacularly misformatted
PDB file.  Again, I'll repeat what I said before: if it's truly necessary
to view or edit a model by hand or with custom shell scripts, this often
means that the available software is deficient.  PLEASE tell the developers
what you need to get your job done; we can't read minds.

-Nat


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread James Stroud
On Aug 7, 2013, at 2:35 PM, Ed Pozharski wrote:
> If I understand your proposal and reference to SQL correctly, you want some 
> scripting language that sounds like simple English.

I didn't say anything about being English-like. English and other natural 
languages are ill-adapted to describing the well-defined operations one might 
perform on a data structure.

> Is the advantage over existing APIs here that one does not need to learn 
> Python, C++, (or, heaven forbid, FORTRAN)?

Anyone can learn Python in an hour and a half. That's not an issue (except for 
whitespace nuts). If one wants to use Python to modify PDB structural data, I 
recommend starting with the tutorial I wrote for CCTBX: 
http://cctbxwiki.bravais.net/CCTBX_Wiki#Working_with_pdb_Files

The advantage of a language over an API is that an API requires coding overhead 
and must (by the definition of "API") be part of an "Application". SQL has no 
such requirement and neither would an ideal language for *selecting* and 
*modifying* macromolecular structural data. In SQL, one can make selections and 
modifications without importing libraries, defining a main function, declaring 
variables, etc. Low overhead is probably the reason so many crystallographers 
(myself not included) are fluent in the likes of awk.

> I.e. programs would look like this
> 
> ---
> GRAB protein FROM FILE "best_model_ever.cif";
> SELECT CHAIN A FROM protein AS chA;
> SET chA BFACTORS TO 30.0;
> GRAB data FROM FILE "best_data_ever.cif";
> BIND protein TO data;
> REFINE protein USING BUSTER WITH TLS+ANISO;
> DROP protein INTO FILE "better_model_yet.cif";
> ---
> 
> Not necessarily a bad idea but now through the fog of time I remember 
> something oddly reminiscent... ah, CNS! (for those googling for it it's not 
> the "central nervous system" :).

Although a little too much like natural language, it is not a bad idea. But, 
where is the link describing the layer of CNS that looks like that? In my 
X-Plor 3.1 manual (Yale University Press, 1987) I see nothing remotely like 
what you describe. CNS, according to the most recent tutorial for 1.3, looks 
like this:

topology
evaluate ($counter=1)
evaluate ($done=false)
while ( $done = false ) loop read
   if ( &exist_topology_infile_$counter = true ) then
  if ( &BLANK%topology_infile_$counter = false ) then
 @@&topology_infile_$counter
  end if
else
   evaluate ($done=true)
end if
evaluate ($counter=$counter+1)
end loop read
end

This example makes a point about the problems of APIs. Namely, they require 
loops and tests, and lack a true selection mechanism, except perhaps for the 
scripting layer of CNS. But even with CNS, once you have a selection, you must 
loop over it to modify the data.

Although it is likely the "best" library for working with structural data, 
CCTBX requires a loop just to change a specific chain ID (to the best of my 
knowledge):

pdb_inp = pdb.input(file_name="best-model.pdb")
hierarchy = pdb_inp.construct_hierarchy()
for model in hierarchy.models():
  for chain in model.chains():
if chain.id == "A":
  chain.id = "B"

I don't intend to pick on CCTBX specifically (because the CCTBX developers have 
specific needs to which they program), but loop/test mechanisms are awkward for 
selecting and modifying structural data, and get much more awkward as 
selections get more complex (e.g. selecting the C-alpha of every alanine of 
chain A, etc.).

James

Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Pete Meyer

Ed Pozharski wrote:
[snip]
If I understand your proposal and reference to SQL correctly, you want 
some scripting language that sounds like simple English.  Is the 
advantage over existing APIs here that one does not need to learn 
Python, C++, (or, heaven forbid, FORTRAN)?  I.e. programs would look 
like this


XML DOM is probably a better example of a standardized API to shoot for 
than SQL in this case.  Regardless of which language or library you use, 
getChildNodes still does the same thing (at least conceptually).


If the recommendation is that crystallographers should be using an API 
for data stored in a standardized format instead of parsing it 
themselves, then it would seem to make sense to me that the API should 
also be standardized (ideally with a well-documented reference 
implementation).


In some sense this is monopolistic - but hopefully it'd be a benevolent 
monopoly.  If I remember correctly, there was a time when the creator of 
Python referred to himself as the "benevolent dictator for life" of the 
project; and it turned out pretty well.


[snip]
Not necessarily a bad idea but now through the fog of time I remember 
something oddly reminiscent... ah, CNS! (for those googling for it it's 
not the "central nervous system" :).


I'm still impressed by the fact that a useful scripting language was 
implemented in fortran.


Pete


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Ed Pozharski

On 08/07/2013 03:54 PM, James Stroud wrote:

On Aug 7, 2013, at 1:06 PM, Ed Pozharski wrote:

On 08/07/2013 01:51 PM, James Stroud wrote:

In the long term, the MM structure community should perhaps get its inspiration 
from SQL

For this to work, a particular interface must monopolize access to structural 
data.

Not necessarily, although the alternative pathway might be more idealistic and 
hence unrealistic.

All that needs to happen is that the community agree on

1. What is the finite set of essential/useful attributes of macromolecular 
structural data.
2. What is the syntax of (a) accessing and (b) modifying those attributes.
3. What is the syntax of selecting subsets of structural data based on those 
attributes.

The resulting syntax (i.e. language) itself should be terse, easy to learn, 
easy to use, and preferably easy to implement.

If such a standard is created, then I believe awk-ing/grep-ing/sed-ing/etc PDBs 
and mmCIFs would quickly become historical.

James

James,

frankly, I am not sure which part of your description is not equivalent 
to "monopolistic interface".


If I understand your proposal and reference to SQL correctly, you want 
some scripting language that sounds like simple English.  Is the 
advantage over existing APIs here that one does not need to learn 
Python, C++, (or, heaven forbid, FORTRAN)?  I.e. programs would look 
like this


---
GRAB protein FROM FILE "best_model_ever.cif";
SELECT CHAIN A FROM protein AS chA;
SET chA BFACTORS TO 30.0;
GRAB data FROM FILE "best_data_ever.cif";
BIND protein TO data;
REFINE protein USING BUSTER WITH TLS+ANISO;
DROP protein INTO FILE "better_model_yet.cif";
---

Not necessarily a bad idea but now through the fog of time I remember 
something oddly reminiscent... ah, CNS! (for those googling for it it's 
not the "central nervous system" :).


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread George Sheldrick
The flexibility of CIF is indeed infinite. Even the names of the 
unit-cell dimsnsions are different in mmCIF and (small molecule) core CIF.
One of the main reasons why I had to bring out a new version of SHELXL 
recently (SHELXL-2013 to replace SHELXL-97) was that in the

meantime COMCIFS committee had changed many of the names.

George



meantime the COMCIFS committee of the IUCr had changed many of the names.


On 08/07/2013 10:02 PM, Nat Echols wrote:
On Wed, Aug 7, 2013 at 12:54 PM, James Stroud > wrote:


All that needs to happen is that the community agree on

1. What is the finite set of essential/useful attributes of
macromolecular structural data.
2. What is the syntax of (a) accessing and (b) modifying those
attributes.
3. What is the syntax of selecting subsets of structural data
based on those attributes.

The resulting syntax (i.e. language) itself should be terse, easy
to learn, easy to use, and preferably easy to implement.


Ah, but the nice thing about mmCIF is that it isn't truly "finite" - 
the PDB may limit what tags are actually included in the distributed 
files, but there is nothing preventing other developers from including 
their own tags, and there is a community process for extending the 
officially defined tags.  Item (2) is very well-established, unlike 
the current chaos of REMARK records.  I think (3) will be left to the 
various libraries to deal with.


-Nat



--
Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-33021 or -33068
Fax. +49-551-39-22582




Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Frances C. Bernstein

 Nobody has addressed the fact that mmCIF is a format
that allows for many ways of presenting the same data.  The
recent discussions seem to be based on the assumption that
all mmCIF files will look like those currently prepared by
the PDB.

 Any code that reads an mmCIF file should be prepared to
read any file that meets the mmCIF specifications.  This
requires the use of software tools and it may not be possible
to use a simple script that works against PDB mmCIF entries
to read arbitrary mmCIF files.

 Or are people saying/hoping/redefining that mmCIF will
turn into a fixed column/field format?

Frances Bernstein

=
Bernstein + Sons
*   *   Information Systems Consultants
5 Brewster Lane, Bellport, NY 11713-2803
*   * ***
 *Frances C. Bernstein
  *   ***  f...@bernstein-plus-sons.com
 *** *
  *   *** 1-631-286-1339FAX: 1-631-286-1999
=


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Nat Echols
On Wed, Aug 7, 2013 at 12:54 PM, James Stroud  wrote:

> All that needs to happen is that the community agree on
>
> 1. What is the finite set of essential/useful attributes of macromolecular
> structural data.
> 2. What is the syntax of (a) accessing and (b) modifying those attributes.
> 3. What is the syntax of selecting subsets of structural data based on
> those attributes.
>
> The resulting syntax (i.e. language) itself should be terse, easy to
> learn, easy to use, and preferably easy to implement.
>

Ah, but the nice thing about mmCIF is that it isn't truly "finite" - the
PDB may limit what tags are actually included in the distributed files, but
there is nothing preventing other developers from including their own tags,
and there is a community process for extending the officially defined
tags.  Item (2) is very well-established, unlike the current chaos of
REMARK records.  I think (3) will be left to the various libraries to deal
with.

-Nat


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread James Stroud
On Aug 7, 2013, at 1:06 PM, Ed Pozharski wrote:
> On 08/07/2013 01:51 PM, James Stroud wrote:
>> In the long term, the MM structure community should perhaps get its 
>> inspiration from SQL
> For this to work, a particular interface must monopolize access to structural 
> data.

Not necessarily, although the alternative pathway might be more idealistic and 
hence unrealistic.

All that needs to happen is that the community agree on

1. What is the finite set of essential/useful attributes of macromolecular 
structural data.
2. What is the syntax of (a) accessing and (b) modifying those attributes.
3. What is the syntax of selecting subsets of structural data based on those 
attributes.

The resulting syntax (i.e. language) itself should be terse, easy to learn, 
easy to use, and preferably easy to implement.

If such a standard is created, then I believe awk-ing/grep-ing/sed-ing/etc PDBs 
and mmCIFs would quickly become historical.

James


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Ed Pozharski

On 08/07/2013 01:51 PM, James Stroud wrote:

In the long term, the MM structure community should perhaps get its inspiration 
from SQL
For this to work, a particular interface must monopolize access to 
structural data.  Then maintainers of that victorious interface could 
change the underlying format whichever way they want while supplying the 
never ending stream of useful features.  And all other programs would be 
just frontends to the interface.  As long as data format remains easily 
readable and there is more than one person willing to fiddle with code, 
persistence or at the very least backward compatibility of the data 
format will remain a (minor to me) issue.  It is also important that it 
is much easier to write a pdb parser in your favourite language than to 
implement general purpose relational database management system.


For full disclosure, I personally do not share the apocalyptic feeling 
about transition to mmCIF.


Cheers,

Ed.


--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Eugene Krissinel
This is to confirm very publicly that CCP4 libraries (of which APIs is one 
example) are open source and free to use. There are no plans to change this 
and, on contrary, there is a common consensus that it should stay as is.

Eugene


On 7 Aug 2013, at 19:16, Jeffrey, Philip D. wrote:

> Are all the APIs open source ?  I was under the impression that CCP4 had 
> moved away from that, which might justifiably reduce interest in any 
> limited-availability API.
> 
> Phil Jeffrey
> Princeton
> 
> From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of James Stroud 
> [xtald...@gmail.com]
> Sent: Wednesday, August 07, 2013 1:51 PM
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] mmCIF as working format?
> 
> On Aug 5, 2013, at 4:33 AM, Eugene Krissinel wrote:
>> I just hope that one day we all will be discussing a sort of universal API 
>> to read/write structural information instead of referencing to raw formats, 
>> and routines to query MX data, which would be more appropriate than grep 
>> (would many SB students/postdocs use grep these days? but many if them would 
>> need to inspect files somehow). This, in essence, is similar to discussing 
>> read/write primitives in C/C++/Fortran rather than I/O functions of BIOS and 
>> HDD/BUS commands that they drive.
> 
> I just want to reinforce this point by quoting it verbatim and also emphasize 
> that it was not lost on some of us.
> 
> In the long term, the MM structure community should perhaps get its 
> inspiration from SQL, which focuses on the scope of data and the semantics 
> its manipulation, rather than how the data is encoded beneath the surface.
> 
> James


-- 
Scanned by iCritical.



Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Jeffrey, Philip D.
Are all the APIs open source ?  I was under the impression that CCP4 had moved 
away from that, which might justifiably reduce interest in any 
limited-availability API.

Phil Jeffrey
Princeton

From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of James Stroud 
[xtald...@gmail.com]
Sent: Wednesday, August 07, 2013 1:51 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] mmCIF as working format?

On Aug 5, 2013, at 4:33 AM, Eugene Krissinel wrote:
> I just hope that one day we all will be discussing a sort of universal API to 
> read/write structural information instead of referencing to raw formats, and 
> routines to query MX data, which would be more appropriate than grep (would 
> many SB students/postdocs use grep these days? but many if them would need to 
> inspect files somehow). This, in essence, is similar to discussing read/write 
> primitives in C/C++/Fortran rather than I/O functions of BIOS and HDD/BUS 
> commands that they drive.

I just want to reinforce this point by quoting it verbatim and also emphasize 
that it was not lost on some of us.

In the long term, the MM structure community should perhaps get its inspiration 
from SQL, which focuses on the scope of data and the semantics its 
manipulation, rather than how the data is encoded beneath the surface.

James


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread James Stroud
On Aug 5, 2013, at 4:33 AM, Eugene Krissinel wrote:
> I just hope that one day we all will be discussing a sort of universal API to 
> read/write structural information instead of referencing to raw formats, and 
> routines to query MX data, which would be more appropriate than grep (would 
> many SB students/postdocs use grep these days? but many if them would need to 
> inspect files somehow). This, in essence, is similar to discussing read/write 
> primitives in C/C++/Fortran rather than I/O functions of BIOS and HDD/BUS 
> commands that they drive.

I just want to reinforce this point by quoting it verbatim and also emphasize 
that it was not lost on some of us.

In the long term, the MM structure community should perhaps get its inspiration 
from SQL, which focuses on the scope of data and the semantics its 
manipulation, rather than how the data is encoded beneath the surface.

James


Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Navdeep Sidhu
Dear David and Kaiser:

While the PDB format is (thankfully--to those used to it) around, it seems to 
me it is certainly a rather poor deterrent to the enjoyment of AWK:

For fixed-field format input, the designers of AWK suggested a useful solution: 
the function substr(s,p,n), i.e., "return substring of s of length n starting 
at position p" (Aho et al. The AWK Programming Language. Addison-Wesley, 1988, 
pp. 42, 43, 72).

The solution I've used, though, is to use gnu awk (gawk) with the format 
definition as follows:
BEGIN {FIELDWIDTHS="6 5 1 4 1 3 1 1 4 1 3 8 8 8 6 6 10 2 2";}
--hope you'd find that useful too.

As for Perl, somebody put it nicely that one should comment programs bearing in 
mind that the person reading them later is always a different one from the one 
who wrote them; that includes the programmer as she/he will always be in a 
different state of mind her/himself.

Best regards,
Navdeep


---
On Tue, Aug 06, 2013 at 08:07:22AM -0400, David A Case wrote:
> 
> An awk script with /^ATOM/ as its selection is actually easier to write
> than the corresponding script for a PDB ATOM record, since the line can
> be split on white space.

On Mon, Aug 05, 2013 at 03:10:55AM -0700, kaiser wrote:
>   Yes, using grep on mmcif files is "awk"ward (but petfectly possible); awk 
> on the other hand works much better. It's actually more of a pain to use it 
> on pdb files. And perl, well perl can handle anything and it will always look 
> nice while you write it and never look nice when you look back at it...


---
Navdeep Sidhu
University of Goettingen
---


Re: [ccp4bb] mmCIF as working format?

2013-08-06 Thread Eugene Krissinel
- this has nothing to do with advantages or disadvantages of mmCIF. It took 
almost 20 years of discussions around mmCIF, so it is not fair to say that 
absolutely nothing was done. However, 20 years is long enough to realise that a 
100% ideal solution is not reachable, while there is no time left, a solution 
is indeed needed. We will try to minimise the impact on end-users, who use the 
software to solve structures, and if you can anticipate that something 
particular will be severely impaired by format change, please let us know.

Is there something I forgot in this list?

Many thanks to everybody,

Eugene


On 6 Aug 2013, at 03:10, Herbert J. Bernstein wrote:

> Dear Colleagues,
> 
> This exchange is a wonderful illustration of the simple fact that different 
> scientists
> work differently, favoring different approach and different tools. For some, 
> the latest
> and greatest formats and support systems are what they need to be productive. 
> For
> a surprising large number of others, change to new methods is a pointless 
> distraction
> from doing good science. What we need to do as a community is not to tell one
> another how they _must_ do their work, but to listen to one another, being 
> helpful
> where we can, and showing mutual respect where we cannot.
> 
> To this end, Frances and I have revived an old idea from 2006 of creating a 
> format
> that looks much like the old PDB format but is 132 columns wide with more 
> characters
> allotted to fields that need them. We re-enabled the WPDB server at
> http://biomol.dowling.edu/wpdb which can produce either a 132-column 'PDB' 
> entry or
> an 80 column PDB entry based on the mmCIF files on the wwPDB server. This 
> allows
> people who work best with tools such as grep and a simple fixed-field format 
> to have
> most of the newer, larger PDB entries in a wide version of the PDB format. If 
> you don't
> need it, or don't like it, you should not use it. If you have need for it, 
> and need some
> things changed, send us an email, and we'll see what we can do to oblige.
> 
> Right now it is on an old, slow server. If there is significant use, I'll 
> move it
> to something bigger and faster.
> 
> Regards,
> Herbert and Frances Bernstein
> 
> 
> On 8/5/13 4:05 PM, Boaz Shaanan wrote:
>> 
>> 
>> /Boaz Shaanan, Ph.D.
>> Dept. of Life Sciences
>> Ben-Gurion University of the Negev
>> Beer-Sheva 84105
>> Israel
>> 
>> E-mail: bshaa...@bgu.ac.il
>> Phone: 972-8-647-2220 Skype: boaz.shaanan
>> Fax: 972-8-647-2992 or 972-8-646-1710 /
>> //
>> //
>> /
>> 
>> /
>> 
>> *From:* Nat Echols [nathaniel.ech...@gmail.com]
>> *Sent:* Monday, August 05, 2013 10:45 PM
>> *To:* בעז שאנן
>> *Cc:* CCP4BB@JISCMAIL.AC.UK
>> *Subject:* Re: [ccp4bb] mmCIF as working format?
>> 
>> On Mon, Aug 5, 2013 at 12:37 PM, Boaz Shaanan > <mailto:bshaa...@bgu.ac.il>> wrote:
>> 
>>There seems to be some kind of a gap between users and developers
>>as far the eagerness to abandon PDB in favour of mmCIF. I myself
>>fully agree with Jeffrey about the ease of manipulating PDB's
>>during work, particularly when encountering unusual circumstances
>>(and there are many of those, as we all know). And how about
>>non-crystallographers that are using PDB's for visualization and
>>understanding how their proteins work? I teach many such students
>>and it's fairly easy to explain to them where to look in the PDB
>>for particular pieces of information relevant to the structure. I
>>can't imagine how they'll cope with the cryptic mmCIF format.
>> 
>> 
>> >I think the only gap is between developers and *expert* users - most of the 
>> >community simply wants tools and formats that work with a >minimum of 
>> >fiddling.
>> 
>> That assumes that you can offer such software, but can you? I doubt that 
>> this goal is reachable (in fact our daily experience proves just that), with 
>> all due respect to you developers.
>> 
>> >Again, if users are having to examine the raw PDB records visually to find 
>> >information, this is a failure of the software.
>> It's not raw, it's easily readable text, very easy to interpret with very 
>> little effort.
>> 
>> Anyway, this discussion is a waste of time. The decision has been taken, 
>> mmCIF will prevail and we (expert and non-expert users) have to swallow the 
>> pill.
>> 
>> Boaz
>> 
>> -Nat



Re: [ccp4bb] mmCIF as working format?

2013-08-06 Thread David A Case
On Mon, Aug 05, 2013, Boaz Shaanan wrote:

> I teach many such students and it's fairly easy to explain to them where
> to look in the PDB for particular pieces of information relevant to the
> structure. I can't imagine how they'll cope with the cryptic mmCIF format.

For many purposes, it's not all that hard: the ATOM records in an mmCIF
file look like a space-delimited version of the ATOM records in a PDB file:

ATOM   1   O "O5'" . DC  A 1 1  ? 18.935 34.195 25.617  1.00 64.35 ? ? ? ? ? ?
1   DC  A "O5'" 1
ATOM   2   C "C5'" . DC  A 1 1  ? 19.130 33.921 24.219  1.00 44.69 ? ? ? ? ? ?
1   DC  A "C5'" 1

An awk script with /^ATOM/ as its selection is actually easier to write
than the corresponding script for a PDB ATOM record, since the line can
be split on white space.  And hand-editing seems no harder than with PDB
files.

Note that if a student wonders what all the "?" entries mean, the answer is
right there are few lines above in the mmCIF file.  Seems easier (to me!) than
having to memorize what goes in column 22 of a PDB record.

Beyond this, everyone on this list will be long dead before the current PDB
format really goes away.  [And I think that includes those just entering
the field.]

...dave case


Re: [ccp4bb] mmCIF as working format?

2013-08-05 Thread Herbert J. Bernstein

Dear Colleagues,

This exchange is a wonderful illustration of the simple fact that 
different scientists
work differently, favoring different approach and different tools. For 
some, the latest
and greatest formats and support systems are what they need to be 
productive. For
a surprising large number of others, change to new methods is a 
pointless distraction
from doing good science. What we need to do as a community is not to 
tell one
another how they _must_ do their work, but to listen to one another, 
being helpful

where we can, and showing mutual respect where we cannot.

To this end, Frances and I have revived an old idea from 2006 of 
creating a format
that looks much like the old PDB format but is 132 columns wide with 
more characters

allotted to fields that need them. We re-enabled the WPDB server at
http://biomol.dowling.edu/wpdb which can produce either a 132-column 
'PDB' entry or
an 80 column PDB entry based on the mmCIF files on the wwPDB server. 
This allows
people who work best with tools such as grep and a simple fixed-field 
format to have
most of the newer, larger PDB entries in a wide version of the PDB 
format. If you don't
need it, or don't like it, you should not use it. If you have need for 
it, and need some

things changed, send us an email, and we'll see what we can do to oblige.

Right now it is on an old, slow server. If there is significant use, 
I'll move it

to something bigger and faster.

Regards,
Herbert and Frances Bernstein


On 8/5/13 4:05 PM, Boaz Shaanan wrote:



/Boaz Shaanan, Ph.D.
Dept. of Life Sciences
Ben-Gurion University of the Negev
Beer-Sheva 84105
Israel

E-mail: bshaa...@bgu.ac.il
Phone: 972-8-647-2220 Skype: boaz.shaanan
Fax: 972-8-647-2992 or 972-8-646-1710 /
//
//
/

/

*From:* Nat Echols [nathaniel.ech...@gmail.com]
*Sent:* Monday, August 05, 2013 10:45 PM
*To:* בעז שאנן
*Cc:* CCP4BB@JISCMAIL.AC.UK
*Subject:* Re: [ccp4bb] mmCIF as working format?

On Mon, Aug 5, 2013 at 12:37 PM, Boaz Shaanan <mailto:bshaa...@bgu.ac.il>> wrote:


There seems to be some kind of a gap between users and developers
as far the eagerness to abandon PDB in favour of mmCIF. I myself
fully agree with Jeffrey about the ease of manipulating PDB's
during work, particularly when encountering unusual circumstances
(and there are many of those, as we all know). And how about
non-crystallographers that are using PDB's for visualization and
understanding how their proteins work? I teach many such students
and it's fairly easy to explain to them where to look in the PDB
for particular pieces of information relevant to the structure. I
can't imagine how they'll cope with the cryptic mmCIF format.


>I think the only gap is between developers and *expert* users - most 
of the community simply wants tools and formats that work with a 
>minimum of fiddling.


That assumes that you can offer such software, but can you? I doubt 
that this goal is reachable (in fact our daily experience proves just 
that), with all due respect to you developers.


>Again, if users are having to examine the raw PDB records visually to 
find information, this is a failure of the software.
It's not raw, it's easily readable text, very easy to interpret with 
very little effort.


Anyway, this discussion is a waste of time. The decision has been 
taken, mmCIF will prevail and we (expert and non-expert users) have to 
swallow the pill.


Boaz

-Nat


Re: [ccp4bb] mmCIF as working format?

2013-08-05 Thread Boaz Shaanan






 
 
Boaz Shaanan, Ph.D.

Dept. of Life Sciences  
Ben-Gurion University of the Negev  
Beer-Sheva 84105    
Israel  
    
E-mail: bshaa...@bgu.ac.il
Phone: 972-8-647-2220  Skype: boaz.shaanan  
Fax:   972-8-647-2992 or 972-8-646-1710
 
 








From: Nat Echols [nathaniel.ech...@gmail.com]
Sent: Monday, August 05, 2013 10:45 PM
To: בעז שאנן
Cc: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] mmCIF as working format?



On Mon, Aug 5, 2013 at 12:37 PM, Boaz Shaanan <bshaa...@bgu.ac.il> wrote:



There seems to be some kind of a gap between users and developers as far the eagerness to abandon PDB in favour of mmCIF. I myself fully agree with Jeffrey about the ease of manipulating
 PDB's during work, particularly when encountering unusual circumstances (and there are many of those, as we all know). And how about non-crystallographers that are using PDB's for visualization and understanding how their proteins work? I teach many such students
 and it's fairly easy to explain to them where to look in the PDB for particular pieces of information relevant to the structure. I can't imagine how they'll cope with the cryptic mmCIF format.




>I think the only gap is between developers and *expert* users - most of the community simply wants tools and formats that work with a >minimum of fiddling.  


That assumes that you can offer such software, but can you? I doubt that this goal is reachable (in fact our daily experience proves just that), with all due respect to  you developers.


>Again, if users are having to examine the raw PDB records visually to find information, this is a failure of the software.
It's not raw, it's easily readable text, very easy to interpret with very little effort. 


Anyway, this discussion is a waste of time.  The decision has been taken, mmCIF will prevail and we (expert and non-expert users) have to swallow the pill. 


Boaz

-Nat









Re: [ccp4bb] mmCIF as working format?

2013-08-05 Thread Nat Echols
On Mon, Aug 5, 2013 at 12:37 PM, Boaz Shaanan  wrote:

>  There seems to be some kind of a gap between users and developers as far
> the eagerness to abandon PDB in favour of mmCIF. I myself fully agree with
> Jeffrey about the ease of manipulating PDB's during work, particularly when
> encountering unusual circumstances (and there are many of those, as we all
> know). And how about non-crystallographers that are using PDB's for
> visualization and understanding how their proteins work? I teach many such
> students and it's fairly easy to explain to them where to look in the PDB
> for particular pieces of information relevant to the structure. I can't
> imagine how they'll cope with the cryptic mmCIF format.
>

I think the only gap is between developers and *expert* users - most of the
community simply wants tools and formats that work with a minimum of
fiddling.  Again, if users are having to examine the raw PDB records
visually to find information, this is a failure of the software.

-Nat


Re: [ccp4bb] mmCIF as working format?

2013-08-05 Thread Boaz Shaanan



There seems to be some kind of a gap between users and developers as far the eagerness to abandon PDB in favour of mmCIF. I myself fully agree with Jeffrey about the ease of manipulating
 PDB's during work, particularly when encountering unusual circumstances (and there are many of those, as we all know). And how about non-crystallographers that are using PDB's for visualization and understanding how their proteins work? I teach many such students
 and it's fairly easy to explain to them where to look in the PDB for particular pieces of information relevant to the structure. I can't imagine how they'll cope with the cryptic mmCIF format.


     Boaz



 
 
Boaz Shaanan, Ph.D.

Dept. of Life Sciences  
Ben-Gurion University of the Negev  
Beer-Sheva 84105    
Israel  
    
E-mail: bshaa...@bgu.ac.il
Phone: 972-8-647-2220  Skype: boaz.shaanan  
Fax:   972-8-647-2992 or 972-8-646-1710
 
 








From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Nat Echols [nathaniel.ech...@gmail.com]
Sent: Monday, August 05, 2013 10:10 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] mmCIF as working format?



On Mon, Aug 5, 2013 at 11:11 AM, Phil Jeffrey <pjeff...@princeton.edu> wrote:


While alternative programs exist to do almost everything I prefer something that works well, works quickly, and provides instant visual feedback.  CCP4 and Phenix are stuck in a batch processing paradigm that I don't find useful for these manipulations.


Speaking as a developer, it's probably much easier and faster for us to write software that *does* do what you want, instead of piling on hacks to keep the PDB format alive another 30+ years.



While PDB is limited and has a lot of redundant information it's for the latter reason it's a rather useful format for quickly making changes in a text editor.  It's certainly far faster than using any GUI, and it's also faster than the command line in many
 instances - and I have my own command line programs for hacking PDB files (and ultimately whatever formats come next)


Most complaints of this sort seem to be based on an unrealistic expectation that your own experiences and skills are representative of the rest of the community.  The vast majority of crystallographers don't have their own command-line programs, aren't familiar
 with the intricacies of PDB format, and as often as not botch the job when they attempt to edit their PDB files by hand.  (I get a lot of bug reports like this.)  They're not going to care whether they can use 'awk' on their structures.
 

Using mmCIF as an archive format makes sense, but I doubt it's going to make building structures any easier except for particularly large structures where some extended-PDB format might work just as well or better.


There is a lot of information that can't easily be stored simply by making the ATOM records wider.  Right now some of this gets crammed into the REMARK section, but usually in an unstructured and/or poorly documented format.  This isn't just problematic for
 archival - it limits what information can be transferred between programs.  mmCIF has none of these limitations.  I have some reservations about the current specification (for instance, the fact that the original R-free flags are not stored separately in deposited
 structure factor files, and are instead mixed into the "status" flag, which can have multiple other meanings), but at least there is a clear process for extending this in a way that does not (or should not, anyway) break existing parsers.

-Nat











Re: [ccp4bb] mmCIF as working format?

2013-08-05 Thread Nat Echols
On Mon, Aug 5, 2013 at 11:11 AM, Phil Jeffrey wrote:

> While alternative programs exist to do almost everything I prefer
> something that works well, works quickly, and provides instant visual
> feedback.  CCP4 and Phenix are stuck in a batch processing paradigm that I
> don't find useful for these manipulations.
>

Speaking as a developer, it's probably much easier and faster for us to
write software that *does* do what you want, instead of piling on hacks to
keep the PDB format alive another 30+ years.

While PDB is limited and has a lot of redundant information it's for the
> latter reason it's a rather useful format for quickly making changes in a
> text editor.  It's certainly far faster than using any GUI, and it's also
> faster than the command line in many instances - and I have my own command
> line programs for hacking PDB files (and ultimately whatever formats come
> next)
>

Most complaints of this sort seem to be based on an unrealistic expectation
that your own experiences and skills are representative of the rest of the
community.  The vast majority of crystallographers don't have their own
command-line programs, aren't familiar with the intricacies of PDB format,
and as often as not botch the job when they attempt to edit their PDB files
by hand.  (I get a lot of bug reports like this.)  They're not going to
care whether they can use 'awk' on their structures.


> Using mmCIF as an archive format makes sense, but I doubt it's going to
> make building structures any easier except for particularly large
> structures where some extended-PDB format might work just as well or better.
>

There is a lot of information that can't easily be stored simply by making
the ATOM records wider.  Right now some of this gets crammed into the
REMARK section, but usually in an unstructured and/or poorly documented
format.  This isn't just problematic for archival - it limits what
information can be transferred between programs.  mmCIF has none of these
limitations.  I have some reservations about the current specification (for
instance, the fact that the original R-free flags are not stored separately
in deposited structure factor files, and are instead mixed into the
"status" flag, which can have multiple other meanings), but at least there
is a clear process for extending this in a way that does not (or should
not, anyway) break existing parsers.

-Nat


Re: [ccp4bb] mmCIF as working format?

2013-08-05 Thread Phil Jeffrey
Questionable practice is writing an interpretation program for 
operations that can be handled simply at the command line.  Programs 
that use the API that Eugene implicitly refers to are no panacea, e.g. 
Coot has strange restrictions on things like changing the chain label 
that can be fixed in a matter of seconds by editing the PDB file in e.g. 
xemacs.  Which means that when I'm building a large structure with 
multiple chain fragments present during the build process, I've edited 
those intermediate PDB files tens of times in a single day.


While alternative programs exist to do almost everything I prefer 
something that works well, works quickly, and provides instant visual 
feedback.  CCP4 and Phenix are stuck in a batch processing paradigm that 
I don't find useful for these manipulations.


While PDB is limited and has a lot of redundant information it's for the 
latter reason it's a rather useful format for quickly making changes in 
a text editor.  It's certainly far faster than using any GUI, and it's 
also faster than the command line in many instances - and I have my own 
command line programs for hacking PDB files (and ultimately whatever 
formats come next)


Using mmCIF as an archive format makes sense, but I doubt it's going to 
make building structures any easier except for particularly large 
structures where some extended-PDB format might work just as well or better.


Phil Jeffrey
Princeton

On 8/5/13 9:53 AM, Pavel Afonine wrote:

Editing (for example, PDB files) by hand is a questionable practice. If
you know programming use either existing reliable parsers (available for
both, PDB and CIF) or write your own jiffy.


Re: [ccp4bb] mmCIF as working format?

2013-08-05 Thread Frank von Delft
I always assumed there was a broad consensus that the PDB format was 
ancient and by now profoundly rubbish.


Ho hum, live and learn.


On 05/08/2013 14:53, Pavel Afonine wrote:

Tim,

PDB file format is good because of its simplicity and that's perhaps 
it. However, it cannot accommodate wealth of information that is 
available at the end of refinement. Of course one can keep creating 
remarks for PDB file etc but I guess mmCIF is just a better way of 
doing it rather than uglify existing PDB format by countless decorators.


Editing (for example, PDB files) by hand is a questionable practice. 
If you know programming use either existing reliable parsers 
(available for both, PDB and CIF) or write your own jiffy.


phenix.refine and surrounding tools can input and output both, PDB and 
mmCIF (both, model and reflections).


Pavel


On Mon, Aug 5, 2013 at 1:03 AM, Tim Gruene > wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dear all,

having read Gerard Kleywegt's latest announcement on the wwPDB
Workshop
(1st August) made me wonder whether it is planned to introduce
mmCIF as
working format to users in addition to using it at e.g. the PDB,
because
I think that would make life unnecessarily complicated.

The example mmCIF file for GroEL is about 7.5 times bigger than
its PDB
file.
I know that disk space is 'cheap' nowadays, but that does not make
it fast.

And personally I find mmCIF very awkward to work with, since it is not
line-oriented. 'grep', 'awk', 'perl' etc. do not work well on XML-like
files.
Instead of using mmCIF, one could, e.g. introduce a free format PDB
format, with space holders for non-assigned entities, and maybe a line
continuation character.

If mmCIF is not going to be the working format for MX (refinement)
programs I would be happy for a reassurance, and otherwise I would
appreciate some comments about the benefits of an XML file format
over a
line-oriented free format for the scientists that work with
structural data.
I my opinion, using XML (or mmCIF) for structural information is an
attempt of programmers to make themselves more indespensable to
scientists, rather than scientifically needed.

Best,
Tim

- --
- --
Dr Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFR/1xbUxlJ7aRr7hoRAkLNAKClH9RpAA7NJsH3YFOTguOo9kjwoQCZAf/m
JF1oyJNuq+8b+VsywDupElo=
=bvb3
-END PGP SIGNATURE-






Re: [ccp4bb] mmCIF as working format?

2013-08-05 Thread Pavel Afonine
Tim,

PDB file format is good because of its simplicity and that's perhaps it.
However, it cannot accommodate wealth of information that is available at
the end of refinement. Of course one can keep creating remarks for PDB file
etc but I guess mmCIF is just a better way of doing it rather than uglify
existing PDB format by countless decorators.

Editing (for example, PDB files) by hand is a questionable practice. If you
know programming use either existing reliable parsers (available for both,
PDB and CIF) or write your own jiffy.

phenix.refine and surrounding tools can input and output both, PDB and
mmCIF (both, model and reflections).

Pavel


On Mon, Aug 5, 2013 at 1:03 AM, Tim Gruene  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Dear all,
>
> having read Gerard Kleywegt's latest announcement on the wwPDB Workshop
> (1st August) made me wonder whether it is planned to introduce mmCIF as
> working format to users in addition to using it at e.g. the PDB, because
> I think that would make life unnecessarily complicated.
>
> The example mmCIF file for GroEL is about 7.5 times bigger than its PDB
> file.
> I know that disk space is 'cheap' nowadays, but that does not make it fast.
>
> And personally I find mmCIF very awkward to work with, since it is not
> line-oriented. 'grep', 'awk', 'perl' etc. do not work well on XML-like
> files.
> Instead of using mmCIF, one could, e.g. introduce a free format PDB
> format, with space holders for non-assigned entities, and maybe a line
> continuation character.
>
> If mmCIF is not going to be the working format for MX (refinement)
> programs I would be happy for a reassurance, and otherwise I would
> appreciate some comments about the benefits of an XML file format over a
> line-oriented free format for the scientists that work with structural
> data.
> I my opinion, using XML (or mmCIF) for structural information is an
> attempt of programmers to make themselves more indespensable to
> scientists, rather than scientifically needed.
>
> Best,
> Tim
>
> - --
> - --
> Dr Tim Gruene
> Institut fuer anorganische Chemie
> Tammannstr. 4
> D-37077 Goettingen
>
> GPG Key ID = A46BEE1A
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iD8DBQFR/1xbUxlJ7aRr7hoRAkLNAKClH9RpAA7NJsH3YFOTguOo9kjwoQCZAf/m
> JF1oyJNuq+8b+VsywDupElo=
> =bvb3
> -END PGP SIGNATURE-
>


Re: [ccp4bb] mmCIF as working format?

2013-08-05 Thread Tim Gruene
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Ian,

yes, also as Andrew pointed at, I meant to refer to the ration of xml
vs. cif rather than cif vs. PDB.

My worry was that future versions of programs like refmac, phenix, or
buster-TNT would only write large files (in xml-format) and no more
PDB files, so that I would have to work with xml or cif on my desktop
computer. I am not worried about what format I download from the PDB
itself.

But if that's not the case I am pleased, but even if that is the case
and if that future change is majority driven, I guess I will have to
live with it even though I have objections to most of the replies to
the thread I opened (except for the file size).

Best,
Tim

P.S.: I would hold that bet against you, but I don't think this is the
right place to discuss this.


On 08/05/2013 03:35 PM, Ian Clifton wrote:
> On 05/08/13 09:03, Tim Gruene wrote:
> 
>> having read Gerard Kleywegt's latest announcement on the wwPDB
>> Workshop (1st August) made me wonder whether it is planned to
>> introduce mmCIF as working format to users in addition to using
>> it at e.g. the PDB, because I think that would make life
>> unnecessarily complicated.
> 
> There’s nothing to stop you using your /own/ working format—it’s
> easy to extract a simpler file from the full archive file—but the
> archive file obviously has to contain the full set of metadata, and
> to be useful, that metadata has to be easily parsable.
> 
> 
>> The example mmCIF file for GroEL is about 7.5 times bigger than
>> its PDB file. I know that disk space is 'cheap' nowadays, but
>> that does not make it fast.
>> 
>> And personally I find mmCIF very awkward to work with, since it
>> is not line-oriented. 'grep', 'awk', 'perl' etc. do not work well
>> on XML-like files. Instead of using mmCIF, one could, e.g.
>> introduce a free format PDB format, with space holders for
>> non-assigned entities, and maybe a line continuation character.
> 
> Are you sure you’re talking about the CIF‐based mmCIF format here,
> not the XML‐based PDBx format? mmCIF shouldn’t be much bigger than
> PDB.
> 
>> If mmCIF is not going to be the working format for MX
>> (refinement) programs I would be happy for a reassurance, and
>> otherwise I would appreciate some comments about the benefits of
>> an XML file format over a line-oriented free format for the
>> scientists that work with structural data. I my opinion, using
>> XML (or mmCIF) for structural information is an attempt of
>> programmers to make themselves more indespensable to scientists,
>> rather than scientifically needed.
> 
> Even when searching the “simple” PDB format, you’re likely to
> encounter problems with line endings. Imagine trying to find all
> files containing PEG, your script must reliably recognise something
> like:
> 
> REMARK 280 CRYSTALLIZATION CONDITIONS: 1.0M LITHIUM SULPHATE, 100MM
> POLY REMARK 280   ETHYLENE GLYCOL
> 
> —in fact this sort of thing is much /easier/ to do, given the
> proper tools, in a format like XML.
> 
> With file formats, the devil is always in the details. If you set
> out to create a “line‐oriented, free format” PDB replacement, and
> you carefully ironed out all the potential ambiguities and awkward
> corner cases, I bet you’d come up with something close to mmCIF.

- -- 
- --
Dr Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFR/61NUxlJ7aRr7hoRAvpyAJ4oq9fWcHA657hZNCix7xoK4ktxgQCgrlx2
C+7EqGgVGKo1J3+6tZHMSqk=
=mdO9
-END PGP SIGNATURE-


Re: [ccp4bb] mmCIF as working format?

2013-08-05 Thread Ian Clifton

On 05/08/13 09:03, Tim Gruene wrote:


having read Gerard Kleywegt's latest announcement on the wwPDB Workshop
(1st August) made me wonder whether it is planned to introduce mmCIF as
working format to users in addition to using it at e.g. the PDB, because
I think that would make life unnecessarily complicated.


There’s nothing to stop you using your /own/ working format—it’s easy to 
extract a simpler file from the full archive file—but the archive file 
obviously has to contain the full set of metadata, and to be useful, 
that metadata has to be easily parsable.




The example mmCIF file for GroEL is about 7.5 times bigger than its PDB
file.
I know that disk space is 'cheap' nowadays, but that does not make it fast.

And personally I find mmCIF very awkward to work with, since it is not
line-oriented. 'grep', 'awk', 'perl' etc. do not work well on XML-like
files.
Instead of using mmCIF, one could, e.g. introduce a free format PDB
format, with space holders for non-assigned entities, and maybe a line
continuation character.


Are you sure you’re talking about the CIF‐based mmCIF format here, not 
the XML‐based PDBx format? mmCIF shouldn’t be much bigger than PDB.



If mmCIF is not going to be the working format for MX (refinement)
programs I would be happy for a reassurance, and otherwise I would
appreciate some comments about the benefits of an XML file format over a
line-oriented free format for the scientists that work with structural data.
I my opinion, using XML (or mmCIF) for structural information is an
attempt of programmers to make themselves more indespensable to
scientists, rather than scientifically needed.


Even when searching the “simple” PDB format, you’re likely to encounter 
problems with line endings. Imagine trying to find all files containing 
PEG, your script must reliably recognise something like:


REMARK 280 CRYSTALLIZATION CONDITIONS: 1.0M LITHIUM SULPHATE, 100MM POLY
REMARK 280   ETHYLENE GLYCOL

—in fact this sort of thing is much /easier/ to do, given the proper 
tools, in a format like XML.


With file formats, the devil is always in the details. If you set out to 
create a “line‐oriented, free format” PDB replacement, and you carefully 
ironed out all the potential ambiguities and awkward corner cases, I bet 
you’d come up with something close to mmCIF.

--
Ian ◎


Re: [ccp4bb] mmCIF as working format?

2013-08-05 Thread Andrew Leslie
Hi Tim,

   I just downloaded GroEL entry 4KI8 in pdb format and cid format from 
RSCB. The PDB format was 4.7Mb and the CIF format was 5.9Mb, doesn't seem such 
a big difference to me ?  Which example were you looking at ?

Andrew


On 5 Aug 2013, at 09:03, Tim Gruene  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Dear all,
> 
> having read Gerard Kleywegt's latest announcement on the wwPDB Workshop
> (1st August) made me wonder whether it is planned to introduce mmCIF as
> working format to users in addition to using it at e.g. the PDB, because
> I think that would make life unnecessarily complicated.
> 
> The example mmCIF file for GroEL is about 7.5 times bigger than its PDB
> file.
> I know that disk space is 'cheap' nowadays, but that does not make it fast.
> 
> And personally I find mmCIF very awkward to work with, since it is not
> line-oriented. 'grep', 'awk', 'perl' etc. do not work well on XML-like
> files.
> Instead of using mmCIF, one could, e.g. introduce a free format PDB
> format, with space holders for non-assigned entities, and maybe a line
> continuation character.
> 
> If mmCIF is not going to be the working format for MX (refinement)
> programs I would be happy for a reassurance, and otherwise I would
> appreciate some comments about the benefits of an XML file format over a
> line-oriented free format for the scientists that work with structural data.
> I my opinion, using XML (or mmCIF) for structural information is an
> attempt of programmers to make themselves more indespensable to
> scientists, rather than scientifically needed.
> 
> Best,
> Tim
> 
> - -- 
> - --
> Dr Tim Gruene
> Institut fuer anorganische Chemie
> Tammannstr. 4
> D-37077 Goettingen
> 
> GPG Key ID = A46BEE1A
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iD8DBQFR/1xbUxlJ7aRr7hoRAkLNAKClH9RpAA7NJsH3YFOTguOo9kjwoQCZAf/m
> JF1oyJNuq+8b+VsywDupElo=
> =bvb3
> -END PGP SIGNATURE-


Re: [ccp4bb] mmCIF as working format?

2013-08-05 Thread Eugene Krissinel
I just hope that one day we all will be discussing a sort of universal API to 
read/write structural information instead of referencing to raw formats, and 
routines to query MX data, which would be more appropriate than grep (would 
many SB students/postdocs use grep these days? but many if them would need to 
inspect files somehow). This, in essence, is similar to discussing read/write 
primitives in C/C++/Fortran rather than I/O functions of BIOS and HDD/BUS 
commands that they drive.

No format can suite everybody, especially given the complexity of 
macromolecular data. In contrary to the conspiracy theory of Tim, transition to 
mmCIF is actually driven by scientists, rather than programmers. PDB has 
limitations (e.g. limits on the number of atoms, residues and chains), which 
make it not suitable in many cases today and definitely not for future. This 
has been discussed many times, and one has to take measures one day; seems like 
that day is coming.

But I would really like to accentuate that working with raw format, whether 
through grep or otherwise, should increasingly become more and more unnecessary 
habit, to say the least. Formats will evolve whatever happens, and the only 
proper way to cope with it as a moving target is to use a maintained API.

Eugene

On 5 Aug 2013, at 11:10, kaiser wrote:

Tim,
  Having not read Gerard Kleywegt's announcement, and not considering myself a 
programmer, I have to disagree with the majority of your statement. Yes, using 
grep on mmcif files is "awk"ward (but petfectly possible); awk on the other 
hand works much better. It's actually more of a pain to use it on pdb files. 
And perl, well perl can handle anything and it will always look nice while you 
write it and never look nice when you look back at it...

Just my 2 cents,

Jens

Sent from my T-Mobile 4G LTE Device



 Original message 
From: Tim Gruene mailto:t...@shelx.uni-ac.gwdg.de>>
Date: 2013/08/05 01:03 (GMT-08:00)
To: CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>
Subject: [ccp4bb] mmCIF as working format?


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dear all,

having read Gerard Kleywegt's latest announcement on the wwPDB Workshop
(1st August) made me wonder whether it is planned to introduce mmCIF as
working format to users in addition to using it at e.g. the PDB, because
I think that would make life unnecessarily complicated.

The example mmCIF file for GroEL is about 7.5 times bigger than its PDB
file.
I know that disk space is 'cheap' nowadays, but that does not make it fast.

And personally I find mmCIF very awkward to work with, since it is not
line-oriented. 'grep', 'awk', 'perl' etc. do not work well on XML-like
files.
Instead of using mmCIF, one could, e.g. introduce a free format PDB
format, with space holders for non-assigned entities, and maybe a line
continuation character.

If mmCIF is not going to be the working format for MX (refinement)
programs I would be happy for a reassurance, and otherwise I would
appreciate some comments about the benefits of an XML file format over a
line-oriented free format for the scientists that work with structural data.
I my opinion, using XML (or mmCIF) for structural information is an
attempt of programmers to make themselves more indespensable to
scientists, rather than scientifically needed.

Best,
Tim

- --
- --
Dr Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFR/1xbUxlJ7aRr7hoRAkLNAKClH9RpAA7NJsH3YFOTguOo9kjwoQCZAf/m
JF1oyJNuq+8b+VsywDupElo=
=bvb3
-END PGP SIGNATURE-


-- 
Scanned by iCritical.



Re: [ccp4bb] mmCIF as working format?

2013-08-05 Thread kaiser
Tim,
  Having not read Gerard Kleywegt's announcement, and not considering myself a 
programmer, I have to disagree with the majority of your statement. Yes, using 
grep on mmcif files is "awk"ward (but petfectly possible); awk on the other 
hand works much better. It's actually more of a pain to use it on pdb files. 
And perl, well perl can handle anything and it will always look nice while you 
write it and never look nice when you look back at it...
 
Just my 2 cents,

Jens

Sent from my T-Mobile 4G LTE Device

 Original message 
From: Tim Gruene  
Date: 2013/08/05  01:03  (GMT-08:00) 
To: CCP4BB@JISCMAIL.AC.UK 
Subject: [ccp4bb] mmCIF as working format? 
 
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dear all,

having read Gerard Kleywegt's latest announcement on the wwPDB Workshop
(1st August) made me wonder whether it is planned to introduce mmCIF as
working format to users in addition to using it at e.g. the PDB, because
I think that would make life unnecessarily complicated.

The example mmCIF file for GroEL is about 7.5 times bigger than its PDB
file.
I know that disk space is 'cheap' nowadays, but that does not make it fast.

And personally I find mmCIF very awkward to work with, since it is not
line-oriented. 'grep', 'awk', 'perl' etc. do not work well on XML-like
files.
Instead of using mmCIF, one could, e.g. introduce a free format PDB
format, with space holders for non-assigned entities, and maybe a line
continuation character.

If mmCIF is not going to be the working format for MX (refinement)
programs I would be happy for a reassurance, and otherwise I would
appreciate some comments about the benefits of an XML file format over a
line-oriented free format for the scientists that work with structural data.
I my opinion, using XML (or mmCIF) for structural information is an
attempt of programmers to make themselves more indespensable to
scientists, rather than scientifically needed.

Best,
Tim

- -- 
- --
Dr Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFR/1xbUxlJ7aRr7hoRAkLNAKClH9RpAA7NJsH3YFOTguOo9kjwoQCZAf/m
JF1oyJNuq+8b+VsywDupElo=
=bvb3
-END PGP SIGNATURE-


[ccp4bb] mmCIF as working format?

2013-08-05 Thread Tim Gruene
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dear all,

having read Gerard Kleywegt's latest announcement on the wwPDB Workshop
(1st August) made me wonder whether it is planned to introduce mmCIF as
working format to users in addition to using it at e.g. the PDB, because
I think that would make life unnecessarily complicated.

The example mmCIF file for GroEL is about 7.5 times bigger than its PDB
file.
I know that disk space is 'cheap' nowadays, but that does not make it fast.

And personally I find mmCIF very awkward to work with, since it is not
line-oriented. 'grep', 'awk', 'perl' etc. do not work well on XML-like
files.
Instead of using mmCIF, one could, e.g. introduce a free format PDB
format, with space holders for non-assigned entities, and maybe a line
continuation character.

If mmCIF is not going to be the working format for MX (refinement)
programs I would be happy for a reassurance, and otherwise I would
appreciate some comments about the benefits of an XML file format over a
line-oriented free format for the scientists that work with structural data.
I my opinion, using XML (or mmCIF) for structural information is an
attempt of programmers to make themselves more indespensable to
scientists, rather than scientifically needed.

Best,
Tim

- -- 
- --
Dr Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFR/1xbUxlJ7aRr7hoRAkLNAKClH9RpAA7NJsH3YFOTguOo9kjwoQCZAf/m
JF1oyJNuq+8b+VsywDupElo=
=bvb3
-END PGP SIGNATURE-