Re: [ccp4bb] [phenixbb] Calculate average B-factor?

2014-10-06 Thread Tim Gruene
Dear Jose,

the question came up again because I did not receive an answer to my
question. The thread discussed benefits and malefits of PDB vs. mmCIF,
which was not my question.
This time, Nat Echolls gave a very reasonable answer (at least for
phenix) on the phenixbb, i.e., that there are no plans to abandon the
PDB format (as working format), but very likely a smooth transition will
take place - I guess this will be more slowly than the enforcement of
the PDB to upload PDBx/mmCIF files for archiving. I agree that for
archiving mmCIF is a reasonable format, but I guess less than 1% of all
structures in the PDB hit the limits of the PDB format.

I greatly appreciate Nat's answer and I would appreciate an answer from
the responsibles for the other refinement programs.

Best,
Tim

On 10/05/2014 08:05 PM, Jose Manuel Duarte wrote:
 Thanks Frances for the explanation. Indeed mmCIF format is a lot more
 complicated and grep can be a dangerous tool to use with them. But for
 most cases it can do the job and thus it maintains some sort of
 backwards compatibility. I can't agree more that using specialised tools
 (for either PDB files or mmCIF files) that deal with the formats
 properly is the best solution (see for instance
 http://mmcif.wwpdb.org/docs/software-resources.html for some of the
 mmCIF readers).
 
 In any case I find it most surprising that this topic came yet again to
 this BB, when it was thoroughly discussed last year in this thread:
 
 https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1308L=ccp4bbD=0P=26939
 
 I'm not sure why this kind of urban legends on the evilness of the mmCIF
 format keep coming back to the list...
 
 As explained there and elsewhere endless times, the PDB format is
 inadequate to represent the complexity of macromolecules and has been
 needing a replacement for a long time. The decision to move on to mmCIF
 has been made and in my opinion the sooner we move forward the better.
 
 Cheers
 
 Jose
 
 
 
 On 05.10.2014 15:52, Frances C. Bernstein wrote:
 mmCIF is a very general format with tag-value pairs, and loops
 so that tags do not need to be repeated endlessly.  It was
 designed so that there is the flexibility of defining new terms
 easily and presenting the data in any order and with any kind
 of spacing.

 I understand that there are 10+ files in cyberspace prepared
 by the PDB and that they all have the 'same' format.

 It is tempting to write software that treats these files as fixed
 format and hope that all software packages that generate coordinate
 files will use the same fixed format.  But that loses the generality
 and flexibility of mmCIF, and software written that way will fail
 when some field requires more characters or a new field is added.
 There are software tools to allow one to read and extract data from
 any mmCIF file; using these is more complicated than using grep but
 using these assures that one's software will not fail when it encounters
 a date file that is not exactly what the PDB is currently producing.

 Note that mmCIf was defined when the limitations of the fixed-format
 PDB format became apparent with large structures.  Let's not repeat
 the mistakes of the past.

 Frances

 =
 Bernstein + Sons
 *   *   Information Systems Consultants
 5 Brewster Lane, Bellport, NY 11713-2803
 *   * ***
  *Frances C. Bernstein
   *   ***  f...@bernstein-plus-sons.com
  *** *
   *   *** 1-631-286-1339FAX: 1-631-286-1999
 =

 On Sun, 5 Oct 2014, Tim Gruene wrote:

 Hi Jose,

 I see. In the example on page
 http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Categories/atom_site.html,

 it is in field 12, though, and I would have thought that mmCIF allows
 line breaks.

 But as long as all developers writing PDBx/mmCIF with their programs
 follow the PDB constraints (``styling plans'' in their FAQ), everything
 is fine.

 Cheers,
 Tim

 On 10/05/2014 01:13 PM, Jose Manuel Duarte wrote:
 Well, if you simply replace that beauty by this one:

 grep ^ATOM filename.cif | awk '{print $15}' | awk '{s+=$1;} END
 {print
 s/NR;}'

 You will achieve exactly the same result (the b-factors are in the 15th
 field of the _atom_site section in deposited mmCIF files). I'm not an
 expert in awk, but I'm sure that can be made even shorter ;)

 It is important to keep in mind that mmCIF files are designed to be
 usable with grep-like tools, so I don't see any problems in moving
 forward to that format. Whilst I see a lot of problems in staying with
 the classic PDB format.

 Cheers

 Jose



 On 05.10.2014 11:18, Tim Gruene wrote:
 Hi all,

 reading this beauty I would like to ask a question to the respective
 developers:
 Will the PDB format remain the working format for the users and only
 upon deposition will it be converted to PDBml for archiving
 purposes, or
 are the 

Re: [ccp4bb] [phenixbb] Calculate average B-factor?

2014-10-06 Thread Dale Tronrud
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



On 10/6/2014 4:20 AM, Tim Gruene wrote:
 Dear Jose,
 
 the question came up again because I did not receive an answer to 
 my question. The thread discussed benefits and malefits of PDB vs. 
 mmCIF, which was not my question. This time, Nat Echolls gave a 
 very reasonable answer (at least for phenix) on the phenixbb,
 i.e., that there are no plans to abandon the PDB format (as
 working format), but very likely a smooth transition will take
 place - I guess this will be more slowly than the enforcement of
 the PDB to upload PDBx/mmCIF files for archiving. I agree that for
 archiving mmCIF is a reasonable format, but I guess less than 1% of
 all structures in the PDB hit the limits of the PDB format.
 
That's odd. I've found that just about every structure I've worked
on in the last couple decades has not been able to be expressed in the
PDB format without loss of information. A primary example? Try
expressing a pair of side chains that have alternative conformation in
a PDB file. Okay, one conformation is A and the other is B. That
allows me a total of twelve pairs of side chains before I run out of
upper case letters. Most people hack their model by reusing A and
B but of course that is ambiguous about where you mean the A's are
the same and where they are different. A realistic model of the
surface of a protein cannot be expressed in the PDB format.

How many models are refined with TLS B factors? There still is no
way to describe TLS in the PDB format. Don't tell me it's stuffed in
REMARK! What kind of a file format is that?

I believe that 100% of the models that we should be building can't
be described in the PDB file format, and that has been true for a
great many years.

Dale Tronrud

 I greatly appreciate Nat's answer and I would appreciate an answer 
 from the responsibles for the other refinement programs.
 
 Best, Tim
 
 On 10/05/2014 08:05 PM, Jose Manuel Duarte wrote:
 Thanks Frances for the explanation. Indeed mmCIF format is a lot 
 more complicated and grep can be a dangerous tool to use with 
 them. But for most cases it can do the job and thus it maintains 
 some sort of backwards compatibility. I can't agree more that 
 using specialised tools (for either PDB files or mmCIF files) 
 that deal with the formats properly is the best solution (see
 for instance http://mmcif.wwpdb.org/docs/software-resources.html
 for some of the mmCIF readers).
 
 In any case I find it most surprising that this topic came yet 
 again to this BB, when it was thoroughly discussed last year in 
 this thread:
 
 https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1308L=ccp4bbD=0P=26939



 
I'm not sure why this kind of urban legends on the evilness of the mmCIF
 format keep coming back to the list...
 
 As explained there and elsewhere endless times, the PDB format is
 inadequate to represent the complexity of macromolecules and has
 been needing a replacement for a long time. The decision to move
 on to mmCIF has been made and in my opinion the sooner we move
 forward the better.
 
 Cheers
 
 Jose
 
 
 
 On 05.10.2014 15:52, Frances C. Bernstein wrote:
 mmCIF is a very general format with tag-value pairs, and loops
  so that tags do not need to be repeated endlessly.  It was 
 designed so that there is the flexibility of defining new terms
 easily and presenting the data in any order and with any kind
 of spacing.
 
 I understand that there are 10+ files in cyberspace 
 prepared by the PDB and that they all have the 'same' format.
 
 It is tempting to write software that treats these files as 
 fixed format and hope that all software packages that generate 
 coordinate files will use the same fixed format.  But that 
 loses the generality and flexibility of mmCIF, and software 
 written that way will fail when some field requires more 
 characters or a new field is added. There are software tools
 to allow one to read and extract data from any mmCIF file;
 using these is more complicated than using grep but using
 these assures that one's software will not fail when it
 encounters a date file that is not exactly what the PDB is
 currently producing.
 
 Note that mmCIf was defined when the limitations of the 
 fixed-format PDB format became apparent with large structures. 
 Let's not repeat the mistakes of the past.
 
 Frances
 
 =  
 Bernstein + Sons *   *   Information Systems Consultants 
 5 Brewster Lane, Bellport, NY 11713-2803 *   * ***
  *Frances C. Bernstein *   *** 
 f...@bernstein-plus-sons.com *** * *   *** 1-631-286-1339 
 FAX: 1-631-286-1999 
 =
 
 On Sun, 5 Oct 2014, Tim Gruene wrote:
 
 Hi Jose,
 
 I see. In the example on page 
 http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Categories/atom_site.html,



 
it is in field 12, though, and I would have thought that mmCIF allows
 line breaks.
 
 But as 

Re: [ccp4bb] [phenixbb] Calculate average B-factor?

2014-10-05 Thread Tim Gruene
Hi all,

reading this beauty I would like to ask a question to the respective
developers:
Will the PDB format remain the working format for the users and only
upon deposition will it be converted to PDBml for archiving purposes, or
are the refinement programs (et al.) going to abandon PDB, too?

Best,
Tim

On 10/04/2014 10:32 PM, Ed Pozharski wrote:
 grep ^ATOM   filename.pdb | cut -c 61-66 | awk '{s+=$1;} END {print s/NR;}'
 
 Nobody likes a show off, Private
Skipper
 
 Cheers
 
 
 Sent on a Sprint Samsung Galaxy S® III
 
 div Original message /divdivFrom: Chen Zhao 
 c.z...@yale.edu /divdivDate:10/04/2014  4:03 PM  (GMT-05:00) 
 /divdivTo: PHENIX user mailing list pheni...@phenix-online.org 
 /divdivSubject: [phenixbb] Calculate average B-factor? /divdiv
 /divDear all,
 
 I am just wondering whether there is a command line tool in phenix that 
 calculates the average B-factor of a PDB file? Can it deal with the ANISOU 
 records (from TLS refinement or not) properly? I looked into previous posts 
 but the  --show-adp-statistics option in phenix.pdbtools seems to be no 
 longer available in the version (1.9-1678) I installed.
 
 Thank you so much,
 Chen
 
 
 
 ___
 phenixbb mailing list
 pheni...@phenix-online.org
 http://phenix-online.org/mailman/listinfo/phenixbb
 

-- 
Dr Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A



signature.asc
Description: OpenPGP digital signature


Re: [ccp4bb] [phenixbb] Calculate average B-factor?

2014-10-05 Thread Jose Manuel Duarte

Well, if you simply replace that beauty by this one:

grep ^ATOM filename.cif | awk '{print $15}' | awk '{s+=$1;} END {print 
s/NR;}'


You will achieve exactly the same result (the b-factors are in the 15th 
field of the _atom_site section in deposited mmCIF files). I'm not an 
expert in awk, but I'm sure that can be made even shorter ;)


It is important to keep in mind that mmCIF files are designed to be 
usable with grep-like tools, so I don't see any problems in moving 
forward to that format. Whilst I see a lot of problems in staying with 
the classic PDB format.


Cheers

Jose



On 05.10.2014 11:18, Tim Gruene wrote:

Hi all,

reading this beauty I would like to ask a question to the respective
developers:
Will the PDB format remain the working format for the users and only
upon deposition will it be converted to PDBml for archiving purposes, or
are the refinement programs (et al.) going to abandon PDB, too?

Best,
Tim

On 10/04/2014 10:32 PM, Ed Pozharski wrote:

grep ^ATOM   filename.pdb | cut -c 61-66 | awk '{s+=$1;} END {print s/NR;}'

Nobody likes a show off, Private
Skipper

Cheers


Sent on a Sprint Samsung Galaxy S® III

div Original message /divdivFrom: Chen Zhao c.z...@yale.edu /divdivDate:10/04/2014  4:03 PM  
(GMT-05:00) /divdivTo: PHENIX user mailing list pheni...@phenix-online.org /divdivSubject: [phenixbb] Calculate 
average B-factor? /divdiv
/divDear all,

I am just wondering whether there is a command line tool in phenix that 
calculates the average B-factor of a PDB file? Can it deal with the ANISOU 
records (from TLS refinement or not) properly? I looked into previous posts but 
the  --show-adp-statistics option in phenix.pdbtools seems to be no longer 
available in the version (1.9-1678) I installed.

Thank you so much,
Chen



___
phenixbb mailing list
pheni...@phenix-online.org
http://phenix-online.org/mailman/listinfo/phenixbb



Re: [ccp4bb] [phenixbb] Calculate average B-factor?

2014-10-05 Thread Frances C. Bernstein

mmCIF is a very general format with tag-value pairs, and loops
so that tags do not need to be repeated endlessly.  It was
designed so that there is the flexibility of defining new terms
easily and presenting the data in any order and with any kind
of spacing.

I understand that there are 10+ files in cyberspace prepared
by the PDB and that they all have the 'same' format.

It is tempting to write software that treats these files as fixed
format and hope that all software packages that generate coordinate
files will use the same fixed format.  But that loses the generality
and flexibility of mmCIF, and software written that way will fail
when some field requires more characters or a new field is added.
There are software tools to allow one to read and extract data from
any mmCIF file; using these is more complicated than using grep but
using these assures that one's software will not fail when it encounters
a date file that is not exactly what the PDB is currently producing.

Note that mmCIf was defined when the limitations of the fixed-format
PDB format became apparent with large structures.  Let's not repeat
the mistakes of the past.

Frances

=
Bernstein + Sons
*   *   Information Systems Consultants
5 Brewster Lane, Bellport, NY 11713-2803
*   * ***
 *Frances C. Bernstein
  *   ***  f...@bernstein-plus-sons.com
 *** *
  *   *** 1-631-286-1339FAX: 1-631-286-1999
=

On Sun, 5 Oct 2014, Tim Gruene wrote:


Hi Jose,

I see. In the example on page
http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Categories/atom_site.html,
it is in field 12, though, and I would have thought that mmCIF allows
line breaks.

But as long as all developers writing PDBx/mmCIF with their programs
follow the PDB constraints (``styling plans'' in their FAQ), everything
is fine.

Cheers,
Tim

On 10/05/2014 01:13 PM, Jose Manuel Duarte wrote:

Well, if you simply replace that beauty by this one:

grep ^ATOM filename.cif | awk '{print $15}' | awk '{s+=$1;} END {print
s/NR;}'

You will achieve exactly the same result (the b-factors are in the 15th
field of the _atom_site section in deposited mmCIF files). I'm not an
expert in awk, but I'm sure that can be made even shorter ;)

It is important to keep in mind that mmCIF files are designed to be
usable with grep-like tools, so I don't see any problems in moving
forward to that format. Whilst I see a lot of problems in staying with
the classic PDB format.

Cheers

Jose



On 05.10.2014 11:18, Tim Gruene wrote:

Hi all,

reading this beauty I would like to ask a question to the respective
developers:
Will the PDB format remain the working format for the users and only
upon deposition will it be converted to PDBml for archiving purposes, or
are the refinement programs (et al.) going to abandon PDB, too?

Best,
Tim

On 10/04/2014 10:32 PM, Ed Pozharski wrote:

grep ^ATOM   filename.pdb | cut -c 61-66 | awk '{s+=$1;} END {print
s/NR;}'

Nobody likes a show off, Private
Skipper

Cheers


Sent on a Sprint Samsung Galaxy S? III

div Original message /divdivFrom: Chen Zhao
c.z...@yale.edu /divdivDate:10/04/2014  4:03 PM  (GMT-05:00)
/divdivTo: PHENIX user mailing list pheni...@phenix-online.org
/divdivSubject: [phenixbb] Calculate average B-factor? /divdiv
/divDear all,

I am just wondering whether there is a command line tool in phenix
that calculates the average B-factor of a PDB file? Can it deal with
the ANISOU records (from TLS refinement or not) properly? I looked
into previous posts but the  --show-adp-statistics option in
phenix.pdbtools seems to be no longer available in the version
(1.9-1678) I installed.

Thank you so much,
Chen



___
phenixbb mailing list
pheni...@phenix-online.org
http://phenix-online.org/mailman/listinfo/phenixbb





--
Dr Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A




Re: [ccp4bb] [phenixbb] Calculate average B-factor?

2014-10-05 Thread Kay Diederichs
On Sun, 5 Oct 2014 13:13:14 +0200, Jose Manuel Duarte jose.dua...@psi.ch 
wrote:

Well, if you simply replace that beauty by this one:

grep ^ATOM filename.cif | awk '{print $15}' | awk '{s+=$1;} END {print
s/NR;}'


the problem with this is that it will break if there is any B-factor of 100 or 
higher, because then the blank will be missing ...

Kay

You will achieve exactly the same result (the b-factors are in the 15th
field of the _atom_site section in deposited mmCIF files). I'm not an
expert in awk, but I'm sure that can be made even shorter ;)

It is important to keep in mind that mmCIF files are designed to be
usable with grep-like tools, so I don't see any problems in moving
forward to that format. Whilst I see a lot of problems in staying with
the classic PDB format.

Cheers

Jose



On 05.10.2014 11:18, Tim Gruene wrote:
 Hi all,

 reading this beauty I would like to ask a question to the respective
 developers:
 Will the PDB format remain the working format for the users and only
 upon deposition will it be converted to PDBml for archiving purposes, or
 are the refinement programs (et al.) going to abandon PDB, too?

 Best,
 Tim

 On 10/04/2014 10:32 PM, Ed Pozharski wrote:
 grep ^ATOM   filename.pdb | cut -c 61-66 | awk '{s+=$1;} END {print 
 s/NR;}'

 Nobody likes a show off, Private
 Skipper

 Cheers


 Sent on a Sprint Samsung Galaxy S� III

 div Original message /divdivFrom: Chen Zhao 
 c.z...@yale.edu /divdivDate:10/04/2014  4:03 PM  (GMT-05:00) 
 /divdivTo: PHENIX user mailing list pheni...@phenix-online.org 
 /divdivSubject: [phenixbb] Calculate average B-factor? /divdiv
 /divDear all,

 I am just wondering whether there is a command line tool in phenix that 
 calculates the average B-factor of a PDB file? Can it deal with the ANISOU 
 records (from TLS refinement or not) properly? I looked into previous posts 
 but the  --show-adp-statistics option in phenix.pdbtools seems to be no 
 longer available in the version (1.9-1678) I installed.

 Thank you so much,
 Chen



 ___
 phenixbb mailing list
 pheni...@phenix-online.org
 http://phenix-online.org/mailman/listinfo/phenixbb



Re: [ccp4bb] [phenixbb] Calculate average B-factor?

2014-10-05 Thread Kay Diederichs
On Sun, 5 Oct 2014 18:54:16 +0100, Kay Diederichs 
kay.diederi...@uni-konstanz.de wrote:

grep ^ATOM filename.cif | awk '{print $15}' | awk '{s+=$1;} END {print
s/NR;}'


the problem with this is that it will break if there is any B-factor of 100 or 
higher, because then the blank will be missing ...


sorry, I didn't read properly - this would break with PDB files, but the 
message was about CIF files - I didn't check their format.

Kay


Re: [ccp4bb] [phenixbb] Calculate average B-factor?

2014-10-05 Thread Jose Manuel Duarte
Thanks Frances for the explanation. Indeed mmCIF format is a lot more 
complicated and grep can be a dangerous tool to use with them. But for 
most cases it can do the job and thus it maintains some sort of 
backwards compatibility. I can't agree more that using specialised tools 
(for either PDB files or mmCIF files) that deal with the formats 
properly is the best solution (see for instance 
http://mmcif.wwpdb.org/docs/software-resources.html for some of the 
mmCIF readers).


In any case I find it most surprising that this topic came yet again to 
this BB, when it was thoroughly discussed last year in this thread:


https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1308L=ccp4bbD=0P=26939

I'm not sure why this kind of urban legends on the evilness of the mmCIF 
format keep coming back to the list...


As explained there and elsewhere endless times, the PDB format is 
inadequate to represent the complexity of macromolecules and has been 
needing a replacement for a long time. The decision to move on to mmCIF 
has been made and in my opinion the sooner we move forward the better.


Cheers

Jose



On 05.10.2014 15:52, Frances C. Bernstein wrote:

mmCIF is a very general format with tag-value pairs, and loops
so that tags do not need to be repeated endlessly.  It was
designed so that there is the flexibility of defining new terms
easily and presenting the data in any order and with any kind
of spacing.

I understand that there are 10+ files in cyberspace prepared
by the PDB and that they all have the 'same' format.

It is tempting to write software that treats these files as fixed
format and hope that all software packages that generate coordinate
files will use the same fixed format.  But that loses the generality
and flexibility of mmCIF, and software written that way will fail
when some field requires more characters or a new field is added.
There are software tools to allow one to read and extract data from
any mmCIF file; using these is more complicated than using grep but
using these assures that one's software will not fail when it encounters
a date file that is not exactly what the PDB is currently producing.

Note that mmCIf was defined when the limitations of the fixed-format
PDB format became apparent with large structures.  Let's not repeat
the mistakes of the past.

Frances

=
Bernstein + Sons
*   *   Information Systems Consultants
5 Brewster Lane, Bellport, NY 11713-2803
*   * ***
 *Frances C. Bernstein
  *   ***  f...@bernstein-plus-sons.com
 *** *
  *   *** 1-631-286-1339FAX: 1-631-286-1999
=

On Sun, 5 Oct 2014, Tim Gruene wrote:


Hi Jose,

I see. In the example on page
http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Categories/atom_site.html, 


it is in field 12, though, and I would have thought that mmCIF allows
line breaks.

But as long as all developers writing PDBx/mmCIF with their programs
follow the PDB constraints (``styling plans'' in their FAQ), everything
is fine.

Cheers,
Tim

On 10/05/2014 01:13 PM, Jose Manuel Duarte wrote:

Well, if you simply replace that beauty by this one:

grep ^ATOM filename.cif | awk '{print $15}' | awk '{s+=$1;} END 
{print

s/NR;}'

You will achieve exactly the same result (the b-factors are in the 15th
field of the _atom_site section in deposited mmCIF files). I'm not an
expert in awk, but I'm sure that can be made even shorter ;)

It is important to keep in mind that mmCIF files are designed to be
usable with grep-like tools, so I don't see any problems in moving
forward to that format. Whilst I see a lot of problems in staying with
the classic PDB format.

Cheers

Jose



On 05.10.2014 11:18, Tim Gruene wrote:

Hi all,

reading this beauty I would like to ask a question to the respective
developers:
Will the PDB format remain the working format for the users and only
upon deposition will it be converted to PDBml for archiving 
purposes, or

are the refinement programs (et al.) going to abandon PDB, too?

Best,
Tim

On 10/04/2014 10:32 PM, Ed Pozharski wrote:

grep ^ATOM   filename.pdb | cut -c 61-66 | awk '{s+=$1;} END {print
s/NR;}'

Nobody likes a show off, Private
Skipper

Cheers


Sent on a Sprint Samsung Galaxy S? III

div Original message /divdivFrom: Chen Zhao
c.z...@yale.edu /divdivDate:10/04/2014  4:03 PM (GMT-05:00)
/divdivTo: PHENIX user mailing list pheni...@phenix-online.org
/divdivSubject: [phenixbb] Calculate average B-factor? 
/divdiv

/divDear all,

I am just wondering whether there is a command line tool in phenix
that calculates the average B-factor of a PDB file? Can it deal with
the ANISOU records (from TLS refinement or not) properly? I looked
into previous posts but the  --show-adp-statistics option in
phenix.pdbtools seems to be no longer available in the