Re: [ccp4bb] [phenixbb] Calculate average B-factor?
Dear Jose, the question came up again because I did not receive an answer to my question. The thread discussed benefits and malefits of PDB vs. mmCIF, which was not my question. This time, Nat Echolls gave a very reasonable answer (at least for phenix) on the phenixbb, i.e., that there are no plans to abandon the PDB format (as working format), but very likely a smooth transition will take place - I guess this will be more slowly than the enforcement of the PDB to upload PDBx/mmCIF files for archiving. I agree that for archiving mmCIF is a reasonable format, but I guess less than 1% of all structures in the PDB hit the limits of the PDB format. I greatly appreciate Nat's answer and I would appreciate an answer from the responsibles for the other refinement programs. Best, Tim On 10/05/2014 08:05 PM, Jose Manuel Duarte wrote: Thanks Frances for the explanation. Indeed mmCIF format is a lot more complicated and grep can be a dangerous tool to use with them. But for most cases it can do the job and thus it maintains some sort of backwards compatibility. I can't agree more that using specialised tools (for either PDB files or mmCIF files) that deal with the formats properly is the best solution (see for instance http://mmcif.wwpdb.org/docs/software-resources.html for some of the mmCIF readers). In any case I find it most surprising that this topic came yet again to this BB, when it was thoroughly discussed last year in this thread: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1308L=ccp4bbD=0P=26939 I'm not sure why this kind of urban legends on the evilness of the mmCIF format keep coming back to the list... As explained there and elsewhere endless times, the PDB format is inadequate to represent the complexity of macromolecules and has been needing a replacement for a long time. The decision to move on to mmCIF has been made and in my opinion the sooner we move forward the better. Cheers Jose On 05.10.2014 15:52, Frances C. Bernstein wrote: mmCIF is a very general format with tag-value pairs, and loops so that tags do not need to be repeated endlessly. It was designed so that there is the flexibility of defining new terms easily and presenting the data in any order and with any kind of spacing. I understand that there are 10+ files in cyberspace prepared by the PDB and that they all have the 'same' format. It is tempting to write software that treats these files as fixed format and hope that all software packages that generate coordinate files will use the same fixed format. But that loses the generality and flexibility of mmCIF, and software written that way will fail when some field requires more characters or a new field is added. There are software tools to allow one to read and extract data from any mmCIF file; using these is more complicated than using grep but using these assures that one's software will not fail when it encounters a date file that is not exactly what the PDB is currently producing. Note that mmCIf was defined when the limitations of the fixed-format PDB format became apparent with large structures. Let's not repeat the mistakes of the past. Frances = Bernstein + Sons * * Information Systems Consultants 5 Brewster Lane, Bellport, NY 11713-2803 * * *** *Frances C. Bernstein * *** f...@bernstein-plus-sons.com *** * * *** 1-631-286-1339FAX: 1-631-286-1999 = On Sun, 5 Oct 2014, Tim Gruene wrote: Hi Jose, I see. In the example on page http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Categories/atom_site.html, it is in field 12, though, and I would have thought that mmCIF allows line breaks. But as long as all developers writing PDBx/mmCIF with their programs follow the PDB constraints (``styling plans'' in their FAQ), everything is fine. Cheers, Tim On 10/05/2014 01:13 PM, Jose Manuel Duarte wrote: Well, if you simply replace that beauty by this one: grep ^ATOM filename.cif | awk '{print $15}' | awk '{s+=$1;} END {print s/NR;}' You will achieve exactly the same result (the b-factors are in the 15th field of the _atom_site section in deposited mmCIF files). I'm not an expert in awk, but I'm sure that can be made even shorter ;) It is important to keep in mind that mmCIF files are designed to be usable with grep-like tools, so I don't see any problems in moving forward to that format. Whilst I see a lot of problems in staying with the classic PDB format. Cheers Jose On 05.10.2014 11:18, Tim Gruene wrote: Hi all, reading this beauty I would like to ask a question to the respective developers: Will the PDB format remain the working format for the users and only upon deposition will it be converted to PDBml for archiving purposes, or are the
Re: [ccp4bb] [phenixbb] Calculate average B-factor?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/6/2014 4:20 AM, Tim Gruene wrote: Dear Jose, the question came up again because I did not receive an answer to my question. The thread discussed benefits and malefits of PDB vs. mmCIF, which was not my question. This time, Nat Echolls gave a very reasonable answer (at least for phenix) on the phenixbb, i.e., that there are no plans to abandon the PDB format (as working format), but very likely a smooth transition will take place - I guess this will be more slowly than the enforcement of the PDB to upload PDBx/mmCIF files for archiving. I agree that for archiving mmCIF is a reasonable format, but I guess less than 1% of all structures in the PDB hit the limits of the PDB format. That's odd. I've found that just about every structure I've worked on in the last couple decades has not been able to be expressed in the PDB format without loss of information. A primary example? Try expressing a pair of side chains that have alternative conformation in a PDB file. Okay, one conformation is A and the other is B. That allows me a total of twelve pairs of side chains before I run out of upper case letters. Most people hack their model by reusing A and B but of course that is ambiguous about where you mean the A's are the same and where they are different. A realistic model of the surface of a protein cannot be expressed in the PDB format. How many models are refined with TLS B factors? There still is no way to describe TLS in the PDB format. Don't tell me it's stuffed in REMARK! What kind of a file format is that? I believe that 100% of the models that we should be building can't be described in the PDB file format, and that has been true for a great many years. Dale Tronrud I greatly appreciate Nat's answer and I would appreciate an answer from the responsibles for the other refinement programs. Best, Tim On 10/05/2014 08:05 PM, Jose Manuel Duarte wrote: Thanks Frances for the explanation. Indeed mmCIF format is a lot more complicated and grep can be a dangerous tool to use with them. But for most cases it can do the job and thus it maintains some sort of backwards compatibility. I can't agree more that using specialised tools (for either PDB files or mmCIF files) that deal with the formats properly is the best solution (see for instance http://mmcif.wwpdb.org/docs/software-resources.html for some of the mmCIF readers). In any case I find it most surprising that this topic came yet again to this BB, when it was thoroughly discussed last year in this thread: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1308L=ccp4bbD=0P=26939 I'm not sure why this kind of urban legends on the evilness of the mmCIF format keep coming back to the list... As explained there and elsewhere endless times, the PDB format is inadequate to represent the complexity of macromolecules and has been needing a replacement for a long time. The decision to move on to mmCIF has been made and in my opinion the sooner we move forward the better. Cheers Jose On 05.10.2014 15:52, Frances C. Bernstein wrote: mmCIF is a very general format with tag-value pairs, and loops so that tags do not need to be repeated endlessly. It was designed so that there is the flexibility of defining new terms easily and presenting the data in any order and with any kind of spacing. I understand that there are 10+ files in cyberspace prepared by the PDB and that they all have the 'same' format. It is tempting to write software that treats these files as fixed format and hope that all software packages that generate coordinate files will use the same fixed format. But that loses the generality and flexibility of mmCIF, and software written that way will fail when some field requires more characters or a new field is added. There are software tools to allow one to read and extract data from any mmCIF file; using these is more complicated than using grep but using these assures that one's software will not fail when it encounters a date file that is not exactly what the PDB is currently producing. Note that mmCIf was defined when the limitations of the fixed-format PDB format became apparent with large structures. Let's not repeat the mistakes of the past. Frances = Bernstein + Sons * * Information Systems Consultants 5 Brewster Lane, Bellport, NY 11713-2803 * * *** *Frances C. Bernstein * *** f...@bernstein-plus-sons.com *** * * *** 1-631-286-1339 FAX: 1-631-286-1999 = On Sun, 5 Oct 2014, Tim Gruene wrote: Hi Jose, I see. In the example on page http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Categories/atom_site.html, it is in field 12, though, and I would have thought that mmCIF allows line breaks. But as
Re: [ccp4bb] [phenixbb] Calculate average B-factor?
Hi all, reading this beauty I would like to ask a question to the respective developers: Will the PDB format remain the working format for the users and only upon deposition will it be converted to PDBml for archiving purposes, or are the refinement programs (et al.) going to abandon PDB, too? Best, Tim On 10/04/2014 10:32 PM, Ed Pozharski wrote: grep ^ATOM filename.pdb | cut -c 61-66 | awk '{s+=$1;} END {print s/NR;}' Nobody likes a show off, Private Skipper Cheers Sent on a Sprint Samsung Galaxy S® III div Original message /divdivFrom: Chen Zhao c.z...@yale.edu /divdivDate:10/04/2014 4:03 PM (GMT-05:00) /divdivTo: PHENIX user mailing list pheni...@phenix-online.org /divdivSubject: [phenixbb] Calculate average B-factor? /divdiv /divDear all, I am just wondering whether there is a command line tool in phenix that calculates the average B-factor of a PDB file? Can it deal with the ANISOU records (from TLS refinement or not) properly? I looked into previous posts but the --show-adp-statistics option in phenix.pdbtools seems to be no longer available in the version (1.9-1678) I installed. Thank you so much, Chen ___ phenixbb mailing list pheni...@phenix-online.org http://phenix-online.org/mailman/listinfo/phenixbb -- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A signature.asc Description: OpenPGP digital signature
Re: [ccp4bb] [phenixbb] Calculate average B-factor?
Well, if you simply replace that beauty by this one: grep ^ATOM filename.cif | awk '{print $15}' | awk '{s+=$1;} END {print s/NR;}' You will achieve exactly the same result (the b-factors are in the 15th field of the _atom_site section in deposited mmCIF files). I'm not an expert in awk, but I'm sure that can be made even shorter ;) It is important to keep in mind that mmCIF files are designed to be usable with grep-like tools, so I don't see any problems in moving forward to that format. Whilst I see a lot of problems in staying with the classic PDB format. Cheers Jose On 05.10.2014 11:18, Tim Gruene wrote: Hi all, reading this beauty I would like to ask a question to the respective developers: Will the PDB format remain the working format for the users and only upon deposition will it be converted to PDBml for archiving purposes, or are the refinement programs (et al.) going to abandon PDB, too? Best, Tim On 10/04/2014 10:32 PM, Ed Pozharski wrote: grep ^ATOM filename.pdb | cut -c 61-66 | awk '{s+=$1;} END {print s/NR;}' Nobody likes a show off, Private Skipper Cheers Sent on a Sprint Samsung Galaxy S® III div Original message /divdivFrom: Chen Zhao c.z...@yale.edu /divdivDate:10/04/2014 4:03 PM (GMT-05:00) /divdivTo: PHENIX user mailing list pheni...@phenix-online.org /divdivSubject: [phenixbb] Calculate average B-factor? /divdiv /divDear all, I am just wondering whether there is a command line tool in phenix that calculates the average B-factor of a PDB file? Can it deal with the ANISOU records (from TLS refinement or not) properly? I looked into previous posts but the --show-adp-statistics option in phenix.pdbtools seems to be no longer available in the version (1.9-1678) I installed. Thank you so much, Chen ___ phenixbb mailing list pheni...@phenix-online.org http://phenix-online.org/mailman/listinfo/phenixbb
Re: [ccp4bb] [phenixbb] Calculate average B-factor?
mmCIF is a very general format with tag-value pairs, and loops so that tags do not need to be repeated endlessly. It was designed so that there is the flexibility of defining new terms easily and presenting the data in any order and with any kind of spacing. I understand that there are 10+ files in cyberspace prepared by the PDB and that they all have the 'same' format. It is tempting to write software that treats these files as fixed format and hope that all software packages that generate coordinate files will use the same fixed format. But that loses the generality and flexibility of mmCIF, and software written that way will fail when some field requires more characters or a new field is added. There are software tools to allow one to read and extract data from any mmCIF file; using these is more complicated than using grep but using these assures that one's software will not fail when it encounters a date file that is not exactly what the PDB is currently producing. Note that mmCIf was defined when the limitations of the fixed-format PDB format became apparent with large structures. Let's not repeat the mistakes of the past. Frances = Bernstein + Sons * * Information Systems Consultants 5 Brewster Lane, Bellport, NY 11713-2803 * * *** *Frances C. Bernstein * *** f...@bernstein-plus-sons.com *** * * *** 1-631-286-1339FAX: 1-631-286-1999 = On Sun, 5 Oct 2014, Tim Gruene wrote: Hi Jose, I see. In the example on page http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Categories/atom_site.html, it is in field 12, though, and I would have thought that mmCIF allows line breaks. But as long as all developers writing PDBx/mmCIF with their programs follow the PDB constraints (``styling plans'' in their FAQ), everything is fine. Cheers, Tim On 10/05/2014 01:13 PM, Jose Manuel Duarte wrote: Well, if you simply replace that beauty by this one: grep ^ATOM filename.cif | awk '{print $15}' | awk '{s+=$1;} END {print s/NR;}' You will achieve exactly the same result (the b-factors are in the 15th field of the _atom_site section in deposited mmCIF files). I'm not an expert in awk, but I'm sure that can be made even shorter ;) It is important to keep in mind that mmCIF files are designed to be usable with grep-like tools, so I don't see any problems in moving forward to that format. Whilst I see a lot of problems in staying with the classic PDB format. Cheers Jose On 05.10.2014 11:18, Tim Gruene wrote: Hi all, reading this beauty I would like to ask a question to the respective developers: Will the PDB format remain the working format for the users and only upon deposition will it be converted to PDBml for archiving purposes, or are the refinement programs (et al.) going to abandon PDB, too? Best, Tim On 10/04/2014 10:32 PM, Ed Pozharski wrote: grep ^ATOM filename.pdb | cut -c 61-66 | awk '{s+=$1;} END {print s/NR;}' Nobody likes a show off, Private Skipper Cheers Sent on a Sprint Samsung Galaxy S? III div Original message /divdivFrom: Chen Zhao c.z...@yale.edu /divdivDate:10/04/2014 4:03 PM (GMT-05:00) /divdivTo: PHENIX user mailing list pheni...@phenix-online.org /divdivSubject: [phenixbb] Calculate average B-factor? /divdiv /divDear all, I am just wondering whether there is a command line tool in phenix that calculates the average B-factor of a PDB file? Can it deal with the ANISOU records (from TLS refinement or not) properly? I looked into previous posts but the --show-adp-statistics option in phenix.pdbtools seems to be no longer available in the version (1.9-1678) I installed. Thank you so much, Chen ___ phenixbb mailing list pheni...@phenix-online.org http://phenix-online.org/mailman/listinfo/phenixbb -- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A
Re: [ccp4bb] [phenixbb] Calculate average B-factor?
On Sun, 5 Oct 2014 13:13:14 +0200, Jose Manuel Duarte jose.dua...@psi.ch wrote: Well, if you simply replace that beauty by this one: grep ^ATOM filename.cif | awk '{print $15}' | awk '{s+=$1;} END {print s/NR;}' the problem with this is that it will break if there is any B-factor of 100 or higher, because then the blank will be missing ... Kay You will achieve exactly the same result (the b-factors are in the 15th field of the _atom_site section in deposited mmCIF files). I'm not an expert in awk, but I'm sure that can be made even shorter ;) It is important to keep in mind that mmCIF files are designed to be usable with grep-like tools, so I don't see any problems in moving forward to that format. Whilst I see a lot of problems in staying with the classic PDB format. Cheers Jose On 05.10.2014 11:18, Tim Gruene wrote: Hi all, reading this beauty I would like to ask a question to the respective developers: Will the PDB format remain the working format for the users and only upon deposition will it be converted to PDBml for archiving purposes, or are the refinement programs (et al.) going to abandon PDB, too? Best, Tim On 10/04/2014 10:32 PM, Ed Pozharski wrote: grep ^ATOM filename.pdb | cut -c 61-66 | awk '{s+=$1;} END {print s/NR;}' Nobody likes a show off, Private Skipper Cheers Sent on a Sprint Samsung Galaxy S� III div Original message /divdivFrom: Chen Zhao c.z...@yale.edu /divdivDate:10/04/2014 4:03 PM (GMT-05:00) /divdivTo: PHENIX user mailing list pheni...@phenix-online.org /divdivSubject: [phenixbb] Calculate average B-factor? /divdiv /divDear all, I am just wondering whether there is a command line tool in phenix that calculates the average B-factor of a PDB file? Can it deal with the ANISOU records (from TLS refinement or not) properly? I looked into previous posts but the --show-adp-statistics option in phenix.pdbtools seems to be no longer available in the version (1.9-1678) I installed. Thank you so much, Chen ___ phenixbb mailing list pheni...@phenix-online.org http://phenix-online.org/mailman/listinfo/phenixbb
Re: [ccp4bb] [phenixbb] Calculate average B-factor?
On Sun, 5 Oct 2014 18:54:16 +0100, Kay Diederichs kay.diederi...@uni-konstanz.de wrote: grep ^ATOM filename.cif | awk '{print $15}' | awk '{s+=$1;} END {print s/NR;}' the problem with this is that it will break if there is any B-factor of 100 or higher, because then the blank will be missing ... sorry, I didn't read properly - this would break with PDB files, but the message was about CIF files - I didn't check their format. Kay
Re: [ccp4bb] [phenixbb] Calculate average B-factor?
Thanks Frances for the explanation. Indeed mmCIF format is a lot more complicated and grep can be a dangerous tool to use with them. But for most cases it can do the job and thus it maintains some sort of backwards compatibility. I can't agree more that using specialised tools (for either PDB files or mmCIF files) that deal with the formats properly is the best solution (see for instance http://mmcif.wwpdb.org/docs/software-resources.html for some of the mmCIF readers). In any case I find it most surprising that this topic came yet again to this BB, when it was thoroughly discussed last year in this thread: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1308L=ccp4bbD=0P=26939 I'm not sure why this kind of urban legends on the evilness of the mmCIF format keep coming back to the list... As explained there and elsewhere endless times, the PDB format is inadequate to represent the complexity of macromolecules and has been needing a replacement for a long time. The decision to move on to mmCIF has been made and in my opinion the sooner we move forward the better. Cheers Jose On 05.10.2014 15:52, Frances C. Bernstein wrote: mmCIF is a very general format with tag-value pairs, and loops so that tags do not need to be repeated endlessly. It was designed so that there is the flexibility of defining new terms easily and presenting the data in any order and with any kind of spacing. I understand that there are 10+ files in cyberspace prepared by the PDB and that they all have the 'same' format. It is tempting to write software that treats these files as fixed format and hope that all software packages that generate coordinate files will use the same fixed format. But that loses the generality and flexibility of mmCIF, and software written that way will fail when some field requires more characters or a new field is added. There are software tools to allow one to read and extract data from any mmCIF file; using these is more complicated than using grep but using these assures that one's software will not fail when it encounters a date file that is not exactly what the PDB is currently producing. Note that mmCIf was defined when the limitations of the fixed-format PDB format became apparent with large structures. Let's not repeat the mistakes of the past. Frances = Bernstein + Sons * * Information Systems Consultants 5 Brewster Lane, Bellport, NY 11713-2803 * * *** *Frances C. Bernstein * *** f...@bernstein-plus-sons.com *** * * *** 1-631-286-1339FAX: 1-631-286-1999 = On Sun, 5 Oct 2014, Tim Gruene wrote: Hi Jose, I see. In the example on page http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Categories/atom_site.html, it is in field 12, though, and I would have thought that mmCIF allows line breaks. But as long as all developers writing PDBx/mmCIF with their programs follow the PDB constraints (``styling plans'' in their FAQ), everything is fine. Cheers, Tim On 10/05/2014 01:13 PM, Jose Manuel Duarte wrote: Well, if you simply replace that beauty by this one: grep ^ATOM filename.cif | awk '{print $15}' | awk '{s+=$1;} END {print s/NR;}' You will achieve exactly the same result (the b-factors are in the 15th field of the _atom_site section in deposited mmCIF files). I'm not an expert in awk, but I'm sure that can be made even shorter ;) It is important to keep in mind that mmCIF files are designed to be usable with grep-like tools, so I don't see any problems in moving forward to that format. Whilst I see a lot of problems in staying with the classic PDB format. Cheers Jose On 05.10.2014 11:18, Tim Gruene wrote: Hi all, reading this beauty I would like to ask a question to the respective developers: Will the PDB format remain the working format for the users and only upon deposition will it be converted to PDBml for archiving purposes, or are the refinement programs (et al.) going to abandon PDB, too? Best, Tim On 10/04/2014 10:32 PM, Ed Pozharski wrote: grep ^ATOM filename.pdb | cut -c 61-66 | awk '{s+=$1;} END {print s/NR;}' Nobody likes a show off, Private Skipper Cheers Sent on a Sprint Samsung Galaxy S? III div Original message /divdivFrom: Chen Zhao c.z...@yale.edu /divdivDate:10/04/2014 4:03 PM (GMT-05:00) /divdivTo: PHENIX user mailing list pheni...@phenix-online.org /divdivSubject: [phenixbb] Calculate average B-factor? /divdiv /divDear all, I am just wondering whether there is a command line tool in phenix that calculates the average B-factor of a PDB file? Can it deal with the ANISOU records (from TLS refinement or not) properly? I looked into previous posts but the --show-adp-statistics option in phenix.pdbtools seems to be no longer available in the