Re: [ccp4bb] thanks god for pdbset

2012-12-05 Thread Ed Pozharski
Francois,

I did not realize Phil Evans is god (perhaps a minor one as he did not
yet earn a capital G).

I do concur that insertion code is evil.  I had to re-refine an old
antibody structure recently and it messes up coot sequence window and
breaks refmac bond restraints.  Evil, evil,.evil.

Cheers,

Ed.

On Wed, 2012-12-05 at 16:58 +0900, Francois Berenger wrote:
 Especially the renumber command that changes
 residue insertion codes into an increment of
 the impacted residue numbers.
 
 Regards,
 F.

-- 
I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs


Re: [ccp4bb] thanks god for pdbset

2012-12-05 Thread Phil Evans
not god  I don't think I wrote that bit!

Phil

On 5 Dec 2012, at 15:06, Ed Pozharski wrote:

 Francois,
 
 I did not realize Phil Evans is god (perhaps a minor one as he did not
 yet earn a capital G).
 
 I do concur that insertion code is evil.  I had to re-refine an old
 antibody structure recently and it messes up coot sequence window and
 breaks refmac bond restraints.  Evil, evil,.evil.
 
 Cheers,
 
 Ed.
 
 On Wed, 2012-12-05 at 16:58 +0900, Francois Berenger wrote:
 Especially the renumber command that changes
 residue insertion codes into an increment of
 the impacted residue numbers.
 
 Regards,
 F.
 
 -- 
 I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs


Re: [ccp4bb] thanks god for pdbset

2012-12-05 Thread Ian Tickle
The last time I tried the pdbset renumber command because of issues with
insertion codes in certain programs, it failed to also renumber the LINK,
SSBOND  CISPEP records.  Needless to say, thanking god (or even God) was
not my first thought! (more along the lines of why can't software
developers stick to the agreed standards?).

I haven't tried it with the latest version, maybe it's fixed now.

-- Ian


On 5 December 2012 07:58, Francois Berenger beren...@riken.jp wrote:

 Especially the renumber command that changes
 residue insertion codes into an increment of
 the impacted residue numbers.

 Regards,
 F.



Re: [ccp4bb] thanks god for pdbset

2012-12-05 Thread Robbie Joosten
Hi Ian,

It's easy to forget about LINK records and such when dealing with the
coordinates (I recently had to fix a bug in my own code for that). 
The problem with insertion codes is that they are very poorly defined in the
PDB standard. Does 128A come before or after 128? There is no strict rule
for that, instead they are used in order of appearance. This makes it hard
for programmers to stick to agreed standards. Instead people rather ignore
insertion codes altogether. They are really poorly soppurted by many
programs. Perhaps switching to mmCIF gets rid of the problem.

Cheers,
Robbie

 -Original Message-
 From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of
 Ian Tickle
 Sent: Wednesday, December 05, 2012 16:39
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] thanks god for pdbset
 
 The last time I tried the pdbset renumber command because of issues with
 insertion codes in certain programs, it failed to also renumber the LINK,
 SSBOND  CISPEP records.  Needless to say, thanking god (or even God) was
 not my first thought! (more along the lines of why can't software
 developers stick to the agreed standards?).
 
 I haven't tried it with the latest version, maybe it's fixed now.
 
 -- Ian
 
 
 
 On 5 December 2012 07:58, Francois Berenger beren...@riken.jp wrote:
 
 
   Especially the renumber command that changes
   residue insertion codes into an increment of
   the impacted residue numbers.
 
   Regards,
   F.
 
 


Re: [ccp4bb] thanks god for pdbset

2012-12-05 Thread Ian Tickle
I had always assumed that ASCII sort order was the standard so ' 128A'
comes after ' 128 ' in the collating sequence, and indeed the PDB
documentation seems to make it clear that it comes after, e.g. in the
section describing the ATOM record:


 REFERENCE PROTEIN NUMBERINGHOMOLOGOUS PROTEIN NUMBERING

-
 59
  59
 60
 60
 61
 62
 62

 REFERENCE PROTEIN NUMBERING HOMOLOGOUS PROTEIN NUMBERING

--
 85
 85
 86
 86

  86A

   86B
 87
87


But does it actually matter if the insertion comes before?  Surely the
sequence is completely defined by the file order, regardless of the residue
numbering, not by the alphanumeric sorting order?  So if 86A comes
immediately before 86 in the file then you must assume that 86A C is linked
to 86 N (assuming of course that the bond length is sensible), if after
then it's 86 C to 86A N.

Cheers

-- Ian


On 5 December 2012 16:02, Robbie Joosten robbie_joos...@hotmail.com wrote:

 Hi Ian,

 It's easy to forget about LINK records and such when dealing with the
 coordinates (I recently had to fix a bug in my own code for that).
 The problem with insertion codes is that they are very poorly defined in
 the
 PDB standard. Does 128A come before or after 128? There is no strict rule
 for that, instead they are used in order of appearance. This makes it hard
 for programmers to stick to agreed standards. Instead people rather ignore
 insertion codes altogether. They are really poorly soppurted by many
 programs. Perhaps switching to mmCIF gets rid of the problem.

 Cheers,
 Robbie

  -Original Message-
  From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of
  Ian Tickle
  Sent: Wednesday, December 05, 2012 16:39
  To: CCP4BB@JISCMAIL.AC.UK
  Subject: Re: [ccp4bb] thanks god for pdbset
 
  The last time I tried the pdbset renumber command because of issues with
  insertion codes in certain programs, it failed to also renumber the LINK,
  SSBOND  CISPEP records.  Needless to say, thanking god (or even God) was
  not my first thought! (more along the lines of why can't software
  developers stick to the agreed standards?).
 
  I haven't tried it with the latest version, maybe it's fixed now.
 
  -- Ian
 
 
 
  On 5 December 2012 07:58, Francois Berenger beren...@riken.jp wrote:
 
 
Especially the renumber command that changes
residue insertion codes into an increment of
the impacted residue numbers.
 
Regards,
F.
 
 



Re: [ccp4bb] thanks god for pdbset

2012-12-05 Thread Peter Keller
Hi Robbie,

On Wed, 2012-12-05 at 17:02 +0100, Robbie Joosten wrote:
 Hi Ian,
 
 It's easy to forget about LINK records and such when dealing with the
 coordinates (I recently had to fix a bug in my own code for that). 
 The problem with insertion codes is that they are very poorly defined in the
 PDB standard. Does 128A come before or after 128? There is no strict rule
 for that, instead they are used in order of appearance. This makes it hard
 for programmers to stick to agreed standards. Instead people rather ignore
 insertion codes altogether. They are really poorly soppurted by many
 programs. Perhaps switching to mmCIF gets rid of the problem.

Properly used, the PDB exchange dictionary for mmCIF can indeed sort
this out. In addition to the PDB-style residue number + insertion code,
it has an item for the residue sequence number in the chain (running
from 1 .. n). The relevant item names are:

  _atom_site.pdbx_PDB_residue_no
  _atom_site.pdbx_PDB_ins_code

and:
  _entity_poly_seq.num

One thing to be careful of, is cases where the insertion code is a digit
(which does happen sometimes). I have seen code many times where an
assumption is made that the insertion code is not a digit, and this is
assumption is used to separate the residue number from the insertion
code (e.g. a user is asked to enter a residue number + insertion code as
a single item). If the insertion code is a digit, this won't work.

This is easy to handle in the fixed-width PDB format:

   85
   851
   852
   86

but if it gets written to mmCIF incorrectly as:

loop_
_atom_site.pdbx_PDB_residue_no
_atom_site.pdbx_PDB_ins_code
   85  .
   851 .
   852 .
   86  .

instead of the correct:

loop_
_atom_site.pdbx_PDB_residue_no
_atom_site.pdbx_PDB_ins_code
   85  .
   85  1
   85  2
   86  .

it can be really hard to sort out later on.

Regards,
Peter.

-- 
Peter Keller Tel.: +44 (0)1223 353033
Global Phasing Ltd., Fax.: +44 (0)1223 366889
Sheraton House,
Castle Park,
Cambridge CB3 0AX
United Kingdom


Re: [ccp4bb] thanks god for pdbset

2012-12-05 Thread Ed Pozharski
On Wed, 2012-12-05 at 17:02 +0100, Robbie Joosten wrote:
 Does 128A come before or after 128? 

Robbie,

shouldn't it simply depend on which residue record comes first in the
pdb file?

Ed.

-- 
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs


Re: [ccp4bb] thanks god for pdbset

2012-12-05 Thread Robbie Joosten
Hi Ian,

The 'standard' you describe below is more of a suggestion than a rule. The
PDB does not enforce a numbering scheme which is particularly annoying when
dealing with engineered proteins with linkers or domains of different
proteins (they come with all sorts of numbering schemes). Of course, when
you use the ATOM records and distance criteria you should be able to work
out what is connected and where the gaps are. Unfortunately, this is not
always properly implemented in software (I had a nice recent case with a gap
in an insertion in a nucleic acid, that cause problems working out the
connectivity). When dealing with ranges of residues, e.g. in TSL group
descriptions, numbering issues with (or without) insertion codes can be a
real pain because ranges can be somewhat ambiguous.
In theory, it is easy and insertion codes (or other numbering issues) should
not be a problem at all. In practice, as Ed pointed out, it is a big mess. 

Cheers,
Robbie 

 -Original Message-
 From: Ian Tickle [mailto:ianj...@gmail.com]
 Sent: Wednesday, December 05, 2012 17:26
 To: Robbie Joosten
 Cc: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] thanks god for pdbset
 
 I had always assumed that ASCII sort order was the standard so ' 128A'
comes
 after ' 128 ' in the collating sequence, and indeed the PDB documentation
 seems to make it clear that it comes after, e.g. in the section describing
the
 ATOM record:
 
 
  REFERENCE PROTEIN NUMBERINGHOMOLOGOUS PROTEIN
 NUMBERING


---
 --
  59
59
  60
60
  61
  62
62
 
  REFERENCE PROTEIN NUMBERING HOMOLOGOUS PROTEIN
 NUMBERING


---
 ---
  85
85
  86
86

86A

86B
  87
87
 
 
 But does it actually matter if the insertion comes before?  Surely the
 sequence is completely defined by the file order, regardless of the
residue
 numbering, not by the alphanumeric sorting order?  So if 86A comes
 immediately before 86 in the file then you must assume that 86A C is
linked
 to 86 N (assuming of course that the bond length is sensible), if after
then it's
 86 C to 86A N.
 
 Cheers
 
 -- Ian
 
 
 
 On 5 December 2012 16:02, Robbie Joosten robbie_joos...@hotmail.com
 wrote:
 
 
   Hi Ian,
 
   It's easy to forget about LINK records and such when dealing with
the
   coordinates (I recently had to fix a bug in my own code for that).
   The problem with insertion codes is that they are very poorly
defined
 in the
   PDB standard. Does 128A come before or after 128? There is no strict
 rule
   for that, instead they are used in order of appearance. This makes
it
 hard
   for programmers to stick to agreed standards. Instead people rather
 ignore
   insertion codes altogether. They are really poorly soppurted by many
   programs. Perhaps switching to mmCIF gets rid of the problem.
 
   Cheers,
   Robbie
 
 
-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On
 Behalf Of
Ian Tickle
Sent: Wednesday, December 05, 2012 16:39
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] thanks god for pdbset
   
The last time I tried the pdbset renumber command because of
 issues with
insertion codes in certain programs, it failed to also renumber
the
 LINK,
SSBOND  CISPEP records.  Needless to say, thanking god (or even
 God) was
not my first thought! (more along the lines of why can't software
developers stick to the agreed standards?).
   
I haven't tried it with the latest version, maybe it's fixed now.
   
-- Ian
   
   
   
On 5 December 2012 07:58, Francois Berenger
 beren...@riken.jp wrote:
   
   
  Especially the renumber command that changes
  residue insertion codes into an increment of
  the impacted residue numbers.
   
  Regards,
  F.
   
   
 
 


Re: [ccp4bb] thanks god for pdbset

2012-12-05 Thread Robbie Joosten
Hi Peter,

Thanks for the info. I'd better go check whether my code assumes insertion 
codes are not  digits.

Cheers,
Robbie 

 Date: Wed, 5 Dec 2012 17:57:58 +
 From: pkel...@globalphasing.com
 Subject: Re: [ccp4bb] thanks god for pdbset
 To: CCP4BB@JISCMAIL.AC.UK
 
 Hi Robbie,
 
 On Wed, 2012-12-05 at 17:02 +0100, Robbie Joosten wrote:
  Hi Ian,
  
  It's easy to forget about LINK records and such when dealing with the
  coordinates (I recently had to fix a bug in my own code for that). 
  The problem with insertion codes is that they are very poorly defined in the
  PDB standard. Does 128A come before or after 128? There is no strict rule
  for that, instead they are used in order of appearance. This makes it hard
  for programmers to stick to agreed standards. Instead people rather ignore
  insertion codes altogether. They are really poorly soppurted by many
  programs. Perhaps switching to mmCIF gets rid of the problem.
 
 Properly used, the PDB exchange dictionary for mmCIF can indeed sort
 this out. In addition to the PDB-style residue number + insertion code,
 it has an item for the residue sequence number in the chain (running
 from 1 .. n). The relevant item names are:
 
   _atom_site.pdbx_PDB_residue_no
   _atom_site.pdbx_PDB_ins_code
 
 and:
   _entity_poly_seq.num
 
 One thing to be careful of, is cases where the insertion code is a digit
 (which does happen sometimes). I have seen code many times where an
 assumption is made that the insertion code is not a digit, and this is
 assumption is used to separate the residue number from the insertion
 code (e.g. a user is asked to enter a residue number + insertion code as
 a single item). If the insertion code is a digit, this won't work.
 
 This is easy to handle in the fixed-width PDB format:
 
85
851
852
86
 
 but if it gets written to mmCIF incorrectly as:
 
 loop_
 _atom_site.pdbx_PDB_residue_no
 _atom_site.pdbx_PDB_ins_code
85  .
851 .
852 .
86  .
 
 instead of the correct:
 
 loop_
 _atom_site.pdbx_PDB_residue_no
 _atom_site.pdbx_PDB_ins_code
85  .
85  1
85  2
86  .
 
 it can be really hard to sort out later on.
 
 Regards,
 Peter.
 
 -- 
 Peter Keller Tel.: +44 (0)1223 353033
 Global Phasing Ltd., Fax.: +44 (0)1223 366889
 Sheraton House,
 Castle Park,
 Cambridge CB3 0AX
 United Kingdom