Re: [ccp4bb] thanks god for pdbset

2012-12-05 Thread Robbie Joosten
Hi Peter,

Thanks for the info. I'd better go check whether my code assumes insertion 
codes are not  digits.

Cheers,
Robbie 

> Date: Wed, 5 Dec 2012 17:57:58 +
> From: pkel...@globalphasing.com
> Subject: Re: [ccp4bb] thanks god for pdbset
> To: CCP4BB@JISCMAIL.AC.UK
> 
> Hi Robbie,
> 
> On Wed, 2012-12-05 at 17:02 +0100, Robbie Joosten wrote:
> > Hi Ian,
> > 
> > It's easy to forget about LINK records and such when dealing with the
> > coordinates (I recently had to fix a bug in my own code for that). 
> > The problem with insertion codes is that they are very poorly defined in the
> > PDB standard. Does 128A come before or after 128? There is no strict rule
> > for that, instead they are used in order of appearance. This makes it hard
> > for programmers to stick to agreed standards. Instead people rather ignore
> > insertion codes altogether. They are really poorly soppurted by many
> > programs. Perhaps switching to mmCIF gets rid of the problem.
> 
> Properly used, the PDB exchange dictionary for mmCIF can indeed sort
> this out. In addition to the PDB-style residue number + insertion code,
> it has an item for the residue sequence number in the chain (running
> from 1 .. n). The relevant item names are:
> 
>   _atom_site.pdbx_PDB_residue_no
>   _atom_site.pdbx_PDB_ins_code
> 
> and:
>   _entity_poly_seq.num
> 
> One thing to be careful of, is cases where the insertion code is a digit
> (which does happen sometimes). I have seen code many times where an
> assumption is made that the insertion code is not a digit, and this is
> assumption is used to separate the residue number from the insertion
> code (e.g. a user is asked to enter a residue number + insertion code as
> a single item). If the insertion code is a digit, this won't work.
> 
> This is easy to handle in the fixed-width PDB format:
> 
>85
>851
>852
>86
> 
> but if it gets written to mmCIF incorrectly as:
> 
> loop_
> _atom_site.pdbx_PDB_residue_no
> _atom_site.pdbx_PDB_ins_code
>85  .
>851 .
>852 .
>86  .
> 
> instead of the correct:
> 
> loop_
> _atom_site.pdbx_PDB_residue_no
> _atom_site.pdbx_PDB_ins_code
>85  .
>85  1
>85  2
>86  .
> 
> it can be really hard to sort out later on.
> 
> Regards,
> Peter.
> 
> -- 
> Peter Keller Tel.: +44 (0)1223 353033
> Global Phasing Ltd., Fax.: +44 (0)1223 366889
> Sheraton House,
> Castle Park,
> Cambridge CB3 0AX
> United Kingdom
  

Re: [ccp4bb] thanks god for pdbset

2012-12-05 Thread Robbie Joosten
Hi Ian,

The 'standard' you describe below is more of a suggestion than a rule. The
PDB does not enforce a numbering scheme which is particularly annoying when
dealing with engineered proteins with linkers or domains of different
proteins (they come with all sorts of numbering schemes). Of course, when
you use the ATOM records and distance criteria you should be able to work
out what is connected and where the gaps are. Unfortunately, this is not
always properly implemented in software (I had a nice recent case with a gap
in an insertion in a nucleic acid, that cause problems working out the
connectivity). When dealing with ranges of residues, e.g. in TSL group
descriptions, numbering issues with (or without) insertion codes can be a
real pain because ranges can be somewhat ambiguous.
In theory, it is easy and insertion codes (or other numbering issues) should
not be a problem at all. In practice, as Ed pointed out, it is a big mess. 

Cheers,
Robbie 

> -Original Message-
> From: Ian Tickle [mailto:ianj...@gmail.com]
> Sent: Wednesday, December 05, 2012 17:26
> To: Robbie Joosten
> Cc: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] thanks god for pdbset
> 
> I had always assumed that ASCII sort order was the standard so ' 128A'
comes
> after ' 128 ' in the collating sequence, and indeed the PDB documentation
> seems to make it clear that it comes after, e.g. in the section describing
the
> ATOM record:
> 
> 
>  REFERENCE PROTEIN NUMBERINGHOMOLOGOUS PROTEIN
> NUMBERING
>

---
> --
>  59
59
>  60
60
>  61
>  62
62
> 
>  REFERENCE PROTEIN NUMBERING HOMOLOGOUS PROTEIN
> NUMBERING
>

---
> ---
>  85
85
>  86
86
>
86A
>
86B
>  87
87
> 
> 
> But does it actually matter if the insertion comes before?  Surely the
> sequence is completely defined by the file order, regardless of the
residue
> numbering, not by the alphanumeric sorting order?  So if 86A comes
> immediately before 86 in the file then you must assume that 86A C is
linked
> to 86 N (assuming of course that the bond length is sensible), if after
then it's
> 86 C to 86A N.
> 
> Cheers
> 
> -- Ian
> 
> 
> 
> On 5 December 2012 16:02, Robbie Joosten 
> wrote:
> 
> 
>   Hi Ian,
> 
>   It's easy to forget about LINK records and such when dealing with
the
>   coordinates (I recently had to fix a bug in my own code for that).
>   The problem with insertion codes is that they are very poorly
defined
> in the
>   PDB standard. Does 128A come before or after 128? There is no strict
> rule
>   for that, instead they are used in order of appearance. This makes
it
> hard
>   for programmers to stick to agreed standards. Instead people rather
> ignore
>   insertion codes altogether. They are really poorly soppurted by many
>   programs. Perhaps switching to mmCIF gets rid of the problem.
> 
>   Cheers,
>   Robbie
> 
> 
>   > -----Original Message-
>   > From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On
> Behalf Of
>   > Ian Tickle
>   > Sent: Wednesday, December 05, 2012 16:39
>   > To: CCP4BB@JISCMAIL.AC.UK
>   > Subject: Re: [ccp4bb] thanks god for pdbset
>   >
>   > The last time I tried the pdbset renumber command because of
> issues with
>   > insertion codes in certain programs, it failed to also renumber
the
> LINK,
>   > SSBOND & CISPEP records.  Needless to say, thanking god (or even
> God) was
>   > not my first thought! (more along the lines of "why can't software
>   > developers stick to the agreed standards?").
>   >
>   > I haven't tried it with the latest version, maybe it's fixed now.
>   >
>   > -- Ian
>   >
>   >
>   >
>   > On 5 December 2012 07:58, Francois Berenger
>  wrote:
>   >
>   >
>   >   Especially the renumber command that changes
>   >   residue insertion codes into an increment of
>   >   the impacted residue numbers.
>   >
>   >   Regards,
>   >   F.
>   >
>   >
> 
> 


Re: [ccp4bb] thanks god for pdbset

2012-12-05 Thread Ed Pozharski
On Wed, 2012-12-05 at 17:02 +0100, Robbie Joosten wrote:
> Does 128A come before or after 128? 

Robbie,

shouldn't it simply depend on which residue record comes first in the
pdb file?

Ed.

-- 
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs


Re: [ccp4bb] thanks god for pdbset

2012-12-05 Thread Peter Keller
Hi Robbie,

On Wed, 2012-12-05 at 17:02 +0100, Robbie Joosten wrote:
> Hi Ian,
> 
> It's easy to forget about LINK records and such when dealing with the
> coordinates (I recently had to fix a bug in my own code for that). 
> The problem with insertion codes is that they are very poorly defined in the
> PDB standard. Does 128A come before or after 128? There is no strict rule
> for that, instead they are used in order of appearance. This makes it hard
> for programmers to stick to agreed standards. Instead people rather ignore
> insertion codes altogether. They are really poorly soppurted by many
> programs. Perhaps switching to mmCIF gets rid of the problem.

Properly used, the PDB exchange dictionary for mmCIF can indeed sort
this out. In addition to the PDB-style residue number + insertion code,
it has an item for the residue sequence number in the chain (running
from 1 .. n). The relevant item names are:

  _atom_site.pdbx_PDB_residue_no
  _atom_site.pdbx_PDB_ins_code

and:
  _entity_poly_seq.num

One thing to be careful of, is cases where the insertion code is a digit
(which does happen sometimes). I have seen code many times where an
assumption is made that the insertion code is not a digit, and this is
assumption is used to separate the residue number from the insertion
code (e.g. a user is asked to enter a residue number + insertion code as
a single item). If the insertion code is a digit, this won't work.

This is easy to handle in the fixed-width PDB format:

   85
   851
   852
   86

but if it gets written to mmCIF incorrectly as:

loop_
_atom_site.pdbx_PDB_residue_no
_atom_site.pdbx_PDB_ins_code
   85  .
   851 .
   852 .
   86  .

instead of the correct:

loop_
_atom_site.pdbx_PDB_residue_no
_atom_site.pdbx_PDB_ins_code
   85  .
   85  1
   85  2
   86  .

it can be really hard to sort out later on.

Regards,
Peter.

-- 
Peter Keller Tel.: +44 (0)1223 353033
Global Phasing Ltd., Fax.: +44 (0)1223 366889
Sheraton House,
Castle Park,
Cambridge CB3 0AX
United Kingdom


Re: [ccp4bb] thanks god for pdbset

2012-12-05 Thread Ian Tickle
I had always assumed that ASCII sort order was the standard so ' 128A'
comes after ' 128 ' in the collating sequence, and indeed the PDB
documentation seems to make it clear that it comes after, e.g. in the
section describing the ATOM record:


 REFERENCE PROTEIN NUMBERINGHOMOLOGOUS PROTEIN NUMBERING

-
 59
  59
 60
 60
 61
 62
 62

 REFERENCE PROTEIN NUMBERING HOMOLOGOUS PROTEIN NUMBERING

--
 85
 85
 86
 86

  86A

   86B
 87
87


But does it actually matter if the insertion comes before?  Surely the
sequence is completely defined by the file order, regardless of the residue
numbering, not by the alphanumeric sorting order?  So if 86A comes
immediately before 86 in the file then you must assume that 86A C is linked
to 86 N (assuming of course that the bond length is sensible), if after
then it's 86 C to 86A N.

Cheers

-- Ian


On 5 December 2012 16:02, Robbie Joosten  wrote:

> Hi Ian,
>
> It's easy to forget about LINK records and such when dealing with the
> coordinates (I recently had to fix a bug in my own code for that).
> The problem with insertion codes is that they are very poorly defined in
> the
> PDB standard. Does 128A come before or after 128? There is no strict rule
> for that, instead they are used in order of appearance. This makes it hard
> for programmers to stick to agreed standards. Instead people rather ignore
> insertion codes altogether. They are really poorly soppurted by many
> programs. Perhaps switching to mmCIF gets rid of the problem.
>
> Cheers,
> Robbie
>
> > -Original Message-
> > From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of
> > Ian Tickle
> > Sent: Wednesday, December 05, 2012 16:39
> > To: CCP4BB@JISCMAIL.AC.UK
> > Subject: Re: [ccp4bb] thanks god for pdbset
> >
> > The last time I tried the pdbset renumber command because of issues with
> > insertion codes in certain programs, it failed to also renumber the LINK,
> > SSBOND & CISPEP records.  Needless to say, thanking god (or even God) was
> > not my first thought! (more along the lines of "why can't software
> > developers stick to the agreed standards?").
> >
> > I haven't tried it with the latest version, maybe it's fixed now.
> >
> > -- Ian
> >
> >
> >
> > On 5 December 2012 07:58, Francois Berenger  wrote:
> >
> >
> >   Especially the renumber command that changes
> >   residue insertion codes into an increment of
> >   the impacted residue numbers.
> >
> >   Regards,
> >   F.
> >
> >
>


Re: [ccp4bb] thanks god for pdbset

2012-12-05 Thread Robbie Joosten
Hi Ian,

It's easy to forget about LINK records and such when dealing with the
coordinates (I recently had to fix a bug in my own code for that). 
The problem with insertion codes is that they are very poorly defined in the
PDB standard. Does 128A come before or after 128? There is no strict rule
for that, instead they are used in order of appearance. This makes it hard
for programmers to stick to agreed standards. Instead people rather ignore
insertion codes altogether. They are really poorly soppurted by many
programs. Perhaps switching to mmCIF gets rid of the problem.

Cheers,
Robbie

> -Original Message-
> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of
> Ian Tickle
> Sent: Wednesday, December 05, 2012 16:39
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] thanks god for pdbset
> 
> The last time I tried the pdbset renumber command because of issues with
> insertion codes in certain programs, it failed to also renumber the LINK,
> SSBOND & CISPEP records.  Needless to say, thanking god (or even God) was
> not my first thought! (more along the lines of "why can't software
> developers stick to the agreed standards?").
> 
> I haven't tried it with the latest version, maybe it's fixed now.
> 
> -- Ian
> 
> 
> 
> On 5 December 2012 07:58, Francois Berenger  wrote:
> 
> 
>   Especially the renumber command that changes
>   residue insertion codes into an increment of
>   the impacted residue numbers.
> 
>   Regards,
>   F.
> 
> 


Re: [ccp4bb] thanks god for pdbset

2012-12-05 Thread Ian Tickle
The last time I tried the pdbset renumber command because of issues with
insertion codes in certain programs, it failed to also renumber the LINK,
SSBOND & CISPEP records.  Needless to say, thanking god (or even God) was
not my first thought! (more along the lines of "why can't software
developers stick to the agreed standards?").

I haven't tried it with the latest version, maybe it's fixed now.

-- Ian


On 5 December 2012 07:58, Francois Berenger  wrote:

> Especially the renumber command that changes
> residue insertion codes into an increment of
> the impacted residue numbers.
>
> Regards,
> F.
>


Re: [ccp4bb] thanks god for pdbset

2012-12-05 Thread Phil Evans
not god & I don't think I wrote that bit!

Phil

On 5 Dec 2012, at 15:06, Ed Pozharski wrote:

> Francois,
> 
> I did not realize Phil Evans is god (perhaps a minor one as he did not
> yet earn a capital G).
> 
> I do concur that insertion code is evil.  I had to re-refine an old
> antibody structure recently and it messes up coot sequence window and
> breaks refmac bond restraints.  Evil, evil,.evil.
> 
> Cheers,
> 
> Ed.
> 
> On Wed, 2012-12-05 at 16:58 +0900, Francois Berenger wrote:
>> Especially the renumber command that changes
>> residue insertion codes into an increment of
>> the impacted residue numbers.
>> 
>> Regards,
>> F.
> 
> -- 
> "I'd jump in myself, if I weren't so good at whistling."
>   Julian, King of Lemurs


Re: [ccp4bb] thanks god for pdbset

2012-12-05 Thread Ed Pozharski
Francois,

I did not realize Phil Evans is god (perhaps a minor one as he did not
yet earn a capital G).

I do concur that insertion code is evil.  I had to re-refine an old
antibody structure recently and it messes up coot sequence window and
breaks refmac bond restraints.  Evil, evil,.evil.

Cheers,

Ed.

On Wed, 2012-12-05 at 16:58 +0900, Francois Berenger wrote:
> Especially the renumber command that changes
> residue insertion codes into an increment of
> the impacted residue numbers.
> 
> Regards,
> F.

-- 
"I'd jump in myself, if I weren't so good at whistling."
   Julian, King of Lemurs


[ccp4bb] thanks god for pdbset

2012-12-04 Thread Francois Berenger

Especially the renumber command that changes
residue insertion codes into an increment of
the impacted residue numbers.

Regards,
F.