Re: [ccp4bb] thanks god for pdbset
Hi Peter, Thanks for the info. I'd better go check whether my code assumes insertion codes are not digits. Cheers, Robbie > Date: Wed, 5 Dec 2012 17:57:58 + > From: pkel...@globalphasing.com > Subject: Re: [ccp4bb] thanks god for pdbset > To: CCP4BB@JISCMAIL.AC.UK > > Hi Robbie, > > On Wed, 2012-12-05 at 17:02 +0100, Robbie Joosten wrote: > > Hi Ian, > > > > It's easy to forget about LINK records and such when dealing with the > > coordinates (I recently had to fix a bug in my own code for that). > > The problem with insertion codes is that they are very poorly defined in the > > PDB standard. Does 128A come before or after 128? There is no strict rule > > for that, instead they are used in order of appearance. This makes it hard > > for programmers to stick to agreed standards. Instead people rather ignore > > insertion codes altogether. They are really poorly soppurted by many > > programs. Perhaps switching to mmCIF gets rid of the problem. > > Properly used, the PDB exchange dictionary for mmCIF can indeed sort > this out. In addition to the PDB-style residue number + insertion code, > it has an item for the residue sequence number in the chain (running > from 1 .. n). The relevant item names are: > > _atom_site.pdbx_PDB_residue_no > _atom_site.pdbx_PDB_ins_code > > and: > _entity_poly_seq.num > > One thing to be careful of, is cases where the insertion code is a digit > (which does happen sometimes). I have seen code many times where an > assumption is made that the insertion code is not a digit, and this is > assumption is used to separate the residue number from the insertion > code (e.g. a user is asked to enter a residue number + insertion code as > a single item). If the insertion code is a digit, this won't work. > > This is easy to handle in the fixed-width PDB format: > >85 >851 >852 >86 > > but if it gets written to mmCIF incorrectly as: > > loop_ > _atom_site.pdbx_PDB_residue_no > _atom_site.pdbx_PDB_ins_code >85 . >851 . >852 . >86 . > > instead of the correct: > > loop_ > _atom_site.pdbx_PDB_residue_no > _atom_site.pdbx_PDB_ins_code >85 . >85 1 >85 2 >86 . > > it can be really hard to sort out later on. > > Regards, > Peter. > > -- > Peter Keller Tel.: +44 (0)1223 353033 > Global Phasing Ltd., Fax.: +44 (0)1223 366889 > Sheraton House, > Castle Park, > Cambridge CB3 0AX > United Kingdom
Re: [ccp4bb] thanks god for pdbset
Hi Ian, The 'standard' you describe below is more of a suggestion than a rule. The PDB does not enforce a numbering scheme which is particularly annoying when dealing with engineered proteins with linkers or domains of different proteins (they come with all sorts of numbering schemes). Of course, when you use the ATOM records and distance criteria you should be able to work out what is connected and where the gaps are. Unfortunately, this is not always properly implemented in software (I had a nice recent case with a gap in an insertion in a nucleic acid, that cause problems working out the connectivity). When dealing with ranges of residues, e.g. in TSL group descriptions, numbering issues with (or without) insertion codes can be a real pain because ranges can be somewhat ambiguous. In theory, it is easy and insertion codes (or other numbering issues) should not be a problem at all. In practice, as Ed pointed out, it is a big mess. Cheers, Robbie > -Original Message- > From: Ian Tickle [mailto:ianj...@gmail.com] > Sent: Wednesday, December 05, 2012 17:26 > To: Robbie Joosten > Cc: CCP4BB@JISCMAIL.AC.UK > Subject: Re: [ccp4bb] thanks god for pdbset > > I had always assumed that ASCII sort order was the standard so ' 128A' comes > after ' 128 ' in the collating sequence, and indeed the PDB documentation > seems to make it clear that it comes after, e.g. in the section describing the > ATOM record: > > > REFERENCE PROTEIN NUMBERINGHOMOLOGOUS PROTEIN > NUMBERING > --- > -- > 59 59 > 60 60 > 61 > 62 62 > > REFERENCE PROTEIN NUMBERING HOMOLOGOUS PROTEIN > NUMBERING > --- > --- > 85 85 > 86 86 > 86A > 86B > 87 87 > > > But does it actually matter if the insertion comes before? Surely the > sequence is completely defined by the file order, regardless of the residue > numbering, not by the alphanumeric sorting order? So if 86A comes > immediately before 86 in the file then you must assume that 86A C is linked > to 86 N (assuming of course that the bond length is sensible), if after then it's > 86 C to 86A N. > > Cheers > > -- Ian > > > > On 5 December 2012 16:02, Robbie Joosten > wrote: > > > Hi Ian, > > It's easy to forget about LINK records and such when dealing with the > coordinates (I recently had to fix a bug in my own code for that). > The problem with insertion codes is that they are very poorly defined > in the > PDB standard. Does 128A come before or after 128? There is no strict > rule > for that, instead they are used in order of appearance. This makes it > hard > for programmers to stick to agreed standards. Instead people rather > ignore > insertion codes altogether. They are really poorly soppurted by many > programs. Perhaps switching to mmCIF gets rid of the problem. > > Cheers, > Robbie > > > > -----Original Message- > > From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On > Behalf Of > > Ian Tickle > > Sent: Wednesday, December 05, 2012 16:39 > > To: CCP4BB@JISCMAIL.AC.UK > > Subject: Re: [ccp4bb] thanks god for pdbset > > > > The last time I tried the pdbset renumber command because of > issues with > > insertion codes in certain programs, it failed to also renumber the > LINK, > > SSBOND & CISPEP records. Needless to say, thanking god (or even > God) was > > not my first thought! (more along the lines of "why can't software > > developers stick to the agreed standards?"). > > > > I haven't tried it with the latest version, maybe it's fixed now. > > > > -- Ian > > > > > > > > On 5 December 2012 07:58, Francois Berenger > wrote: > > > > > > Especially the renumber command that changes > > residue insertion codes into an increment of > > the impacted residue numbers. > > > > Regards, > > F. > > > > > >
Re: [ccp4bb] thanks god for pdbset
On Wed, 2012-12-05 at 17:02 +0100, Robbie Joosten wrote: > Does 128A come before or after 128? Robbie, shouldn't it simply depend on which residue record comes first in the pdb file? Ed. -- Oh, suddenly throwing a giraffe into a volcano to make water is crazy? Julian, King of Lemurs
Re: [ccp4bb] thanks god for pdbset
Hi Robbie, On Wed, 2012-12-05 at 17:02 +0100, Robbie Joosten wrote: > Hi Ian, > > It's easy to forget about LINK records and such when dealing with the > coordinates (I recently had to fix a bug in my own code for that). > The problem with insertion codes is that they are very poorly defined in the > PDB standard. Does 128A come before or after 128? There is no strict rule > for that, instead they are used in order of appearance. This makes it hard > for programmers to stick to agreed standards. Instead people rather ignore > insertion codes altogether. They are really poorly soppurted by many > programs. Perhaps switching to mmCIF gets rid of the problem. Properly used, the PDB exchange dictionary for mmCIF can indeed sort this out. In addition to the PDB-style residue number + insertion code, it has an item for the residue sequence number in the chain (running from 1 .. n). The relevant item names are: _atom_site.pdbx_PDB_residue_no _atom_site.pdbx_PDB_ins_code and: _entity_poly_seq.num One thing to be careful of, is cases where the insertion code is a digit (which does happen sometimes). I have seen code many times where an assumption is made that the insertion code is not a digit, and this is assumption is used to separate the residue number from the insertion code (e.g. a user is asked to enter a residue number + insertion code as a single item). If the insertion code is a digit, this won't work. This is easy to handle in the fixed-width PDB format: 85 851 852 86 but if it gets written to mmCIF incorrectly as: loop_ _atom_site.pdbx_PDB_residue_no _atom_site.pdbx_PDB_ins_code 85 . 851 . 852 . 86 . instead of the correct: loop_ _atom_site.pdbx_PDB_residue_no _atom_site.pdbx_PDB_ins_code 85 . 85 1 85 2 86 . it can be really hard to sort out later on. Regards, Peter. -- Peter Keller Tel.: +44 (0)1223 353033 Global Phasing Ltd., Fax.: +44 (0)1223 366889 Sheraton House, Castle Park, Cambridge CB3 0AX United Kingdom
Re: [ccp4bb] thanks god for pdbset
I had always assumed that ASCII sort order was the standard so ' 128A' comes after ' 128 ' in the collating sequence, and indeed the PDB documentation seems to make it clear that it comes after, e.g. in the section describing the ATOM record: REFERENCE PROTEIN NUMBERINGHOMOLOGOUS PROTEIN NUMBERING - 59 59 60 60 61 62 62 REFERENCE PROTEIN NUMBERING HOMOLOGOUS PROTEIN NUMBERING -- 85 85 86 86 86A 86B 87 87 But does it actually matter if the insertion comes before? Surely the sequence is completely defined by the file order, regardless of the residue numbering, not by the alphanumeric sorting order? So if 86A comes immediately before 86 in the file then you must assume that 86A C is linked to 86 N (assuming of course that the bond length is sensible), if after then it's 86 C to 86A N. Cheers -- Ian On 5 December 2012 16:02, Robbie Joosten wrote: > Hi Ian, > > It's easy to forget about LINK records and such when dealing with the > coordinates (I recently had to fix a bug in my own code for that). > The problem with insertion codes is that they are very poorly defined in > the > PDB standard. Does 128A come before or after 128? There is no strict rule > for that, instead they are used in order of appearance. This makes it hard > for programmers to stick to agreed standards. Instead people rather ignore > insertion codes altogether. They are really poorly soppurted by many > programs. Perhaps switching to mmCIF gets rid of the problem. > > Cheers, > Robbie > > > -Original Message- > > From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of > > Ian Tickle > > Sent: Wednesday, December 05, 2012 16:39 > > To: CCP4BB@JISCMAIL.AC.UK > > Subject: Re: [ccp4bb] thanks god for pdbset > > > > The last time I tried the pdbset renumber command because of issues with > > insertion codes in certain programs, it failed to also renumber the LINK, > > SSBOND & CISPEP records. Needless to say, thanking god (or even God) was > > not my first thought! (more along the lines of "why can't software > > developers stick to the agreed standards?"). > > > > I haven't tried it with the latest version, maybe it's fixed now. > > > > -- Ian > > > > > > > > On 5 December 2012 07:58, Francois Berenger wrote: > > > > > > Especially the renumber command that changes > > residue insertion codes into an increment of > > the impacted residue numbers. > > > > Regards, > > F. > > > > >
Re: [ccp4bb] thanks god for pdbset
Hi Ian, It's easy to forget about LINK records and such when dealing with the coordinates (I recently had to fix a bug in my own code for that). The problem with insertion codes is that they are very poorly defined in the PDB standard. Does 128A come before or after 128? There is no strict rule for that, instead they are used in order of appearance. This makes it hard for programmers to stick to agreed standards. Instead people rather ignore insertion codes altogether. They are really poorly soppurted by many programs. Perhaps switching to mmCIF gets rid of the problem. Cheers, Robbie > -Original Message- > From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of > Ian Tickle > Sent: Wednesday, December 05, 2012 16:39 > To: CCP4BB@JISCMAIL.AC.UK > Subject: Re: [ccp4bb] thanks god for pdbset > > The last time I tried the pdbset renumber command because of issues with > insertion codes in certain programs, it failed to also renumber the LINK, > SSBOND & CISPEP records. Needless to say, thanking god (or even God) was > not my first thought! (more along the lines of "why can't software > developers stick to the agreed standards?"). > > I haven't tried it with the latest version, maybe it's fixed now. > > -- Ian > > > > On 5 December 2012 07:58, Francois Berenger wrote: > > > Especially the renumber command that changes > residue insertion codes into an increment of > the impacted residue numbers. > > Regards, > F. > >
Re: [ccp4bb] thanks god for pdbset
The last time I tried the pdbset renumber command because of issues with insertion codes in certain programs, it failed to also renumber the LINK, SSBOND & CISPEP records. Needless to say, thanking god (or even God) was not my first thought! (more along the lines of "why can't software developers stick to the agreed standards?"). I haven't tried it with the latest version, maybe it's fixed now. -- Ian On 5 December 2012 07:58, Francois Berenger wrote: > Especially the renumber command that changes > residue insertion codes into an increment of > the impacted residue numbers. > > Regards, > F. >
Re: [ccp4bb] thanks god for pdbset
not god & I don't think I wrote that bit! Phil On 5 Dec 2012, at 15:06, Ed Pozharski wrote: > Francois, > > I did not realize Phil Evans is god (perhaps a minor one as he did not > yet earn a capital G). > > I do concur that insertion code is evil. I had to re-refine an old > antibody structure recently and it messes up coot sequence window and > breaks refmac bond restraints. Evil, evil,.evil. > > Cheers, > > Ed. > > On Wed, 2012-12-05 at 16:58 +0900, Francois Berenger wrote: >> Especially the renumber command that changes >> residue insertion codes into an increment of >> the impacted residue numbers. >> >> Regards, >> F. > > -- > "I'd jump in myself, if I weren't so good at whistling." > Julian, King of Lemurs
Re: [ccp4bb] thanks god for pdbset
Francois, I did not realize Phil Evans is god (perhaps a minor one as he did not yet earn a capital G). I do concur that insertion code is evil. I had to re-refine an old antibody structure recently and it messes up coot sequence window and breaks refmac bond restraints. Evil, evil,.evil. Cheers, Ed. On Wed, 2012-12-05 at 16:58 +0900, Francois Berenger wrote: > Especially the renumber command that changes > residue insertion codes into an increment of > the impacted residue numbers. > > Regards, > F. -- "I'd jump in myself, if I weren't so good at whistling." Julian, King of Lemurs
[ccp4bb] thanks god for pdbset
Especially the renumber command that changes residue insertion codes into an increment of the impacted residue numbers. Regards, F.