Re: [COOT] pdb file problem with duplicate amino acid

2017-02-01 Thread Dale Tronrud


On 2/1/2017 4:58 AM, Paul Emsley wrote:
> On 31/01/2017 20:09, Dale Tronrud wrote:
>> On 1/31/2017 11:51 AM, Paul Emsley wrote:
>>> On 31/01/17 17:54, Edwin Pozharski wrote:
 Whatever the rationale was, there is a structure in the PDB that has
 alternate conformer of a residue listed with different residue type -
 A is arginine and B is glutamine.  Coot fails to load the model
 complaining in the command window

 WARNING:: Error reading small-molecule cif "/home/epo/coot/foo.pdb"
 There was an error reading /home/epo/coot/foo.pdb.
 ERROR 42 READ: Duplicate sequence number and insertion code.
  LINE #1571
  ATOM   1666  N  BGLN B  93  24.448  28.340 -33.325  0.50
 9.34   N

 No Spacegroup found for this PDB file
 There was a coordinates read error

>>>
>>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1503&L=COOT&F=&S=&P=25056
>>>
>>>
>>
>>I think this is a poor solution.
> 
> We agree, I think, that it is not a good solution.
> 
>>  Microheterogeneity is not a duplicate residue number.
> 
> Isn't it? My understanding is that using the same residue number for a
> different residue type is *is* how microheterogeneity is specified.
> 
>>  Not any more so than the alternative
>> conformation that is also indicated with "alt loc" letters.
> 
> I don't follow this, but if you mean that to describe microheterogeneity
> by using an altloc is the Wrong Way, then I agree with you.

   I don't consider this a "duplicate residue number" because there is
only one residue being described.  At one time the PDB renamed
inhibitors bound to a protein to residue "1" despite the first residue
of the protein also being named "1".  Now that's a duplicate and it's
very confusing!

   With microheterogeneity there is only a single "slot" in the sequence
that is being described by the residue number.  It just happens that in
some of the unit cells that slot is occupied by, for example, an Ala
while the rest of the unit cells contain Ser.  This situation is not
much different than when the residue is always of type Ser but in some
unit cells it is g+ and the rest g- -- You have a single location in the
molecule with mutually exclusive models whose occupancies cannot sum to
more than one.  The "alt loc" indicators can be used to specify either
situation, and that is how they are used in the PDB.

Here is an example from a Crambin model (1JXT).  Note that there are
three "alt loc"s, two rotomers of Ser and one of Pro.  I do not consider
this a "duplicate residue number" since it is a description of just one
place in the sequence of the protein.

ATOM378  N  BSER A  22   4.886  12.647  -3.137  0.25  2.69
N
ATOM379  N  CSER A  22   4.886  12.647  -3.137  0.20  2.69
N
ATOM380  CA BSER A  22   6.014  13.445  -2.619  0.25  2.65
C
ATOM381  CA CSER A  22   6.014  13.445  -2.619  0.20  2.65
C
ATOM382  C  BSER A  22   6.335  13.134  -1.171  0.25  2.69
C
ATOM383  C  CSER A  22   6.335  13.134  -1.171  0.20  2.69
C
ATOM384  O  BSER A  22   5.447  12.947  -0.321  0.25  3.16
O
ATOM385  O  CSER A  22   5.447  12.947  -0.321  0.20  3.16
O
ATOM386  CB BSER A  22   5.771  14.977  -2.622  0.25  2.42
C
ATOM387  CB CSER A  22   5.303  14.879  -2.638  0.20  1.48
C
ATOM388  OG BSER A  22   4.801  15.169  -1.599  0.25  3.52
O
ATOM389  OG CSER A  22   6.657  15.597  -2.270  0.20  4.90
O
ATOM390  HB2BSER A  22   6.591  15.476  -2.342  0.25  9.94
H
ATOM391  HB2CSER A  22   5.428  15.332  -3.776  0.20  4.73
H
ATOM392  HB3BSER A  22   5.454  15.386  -3.472  0.25  5.08
H
ATOM393  HB3CSER A  22   4.730  15.085  -2.351  0.20  1.05
H
ATOM394  N  APRO A  22   4.886  12.647  -3.137  0.55  2.69
N
ATOM395  CA APRO A  22   6.014  13.445  -2.619  0.55  2.65
C
ATOM396  C  APRO A  22   6.335  13.134  -1.171  0.55  2.69
C
ATOM397  O  APRO A  22   5.447  12.947  -0.321  0.55  3.16
O
ATOM398  CB APRO A  22   5.553  14.873  -2.888  0.55  3.49
C
ATOM399  CG APRO A  22   4.590  14.806  -4.078  0.55  3.51
C
ATOM400  CD APRO A  22   3.870  13.470  -3.919  0.55  2.50
C
ATOM401  HA APRO A  22   6.812  13.232  -3.136  0.55  1.43
H
ATOM402  HB2APRO A  22   5.167  15.299  -2.198  0.55  3.89
H
ATOM403  HB3APRO A  22   6.300  15.439  -3.105  0.55 13.74
H
ATOM404  HG2APRO A  22   4.045  15.436  -4.032  0.55  3.61
H
ATOM405  HG3APRO A  22   5.063  14.827  -4.928  0.55  3.64
H
ATOM406  HD2APRO A  22   3.057  13.566  -3.405  0.55  5.81
H
ATOM407  HD3APRO A  22   3.657  12.978  -4.727  0.55  8.33
H

Dale Tronrud

> 
>> Both come up quite often in the PDB,
> 
> One man's quite often is another man's very rarely.
> 
>

Re: [COOT] pdb file problem with duplicate amino acid

2017-02-01 Thread Paul Emsley

On 31/01/2017 20:09, Dale Tronrud wrote:

On 1/31/2017 11:51 AM, Paul Emsley wrote:

On 31/01/17 17:54, Edwin Pozharski wrote:

Whatever the rationale was, there is a structure in the PDB that has
alternate conformer of a residue listed with different residue type -
A is arginine and B is glutamine.  Coot fails to load the model
complaining in the command window

WARNING:: Error reading small-molecule cif "/home/epo/coot/foo.pdb"
There was an error reading /home/epo/coot/foo.pdb.
ERROR 42 READ: Duplicate sequence number and insertion code.
 LINE #1571
 ATOM   1666  N  BGLN B  93  24.448  28.340 -33.325  0.50
9.34   N

No Spacegroup found for this PDB file
There was a coordinates read error



https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1503&L=COOT&F=&S=&P=25056



   I think this is a poor solution.


We agree, I think, that it is not a good solution.


 Microheterogeneity is not a duplicate residue number.


Isn't it? My understanding is that using the same residue number for a different residue 
type is *is* how microheterogeneity is specified.



 Not any more so than the alternative
conformation that is also indicated with "alt loc" letters.


I don't follow this, but if you mean that to describe microheterogeneity by using an altloc 
is the Wrong Way, then I agree with you.


> Both come up quite often in the PDB,

One man's quite often is another man's very rarely.


and microheterogeneity probably should be put
in models more often than it currently is.


I don't doubt that you are right.


Many modelers simply don't
realize it is a possibility.


I agree.


Your users rarely going to know about the
need to put this option into their startup file.


Indeed, if Ed has to ask the list, then I need to reconsider how I arrange this 
problem/work-around. (Maybe mmdb2 (or coot?) no longer has the problem with atom selection 
in models that have duplicate sequence numbers).


Paul.


Re: [COOT] pdb file problem with duplicate amino acid

2017-01-31 Thread Edwin Pozharski
Thanks - sorry I should have searched the archives first.  As a suggestion,
could you alter the warning text to include the fix decription (i.e.
"include this in your .coot.py)?  This way users would know what to do
right away.

On Tue, Jan 31, 2017 at 2:51 PM, Paul Emsley 
wrote:

> On 31/01/17 17:54, Edwin Pozharski wrote:
>
> Whatever the rationale was, there is a structure in the PDB that has
> alternate conformer of a residue listed with different residue type - A is
> arginine and B is glutamine.  Coot fails to load the model complaining in
> the command window
>
> WARNING:: Error reading small-molecule cif "/home/epo/coot/foo.pdb"
>> There was an error reading /home/epo/coot/foo.pdb.
>> ERROR 42 READ: Duplicate sequence number and insertion code.
>>  LINE #1571
>>  ATOM   1666  N  BGLN B  93  24.448  28.340 -33.325  0.50
>> 9.34   N
>>
>> No Spacegroup found for this PDB file
>> There was a coordinates read error
>>
>
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1503&L=
> COOT&F=&S=&P=25056
>
> Paul.
>


Re: [COOT] pdb file problem with duplicate amino acid

2017-01-31 Thread Dale Tronrud
On 1/31/2017 11:51 AM, Paul Emsley wrote:
> On 31/01/17 17:54, Edwin Pozharski wrote:
>> Whatever the rationale was, there is a structure in the PDB that has
>> alternate conformer of a residue listed with different residue type -
>> A is arginine and B is glutamine.  Coot fails to load the model
>> complaining in the command window
>>
>> WARNING:: Error reading small-molecule cif "/home/epo/coot/foo.pdb"
>> There was an error reading /home/epo/coot/foo.pdb.
>> ERROR 42 READ: Duplicate sequence number and insertion code.
>>  LINE #1571
>>  ATOM   1666  N  BGLN B  93  24.448  28.340 -33.325  0.50 
>> 9.34   N 
>>
>> No Spacegroup found for this PDB file
>> There was a coordinates read error
>>
> 
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1503&L=COOT&F=&S=&P=25056
> 
> Paul.

   I think this is a poor solution.  Microheterogeneity is not a
duplicate residue number.  Not any more so than the alternative
conformation that is also indicated with "alt loc" letters.  Both come
up quite often in the PDB, and microheterogeneity probably should be put
in models more often than it currently is.  May modelers simply don't
realize it is a possibility.  Your users rarely going to know about the
need to put this option into their startup file.

Dale


Re: [COOT] pdb file problem with duplicate amino acid

2017-01-31 Thread Paul Emsley

On 31/01/17 17:54, Edwin Pozharski wrote:
Whatever the rationale was, there is a structure in the PDB that has 
alternate conformer of a residue listed with different residue type - 
A is arginine and B is glutamine.  Coot fails to load the model 
complaining in the command window


WARNING:: Error reading small-molecule cif "/home/epo/coot/foo.pdb"
There was an error reading /home/epo/coot/foo.pdb.
ERROR 42 READ: Duplicate sequence number and insertion code.
 LINE #1571
 ATOM   1666  N  BGLN B  93  24.448  28.340 -33.325  0.50 
9.34   N


No Spacegroup found for this PDB file
There was a coordinates read error



https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1503&L=COOT&F=&S=&P=25056

Paul.


Re: [COOT] pdb file problem with duplicate amino acid

2017-01-31 Thread Dale Tronrud
   This is an occurrence of microheterogeneity and it is not all that
uncommon.  See Crambin as a classic prototype.  Coot should be able to
handle this.

   The work-around you suggest creates a very different model.  Residue
93A lies between 93 and 94 so you are actually inserting an entire
residue into the chain.

Dale Tronrud

On 1/31/2017 9:54 AM, Edwin Pozharski wrote:
> Whatever the rationale was, there is a structure in the PDB that has
> alternate conformer of a residue listed with different residue type - A
> is arginine and B is glutamine.  Coot fails to load the model
> complaining in the command window
> 
> WARNING:: Error reading small-molecule cif "/home/epo/coot/foo.pdb"
> There was an error reading /home/epo/coot/foo.pdb.
> ERROR 42 READ: Duplicate sequence number and insertion code.
>  LINE #1571
>  ATOM   1666  N  BGLN B  93  24.448  28.340 -33.325  0.50 
> 9.34   N 
> 
> No Spacegroup found for this PDB file
> There was a coordinates read error
> 
> 
> One way to deal with it is to take the second conformer and manually add
> a sequence modifier (make it 93A), and that pdb file loads just fine.
> 
> This is observed with Coot-0.8.8-pre, rev.6506.
> 
> This is only a minor nuisance, of course, so I completely understand if
> no fix is made to load such strange models.
> 
> Cheers,
> 
> Ed.
> 
> ---
> Coot verendus est
> 
> 


[COOT] pdb file problem with duplicate amino acid

2017-01-31 Thread Edwin Pozharski
Whatever the rationale was, there is a structure in the PDB that has
alternate conformer of a residue listed with different residue type - A is
arginine and B is glutamine.  Coot fails to load the model complaining in
the command window

WARNING:: Error reading small-molecule cif "/home/epo/coot/foo.pdb"
> There was an error reading /home/epo/coot/foo.pdb.
> ERROR 42 READ: Duplicate sequence number and insertion code.
>  LINE #1571
>  ATOM   1666  N  BGLN B  93  24.448  28.340 -33.325  0.50
> 9.34   N
>
> No Spacegroup found for this PDB file
> There was a coordinates read error
>

One way to deal with it is to take the second conformer and manually add a
sequence modifier (make it 93A), and that pdb file loads just fine.

This is observed with Coot-0.8.8-pre, rev.6506.

This is only a minor nuisance, of course, so I completely understand if no
fix is made to load such strange models.

Cheers,

Ed.

---
Coot verendus est