Re: [COOT] pdb file problem with duplicate amino acid
On 2/1/2017 4:58 AM, Paul Emsley wrote: > On 31/01/2017 20:09, Dale Tronrud wrote: >> On 1/31/2017 11:51 AM, Paul Emsley wrote: >>> On 31/01/17 17:54, Edwin Pozharski wrote: Whatever the rationale was, there is a structure in the PDB that has alternate conformer of a residue listed with different residue type - A is arginine and B is glutamine. Coot fails to load the model complaining in the command window WARNING:: Error reading small-molecule cif "/home/epo/coot/foo.pdb" There was an error reading /home/epo/coot/foo.pdb. ERROR 42 READ: Duplicate sequence number and insertion code. LINE #1571 ATOM 1666 N BGLN B 93 24.448 28.340 -33.325 0.50 9.34 N No Spacegroup found for this PDB file There was a coordinates read error >>> >>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1503&L=COOT&F=&S=&P=25056 >>> >>> >> >>I think this is a poor solution. > > We agree, I think, that it is not a good solution. > >> Microheterogeneity is not a duplicate residue number. > > Isn't it? My understanding is that using the same residue number for a > different residue type is *is* how microheterogeneity is specified. > >> Not any more so than the alternative >> conformation that is also indicated with "alt loc" letters. > > I don't follow this, but if you mean that to describe microheterogeneity > by using an altloc is the Wrong Way, then I agree with you. I don't consider this a "duplicate residue number" because there is only one residue being described. At one time the PDB renamed inhibitors bound to a protein to residue "1" despite the first residue of the protein also being named "1". Now that's a duplicate and it's very confusing! With microheterogeneity there is only a single "slot" in the sequence that is being described by the residue number. It just happens that in some of the unit cells that slot is occupied by, for example, an Ala while the rest of the unit cells contain Ser. This situation is not much different than when the residue is always of type Ser but in some unit cells it is g+ and the rest g- -- You have a single location in the molecule with mutually exclusive models whose occupancies cannot sum to more than one. The "alt loc" indicators can be used to specify either situation, and that is how they are used in the PDB. Here is an example from a Crambin model (1JXT). Note that there are three "alt loc"s, two rotomers of Ser and one of Pro. I do not consider this a "duplicate residue number" since it is a description of just one place in the sequence of the protein. ATOM378 N BSER A 22 4.886 12.647 -3.137 0.25 2.69 N ATOM379 N CSER A 22 4.886 12.647 -3.137 0.20 2.69 N ATOM380 CA BSER A 22 6.014 13.445 -2.619 0.25 2.65 C ATOM381 CA CSER A 22 6.014 13.445 -2.619 0.20 2.65 C ATOM382 C BSER A 22 6.335 13.134 -1.171 0.25 2.69 C ATOM383 C CSER A 22 6.335 13.134 -1.171 0.20 2.69 C ATOM384 O BSER A 22 5.447 12.947 -0.321 0.25 3.16 O ATOM385 O CSER A 22 5.447 12.947 -0.321 0.20 3.16 O ATOM386 CB BSER A 22 5.771 14.977 -2.622 0.25 2.42 C ATOM387 CB CSER A 22 5.303 14.879 -2.638 0.20 1.48 C ATOM388 OG BSER A 22 4.801 15.169 -1.599 0.25 3.52 O ATOM389 OG CSER A 22 6.657 15.597 -2.270 0.20 4.90 O ATOM390 HB2BSER A 22 6.591 15.476 -2.342 0.25 9.94 H ATOM391 HB2CSER A 22 5.428 15.332 -3.776 0.20 4.73 H ATOM392 HB3BSER A 22 5.454 15.386 -3.472 0.25 5.08 H ATOM393 HB3CSER A 22 4.730 15.085 -2.351 0.20 1.05 H ATOM394 N APRO A 22 4.886 12.647 -3.137 0.55 2.69 N ATOM395 CA APRO A 22 6.014 13.445 -2.619 0.55 2.65 C ATOM396 C APRO A 22 6.335 13.134 -1.171 0.55 2.69 C ATOM397 O APRO A 22 5.447 12.947 -0.321 0.55 3.16 O ATOM398 CB APRO A 22 5.553 14.873 -2.888 0.55 3.49 C ATOM399 CG APRO A 22 4.590 14.806 -4.078 0.55 3.51 C ATOM400 CD APRO A 22 3.870 13.470 -3.919 0.55 2.50 C ATOM401 HA APRO A 22 6.812 13.232 -3.136 0.55 1.43 H ATOM402 HB2APRO A 22 5.167 15.299 -2.198 0.55 3.89 H ATOM403 HB3APRO A 22 6.300 15.439 -3.105 0.55 13.74 H ATOM404 HG2APRO A 22 4.045 15.436 -4.032 0.55 3.61 H ATOM405 HG3APRO A 22 5.063 14.827 -4.928 0.55 3.64 H ATOM406 HD2APRO A 22 3.057 13.566 -3.405 0.55 5.81 H ATOM407 HD3APRO A 22 3.657 12.978 -4.727 0.55 8.33 H Dale Tronrud > >> Both come up quite often in the PDB, > > One man's quite often is another man's very rarely. > >
Re: [COOT] pdb file problem with duplicate amino acid
On 31/01/2017 20:09, Dale Tronrud wrote: On 1/31/2017 11:51 AM, Paul Emsley wrote: On 31/01/17 17:54, Edwin Pozharski wrote: Whatever the rationale was, there is a structure in the PDB that has alternate conformer of a residue listed with different residue type - A is arginine and B is glutamine. Coot fails to load the model complaining in the command window WARNING:: Error reading small-molecule cif "/home/epo/coot/foo.pdb" There was an error reading /home/epo/coot/foo.pdb. ERROR 42 READ: Duplicate sequence number and insertion code. LINE #1571 ATOM 1666 N BGLN B 93 24.448 28.340 -33.325 0.50 9.34 N No Spacegroup found for this PDB file There was a coordinates read error https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1503&L=COOT&F=&S=&P=25056 I think this is a poor solution. We agree, I think, that it is not a good solution. Microheterogeneity is not a duplicate residue number. Isn't it? My understanding is that using the same residue number for a different residue type is *is* how microheterogeneity is specified. Not any more so than the alternative conformation that is also indicated with "alt loc" letters. I don't follow this, but if you mean that to describe microheterogeneity by using an altloc is the Wrong Way, then I agree with you. > Both come up quite often in the PDB, One man's quite often is another man's very rarely. and microheterogeneity probably should be put in models more often than it currently is. I don't doubt that you are right. Many modelers simply don't realize it is a possibility. I agree. Your users rarely going to know about the need to put this option into their startup file. Indeed, if Ed has to ask the list, then I need to reconsider how I arrange this problem/work-around. (Maybe mmdb2 (or coot?) no longer has the problem with atom selection in models that have duplicate sequence numbers). Paul.
Re: [COOT] pdb file problem with duplicate amino acid
Thanks - sorry I should have searched the archives first. As a suggestion, could you alter the warning text to include the fix decription (i.e. "include this in your .coot.py)? This way users would know what to do right away. On Tue, Jan 31, 2017 at 2:51 PM, Paul Emsley wrote: > On 31/01/17 17:54, Edwin Pozharski wrote: > > Whatever the rationale was, there is a structure in the PDB that has > alternate conformer of a residue listed with different residue type - A is > arginine and B is glutamine. Coot fails to load the model complaining in > the command window > > WARNING:: Error reading small-molecule cif "/home/epo/coot/foo.pdb" >> There was an error reading /home/epo/coot/foo.pdb. >> ERROR 42 READ: Duplicate sequence number and insertion code. >> LINE #1571 >> ATOM 1666 N BGLN B 93 24.448 28.340 -33.325 0.50 >> 9.34 N >> >> No Spacegroup found for this PDB file >> There was a coordinates read error >> > > https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1503&L= > COOT&F=&S=&P=25056 > > Paul. >
Re: [COOT] pdb file problem with duplicate amino acid
On 1/31/2017 11:51 AM, Paul Emsley wrote: > On 31/01/17 17:54, Edwin Pozharski wrote: >> Whatever the rationale was, there is a structure in the PDB that has >> alternate conformer of a residue listed with different residue type - >> A is arginine and B is glutamine. Coot fails to load the model >> complaining in the command window >> >> WARNING:: Error reading small-molecule cif "/home/epo/coot/foo.pdb" >> There was an error reading /home/epo/coot/foo.pdb. >> ERROR 42 READ: Duplicate sequence number and insertion code. >> LINE #1571 >> ATOM 1666 N BGLN B 93 24.448 28.340 -33.325 0.50 >> 9.34 N >> >> No Spacegroup found for this PDB file >> There was a coordinates read error >> > > https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1503&L=COOT&F=&S=&P=25056 > > Paul. I think this is a poor solution. Microheterogeneity is not a duplicate residue number. Not any more so than the alternative conformation that is also indicated with "alt loc" letters. Both come up quite often in the PDB, and microheterogeneity probably should be put in models more often than it currently is. May modelers simply don't realize it is a possibility. Your users rarely going to know about the need to put this option into their startup file. Dale
Re: [COOT] pdb file problem with duplicate amino acid
On 31/01/17 17:54, Edwin Pozharski wrote: Whatever the rationale was, there is a structure in the PDB that has alternate conformer of a residue listed with different residue type - A is arginine and B is glutamine. Coot fails to load the model complaining in the command window WARNING:: Error reading small-molecule cif "/home/epo/coot/foo.pdb" There was an error reading /home/epo/coot/foo.pdb. ERROR 42 READ: Duplicate sequence number and insertion code. LINE #1571 ATOM 1666 N BGLN B 93 24.448 28.340 -33.325 0.50 9.34 N No Spacegroup found for this PDB file There was a coordinates read error https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1503&L=COOT&F=&S=&P=25056 Paul.
Re: [COOT] pdb file problem with duplicate amino acid
This is an occurrence of microheterogeneity and it is not all that uncommon. See Crambin as a classic prototype. Coot should be able to handle this. The work-around you suggest creates a very different model. Residue 93A lies between 93 and 94 so you are actually inserting an entire residue into the chain. Dale Tronrud On 1/31/2017 9:54 AM, Edwin Pozharski wrote: > Whatever the rationale was, there is a structure in the PDB that has > alternate conformer of a residue listed with different residue type - A > is arginine and B is glutamine. Coot fails to load the model > complaining in the command window > > WARNING:: Error reading small-molecule cif "/home/epo/coot/foo.pdb" > There was an error reading /home/epo/coot/foo.pdb. > ERROR 42 READ: Duplicate sequence number and insertion code. > LINE #1571 > ATOM 1666 N BGLN B 93 24.448 28.340 -33.325 0.50 > 9.34 N > > No Spacegroup found for this PDB file > There was a coordinates read error > > > One way to deal with it is to take the second conformer and manually add > a sequence modifier (make it 93A), and that pdb file loads just fine. > > This is observed with Coot-0.8.8-pre, rev.6506. > > This is only a minor nuisance, of course, so I completely understand if > no fix is made to load such strange models. > > Cheers, > > Ed. > > --- > Coot verendus est > >
[COOT] pdb file problem with duplicate amino acid
Whatever the rationale was, there is a structure in the PDB that has alternate conformer of a residue listed with different residue type - A is arginine and B is glutamine. Coot fails to load the model complaining in the command window WARNING:: Error reading small-molecule cif "/home/epo/coot/foo.pdb" > There was an error reading /home/epo/coot/foo.pdb. > ERROR 42 READ: Duplicate sequence number and insertion code. > LINE #1571 > ATOM 1666 N BGLN B 93 24.448 28.340 -33.325 0.50 > 9.34 N > > No Spacegroup found for this PDB file > There was a coordinates read error > One way to deal with it is to take the second conformer and manually add a sequence modifier (make it 93A), and that pdb file loads just fine. This is observed with Coot-0.8.8-pre, rev.6506. This is only a minor nuisance, of course, so I completely understand if no fix is made to load such strange models. Cheers, Ed. --- Coot verendus est