Apologies all -- but I am still having problems with this.
Reading
https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03485.html
"As far as I understood, the PDB reader assigns bond orders to the amino
acids in a protein, but if a ligand is present it puts all bonds of it to
SINGLE bonds as auto bond-type perception is not trivial (see Roger's
comments)."
However I am unable to get bond orders for the protein side - am I doing
something wrong or is this the intended behaviour ?
I imagine I can use AssignBondOrdersFromTemplate() for the 20 amino acids
and set these myself -- or is there a better way to do this?
Also, is there a way to make AssignBondOrdersFromTemplate assign bond
orders to all matches?
>>> import rdkit
>>> from rdkit import Chem
>>> temp = Chem.MolFromSmiles('C=O')
>>> mol = Chem.MolFromSmiles('C(O)CC(O)')
>>> from rdkit.Chem import AllChem
>>> m2 = AllChem.AssignBondOrdersFromTemplate(temp, mol)
[12:24:56] WARNING: More than one matching pattern found - picking one
>>> print Chem.MolToSmiles(m2) # was expecting O=CCC=O
O=CCCO
Also another thing I don't quite understand is in the following below code,
I get a "WARNING: More than one matching pattern found - picking one" but
how can my template match multiple times (this is not symettrical) ?
# (Using RDKit_2013_09_1)
import rdkit
from rdkit import Chem
from rdkit.Chem import AllChem
ligand_mol = Chem.MolFromPDBBlock("""HETATM 1 C1 MRC A1993
30.994 82.769 82.139 1.00 18.68 C
HETATM 2 C2 MRC A1993 29.949 82.382 81.280 1.00 18.38
C
HETATM 3 C3 MRC A1993 28.809 83.090 80.875 1.00 16.44
C
HETATM 4 C4 MRC A1993 27.794 82.511 79.886 1.00 17.11
C
HETATM 5 C5 MRC A1993 26.268 82.360 79.965 1.00 16.74
C
HETATM 6 C6 MRC A1993 25.256 81.832 78.911 1.00 17.00
C
HETATM 7 C7 MRC A1993 23.832 81.867 79.556 1.00 17.45
C
HETATM 8 C8 MRC A1993 23.758 81.056 80.927 1.00 16.89
C
HETATM 9 C9 MRC A1993 23.820 79.467 80.419 1.00 17.84
C
HETATM 10 C10 MRC A1993 22.833 78.610 79.550 1.00 19.48
C
HETATM 11 C11 MRC A1993 22.999 78.593 78.193 1.00 20.56
C
HETATM 12 C12 MRC A1993 21.733 78.839 77.305 1.00 20.86
C
HETATM 13 C13 MRC A1993 21.779 78.052 75.821 1.00 20.74
C
HETATM 14 C14 MRC A1993 20.323 77.662 75.537 1.00 22.44
C
HETATM 15 C15 MRC A1993 28.456 84.523 81.348 1.00 12.97
C
HETATM 16 C16 MRC A1993 24.899 81.634 81.814 1.00 16.07
C
HETATM 17 C1' MRC A1993 38.561 75.401 83.188 1.00 53.39
C
HETATM 18 O1P MRC A1993 39.367 74.705 83.841 1.00 53.58
O
HETATM 19 O1Q MRC A1993 38.963 76.034 82.185 1.00 52.93
O
HETATM 20 C2' MRC A1993 37.074 75.480 83.615 1.00 51.57
C
HETATM 21 C3' MRC A1993 36.915 75.997 85.071 1.00 48.41
C
HETATM 22 C4' MRC A1993 35.513 76.588 85.323 1.00 45.07
C
HETATM 23 C5' MRC A1993 35.443 78.068 84.897 1.00 41.55
C
HETATM 24 C6' MRC A1993 34.033 78.631 85.167 1.00 37.19
C
HETATM 25 C7' MRC A1993 33.490 79.356 83.929 1.00 34.17
C
HETATM 26 C8' MRC A1993 33.454 80.886 84.151 1.00 31.34
C
HETATM 27 C9' MRC A1993 32.082 81.519 83.803 1.00 27.63
C
HETATM 28 O1A MRC A1993 32.056 81.880 82.413 1.00 22.28
O
HETATM 29 O1B MRC A1993 31.044 83.885 82.667 1.00 20.31
O
HETATM 30 O5 MRC A1993 26.209 81.625 81.183 1.00 16.19
O
HETATM 31 O7 MRC A1993 23.503 83.224 79.735 1.00 14.98
O
HETATM 32 O6 MRC A1993 25.399 82.787 77.821 1.00 15.00
O
HETATM 33 O10 MRC A1993 22.868 77.384 78.981 1.00 21.90
O
HETATM 34 C17 MRC A1993 21.395 80.405 77.027 1.00 20.53
C
HETATM 35 O13 MRC A1993 22.524 76.868 75.987 1.00 21.25
O
TER
END""")
template_ligand_mol = Chem.MolFromSmiles("C[C@H](O)[C@H](C)[C@@H]1O[C@H
]1C[C@H]2CO[C@@H](C/C(C)=C/C(=O)OCCCCCCCCC(O)=O)[C@@H](O)[C@H]2O")
ligand_mol_with_bonds =
AllChem.AssignBondOrdersFromTemplate(template_ligand_mol, ligand_mol)
# [12:33:39] WARNING: More than one matching pattern found - picking one
print Chem.MolToSmiles(ligand_mol)
# CC(CC(O)OCCCCCCCCC(O)O)CC1OCC(CC2OC2C(C)C(C)O)C(O)C1O
print Chem.MolToSmiles(ligand_mol_with_bonds)
# CC(=CC(=O)OCCCCCCCCC(=O)O)CC1OCC(CC2OC2C(C)C(C)O)C(O)C1O
Any help would be greatly appreciated.
Thanks,
JP
On 13 January 2014 21:02, JP <jeanpaul.ebe...@inhibox.com> wrote:
>
> Thanks All - I think I am in a good place now.
>
> I can get the SMILES from Paul's mmcif links and then I can use Sereina
magic three lines to do what I want. I'd cross my fingers - but with RDKit
you don't need to.
> This works for all Chemical Components (or what other fashionable name
they go by these days) in the PDB.
>
> For posterity: I have found a post in the mailing list started by James
which sheds some light on this:
>
https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03481.html
>
>
>
>
> On 13 January 2014 19:46, sereina riniker <sereina.rini...@gmail.com>
wrote:
>>
>> Hi JP,
>>
>> If you have also a SMILES of the molecule you want to read from PDB, you
can assign the bond orders based on this template:
>>
>> tmp = Chem.MolFromPDBFile(yourfilename)
>> template = Chem.MolFromSmiles(yoursmiles)
>> mol = AllChem.AssignBondOrdersFromTemplate(template, tmp)
>>
>> Is this what you're looking for?
>>
>> Best,
>> Sereina
>>
>>
>> 2014/1/13 JP <jeanpaul.ebe...@inhibox.com>
>>>
>>> RDKitters!
>>>
>>> Finally back on the mailing list!
>>>
>>> I am sure we've been through this at the UGM (my mind must have
wandered off!), but a quick question about the PDB reader and bond
perception. Is this supported with the current PDB reader? I remember
that someone (PaulE, perhaps?) was saying bond perception was painful, but
there was some dictionary for PDB ligands which helps (any idea the name of
this dictionary?).
>>>
>>> To the technical details.
>>>
>>> I am reading in the following PDB file with a simple MolFromPDBFile()
call:
>>>
>>> HETATM 1 O1P 84T A1862 -27.016 9.387 -72.564 1.00 20.81
O
>>> HETATM 2 P 84T A1862 -27.282 9.818 -73.968 1.00 19.65
P
>>> HETATM 3 O2P 84T A1862 -27.881 11.176 -74.182 1.00 21.49
O
>>> HETATM 4 N 84T A1862 -25.869 9.583 -74.813 1.00 19.78
N
>>> HETATM 5 C 84T A1862 -25.759 10.010 -76.075 1.00 19.97
C
>>> HETATM 6 CA 84T A1862 -24.493 9.748 -76.807 1.00 19.75
C
>>> HETATM 7 CB 84T A1862 -24.794 8.678 -77.847 1.00 19.73
C
>>> HETATM 8 CG 84T A1862 -23.571 8.324 -78.681 1.00 19.70
C
>>> HETATM 9 CD2 84T A1862 -23.309 9.519 -79.611 1.00 18.49
C
>>> HETATM 10 CD1 84T A1862 -23.863 6.932 -79.305 1.00 18.60
C
>>> HETATM 11 OHB 84T A1862 -25.210 7.467 -77.223 1.00 19.17
O
>>> HETATM 12 OH 84T A1862 -23.549 9.127 -75.984 1.00 20.33
O
>>> HETATM 13 O 84T A1862 -26.672 10.517 -76.692 1.00 20.26
O
>>> HETATM 14 O5' 84T A1862 -28.377 8.861 -74.619 1.00 19.39
O
>>> HETATM 15 C5' 84T A1862 -28.002 7.536 -74.954 1.00 18.47
C
>>> HETATM 16 C4' 84T A1862 -28.909 7.000 -76.012 1.00 18.24
C
>>> HETATM 17 C3' 84T A1862 -28.901 7.826 -77.298 1.00 18.28
C
>>> HETATM 18 C2' 84T A1862 -30.318 7.610 -77.768 1.00 18.69
C
>>> HETATM 19 O2' 84T A1862 -30.789 8.641 -78.581 1.00 19.64
O
>>> HETATM 20 O4' 84T A1862 -30.262 6.951 -75.529 1.00 18.80
O
>>> HETATM 21 C1' 84T A1862 -31.152 7.470 -76.521 1.00 19.01
C
>>> HETATM 22 N9 84T A1862 -31.753 8.732 -76.009 1.00 20.08
N
>>> HETATM 23 C4 84T A1862 -33.033 9.013 -76.158 1.00 21.10
C
>>> HETATM 24 N3 84T A1862 -34.018 8.339 -76.786 1.00 21.58
N
>>> HETATM 25 C2 84T A1862 -35.263 8.846 -76.830 1.00 21.95
C
>>> HETATM 26 C8 84T A1862 -31.223 9.701 -75.291 1.00 20.27
C
>>> HETATM 27 N7 84T A1862 -32.173 10.618 -75.019 1.00 21.28
N
>>> HETATM 28 C5 84T A1862 -33.315 10.213 -75.563 1.00 21.81
C
>>> HETATM 29 C6 84T A1862 -34.624 10.702 -75.627 1.00 22.85
C
>>> HETATM 30 N1 84T A1862 -35.550 10.010 -76.285 1.00 22.44
N
>>> HETATM 31 N6 84T A1862 -35.008 11.862 -75.052 1.00 23.86
N
>>> TER
>>> END
>>>
>>> But I am losing all the double bond (and aromatic) information:
>>>
>>> m = Chem.MolFromPDBFile(sys.argv[1])
>>> print Chem.MolToSmiles(m)
>>>
>>> Gives me:
>>>
>>> CC(C)C(O)C(O)C(O)NP(O)(O)OCC1CC(O)C(N2CNC3C2NCNC3N)O1
>>>
>>> As usual, many thanks for your time,
>>>
>>> -
>>> Jean-Paul Ebejer
>>> Early Stage Researcher
>>>
>>>
------------------------------------------------------------------------------
>>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>>> Critical Workloads, Development Environments & Everything In Between.
>>> Get a Quote or Start a Free Trial Today.
>>>
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
>
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss