Hi IIllimar,
The RDKit PDB reader only recognize standard amino acids and, after the
PR I did on Saturday https://github.com/rdkit/rdkit/pull/2850 will be
merged, nucleic acid bases.
Anything else will not have the correct hybridization/bond orders
perceived, as those are not encoded in the PDB format and the PDB reader
does not have any functionality to do that.
The 1ARJ case is peculiar, as it has an ARG residue which would be
recognized if it were in the ATOM records, but not in the HETATM
section, for which no attempt to perceive the correct hybridization/bond
is made.
My suggestion, if you are using standard PDB files, is to download the
SDF file:
https://www.rcsb.org/pdb/download/downloadLigandFiles.do?ligandIdList=A2F&structIdList=3GOT&instanceType=all&excludeUnobserved=false&includeHydrogens=false
and construct your RDKit molecule from that.
You should be able to automate that without too much effort either
constructing URLs using the template above or using the PDB REST API.
Cheers,
p.
On 16/12/2019 18:24, Illimar Hugo Rekand wrote:
Thanks, Paolo, for a good and clear example.
I adapted your code into my workflow to calculate some Lipinski-properties of
RNA pdb-structures, and ran into some issues. I'm not sure if I should make a
new thread or throw this onto this one I already created?
I used the following code under
from rdkit import Chem
from rdkit.Chem import rdmolops, Lipinski
from urllib.request import urlopen
import gzip
import pprint
pp = pprint.PrettyPrinter(indent=4)
Lipinski_dic = {'FractionCSP3':Lipinski.FractionCSP3,
'HeavyAtomCount':Lipinski.HeavyAtomCount,
'NHOHCount': Lipinski.NHOHCount,
"NOCount":Lipinski.NOCount,
"NumAliphaticCarbocycles": Lipinski.NumAliphaticCarbocycles,
"NumAliphaticHeterocycles" : Lipinski.NumAliphaticHeterocycles,
'NumAliphaticRings' : Lipinski.NumAliphaticRings,
'NumAromaticCarbocycles' : Lipinski.NumAromaticCarbocycles,
'NumAromaticHeterocycles' : Lipinski.NumAromaticHeterocycles,
'NumAromaticRings' : Lipinski.NumAromaticRings,
'NumHAcceptors' : Lipinski.NumHAcceptors,
'NumHDonors' : Lipinski.NumHDonors,
'NumHeteroatoms' : Lipinski.NumHeteroatoms,
'NumRotatableBonds' : Lipinski.NumRotatableBonds,
'NumSaturatedCarbocycles' : Lipinski.NumSaturatedCarbocycles,
'NumSaturatedHeterocycles' : Lipinski.NumSaturatedHeterocycles,
'NumSaturatedRings' : Lipinski.NumSaturatedRings,
'RingCount' : Lipinski.RingCount
}
url = "https://files.rcsb.org/download/1arj.pdb.gz"
pdb_data = gzip.decompress(urlopen(url).read())
mol = Chem.RWMol(Chem.MolFromPDBBlock(pdb_data))
bonds_to_cleave = {(b.GetBeginAtomIdx(), b.GetEndAtomIdx()) for b in
mol.GetBonds() if b.GetBeginAtom().GetPDBResidueInfo().GetIsHeteroAtom() ^
b.GetEndAtom().GetPDBResidueInfo().GetIsHeteroAtom()}
[mol.RemoveBond(*b) for b in bonds_to_cleave]
hetatm_frags = [f for f in rdmolops.GetMolFrags(mol, asMols=True,
sanitizeFrags=True) if f.GetNumAtoms() and
f.GetAtomWithIdx(0).GetPDBResidueInfo().GetIsHeteroAtom()]
for hetatm in hetatm_frags:
res_name = hetatm.GetAtomWithIdx(0).GetPDBResidueInfo().GetResidueName()
calculated_props = {}
for prop in Lipinski_dic:
function = Lipinski_dic[prop]
x = function(hetatm)
calculated_props[prop] = x
pp.pprint(calculated_props)
and as you can see the properties of the ligand doesn't match up with what is
expected (The number of SP3-atoms doesn't match up). When parsing through the
structure 3got, it fails to recognize the aromatic rings of the ligand A2F. I'm
assuming this is caused by RDKit not assigning bond orders correctly when
reading in RNA and DNA pdb files (something which I have reported in earlier on
this mailing list)?
Running hetatm.UpdatePropertyCache(strict=True) does not remedy this problem.
Is there a clever way I can fix this quickly without waiting for this to be
fixed in a future version?
Illimar Rekand
Ph.D. candidate,
Brenk-lab, Haug-lab
Department of Biomedicine
Department of Chemistry
University of Bergen
________________________________
From: Illimar Hugo Rekand
Sent: Monday, December 16, 2019 5:55:56 PM
To: Paolo Tosco
Subject: Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand
Hey, Paolo,
thanks for a good and clear example!
all the best,
Illimar Rekand
Ph.D. candidate,
Brenk-lab, Haug-lab
Department of Biomedicine
Department of Chemistry
University of Bergen
________________________________
From: Paolo Tosco <paolo.tosco.m...@gmail.com>
Sent: Monday, December 16, 2019 5:52:18 PM
To: Illimar Hugo Rekand; rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand
Hi Illimar,
this gist:
https://gist.github.com/ptosco/2ee199b219b27e01052a7a1433b3bd22
shows a way to achieve this.
Cheers,
p.
On 16/12/2019 16:07, Illimar Hugo Rekand wrote:
Hello, everyone
Is there a simple way to create a mol object from just the HETATM/ligand lines
from a pdb-file?
Would it be viable to create a function where you could create a mol object
from specific lines within a pdb-file?
Illimar Rekand
Ph.D. candidate,
Brenk-lab, Haug-lab
Department of Biomedicine
Department of Chemistry
University of Bergen
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss