Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

Lukas Pravda Tue, 17 Dec 2019 12:11:50 -0800

Hi IIllimar,

I don’t really know what your use case is, so it may be completely useless. 
However, just to add my two cents, we've created a package that builds on the 
top of rdkit and parses PDB ligand definitions from cif files. You can find the 
package here: https://gitlab.ebi.ac.uk/pdbe/ccdutils and the documentation can 
be found here: https://pdbe.gitdocs.ebi.ac.uk/ccdutils/


Let me know if this is helpful or you need further help.

Best,
Lukas

On 16/12/2019, 20:03, "Paolo Tosco" <[email protected]> wrote:

    Hi IIllimar,
    
    The RDKit PDB reader only recognize standard amino acids and, after the 
    PR I did on Saturday https://github.com/rdkit/rdkit/pull/2850 will be 
    merged, nucleic acid bases.
    
    Anything else will not have the correct hybridization/bond orders 
    perceived, as those are not encoded in the PDB format and the PDB reader 
    does not have any functionality to do that.
    
    The 1ARJ case is peculiar, as it has an ARG residue which would be 
    recognized if it were in the ATOM records, but not in the HETATM 
    section, for which no attempt to perceive the correct hybridization/bond 
    is made.
    
    My suggestion, if you are using standard PDB files, is to download the 
    SDF file:
    
    
https://www.rcsb.org/pdb/download/downloadLigandFiles.do?ligandIdList=A2F&structIdList=3GOT&instanceType=all&excludeUnobserved=false&includeHydrogens=false
    
    and construct your RDKit molecule from that.
    
    You should be able to automate that without too much effort either 
    constructing URLs using the template above or using the PDB REST API.
    
    Cheers,
    p.
    
    On 16/12/2019 18:24, Illimar Hugo Rekand wrote:
    > Thanks, Paolo, for a good and clear example.
    >
    >
    > I adapted your code into my workflow to calculate some 
Lipinski-properties of RNA pdb-structures, and ran into some issues. I'm not 
sure if I should make a new thread or throw this onto this one I already 
created?
    >
    >
    > I used the following code under
    >
    >
    > from rdkit import Chem
    > from rdkit.Chem import rdmolops, Lipinski
    > from urllib.request import urlopen
    > import gzip
    > import pprint
    > pp = pprint.PrettyPrinter(indent=4)
    >
    >
    > Lipinski_dic = {'FractionCSP3':Lipinski.FractionCSP3,
    >                  'HeavyAtomCount':Lipinski.HeavyAtomCount,
    >                  'NHOHCount': Lipinski.NHOHCount,
    >                  "NOCount":Lipinski.NOCount,
    >                  "NumAliphaticCarbocycles": 
Lipinski.NumAliphaticCarbocycles,
    >                  "NumAliphaticHeterocycles" : 
Lipinski.NumAliphaticHeterocycles,
    >                  'NumAliphaticRings' :  Lipinski.NumAliphaticRings,
    >                  'NumAromaticCarbocycles' : 
Lipinski.NumAromaticCarbocycles,
    >                  'NumAromaticHeterocycles' : 
Lipinski.NumAromaticHeterocycles,
    >                  'NumAromaticRings' : Lipinski.NumAromaticRings,
    >                  'NumHAcceptors' : Lipinski.NumHAcceptors,
    >                  'NumHDonors' : Lipinski.NumHDonors,
    >                  'NumHeteroatoms' : Lipinski.NumHeteroatoms,
    >                  'NumRotatableBonds' : Lipinski.NumRotatableBonds,
    >                  'NumSaturatedCarbocycles' : 
Lipinski.NumSaturatedCarbocycles,
    >                  'NumSaturatedHeterocycles' : 
Lipinski.NumSaturatedHeterocycles,
    >                  'NumSaturatedRings' : Lipinski.NumSaturatedRings,
    >                  'RingCount' : Lipinski.RingCount
    >                  }
    >
    > url =  "https://files.rcsb.org/download/1arj.pdb.gz";
    > pdb_data = gzip.decompress(urlopen(url).read())
    > mol = Chem.RWMol(Chem.MolFromPDBBlock(pdb_data))
    > bonds_to_cleave = {(b.GetBeginAtomIdx(), b.GetEndAtomIdx()) for b in 
mol.GetBonds() if b.GetBeginAtom().GetPDBResidueInfo().GetIsHeteroAtom() ^ 
b.GetEndAtom().GetPDBResidueInfo().GetIsHeteroAtom()}
    > [mol.RemoveBond(*b) for b in bonds_to_cleave]
    > hetatm_frags = [f for f in rdmolops.GetMolFrags(mol, asMols=True, 
sanitizeFrags=True) if f.GetNumAtoms() and 
f.GetAtomWithIdx(0).GetPDBResidueInfo().GetIsHeteroAtom()]
    > for hetatm in hetatm_frags:
    >      res_name = 
hetatm.GetAtomWithIdx(0).GetPDBResidueInfo().GetResidueName()
    >      calculated_props = {}
    >      for prop in Lipinski_dic:
    >          function = Lipinski_dic[prop]
    >          x = function(hetatm)
    >          calculated_props[prop] = x
    >      pp.pprint(calculated_props)
    >
    >
    > and as you can see the properties of the ligand doesn't match up with 
what is expected (The number of SP3-atoms doesn't match up). When parsing 
through the structure 3got, it fails to recognize the aromatic rings of the 
ligand A2F. I'm assuming this is caused by RDKit not assigning bond orders 
correctly when reading in RNA and DNA pdb files (something which I have 
reported in earlier on this mailing list)?
    >
    >
    > Running hetatm.UpdatePropertyCache(strict=True) does not remedy this 
problem. Is there a clever way I can fix this quickly without waiting for this 
to be fixed in a future version?
    >
    >
    > Illimar Rekand
    > Ph.D. candidate,
    > Brenk-lab, Haug-lab
    > Department of Biomedicine
    > Department of Chemistry
    > University of Bergen
    >
    >
    > ________________________________
    > From: Illimar Hugo Rekand
    > Sent: Monday, December 16, 2019 5:55:56 PM
    > To: Paolo Tosco
    > Subject: Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand
    >
    >
    > Hey, Paolo,
    >
    >
    > thanks for a good and clear example!
    >
    >
    > all the best,
    >
    >
    > Illimar Rekand
    > Ph.D. candidate,
    > Brenk-lab, Haug-lab
    > Department of Biomedicine
    > Department of Chemistry
    > University of Bergen
    >
    >
    > ________________________________
    > From: Paolo Tosco <[email protected]>
    > Sent: Monday, December 16, 2019 5:52:18 PM
    > To: Illimar Hugo Rekand; [email protected]
    > Subject: Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand
    >
    > Hi Illimar,
    >
    > this gist:
    >
    > https://gist.github.com/ptosco/2ee199b219b27e01052a7a1433b3bd22
    >
    > shows a way to achieve this.
    >
    > Cheers,
    > p.
    >
    > On 16/12/2019 16:07, Illimar Hugo Rekand wrote:
    >> Hello, everyone
    >>
    >>
    >> Is there a simple way to create a mol object from just the HETATM/ligand 
lines from a pdb-file?
    >>
    >> Would it be viable to create a function where you could create a mol 
object from specific lines within a pdb-file?
    >>
    >>
    >> Illimar Rekand
    >> Ph.D. candidate,
    >> Brenk-lab, Haug-lab
    >> Department of Biomedicine
    >> Department of Chemistry
    >> University of Bergen
    >>
    >>
    >>
    >> _______________________________________________
    >> Rdkit-discuss mailing list
    >> [email protected]
    >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
    
    
    _______________________________________________
    Rdkit-discuss mailing list
    [email protected]
    https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
    




_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

Reply via email to