Hi IIllimar, I don’t really know what your use case is, so it may be completely useless. However, just to add my two cents, we've created a package that builds on the top of rdkit and parses PDB ligand definitions from cif files. You can find the package here: https://gitlab.ebi.ac.uk/pdbe/ccdutils and the documentation can be found here: https://pdbe.gitdocs.ebi.ac.uk/ccdutils/
Let me know if this is helpful or you need further help. Best, Lukas On 16/12/2019, 20:03, "Paolo Tosco" <paolo.tosco.m...@gmail.com> wrote: Hi IIllimar, The RDKit PDB reader only recognize standard amino acids and, after the PR I did on Saturday https://github.com/rdkit/rdkit/pull/2850 will be merged, nucleic acid bases. Anything else will not have the correct hybridization/bond orders perceived, as those are not encoded in the PDB format and the PDB reader does not have any functionality to do that. The 1ARJ case is peculiar, as it has an ARG residue which would be recognized if it were in the ATOM records, but not in the HETATM section, for which no attempt to perceive the correct hybridization/bond is made. My suggestion, if you are using standard PDB files, is to download the SDF file: https://www.rcsb.org/pdb/download/downloadLigandFiles.do?ligandIdList=A2F&structIdList=3GOT&instanceType=all&excludeUnobserved=false&includeHydrogens=false and construct your RDKit molecule from that. You should be able to automate that without too much effort either constructing URLs using the template above or using the PDB REST API. Cheers, p. On 16/12/2019 18:24, Illimar Hugo Rekand wrote: > Thanks, Paolo, for a good and clear example. > > > I adapted your code into my workflow to calculate some Lipinski-properties of RNA pdb-structures, and ran into some issues. I'm not sure if I should make a new thread or throw this onto this one I already created? > > > I used the following code under > > > from rdkit import Chem > from rdkit.Chem import rdmolops, Lipinski > from urllib.request import urlopen > import gzip > import pprint > pp = pprint.PrettyPrinter(indent=4) > > > Lipinski_dic = {'FractionCSP3':Lipinski.FractionCSP3, > 'HeavyAtomCount':Lipinski.HeavyAtomCount, > 'NHOHCount': Lipinski.NHOHCount, > "NOCount":Lipinski.NOCount, > "NumAliphaticCarbocycles": Lipinski.NumAliphaticCarbocycles, > "NumAliphaticHeterocycles" : Lipinski.NumAliphaticHeterocycles, > 'NumAliphaticRings' : Lipinski.NumAliphaticRings, > 'NumAromaticCarbocycles' : Lipinski.NumAromaticCarbocycles, > 'NumAromaticHeterocycles' : Lipinski.NumAromaticHeterocycles, > 'NumAromaticRings' : Lipinski.NumAromaticRings, > 'NumHAcceptors' : Lipinski.NumHAcceptors, > 'NumHDonors' : Lipinski.NumHDonors, > 'NumHeteroatoms' : Lipinski.NumHeteroatoms, > 'NumRotatableBonds' : Lipinski.NumRotatableBonds, > 'NumSaturatedCarbocycles' : Lipinski.NumSaturatedCarbocycles, > 'NumSaturatedHeterocycles' : Lipinski.NumSaturatedHeterocycles, > 'NumSaturatedRings' : Lipinski.NumSaturatedRings, > 'RingCount' : Lipinski.RingCount > } > > url = "https://files.rcsb.org/download/1arj.pdb.gz" > pdb_data = gzip.decompress(urlopen(url).read()) > mol = Chem.RWMol(Chem.MolFromPDBBlock(pdb_data)) > bonds_to_cleave = {(b.GetBeginAtomIdx(), b.GetEndAtomIdx()) for b in mol.GetBonds() if b.GetBeginAtom().GetPDBResidueInfo().GetIsHeteroAtom() ^ b.GetEndAtom().GetPDBResidueInfo().GetIsHeteroAtom()} > [mol.RemoveBond(*b) for b in bonds_to_cleave] > hetatm_frags = [f for f in rdmolops.GetMolFrags(mol, asMols=True, sanitizeFrags=True) if f.GetNumAtoms() and f.GetAtomWithIdx(0).GetPDBResidueInfo().GetIsHeteroAtom()] > for hetatm in hetatm_frags: > res_name = hetatm.GetAtomWithIdx(0).GetPDBResidueInfo().GetResidueName() > calculated_props = {} > for prop in Lipinski_dic: > function = Lipinski_dic[prop] > x = function(hetatm) > calculated_props[prop] = x > pp.pprint(calculated_props) > > > and as you can see the properties of the ligand doesn't match up with what is expected (The number of SP3-atoms doesn't match up). When parsing through the structure 3got, it fails to recognize the aromatic rings of the ligand A2F. I'm assuming this is caused by RDKit not assigning bond orders correctly when reading in RNA and DNA pdb files (something which I have reported in earlier on this mailing list)? > > > Running hetatm.UpdatePropertyCache(strict=True) does not remedy this problem. Is there a clever way I can fix this quickly without waiting for this to be fixed in a future version? > > > Illimar Rekand > Ph.D. candidate, > Brenk-lab, Haug-lab > Department of Biomedicine > Department of Chemistry > University of Bergen > > > ________________________________ > From: Illimar Hugo Rekand > Sent: Monday, December 16, 2019 5:55:56 PM > To: Paolo Tosco > Subject: Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand > > > Hey, Paolo, > > > thanks for a good and clear example! > > > all the best, > > > Illimar Rekand > Ph.D. candidate, > Brenk-lab, Haug-lab > Department of Biomedicine > Department of Chemistry > University of Bergen > > > ________________________________ > From: Paolo Tosco <paolo.tosco.m...@gmail.com> > Sent: Monday, December 16, 2019 5:52:18 PM > To: Illimar Hugo Rekand; rdkit-discuss@lists.sourceforge.net > Subject: Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand > > Hi Illimar, > > this gist: > > https://gist.github.com/ptosco/2ee199b219b27e01052a7a1433b3bd22 > > shows a way to achieve this. > > Cheers, > p. > > On 16/12/2019 16:07, Illimar Hugo Rekand wrote: >> Hello, everyone >> >> >> Is there a simple way to create a mol object from just the HETATM/ligand lines from a pdb-file? >> >> Would it be viable to create a function where you could create a mol object from specific lines within a pdb-file? >> >> >> Illimar Rekand >> Ph.D. candidate, >> Brenk-lab, Haug-lab >> Department of Biomedicine >> Department of Chemistry >> University of Bergen >> >> >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss