Thanks Paolo, this works brilliantly. Let's hope astatine inhibitors won't gain in popularity 😉
Best, Jenke ________________________________ From: Paolo Tosco <paolo.tosco.m...@gmail.com> Sent: 30 October 2019 13:25 To: SCHEEN Jenke <j.sch...@sms.ed.ac.uk>; RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Subject: Re: [Rdkit-discuss] fingerprint a molecule with pseudoatoms denoted by 'Du' Hi Jenke, I have put together a small gist showing a slightly hacky way to round-trip a molecule containing dummy atoms through a PDB block (assuming that your molecules do not contain astatine). If your dummy atoms are called "DU" rather than " *", you may just change the replace() expression with something that fits your needs. HTH, cheers p. On 10/30/19 12:06, SCHEEN Jenke wrote: Hi RDKitters, I'm trying to use rdkit to generate molecular fingerprints (such as AP or ECFP) on molecules that have non-interactive pseudoatoms ('dummy atoms', denoted by Du). I attached a sample PDB file containing the dummy atoms on positions 21-24. Reading this file (Chem.rdmolfiles.MolFromPDBFile("test.pdb", sanitize=False) throws a post-condition violation because the element 'Du' isn't recognised, which makes sense. I've been searching online and haven't been able to find any workarounds, do you have any suggestions? Some notes: * I'm hoping that once rdkit is able to read in the pdb file the mol object can be parsed without the FP constructor (e.g. AllChem.GetMorganFingerprint) complaining. * The use of the term dummy atoms here should not be confused with the dummy atoms depiction in fragmentising molecules in rdkit (where * is the smiles notation). * For this project all I aim to do is generate structural fingerprints for these types of ligands. This means I won't have to worry about defining chemical properties to Du. * The context for this issue is that we're aiming to featurise the ligands for an ML protocol where the dummy atoms are one of the major descriptors of the problem. * I thought manually inserting a 119th element in atomic_data.cpp might resolve the issue but I've been unable to locate the file in my conda installation. * The ODDT python API seems to parse the Du element without any issues but is limited in its FP generator diversity. Best, Jenke The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss