Hi all, I am writing a function that removes atoms with only one bond and can be applied recursively in order to find the scaffold of a molecule. The function works in most cases but I have observed that, when aromatic rings are involved, it produces a loss of information. This is an example:
from rdkit import Chem smiles = "CCN1CCC(N(CCCCCCc2ccc(C(F)(F)F)cc2)C(=O)Cn2c(CCc3cccc(F)c3F)cc(=O)c3ccccc32)CC1" Chem.MolFromSmiles(smiles.replace('@','')) [image: image.png] mol = Chem.MolFromSmiles(smiles.replace('@', '')) rwmol = Chem.RWMol(mol) # Note that in this case, I am iterating through the atoms only once # but the idea is to repeat this iteration until all single-bonded # atoms have been removed for a in list(rwmol.GetAtoms()): if len(a.GetBonds()) == 1: nbr = a.GetNeighbors()[0] rwmol.RemoveBond(a.GetIdx(), nbr.GetIdx()) rwmol.RemoveAtom(a.GetIdx()) rwmol [image: image.png] If you check the top aromatic ring in the starting molecule and the result after the first iteration, you can already see that the removal messes up the aromatic configuration of the ring after removing the two fluorines. If I try to Chem.SanitizeMol, at the end of the loop, it will not fix the problem (KekulizeException: Can't kekulize mol. Unkekulized atoms: 21 30 31 32 33 34 35 36 37). I have also done some attempts with using different sanitiseOps or by increasing the number of hydrogens on the atoms attached to those I remove but I could not figure out a solution - and clearly, if I carry on removing atoms recursively (i.e., I reapply the removal again), then the molecule gets messed up completely: [image: image.png] Can anyone help me with understanding what's missing? Thanks, Giammy
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss