Hi Jameed, I don't think your approach will work, which means I likely didn't explain myself well enough.
Let's say I start with: Cc1cc2ccccc2c(=O)o1 - https://cactus.nci.nih.gov/chemical/structure/Cc1cc2ccccc2c(=O)o1/image I want to break the aromatic bond between the aromatic 'c' (also double-bonded to the O) and the aromatic 'o'. This is possible because a Kekule interpretation of that molecule is: >>> from rdkit import Chem >>> mol = Chem.MolFromSmiles("Cc1cc2ccccc2c(=O)o1") >>> Chem.Kekulize(mol) >>> Chem.MolToSmiles(mol) 'Cc1cc2ccccc2c(=O)o1' >>> Chem.MolToSmiles(mol, kekuleSmiles=True) 'CC1=CC2=CC=CC=C2C(=O)O1' and I could break it by simply inserting a "." in the right spot, like this: CC1=CC2=CC=CC=C2C(=O).O1 >>> mol2 = Chem.MolFromSmiles("CC1=CC2=CC=CC=C2C(=O).O1") >>> Chem.MolToSmiles(mol2) 'CC(O)=Cc1ccccc1C=O' The resulting structure still has an aromatic ring, but not all of the atoms which were flagged as aromatic are still aromatic. With the approach you outlined, I don't know how to restored the modified structure so I can generate the final SMILES I expect. The Kekulize() approach finds a valid Kekule interpretation, but there may be multiple interpretations, allowing both a single and double bond for a given bond position. I think I want to generate both possibilities. So far the only solution I've come up with - and it's a real brute-force one - is to do complete enumeration, along the lines of the code after my signature. % python get_kekule_bonds.py 'Cc1cc2ccccc2c(=O)o1' bond frag atoms allowed 1 c:c 1,2 DOUBLE 2 c:c 2,3 SINGLE 3 c:c 3,4 SINGLE DOUBLE 4 c:c 4,5 SINGLE DOUBLE 5 c:c 5,6 SINGLE DOUBLE 6 c:c 6,7 SINGLE DOUBLE 7 c:c 7,8 SINGLE DOUBLE 8 c:c 8,9 SINGLE 10 c:o 9,11 SINGLE 11 c:o 11,1 SINGLE 12 c:c 8,3 SINGLE DOUBLE (To be usable for what I want, I also need to figure out where those bond terms are in the Kekeule SMILES, so I can either replace it with a "." or remove the ring closure, then re-perceive.) This can work for what I'm looking for, since I have <=14 heavies, but I'm hoping for something rather less crude. Andrew da...@dalkescientific.com # List possible Kekule bond assignments for aromatic bonds in a SMILES from rdkit import Chem import itertools import sys def get_kekule_bonds(mol): """mol -> possible bond assignment table Returns a table mapping bond index (for aromatic bonds) to possible Kekule bond assignments for that bond. """ # Figure out which bonds are aromatic aromatic_bond_indices = {} for bond in mol.GetBonds(): if bond.GetIsAromatic(): a1, a2 = bond.GetBeginAtomIdx(), bond.GetEndAtomIdx() if a1 > a2: a1, a2 = a2, a1 aromatic_bond_indices[(a1, a2)] = bond.GetIdx() if not aromatic_bond_indices: return {} # Look for assignments which result in the same Kekeule SMILES # (with aromatic bonds indicated as ':') expected_kekule_smiles = Chem.MolToSmiles(mol, kekuleSmiles=True) # Initialize the table of bond assignments possible_bond_assignments = dict((idx, set()) for idx in aromatic_bond_indices.values()) # Figure out how to get from the canonical Kekule SMILES output order # back to the moleule order. s = mol.GetProp("_smilesAtomOutputOrder") assert s[0] == "[" and s[-2:] == ",]", s invert_order = dict(enumerate(map(int, s[1:-2].split(",")))) # Figure out the places where I can replace a ":" with a "-" or "=" parts = expected_kekule_smiles.split(":") N = len(parts)-1 assert N == len(aromatic_bond_indices) terms = [None] * (2*N+1) terms[::2] = parts # Generate all possible bond assignments (generates 2**N assignments) for bond_assignment in itertools.product( *(("-=",)*N) ): # Replace the places where the ":" used to be (or rather, the previous assignment) # and interpret as a new SMILES terms[1::2] = bond_assignment new_kekule_smiles = "".join(terms) new_mol = Chem.MolFromSmiles(new_kekule_smiles) if new_mol is None: continue # Was there a match? canonical_kekule_smiles = Chem.MolToSmiles(new_mol, kekuleSmiles=True) if canonical_kekule_smiles != expected_kekule_smiles: continue # Yes! That means we can figure out what the actual bond assignments # were. Re-process the Kekule SMILES, but don't do aromaticity perception. raw_mol = Chem.MolFromSmiles(new_kekule_smiles, sanitize=False) for bond in raw_mol.GetBonds(): # Need to map back to the original bonds original_a1 = invert_order[bond.GetBeginAtomIdx()] original_a2 = invert_order[bond.GetEndAtomIdx()] if original_a1 > original_a2: original_a1, original_a2 = original_a2, original_a1 key = original_a1, original_a2 original_bond_index = aromatic_bond_indices.get(key, None) #print(key, original_bond_index) if original_bond_index is not None: possible_bond_assignments[original_bond_index].add(bond.GetBondType()) return possible_bond_assignments def disable_rdkit_error_messages(): from rdkit import RDLogger logger = RDLogger.logger() logger.setLevel(RDLogger.CRITICAL) def main(): import argparse parser = argparse.ArgumentParser("identify possible Kekule bonds") parser.add_argument("--errors", action="store_true", help="show RDKit warning and error messages") parser.add_argument("smiles", nargs="?", default="c1ccccc1O") args = parser.parse_args() mol = Chem.MolFromSmiles(args.smiles) if mol is None: parser.error("Cannot parse SMILES: %s" % (args.smiles,)) smiles = Chem.MolToSmiles(mol) if smiles != args.smiles: sys.stderr.write("Using SMILES: %s\n" % (smiles,)) mol = Chem.MolFromSmiles(smiles) if mol is None: parser.error("Cannot re-interpret SMILES: %s" % (smiles,)) if not args.errors: disable_rdkit_error_messages() possible_bond_assignments = get_kekule_bonds(mol) print("bond", "frag", "atoms", "allowed", sep="\t") for bond_index, possible_assignments in sorted(possible_bond_assignments.items()): bond = mol.GetBondWithIdx(bond_index) atom_indices = bond.GetBeginAtomIdx(), bond.GetEndAtomIdx() fragment_smiles = Chem.MolFragmentToSmiles(mol, atom_indices, [bond_index], allBondsExplicit=True) print(bond_index, fragment_smiles, "%d,%d" % atom_indices, " ".join(str(x) for x in sorted(possible_assignments)), sep="\t") if __name__ == "__main__": main() > On Aug 21, 2019, at 09:59, Jameed Hussain <jameed.huss...@dotmatics.com> > wrote: > > Hi Andrew, > > You could substitute the aromatic atoms into non-organic atoms (eg. aromatic > carbon to calcium [Ca], aromatic nitrogen to sodium [Na] and so on with > single bonds between them), and use the atom deleting procedure as normal. > The structures should still be valid so should be able to use the usual rdkit > functions on them. > > Thanks > Jameed _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss