Re: [Rdkit-discuss] aromatic bonds and graph edit distance

Andrew Dalke Wed, 21 Aug 2019 05:35:19 -0700

Hi Jameed,

  I don't think your approach will work, which means I likely didn't explain 
myself well enough.


Let's say I start with:

   Cc1cc2ccccc2c(=O)o1 - 
https://cactus.nci.nih.gov/chemical/structure/Cc1cc2ccccc2c(=O)o1/image

I want to break the aromatic bond between the aromatic 'c' (also double-bonded 
to the O) and the aromatic 'o'.

This is possible because a Kekule interpretation of that molecule is: 

>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("Cc1cc2ccccc2c(=O)o1")
>>> Chem.Kekulize(mol)
>>> Chem.MolToSmiles(mol)
'Cc1cc2ccccc2c(=O)o1'
>>> Chem.MolToSmiles(mol, kekuleSmiles=True)
'CC1=CC2=CC=CC=C2C(=O)O1'

and I could break it by simply inserting a "." in the right spot, like this:

  CC1=CC2=CC=CC=C2C(=O).O1

>>> mol2 = Chem.MolFromSmiles("CC1=CC2=CC=CC=C2C(=O).O1")
>>> Chem.MolToSmiles(mol2)
'CC(O)=Cc1ccccc1C=O'

The resulting structure still has an aromatic ring, but not all of the atoms 
which were flagged as aromatic are still aromatic.

With the approach you outlined, I don't know how to restored the modified 
structure so I can generate the final SMILES I expect.

The Kekulize() approach finds a valid Kekule interpretation, but there may be 
multiple interpretations, allowing both a single and double bond for a given 
bond position. I think I want to generate both possibilities.

So far the only solution I've come up with - and it's a real brute-force one - 
is to do complete enumeration, along the lines of the code after my signature.


% python get_kekule_bonds.py 'Cc1cc2ccccc2c(=O)o1'
bond    frag    atoms   allowed
1       c:c     1,2     DOUBLE
2       c:c     2,3     SINGLE
3       c:c     3,4     SINGLE DOUBLE
4       c:c     4,5     SINGLE DOUBLE
5       c:c     5,6     SINGLE DOUBLE
6       c:c     6,7     SINGLE DOUBLE
7       c:c     7,8     SINGLE DOUBLE
8       c:c     8,9     SINGLE
10      c:o     9,11    SINGLE
11      c:o     11,1    SINGLE
12      c:c     8,3     SINGLE DOUBLE

(To be usable for what I want, I also need to figure out where those bond terms 
are in the Kekeule SMILES, so I can either replace it with a "." or remove the 
ring closure, then re-perceive.)

This can work for what I'm looking for, since I have <=14 heavies, but I'm 
hoping for something rather less crude.

                                Andrew
                                da...@dalkescientific.com


# List possible Kekule bond assignments for aromatic bonds in a SMILES
from rdkit import Chem
import itertools
import sys

def get_kekule_bonds(mol):
    """mol -> possible bond assignment table

    Returns a table mapping bond index (for aromatic bonds) to
    possible Kekule bond assignments for that bond.
    """
    # Figure out which bonds are aromatic
    aromatic_bond_indices = {}
    for bond in mol.GetBonds():
        if bond.GetIsAromatic():
            a1, a2 = bond.GetBeginAtomIdx(), bond.GetEndAtomIdx()
            if a1 > a2:
                a1, a2 = a2, a1
            aromatic_bond_indices[(a1, a2)] = bond.GetIdx()
            
    if not aromatic_bond_indices:
        return {}

    # Look for assignments which result in the same Kekeule SMILES
    # (with aromatic bonds indicated as ':')
    expected_kekule_smiles = Chem.MolToSmiles(mol, kekuleSmiles=True)

    # Initialize the table of bond assignments
    possible_bond_assignments = dict((idx, set()) for idx in 
aromatic_bond_indices.values())

    # Figure out how to get from the canonical Kekule SMILES output order
    # back to the moleule order.
    s = mol.GetProp("_smilesAtomOutputOrder")
    assert s[0] == "[" and s[-2:] == ",]", s
    invert_order = dict(enumerate(map(int, s[1:-2].split(","))))

    # Figure out the places where I can replace a ":" with a "-" or "="
    parts = expected_kekule_smiles.split(":")
    N = len(parts)-1
    assert N == len(aromatic_bond_indices)
    terms = [None] * (2*N+1)
    terms[::2] = parts

    # Generate all possible bond assignments (generates 2**N assignments)
    for bond_assignment in itertools.product( *(("-=",)*N) ):
        # Replace the places where the ":" used to be (or rather, the previous 
assignment)
        # and interpret as a new SMILES
        terms[1::2] = bond_assignment
        new_kekule_smiles = "".join(terms)
        new_mol = Chem.MolFromSmiles(new_kekule_smiles)
        if new_mol is None:
            continue

        # Was there a match?
        canonical_kekule_smiles = Chem.MolToSmiles(new_mol, kekuleSmiles=True)
        if canonical_kekule_smiles != expected_kekule_smiles:
            continue

        # Yes! That means we can figure out what the actual bond assignments
        # were. Re-process the Kekule SMILES, but don't do aromaticity 
perception.
        raw_mol = Chem.MolFromSmiles(new_kekule_smiles, sanitize=False)
        for bond in raw_mol.GetBonds():
            # Need to map back to the original bonds
            original_a1 = invert_order[bond.GetBeginAtomIdx()]
            original_a2 = invert_order[bond.GetEndAtomIdx()]
            if original_a1 > original_a2:
                original_a1, original_a2 = original_a2, original_a1
            key = original_a1, original_a2
            original_bond_index = aromatic_bond_indices.get(key, None)
            #print(key, original_bond_index)
            if original_bond_index is not None:
                
possible_bond_assignments[original_bond_index].add(bond.GetBondType())
    return possible_bond_assignments

def disable_rdkit_error_messages():
    from rdkit import RDLogger
    logger = RDLogger.logger()
    logger.setLevel(RDLogger.CRITICAL)
    
def main():
    import argparse
    parser = argparse.ArgumentParser("identify possible Kekule bonds")
    parser.add_argument("--errors", action="store_true",
                            help="show RDKit warning and error messages")
    parser.add_argument("smiles", nargs="?", default="c1ccccc1O")
    
    args = parser.parse_args()
    
    mol = Chem.MolFromSmiles(args.smiles)
    if mol is None:
        parser.error("Cannot parse SMILES: %s" % (args.smiles,))
    smiles = Chem.MolToSmiles(mol)
    if smiles != args.smiles:
        sys.stderr.write("Using SMILES: %s\n" % (smiles,))
        mol = Chem.MolFromSmiles(smiles)
        if mol is None:
            parser.error("Cannot re-interpret SMILES: %s" % (smiles,))
        

    if not args.errors:
        disable_rdkit_error_messages()

    possible_bond_assignments = get_kekule_bonds(mol)
    
    print("bond", "frag", "atoms", "allowed", sep="\t")
    for bond_index, possible_assignments in 
sorted(possible_bond_assignments.items()):
        bond = mol.GetBondWithIdx(bond_index)
        atom_indices = bond.GetBeginAtomIdx(), bond.GetEndAtomIdx()
        fragment_smiles = Chem.MolFragmentToSmiles(mol, atom_indices, 
[bond_index], allBondsExplicit=True)
        print(bond_index, fragment_smiles, "%d,%d" % atom_indices,
                  " ".join(str(x) for x in sorted(possible_assignments)), 
sep="\t")

if __name__ == "__main__":
    main()



> On Aug 21, 2019, at 09:59, Jameed Hussain <jameed.huss...@dotmatics.com> 
> wrote:
> 
> Hi Andrew,
> 
> You could substitute the aromatic atoms into non-organic atoms (eg. aromatic 
> carbon to calcium [Ca], aromatic nitrogen to sodium [Na] and so on with 
> single bonds between them), and use the atom deleting procedure as normal. 
> The structures should still be valid so should be able to use the usual rdkit 
> functions on them.
> 
> Thanks
> Jameed





_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] aromatic bonds and graph edit distance

Reply via email to