I'm working on a project where I cut a molecule along certain single bonds, to 
find a core structure and one or more R-groups.

In yesterday's email, I mentioned a problem I have in creating a canonical 
SMILES for the core when the R-groups are replaced by a hydrogen.

I also want to create a SMILES for the core where the R-groups are replaced 
with a "-[*]". For example, given "c1ccccc1CCO" where the R-group is the "CCO", 
I want to create the core "c1ccccc1[*]".

My code works except when the attachment point of the cut contains chiral 
information. About 50% of the time, the chirality is inverted. The problem is 
that chirality is a very fragile property which depends (I think) on the index 
permutation order of the neighboring atoms to a given atom. It's tricky to 
maintain that information during structure editing.

For the rest of this email I will present a reproducible, explain my 
understanding, and sketch what seems a reasonable workaround. Let me know if I 
am (in)correct in that. I'll then propose what I think may be a useful editing 
function: ReplaceBond()

The following code snippet works like this. It starts with the input structure:

   C[C@@H]1C[C@H]2[C@@H]3CCC4=CCC=C[C@@]4([C@]3([C@H](C[C@@]2([C@]1(CO)O)C)O)F)C

I cut the bond from the "[C@@]4" to the final "C", then add a "[*] to each side 
of the cut. This produces the canonical core:

   
[*][C@@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H](O)[C@@]12F

To verify if this is correct, I'll make some syntax changes directly on the 
original SMILES string. I replace the final "C" character with "[*]". This 
should be isomorphic to the expected core. I parse and canonicalize it to get 
the expected core SMILES, which is:

   
[*][C@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H](O)[C@@]12F

Note that the first has a '@@' in the second atom while the second has a '@'.

========== Reproducible which shows that the chirality changes ====

from __future__ import print_function
from rdkit import Chem

# I want to cut the SMILES at the given atoms, to make a core and an R-group.
# I want the location of the attachment points to be marked [*] atoms.

#                                      1 1 1      1     1    1 1      1    11 2 
2 2 2 2
#         0 1     2 3     4     567  890 1 2      3     4    5 6      7    89 0 
1 2 3 4
smiles = 
"C[C@@H]1C[C@H]2[C@@H]3CCC4=CCC=C[C@@]4([C@]3([C@H](C[C@@]2([C@]1(CO)O)C)O)F)C"
#    will cut the bond between 12 and 24   
^^^----------------------------------------^
core_atom, rgroup_atom = 12, 24

input_mol = Chem.MolFromSmiles(smiles)
assert rgroup_atom == input_mol.GetNumAtoms()-1, "this needs to be the terminal 
'C'"

# Cut the bond and add two "[*]" atoms
emol = Chem.EditableMol(input_mol)
emol.RemoveBond(core_atom, rgroup_atom)

for atom in (core_atom, rgroup_atom):
    new_atom = emol.AddAtom(Chem.Atom(0))
    new_bond = emol.AddBond(atom, new_atom, Chem.BondType.SINGLE)
    print("added bond", new_bond, "from", atom, "to", new_atom)

cut_mol = emol.GetMol()
cut_smiles = Chem.MolToSmiles(cut_mol, isomericSmiles=True)

core_smiles = max(cut_smiles.split("."), key=len)
print("found core SMILES:\n", core_smiles)
# This prints:
#   
[*][C@@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H](O)[C@@]12F

# Compare it to what I expected.
# In this case I can simply replace the terminal methyl in the SMILES.
noncanonical_expected_core_smiles = smiles[:-1] + '[*]'

expected_mol = Chem.MolFromSmiles(noncanonical_expected_core_smiles)
expected_core_smiles = Chem.MolToSmiles(expected_mol, isomericSmiles=True)
print("expected core SMILES:\n", expected_core_smiles)
# This prints:
#   
[*][C@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H](O)[C@@]12F


if core_smiles != expected_core_smiles:
    print("!!!! Not the same !!!")
else:
    print("Identical")
    raise AssertionError("did not expect that")

======

Here is what I think is happening.

The problem occurs because the atom's chiral tag is connected to the 
permutation order of its neighbors. I changed the parity of the permutation 
order without updating the chiral tag.

I'll use the following bit of code to compute the parity of the neighbors for 
the input structure and compare it to the parity of the neighbors of the 'cut' 
structure.

======  Compute the parity for the configuration around the attachment atom on 
the core 
# This code snippet continues from the previous code

def parity(x):
    # Use a Shell sort to count the number of swaps.
    # (This takes O(N*N) time, which is is okay for 4 neighbors.)
    N = len(x)
    num_swaps = 0
    for i in range(N-1):
        for j in range(i+1, N):
            if x[i] > x[j]:
                x[i], x[j] = x[j], x[i]
                num_swaps += 1
    return (-1) ** num_swaps # either +1 or -1 symmetry

def get_bond_configuration(mol, atom):
    # Return the parity of the atom indices of the neighbors
    atom_obj = mol.GetAtomWithIdx(atom)
    neighbors = [neighbor_atom_obj.GetIdx() for neighbor_atom_obj in 
atom_obj.GetNeighbors()]
    assert len(neighbors) == 4, (atom, neighbors)
    return parity(neighbors)


input_configuration = get_bond_configuration(input_mol, 12)
print("input configuration:", input_configuration)

cut_configuration = get_bond_configuration(cut_mol, 12)
print("cut configuration:", cut_configuration)

assert input_configuration != cut_configuration, (
    "I expected the configurations to be  different")
======

This prints the output:

  input configuration: -1
  cut configuration: 1

which demonstrates that the parity has reversed.

I'll swap the chirality of the atom on the 'cut' molecule, generate the new 
core SMILES, and show that it matches the expected SMILES:

======== correct chirality and compare the SMILES with the expected value
# This code snippet continues from the previous code

if cut_chiral_tag == Chem.ChiralType.CHI_TETRAHEDRAL_CW:
    new_chiral_tag = Chem.ChiralType.CHI_TETRAHEDRAL_CCW
elif cut_chiral_tag == Chem.ChiralType.CHI_TETRAHEDRAL_CCW:
    new_chiral_tag = Chem.ChiralType.CHI_TETRAHEDRAL_CW
else:
    raise AssertionError("unknown chiral tag", cut_chiral_tag)

cut_atom_obj.SetChiralTag(new_chiral_tag)


updated_cut_smiles = Chem.MolToSmiles(cut_mol, isomericSmiles=True)
updated_core_smiles = max(updated_cut_smiles.split("."), key=len)
print("updated core SMILES:\n", updated_core_smiles)

if updated_core_smiles == expected_core_smiles:
    print("They match!")
else:
    raise AssertionError("They do not match")
======== 

The result is the following:

  updated core SMILES:
   
[*][C@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H](O)[C@@]12F
  They match!

which tells me that the fix appears to have worked.


==== Proposal ===

I propose a new API method which looks like :

   edit_mol.ReplaceBond(from_atom, old_atom, new_atom)

This will:
  1) remember the bond type T connecting from_atom to old_atom
  2) remove the bond from from_atom to old_atom
  3) create a new bond of type T between from_atom and new_atom
  4) update the chiral information so it's equivalent to having
      new_atom in the same configuration position as old_atom.


This assumes that my understanding of the problem is correct, that there is no 
existing/better way to do in RDKit, that it's useful enough that others may 
want it, and that someone else has the time to implement it. 

Cheers,

                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to