I'm working on a project where I cut a molecule along certain single bonds, to find a core structure and one or more R-groups.
In yesterday's email, I mentioned a problem I have in creating a canonical SMILES for the core when the R-groups are replaced by a hydrogen. I also want to create a SMILES for the core where the R-groups are replaced with a "-[*]". For example, given "c1ccccc1CCO" where the R-group is the "CCO", I want to create the core "c1ccccc1[*]". My code works except when the attachment point of the cut contains chiral information. About 50% of the time, the chirality is inverted. The problem is that chirality is a very fragile property which depends (I think) on the index permutation order of the neighboring atoms to a given atom. It's tricky to maintain that information during structure editing. For the rest of this email I will present a reproducible, explain my understanding, and sketch what seems a reasonable workaround. Let me know if I am (in)correct in that. I'll then propose what I think may be a useful editing function: ReplaceBond() The following code snippet works like this. It starts with the input structure: C[C@@H]1C[C@H]2[C@@H]3CCC4=CCC=C[C@@]4([C@]3([C@H](C[C@@]2([C@]1(CO)O)C)O)F)C I cut the bond from the "[C@@]4" to the final "C", then add a "[*] to each side of the cut. This produces the canonical core: [*][C@@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H](O)[C@@]12F To verify if this is correct, I'll make some syntax changes directly on the original SMILES string. I replace the final "C" character with "[*]". This should be isomorphic to the expected core. I parse and canonicalize it to get the expected core SMILES, which is: [*][C@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H](O)[C@@]12F Note that the first has a '@@' in the second atom while the second has a '@'. ========== Reproducible which shows that the chirality changes ==== from __future__ import print_function from rdkit import Chem # I want to cut the SMILES at the given atoms, to make a core and an R-group. # I want the location of the attachment points to be marked [*] atoms. # 1 1 1 1 1 1 1 1 11 2 2 2 2 2 # 0 1 2 3 4 567 890 1 2 3 4 5 6 7 89 0 1 2 3 4 smiles = "C[C@@H]1C[C@H]2[C@@H]3CCC4=CCC=C[C@@]4([C@]3([C@H](C[C@@]2([C@]1(CO)O)C)O)F)C" # will cut the bond between 12 and 24 ^^^----------------------------------------^ core_atom, rgroup_atom = 12, 24 input_mol = Chem.MolFromSmiles(smiles) assert rgroup_atom == input_mol.GetNumAtoms()-1, "this needs to be the terminal 'C'" # Cut the bond and add two "[*]" atoms emol = Chem.EditableMol(input_mol) emol.RemoveBond(core_atom, rgroup_atom) for atom in (core_atom, rgroup_atom): new_atom = emol.AddAtom(Chem.Atom(0)) new_bond = emol.AddBond(atom, new_atom, Chem.BondType.SINGLE) print("added bond", new_bond, "from", atom, "to", new_atom) cut_mol = emol.GetMol() cut_smiles = Chem.MolToSmiles(cut_mol, isomericSmiles=True) core_smiles = max(cut_smiles.split("."), key=len) print("found core SMILES:\n", core_smiles) # This prints: # [*][C@@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H](O)[C@@]12F # Compare it to what I expected. # In this case I can simply replace the terminal methyl in the SMILES. noncanonical_expected_core_smiles = smiles[:-1] + '[*]' expected_mol = Chem.MolFromSmiles(noncanonical_expected_core_smiles) expected_core_smiles = Chem.MolToSmiles(expected_mol, isomericSmiles=True) print("expected core SMILES:\n", expected_core_smiles) # This prints: # [*][C@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H](O)[C@@]12F if core_smiles != expected_core_smiles: print("!!!! Not the same !!!") else: print("Identical") raise AssertionError("did not expect that") ====== Here is what I think is happening. The problem occurs because the atom's chiral tag is connected to the permutation order of its neighbors. I changed the parity of the permutation order without updating the chiral tag. I'll use the following bit of code to compute the parity of the neighbors for the input structure and compare it to the parity of the neighbors of the 'cut' structure. ====== Compute the parity for the configuration around the attachment atom on the core # This code snippet continues from the previous code def parity(x): # Use a Shell sort to count the number of swaps. # (This takes O(N*N) time, which is is okay for 4 neighbors.) N = len(x) num_swaps = 0 for i in range(N-1): for j in range(i+1, N): if x[i] > x[j]: x[i], x[j] = x[j], x[i] num_swaps += 1 return (-1) ** num_swaps # either +1 or -1 symmetry def get_bond_configuration(mol, atom): # Return the parity of the atom indices of the neighbors atom_obj = mol.GetAtomWithIdx(atom) neighbors = [neighbor_atom_obj.GetIdx() for neighbor_atom_obj in atom_obj.GetNeighbors()] assert len(neighbors) == 4, (atom, neighbors) return parity(neighbors) input_configuration = get_bond_configuration(input_mol, 12) print("input configuration:", input_configuration) cut_configuration = get_bond_configuration(cut_mol, 12) print("cut configuration:", cut_configuration) assert input_configuration != cut_configuration, ( "I expected the configurations to be different") ====== This prints the output: input configuration: -1 cut configuration: 1 which demonstrates that the parity has reversed. I'll swap the chirality of the atom on the 'cut' molecule, generate the new core SMILES, and show that it matches the expected SMILES: ======== correct chirality and compare the SMILES with the expected value # This code snippet continues from the previous code if cut_chiral_tag == Chem.ChiralType.CHI_TETRAHEDRAL_CW: new_chiral_tag = Chem.ChiralType.CHI_TETRAHEDRAL_CCW elif cut_chiral_tag == Chem.ChiralType.CHI_TETRAHEDRAL_CCW: new_chiral_tag = Chem.ChiralType.CHI_TETRAHEDRAL_CW else: raise AssertionError("unknown chiral tag", cut_chiral_tag) cut_atom_obj.SetChiralTag(new_chiral_tag) updated_cut_smiles = Chem.MolToSmiles(cut_mol, isomericSmiles=True) updated_core_smiles = max(updated_cut_smiles.split("."), key=len) print("updated core SMILES:\n", updated_core_smiles) if updated_core_smiles == expected_core_smiles: print("They match!") else: raise AssertionError("They do not match") ======== The result is the following: updated core SMILES: [*][C@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H](O)[C@@]12F They match! which tells me that the fix appears to have worked. ==== Proposal === I propose a new API method which looks like : edit_mol.ReplaceBond(from_atom, old_atom, new_atom) This will: 1) remember the bond type T connecting from_atom to old_atom 2) remove the bond from from_atom to old_atom 3) create a new bond of type T between from_atom and new_atom 4) update the chiral information so it's equivalent to having new_atom in the same configuration position as old_atom. This assumes that my understanding of the problem is correct, that there is no existing/better way to do in RDKit, that it's useful enough that others may want it, and that someone else has the time to implement it. Cheers, Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss