Hi all, sorry I realise you're probably all busy down in Chemistry with the meeting to give a quick answer to a problem I'm having with GetSubstructMatch. Basically I need to get the mapping between the reactants and the products in the form of the atom indices in the product Smiles in the order of the product Smarts atoms. Here's my test script (running rdkit-Release_2018_03_4 on Kubuntu 16.04):
from rdkit import Chem from rdkit.Chem import AllChem reactantSmarts = '[O:1]=[C:2][C:3]Cl.[S:4][C:5][C:6]' productSmarts = '[O:1]=[C:2][C:3][S:4][C:5][C:6]' reactionSmirks = reactantSmarts + '>>' + productSmarts # Chloroacetaldehyde & cysteine sidechain: ligandSmiles = 'O=CCCl' targetSmiles = 'SCC' print('\nreactantSmarts:', reactantSmarts) print('productSmarts: ', productSmarts) print('reactionSmirks:', reactionSmirks) rxn = AllChem.ReactionFromSmarts(reactionSmirks) print('\nligandSmiles: ',ligandSmiles) print('targetSmiles: ',targetSmiles) ligand = Chem.MolFromSmiles(ligandSmiles) target = Chem.MolFromSmiles(targetSmiles) ps = rxn.RunReactants((ligand, target)) print('productSmiles: ', Chem.MolToSmiles(ps[0][0])) productPattern = Chem.MolFromSmarts(productSmarts) print('\nSmiles from productSmarts:', Chem.MolToSmiles(productPattern)) print("\nAtom indices in productSmiles ordered as productSmarts’ atoms:", ps[0][0].GetSubstructMatch(productPattern)) and this is the output: reactantSmarts: [O:1]=[C:2][C:3]Cl.[S:4][C:5][C:6] productSmarts: [O:1]=[C:2][C:3][S:4][C:5][C:6] reactionSmirks: [O:1]=[C:2][C:3]Cl.[S:4][C:5][C:6]>>[O:1]=[C:2][C:3][S:4][C:5][C:6] ligandSmiles: O=CCCl targetSmiles: SCC productSmiles: CCSCC=O Smiles from productSmarts: [O:1]=[CH:2][CH2:3][S:4][CH2:5][CH3:6] Atom indices in productSmiles ordered as productSmarts’ atoms: (0, 1, 2, 3, 4, 5) Notice how the order of atoms in productSmiles has been reversed from productSmarts (presumably some internal canonicalisation?). Nothing wrong in that per se, but this reversal is not reflected in the indices returned from GetSubstructMatch so my attempt to match up the atoms in the two Smiles strings crashes & burns. Isn't the correct answer here (5, 4, 3, 2, 1, 0) or am I totally off-beam? Cheers -- Ian J. Tickle Global Phasing Ltd., Cambridge, UK.
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss