Hi all,, I would appreciate some pointers on how it would be possible to find the maximum common substructure of 2 molecules, where in the template structure some atoms may be *any*, but some other atoms must be fixed.
Currently, I'm trying to use rdFMCS module. For example: from rdkit import Chem from rdkit.Chem import rdFMCS template = Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1') # This should give a sulfone connected to an aromatic ring and # some other (any) element. Notice that the ring may have # any atoms (N,C,O), but for me it is important to have the SO2 group. mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3ccccc3)CCN2)cc1') # This molecule has the pattern. # Now, if I try to find a substructure match, I use: compare = [template, mol1] res = rdFMCS.FindMCS(compare, atomCompare=rdFMCS.AtomCompare.CompareElements, bondCompare=rdFMCS.BondCompare.CompareAny, ringMatchesRingOnly=False, completeRingsOnly=False) res.smartsString # gives: '[#16](=[#8])=[#8]' # Notice that the only match is the SO2, it does not match the ring. However, if I try that with another structure that has a CF3 in place of the SO2, I get: mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1') compare = [template,mol2] res = rdFMCS.FindMCS(compare, atomCompare=rdFMCS.AtomCompare.CompareElements, bondCompare=rdFMCS.BondCompare.CompareAny, ringMatchesRingOnly=False, completeRingsOnly=False) res.smartsString # Returns: '' (empty string) # if I change to AtomCompare.CompareAny, now a CF3 will also match # in the SO2-X: mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1') compare = [template,mol2] res = rdFMCS.FindMCS(compare, atomCompare=rdFMCS.AtomCompare.CompareAny, bondCompare=rdFMCS.BondCompare.CompareAny, ringMatchesRingOnly=False, completeRingsOnly=False) res.smartsString # Returns: '[#16,#6](-[#0,#6])(=,-[#8,#9])(=,-[#8,#9])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:1' But now theCF3 is counted in place of the SO2. The result I'd like to get here would be just the ring, as in the case: new_template = Chem.MolFromSmarts('CS(=O)(=O)c1cnccc1') mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1') compare = [new_template,mol2] res = rdFMCS.FindMCS(compare, atomCompare=rdFMCS.AtomCompare.CompareElements, bondCompare=rdFMCS.BondCompare.CompareAny, ringMatchesRingOnly=False, completeRingsOnly=False) res.smartsString # Returns: '[#6]1:[#6]:[#7]:[#6]:[#6]:[#6]:1' (just the ring) Notice that if I use CompareElements, there seems to be no way to match the ring with either N or C. Does anyone have a suggestion on how I can specify flexibility (similar to AtomCompare.CompareAny) only for a portion of the molecule and still enforce specific atoms in another portion? Thank you so much! -- Gustavo Seabra.
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss