This works perfectly! I could understand most of what you did there ;-), but what does the ".HasQuery()" mean? The RDKit API is not very clear about it: "Returns whether or not the atom has an associated query". Is this described anywhere else?
Thank you so much! -- Gustavo Seabra. On Fri, Jul 23, 2021 at 4:53 AM Paolo Tosco <paolo.tosco.m...@gmail.com> wrote: > Hi Gustavo, > > you should be able to address this with a custom AtomCompare (and > BondCompare, if you want to use bond queries too) class, that now is also > supported from Python. > You can take a look at Code/GraphMol/FMCS/Wrap/testFMCS.py for > inspiration how to use it; here's something that seems to work for your > example: > > from rdkit import Chem > from rdkit.Chem import rdFMCS > > template = > Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1') > # This should give a sulfone connected to an aromatic ring and > # some other (any) element. Notice that the ring may have > # any atoms (N,C,O), but for me it is important to have the SO2 group. > > template > [image: image.png] > > mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3ccccc3)CCN2)cc1') > # This molecule has the pattern. > > mol1 > [image: image.png] > > compare = [template, mol1] > res = rdFMCS.FindMCS(compare, > atomCompare=rdFMCS.AtomCompare.CompareElements, > bondCompare=rdFMCS.BondCompare.CompareAny, > ringMatchesRingOnly=False, > completeRingsOnly=False) > res.smartsString > # gives: '[#16](=[#8])=[#8]' > > # Let's address the problem with a custom AtomCompare class: > > class CompareQueryAtoms(rdFMCS.MCSAtomCompare): > def __call__(self, p, mol1, atom1, mol2, atom2): > a1 = mol1.GetAtomWithIdx(atom1) > a2 = mol2.GetAtomWithIdx(atom2) > if ((not a1.HasQuery()) and (not a2.HasQuery()) and > a1.GetAtomicNum() != a2.GetAtomicNum()): > return False > if (p.MatchValences and a1.GetTotalValence() != > a2.GetTotalValence()): > return False > if (p.MatchChiralTag and not self.CheckAtomChirality(p, mol1, > atom1, mol2, atom2)): > return False > if (p.MatchFormalCharge and (not a1.HasQuery()) and (not > a2.HasQuery()) and not self.CheckAtomCharge(p, mol1, atom1, mol2, atom2)): > return False > if p.RingMatchesRingOnly: > return self.CheckAtomRingMatch(p, mol1, atom1, mol2, atom2) > if ((a1.HasQuery() or a2.HasQuery()) and (not a1.Match(a2))): > return False > return True > > params = rdFMCS.MCSParameters() > params.AtomCompareParameters.RingMatchesRingOnly = False > params.BondCompareParameters.RingMatchesRingOnly = False > params.AtomCompareParameters.CompleteRingsOnly = False > params.BondCompareParameters.CompleteRingsOnly = False > params.BondTyper = rdFMCS.BondCompare.CompareAny > params.AtomTyper = CompareQueryAtoms() > > compare = [template, mol1] > res = rdFMCS.FindMCS(compare, params) > res.smartsString > > '[#16](-[#0,#6])(=[#8])(=[#8])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:1' > > > # the queryMol returned by MCS will match the template, but the original > template query > > # has many more details, so we extract the MCS part of the original template > and use that > > # as query instead > > def trim_template(template, query): > template_mcs_core = Chem.ReplaceSidechains(template, query) > for a in template_mcs_core.GetAtoms(): > if (not a.GetAtomicNum()) and a.GetIsotope(): > a.SetAtomicNum(1) > a.SetIsotope(0) > return Chem.RemoveAllHs(template_mcs_core) > > > query_mol = trim_template(template, res.queryMol) > template.GetSubstructMatch(query_mol) > > (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) > > > # here there seems to a be a bug with the 2D depiction, but that's another > story > > template > > [image: image.png] > > mol1.GetSubstructMatches(query_mol) > > ((4, 1, 0, 2, 3, 5, 6, 7, 19, 20),) > > > mol1 > > [image: image.png] > > > mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1') > compare = [template, mol2] > > > mol2 > > [image: image.png] > > > res = rdFMCS.FindMCS(compare, params) > res.smartsString > > '[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:[#0,#7]:1' > > > query_mol = trim_template(template, res.queryMol) > > query_mol > > [image: image.png] > > > mol2.GetSubstructMatches(query_mol) > > ((1, 2, 3, 4, 20, 21), (10, 11, 12, 13, 18, 19)) > > > mol2 > > [image: image.png] > > > I hope the above helps, cheers > > p. > > > On Thu, Jul 22, 2021 at 7:45 PM Gustavo Seabra <gustavo.sea...@gmail.com> > wrote: > >> Hi all,, >> >> I would appreciate some pointers on how it would be possible to find the >> maximum common substructure of 2 molecules, where in the template structure >> some atoms may be *any*, but some other atoms must be fixed. >> >> Currently, I'm trying to use rdFMCS module. For example: >> >> from rdkit import Chem >> from rdkit.Chem import rdFMCS >> >> template = >> Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1') >> # This should give a sulfone connected to an aromatic ring and >> # some other (any) element. Notice that the ring may have >> # any atoms (N,C,O), but for me it is important to have the SO2 group. >> >> mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3ccccc3)CCN2)cc1') >> # This molecule has the pattern. >> >> # Now, if I try to find a substructure match, I use: >> compare = [template, mol1] >> res = rdFMCS.FindMCS(compare, >> atomCompare=rdFMCS.AtomCompare.CompareElements, >> bondCompare=rdFMCS.BondCompare.CompareAny, >> ringMatchesRingOnly=False, >> completeRingsOnly=False) >> res.smartsString >> # gives: '[#16](=[#8])=[#8]' >> >> # Notice that the only match is the SO2, it does not match the ring. >> However, if I try that with another structure that has a CF3 in place of >> the SO2, I get: >> mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1') >> compare = [template,mol2] >> res = rdFMCS.FindMCS(compare, >> atomCompare=rdFMCS.AtomCompare.CompareElements, >> bondCompare=rdFMCS.BondCompare.CompareAny, >> ringMatchesRingOnly=False, >> completeRingsOnly=False) >> res.smartsString >> # Returns: '' (empty string) >> >> # if I change to AtomCompare.CompareAny, now a CF3 will also match >> # in the SO2-X: >> mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1') >> compare = [template,mol2] >> res = rdFMCS.FindMCS(compare, >> atomCompare=rdFMCS.AtomCompare.CompareAny, >> bondCompare=rdFMCS.BondCompare.CompareAny, >> ringMatchesRingOnly=False, >> completeRingsOnly=False) >> res.smartsString >> # Returns: >> '[#16,#6](-[#0,#6])(=,-[#8,#9])(=,-[#8,#9])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:1' >> >> But now theCF3 is counted in place of the SO2. The result I'd like to get >> here would be just the ring, as in the case: >> new_template = Chem.MolFromSmarts('CS(=O)(=O)c1cnccc1') >> mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1') >> compare = [new_template,mol2] >> res = rdFMCS.FindMCS(compare, >> atomCompare=rdFMCS.AtomCompare.CompareElements, >> bondCompare=rdFMCS.BondCompare.CompareAny, >> ringMatchesRingOnly=False, >> completeRingsOnly=False) >> res.smartsString >> # Returns: '[#6]1:[#6]:[#7]:[#6]:[#6]:[#6]:1' (just the ring) >> >> Notice that if I use CompareElements, there seems to be no way to match >> the ring with either N or C. >> >> Does anyone have a suggestion on how I can specify flexibility (similar >> to AtomCompare.CompareAny) only for a portion of the molecule and still >> enforce specific atoms in another portion? >> >> Thank you so much! >> -- >> Gustavo Seabra. >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss