Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS
On Fri, Jul 23, 2021 at 4:53 AM Paolo Tosco wrote: > > # here there seems to a be a bug with the 2D depiction, but that's another > story > > template > > [image: image.png] > > Just a quick thing: I don't know if this is supposed to be a bug or a feature, but I noticed that this seems to be caused by properties of the Mol created from SMARTS *not* being set when the mol is created, but only when they are requested the first time. Right after creating the mol object by Chem.MolFromSmiles the IsInRing doesn't seem to be set correctly (or at all), and the comparison operation distorts the molecule. But, if you force the computation of the properties, e.g. by printing them, for idx, atom in enumerate(template.GetAtoms()): print(f"{idx:>4d} {atom.GetAtomicNum():5d} {str(atom.IsInRing()):>7} {str(atom.GetIsAromatic()):>5}") After that, all seems to work as expected \o/. I don't know if it is by design that the properties are calculated only when needed? Gustavo. ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS
Thanks a lot! -- Gustavo Seabra. On Fri, Jul 23, 2021 at 12:18 PM Paolo Tosco wrote: > Hi Gustavo, > > Chem.Atom.HasQuery() and Chem.Bond.HasQuery() return True when the > underlying atom (or bond) is an instance of Chem.QueryAtom (or Chem. > QueryBond). > Query atoms and bonds can either be defined through SMARTS expressions... > > from rdkit import Chem > from rdkit.Chem import rdqueries > > a = Chem.Atom(6) > a.HasQuery() > False > > mol = Chem.MolFromSmarts("[+1;D3]") > qa_from_smarts = mol.GetAtomWithIdx(0) > qa_from_smarts.HasQuery() > True > > qa_from_smarts.DescribeQuery() > 'AtomAnd\n AtomFormalCharge 1 = val\n AtomExplicitDegree 3 = val\n' > > ...or be directly instantiated from Python and combined at your leisure; > through this approach you can actually define very specific queries that > may not be possible to describe with SMARTS. > Below I show how to construct the same query atom as from the above SMARTS > expression: > > qa = rdqueries.FormalChargeEqualsQueryAtom(1) > qa > > > qa.HasQuery() > True > > qa.DescribeQuery() > 'AtomFormalCharge 1 = val\n' > > qa2 = rdqueries.ExplicitDegreeEqualsQueryAtom(3) > qa2.DescribeQuery() > 'AtomExplicitDegree 3 = val\n' > > qa.ExpandQuery(qa2) > qa.DescribeQuery() > 'AtomAnd\n AtomFormalCharge 1 = val\n AtomExplicitDegree 3 = val\n' > > Cheers, > p. > > On Fri, Jul 23, 2021 at 5:47 PM Gustavo Seabra > wrote: > >> This works perfectly! >> >> I could understand most of what you did there ;-), but what does the >> ".HasQuery()" mean? The RDKit API is not very clear about it: "Returns >> whether or not the atom has an associated query". Is this described >> anywhere else? >> >> Thank you so much! >> -- >> Gustavo Seabra. >> >> >> On Fri, Jul 23, 2021 at 4:53 AM Paolo Tosco >> wrote: >> >>> Hi Gustavo, >>> >>> you should be able to address this with a custom AtomCompare (and >>> BondCompare, if you want to use bond queries too) class, that now is >>> also supported from Python. >>> You can take a look at Code/GraphMol/FMCS/Wrap/testFMCS.py for >>> inspiration how to use it; here's something that seems to work for your >>> example: >>> >>> from rdkit import Chem >>> from rdkit.Chem import rdFMCS >>> >>> template = >>> Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1') >>> # This should give a sulfone connected to an aromatic ring and >>> # some other (any) element. Notice that the ring may have >>> # any atoms (N,C,O), but for me it is important to have the SO2 group. >>> >>> template >>> [image: image.png] >>> >>> mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1') >>> # This molecule has the pattern. >>> >>> mol1 >>> [image: image.png] >>> >>> compare = [template, mol1] >>> res = rdFMCS.FindMCS(compare, >>> atomCompare=rdFMCS.AtomCompare.CompareElements, >>> bondCompare=rdFMCS.BondCompare.CompareAny, >>> ringMatchesRingOnly=False, >>> completeRingsOnly=False) >>> res.smartsString >>> # gives: '[#16](=[#8])=[#8]' >>> >>> # Let's address the problem with a custom AtomCompare class: >>> >>> class CompareQueryAtoms(rdFMCS.MCSAtomCompare): >>> def __call__(self, p, mol1, atom1, mol2, atom2): >>> a1 = mol1.GetAtomWithIdx(atom1) >>> a2 = mol2.GetAtomWithIdx(atom2) >>> if ((not a1.HasQuery()) and (not a2.HasQuery()) and >>> a1.GetAtomicNum() != a2.GetAtomicNum()): >>> return False >>> if (p.MatchValences and a1.GetTotalValence() != >>> a2.GetTotalValence()): >>> return False >>> if (p.MatchChiralTag and not self.CheckAtomChirality(p, mol1, >>> atom1, mol2, atom2)): >>> return False >>> if (p.MatchFormalCharge and (not a1.HasQuery()) and (not >>> a2.HasQuery()) and not self.CheckAtomCharge(p, mol1, atom1, mol2, atom2)): >>> return False >>> if p.RingMatchesRingOnly: >>> return self.CheckAtomRingMatch(p, mol1, atom1, mol2, atom2) >>> if ((a1.HasQuery() or a2.HasQuery()) and (not a1.Match(a2))): >>> return False >>> return True >>> >>> params = rdFMCS.MCSParameters() >>> params.AtomCompareParameters.RingMatchesRingOnly = False >>> params.BondCompareParameters.RingMatchesRingOnly = False >>> params.AtomCompareParameters.CompleteRingsOnly = False >>> params.BondCompareParameters.CompleteRingsOnly = False >>> params.BondTyper = rdFMCS.BondCompare.CompareAny >>> params.AtomTyper = CompareQueryAtoms() >>> >>> compare = [template, mol1] >>> res = rdFMCS.FindMCS(compare, params) >>> res.smartsString >>> >>> '[#16](-[#0,#6])(=[#8])(=[#8])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:1' >>> >>> >>> # the queryMol returned by MCS will match the template, but the original >>> template query >>> >>> # has many more details, so we extract the MCS part of the original >>> template and use that >>> >>> # as query instead >>> >>> def trim_template(template, query): >>> template_mcs_core = Chem.ReplaceSidechains(template,
Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS
Hi Gustavo, Chem.Atom.HasQuery() and Chem.Bond.HasQuery() return True when the underlying atom (or bond) is an instance of Chem.QueryAtom (or Chem. QueryBond). Query atoms and bonds can either be defined through SMARTS expressions... from rdkit import Chem from rdkit.Chem import rdqueries a = Chem.Atom(6) a.HasQuery() False mol = Chem.MolFromSmarts("[+1;D3]") qa_from_smarts = mol.GetAtomWithIdx(0) qa_from_smarts.HasQuery() True qa_from_smarts.DescribeQuery() 'AtomAnd\n AtomFormalCharge 1 = val\n AtomExplicitDegree 3 = val\n' ...or be directly instantiated from Python and combined at your leisure; through this approach you can actually define very specific queries that may not be possible to describe with SMARTS. Below I show how to construct the same query atom as from the above SMARTS expression: qa = rdqueries.FormalChargeEqualsQueryAtom(1) qa qa.HasQuery() True qa.DescribeQuery() 'AtomFormalCharge 1 = val\n' qa2 = rdqueries.ExplicitDegreeEqualsQueryAtom(3) qa2.DescribeQuery() 'AtomExplicitDegree 3 = val\n' qa.ExpandQuery(qa2) qa.DescribeQuery() 'AtomAnd\n AtomFormalCharge 1 = val\n AtomExplicitDegree 3 = val\n' Cheers, p. On Fri, Jul 23, 2021 at 5:47 PM Gustavo Seabra wrote: > This works perfectly! > > I could understand most of what you did there ;-), but what does the > ".HasQuery()" mean? The RDKit API is not very clear about it: "Returns > whether or not the atom has an associated query". Is this described > anywhere else? > > Thank you so much! > -- > Gustavo Seabra. > > > On Fri, Jul 23, 2021 at 4:53 AM Paolo Tosco > wrote: > >> Hi Gustavo, >> >> you should be able to address this with a custom AtomCompare (and >> BondCompare, if you want to use bond queries too) class, that now is >> also supported from Python. >> You can take a look at Code/GraphMol/FMCS/Wrap/testFMCS.py for >> inspiration how to use it; here's something that seems to work for your >> example: >> >> from rdkit import Chem >> from rdkit.Chem import rdFMCS >> >> template = >> Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1') >> # This should give a sulfone connected to an aromatic ring and >> # some other (any) element. Notice that the ring may have >> # any atoms (N,C,O), but for me it is important to have the SO2 group. >> >> template >> [image: image.png] >> >> mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1') >> # This molecule has the pattern. >> >> mol1 >> [image: image.png] >> >> compare = [template, mol1] >> res = rdFMCS.FindMCS(compare, >> atomCompare=rdFMCS.AtomCompare.CompareElements, >> bondCompare=rdFMCS.BondCompare.CompareAny, >> ringMatchesRingOnly=False, >> completeRingsOnly=False) >> res.smartsString >> # gives: '[#16](=[#8])=[#8]' >> >> # Let's address the problem with a custom AtomCompare class: >> >> class CompareQueryAtoms(rdFMCS.MCSAtomCompare): >> def __call__(self, p, mol1, atom1, mol2, atom2): >> a1 = mol1.GetAtomWithIdx(atom1) >> a2 = mol2.GetAtomWithIdx(atom2) >> if ((not a1.HasQuery()) and (not a2.HasQuery()) and >> a1.GetAtomicNum() != a2.GetAtomicNum()): >> return False >> if (p.MatchValences and a1.GetTotalValence() != >> a2.GetTotalValence()): >> return False >> if (p.MatchChiralTag and not self.CheckAtomChirality(p, mol1, >> atom1, mol2, atom2)): >> return False >> if (p.MatchFormalCharge and (not a1.HasQuery()) and (not >> a2.HasQuery()) and not self.CheckAtomCharge(p, mol1, atom1, mol2, atom2)): >> return False >> if p.RingMatchesRingOnly: >> return self.CheckAtomRingMatch(p, mol1, atom1, mol2, atom2) >> if ((a1.HasQuery() or a2.HasQuery()) and (not a1.Match(a2))): >> return False >> return True >> >> params = rdFMCS.MCSParameters() >> params.AtomCompareParameters.RingMatchesRingOnly = False >> params.BondCompareParameters.RingMatchesRingOnly = False >> params.AtomCompareParameters.CompleteRingsOnly = False >> params.BondCompareParameters.CompleteRingsOnly = False >> params.BondTyper = rdFMCS.BondCompare.CompareAny >> params.AtomTyper = CompareQueryAtoms() >> >> compare = [template, mol1] >> res = rdFMCS.FindMCS(compare, params) >> res.smartsString >> >> '[#16](-[#0,#6])(=[#8])(=[#8])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:1' >> >> >> # the queryMol returned by MCS will match the template, but the original >> template query >> >> # has many more details, so we extract the MCS part of the original template >> and use that >> >> # as query instead >> >> def trim_template(template, query): >> template_mcs_core = Chem.ReplaceSidechains(template, query) >> for a in template_mcs_core.GetAtoms(): >> if (not a.GetAtomicNum()) and a.GetIsotope(): >> a.SetAtomicNum(1) >> a.SetIsotope(0) >> return Chem.RemoveAllHs(template_mcs_core) >> >> >> query_mol = trim_template(template,
Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS
This works perfectly! I could understand most of what you did there ;-), but what does the ".HasQuery()" mean? The RDKit API is not very clear about it: "Returns whether or not the atom has an associated query". Is this described anywhere else? Thank you so much! -- Gustavo Seabra. On Fri, Jul 23, 2021 at 4:53 AM Paolo Tosco wrote: > Hi Gustavo, > > you should be able to address this with a custom AtomCompare (and > BondCompare, if you want to use bond queries too) class, that now is also > supported from Python. > You can take a look at Code/GraphMol/FMCS/Wrap/testFMCS.py for > inspiration how to use it; here's something that seems to work for your > example: > > from rdkit import Chem > from rdkit.Chem import rdFMCS > > template = > Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1') > # This should give a sulfone connected to an aromatic ring and > # some other (any) element. Notice that the ring may have > # any atoms (N,C,O), but for me it is important to have the SO2 group. > > template > [image: image.png] > > mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1') > # This molecule has the pattern. > > mol1 > [image: image.png] > > compare = [template, mol1] > res = rdFMCS.FindMCS(compare, > atomCompare=rdFMCS.AtomCompare.CompareElements, > bondCompare=rdFMCS.BondCompare.CompareAny, > ringMatchesRingOnly=False, > completeRingsOnly=False) > res.smartsString > # gives: '[#16](=[#8])=[#8]' > > # Let's address the problem with a custom AtomCompare class: > > class CompareQueryAtoms(rdFMCS.MCSAtomCompare): > def __call__(self, p, mol1, atom1, mol2, atom2): > a1 = mol1.GetAtomWithIdx(atom1) > a2 = mol2.GetAtomWithIdx(atom2) > if ((not a1.HasQuery()) and (not a2.HasQuery()) and > a1.GetAtomicNum() != a2.GetAtomicNum()): > return False > if (p.MatchValences and a1.GetTotalValence() != > a2.GetTotalValence()): > return False > if (p.MatchChiralTag and not self.CheckAtomChirality(p, mol1, > atom1, mol2, atom2)): > return False > if (p.MatchFormalCharge and (not a1.HasQuery()) and (not > a2.HasQuery()) and not self.CheckAtomCharge(p, mol1, atom1, mol2, atom2)): > return False > if p.RingMatchesRingOnly: > return self.CheckAtomRingMatch(p, mol1, atom1, mol2, atom2) > if ((a1.HasQuery() or a2.HasQuery()) and (not a1.Match(a2))): > return False > return True > > params = rdFMCS.MCSParameters() > params.AtomCompareParameters.RingMatchesRingOnly = False > params.BondCompareParameters.RingMatchesRingOnly = False > params.AtomCompareParameters.CompleteRingsOnly = False > params.BondCompareParameters.CompleteRingsOnly = False > params.BondTyper = rdFMCS.BondCompare.CompareAny > params.AtomTyper = CompareQueryAtoms() > > compare = [template, mol1] > res = rdFMCS.FindMCS(compare, params) > res.smartsString > > '[#16](-[#0,#6])(=[#8])(=[#8])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:1' > > > # the queryMol returned by MCS will match the template, but the original > template query > > # has many more details, so we extract the MCS part of the original template > and use that > > # as query instead > > def trim_template(template, query): > template_mcs_core = Chem.ReplaceSidechains(template, query) > for a in template_mcs_core.GetAtoms(): > if (not a.GetAtomicNum()) and a.GetIsotope(): > a.SetAtomicNum(1) > a.SetIsotope(0) > return Chem.RemoveAllHs(template_mcs_core) > > > query_mol = trim_template(template, res.queryMol) > template.GetSubstructMatch(query_mol) > > (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) > > > # here there seems to a be a bug with the 2D depiction, but that's another > story > > template > > [image: image.png] > > mol1.GetSubstructMatches(query_mol) > > ((4, 1, 0, 2, 3, 5, 6, 7, 19, 20),) > > > mol1 > > [image: image.png] > > > mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1') > compare = [template, mol2] > > > mol2 > > [image: image.png] > > > res = rdFMCS.FindMCS(compare, params) > res.smartsString > > '[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:[#0,#7]:1' > > > query_mol = trim_template(template, res.queryMol) > > query_mol > > [image: image.png] > > > mol2.GetSubstructMatches(query_mol) > > ((1, 2, 3, 4, 20, 21), (10, 11, 12, 13, 18, 19)) > > > mol2 > > [image: image.png] > > > I hope the above helps, cheers > > p. > > > On Thu, Jul 22, 2021 at 7:45 PM Gustavo Seabra > wrote: > >> Hi all,, >> >> I would appreciate some pointers on how it would be possible to find the >> maximum common substructure of 2 molecules, where in the template structure >> some atoms may be *any*, but some other atoms must be fixed. >> >> Currently, I'm trying to use rdFMCS module. For example: >> >> from rdkit import Chem >> from rdkit.Chem import rdFMCS >> >> template = >>
Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS
Hi Gustavo, you should be able to address this with a custom AtomCompare (and BondCompare, if you want to use bond queries too) class, that now is also supported from Python. You can take a look at Code/GraphMol/FMCS/Wrap/testFMCS.py for inspiration how to use it; here's something that seems to work for your example: from rdkit import Chem from rdkit.Chem import rdFMCS template = Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1') # This should give a sulfone connected to an aromatic ring and # some other (any) element. Notice that the ring may have # any atoms (N,C,O), but for me it is important to have the SO2 group. template [image: image.png] mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1') # This molecule has the pattern. mol1 [image: image.png] compare = [template, mol1] res = rdFMCS.FindMCS(compare, atomCompare=rdFMCS.AtomCompare.CompareElements, bondCompare=rdFMCS.BondCompare.CompareAny, ringMatchesRingOnly=False, completeRingsOnly=False) res.smartsString # gives: '[#16](=[#8])=[#8]' # Let's address the problem with a custom AtomCompare class: class CompareQueryAtoms(rdFMCS.MCSAtomCompare): def __call__(self, p, mol1, atom1, mol2, atom2): a1 = mol1.GetAtomWithIdx(atom1) a2 = mol2.GetAtomWithIdx(atom2) if ((not a1.HasQuery()) and (not a2.HasQuery()) and a1.GetAtomicNum() != a2.GetAtomicNum()): return False if (p.MatchValences and a1.GetTotalValence() != a2.GetTotalValence()): return False if (p.MatchChiralTag and not self.CheckAtomChirality(p, mol1, atom1, mol2, atom2)): return False if (p.MatchFormalCharge and (not a1.HasQuery()) and (not a2.HasQuery()) and not self.CheckAtomCharge(p, mol1, atom1, mol2, atom2)): return False if p.RingMatchesRingOnly: return self.CheckAtomRingMatch(p, mol1, atom1, mol2, atom2) if ((a1.HasQuery() or a2.HasQuery()) and (not a1.Match(a2))): return False return True params = rdFMCS.MCSParameters() params.AtomCompareParameters.RingMatchesRingOnly = False params.BondCompareParameters.RingMatchesRingOnly = False params.AtomCompareParameters.CompleteRingsOnly = False params.BondCompareParameters.CompleteRingsOnly = False params.BondTyper = rdFMCS.BondCompare.CompareAny params.AtomTyper = CompareQueryAtoms() compare = [template, mol1] res = rdFMCS.FindMCS(compare, params) res.smartsString '[#16](-[#0,#6])(=[#8])(=[#8])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:1' # the queryMol returned by MCS will match the template, but the original template query # has many more details, so we extract the MCS part of the original template and use that # as query instead def trim_template(template, query): template_mcs_core = Chem.ReplaceSidechains(template, query) for a in template_mcs_core.GetAtoms(): if (not a.GetAtomicNum()) and a.GetIsotope(): a.SetAtomicNum(1) a.SetIsotope(0) return Chem.RemoveAllHs(template_mcs_core) query_mol = trim_template(template, res.queryMol) template.GetSubstructMatch(query_mol) (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) # here there seems to a be a bug with the 2D depiction, but that's another story template [image: image.png] mol1.GetSubstructMatches(query_mol) ((4, 1, 0, 2, 3, 5, 6, 7, 19, 20),) mol1 [image: image.png] mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1') compare = [template, mol2] mol2 [image: image.png] res = rdFMCS.FindMCS(compare, params) res.smartsString '[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:[#0,#7]:1' query_mol = trim_template(template, res.queryMol) query_mol [image: image.png] mol2.GetSubstructMatches(query_mol) ((1, 2, 3, 4, 20, 21), (10, 11, 12, 13, 18, 19)) mol2 [image: image.png] I hope the above helps, cheers p. On Thu, Jul 22, 2021 at 7:45 PM Gustavo Seabra wrote: > Hi all,, > > I would appreciate some pointers on how it would be possible to find the > maximum common substructure of 2 molecules, where in the template structure > some atoms may be *any*, but some other atoms must be fixed. > > Currently, I'm trying to use rdFMCS module. For example: > > from rdkit import Chem > from rdkit.Chem import rdFMCS > > template = > Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1') > # This should give a sulfone connected to an aromatic ring and > # some other (any) element. Notice that the ring may have > # any atoms (N,C,O), but for me it is important to have the SO2 group. > > mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1') > # This molecule has the pattern. > > # Now, if I try to find a substructure match, I use: > compare = [template, mol1] > res = rdFMCS.FindMCS(compare, > atomCompare=rdFMCS.AtomCompare.CompareElements, > bondCompare=rdFMCS.BondCompare.CompareAny, >
Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS
On Jul 23, 2021, at 06:42, Andrew Dalke wrote: > > No, there's no way to do that. > > The best I can suggest is to go back to the original Python implementation > and change the code leading up to Alternatively, since your template is small, you can brute-force enumerate all possible matching SMARTS patterns, and test them from largest to smallest. I believe the following patterns are correct for your template. These are ordered by number of bonds, then number of atoms, then ASCII-betically. (Note: these many contain duplicates because Chem.MolToSmarts doesn't produce canonical SMARTS.) [n,c,o]1(-S(-*)(=O)=O):[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]:1 [n,c,o](-S(-*)(=O)=O):[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o] [n,c,o](-S(-*)(=O)=O)(:[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]):[n,c,o] [n,c,o](-S(-*)(=O)=O)(:[n,c,o]:[n,c,o]:[n,c,o]):[n,c,o]:[n,c,o] [n,c,o](-S(-*)(=O)=O)(:[n,c,o]:[n,c,o]):[n,c,o]:[n,c,o]:[n,c,o] [n,c,o](-S(-*)(=O)=O)(:[n,c,o]):[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o] [n,c,o]1(-S(=O)=O):[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]:1 [n,c,o](-S(=O)=O):[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o] [n,c,o](-S(=O)=O)(:[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]):[n,c,o] [n,c,o](-S(=O)=O)(:[n,c,o]:[n,c,o]:[n,c,o]):[n,c,o]:[n,c,o] [n,c,o](-S(=O)=O)(:[n,c,o]:[n,c,o]):[n,c,o]:[n,c,o]:[n,c,o] [n,c,o](-S(=O)=O)(:[n,c,o]):[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o] [n,c,o](-S(-*)(=O)=O):[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o] [n,c,o](-S(-*)(=O)=O)(:[n,c,o]:[n,c,o]:[n,c,o]):[n,c,o] [n,c,o](-S(-*)(=O)=O)(:[n,c,o]:[n,c,o]):[n,c,o]:[n,c,o] [n,c,o](-S(-*)(=O)=O)(:[n,c,o]):[n,c,o]:[n,c,o]:[n,c,o] [n,c,o](-S(=O)=O):[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o] [n,c,o](-S(=O)=O)(:[n,c,o]:[n,c,o]:[n,c,o]):[n,c,o] [n,c,o](-S(=O)=O)(:[n,c,o]:[n,c,o]):[n,c,o]:[n,c,o] [n,c,o](-S(=O)=O)(:[n,c,o]):[n,c,o]:[n,c,o]:[n,c,o] [n,c,o](-S(-*)(=O)=O):[n,c,o]:[n,c,o]:[n,c,o] [n,c,o](-S(-*)(=O)=O)(:[n,c,o]:[n,c,o]):[n,c,o] [n,c,o](-S(-*)(=O)=O)(:[n,c,o]):[n,c,o]:[n,c,o] [n,c,o](-S(=O)=O):[n,c,o]:[n,c,o]:[n,c,o] [n,c,o](-S(=O)=O)(:[n,c,o]:[n,c,o]):[n,c,o] [n,c,o](-S(=O)=O)(:[n,c,o]):[n,c,o]:[n,c,o] [n,c,o](-S(-*)(=O)=O):[n,c,o]:[n,c,o] [n,c,o](-S(-*)(=O)=O)(:[n,c,o]):[n,c,o] [n,c,o](-S(=O)=O):[n,c,o]:[n,c,o] [n,c,o](-S(=O)=O)(:[n,c,o]):[n,c,o] [n,c,o](-S(-*)(=O)=O):[n,c,o] [n,c,o]-S(-*)(=O)=O [n,c,o](-S(=O)=O):[n,c,o] [n,c,o]-S(=O)=O S(-*)(=O)=O S(=O)=O I generated it with the following: === from rdkit import Chem import itertools # Must have the atoms marked with an atom map (the atom map value is ignored). template = '[n,c,o]1(-[S:1](-*)(=[O:1])=[O:1]):[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]:1' mol = Chem.MolFromSmarts(template) # Figure out which bonds to keep bond_atom_indices = [] for bond in mol.GetBonds(): if all(atom.HasProp("molAtomMapNumber") for atom in (bond.GetBeginAtom(), bond.GetEndAtom())): continue bond_atom_indices.append((bond.GetBeginAtomIdx(), bond.GetEndAtomIdx())) # Remove the atom maps for atom in mol.GetAtoms(): if atom.HasProp("molAtomMapNumber"): atom.ClearProp("molAtomMapNumber") seen = set() # Enumerate all possible bonds to delete (should be 2**n) for r in range(0, len(bond_atom_indices)+1): for delete_indices in itertools.combinations(bond_atom_indices, r): tmp_mol = Chem.RWMol(mol) # Remove the selected bonds for atom1_idx, atom2_idx in delete_indices: tmp_mol.RemoveBond(atom1_idx, atom2_idx) # Remove any singletons. Start from the end so the indices are stable. for atom in list(tmp_mol.GetAtoms())[::-1]: if not atom.GetBonds(): tmp_mol.RemoveAtom(atom.GetIdx()) # Get the corresponding SMARTS tmp_smarts = Chem.MolToSmarts(tmp_mol) # Ensure it's singly connected if "." in tmp_smarts: continue # Ensure it's unique; track the number of bonds and atoms for later sorting key = (tmp_mol.GetNumBonds(), tmp_mol.GetNumAtoms(), tmp_smarts) seen.add(key) for num_bonds, num_atoms, smarts in sorted(seen, reverse=True): print(smarts) === Andrew da...@dalkescientific.com ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS
On Jul 23, 2021, at 01:01, Gustavo Seabra wrote: > I actually want the sulfone to be found, if it is there. My problem is that I > also want flexibility to change the ring atoms and still find the ring as a > match, while considering a match on the sulfone only if it really is there. > (e.g., CF3 should *not* match.) Does it make sense? Ahh, I see. No, there's no way to do that. The best I can suggest is to go back to the original Python implementation and change the code leading up to https://hg.sr.ht/~dalke/fmcs/browse/fmcs.py?rev=tip#L1929 so the initial seed is the sulfone instead of an (atom, bond, atom). Then use that to the the MCS with the sulfone, and if that fails, use RDKit's existing method. I point to my repository only because that's in Python and I know it better. If your C++ skills are better than mine, you might change the corresponding implementation in RDKit. Cheers, Andrew da...@dalkescientific.com ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS
Hi, Thanks a lot for the reply! However, in this case, it looks like I would have to somehow label the isotope in every query molecule, right? For example: ``` template = Chem.MolFromSmarts('[c]1(-[2S](=[3O])(=[3O])(-C)):[c]:[c]:[c]:[c]:[c]:1') mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1') compare = [template,mol1] res = rdFMCS.FindMCS(compare, atomCompare=rdFMCS.AtomCompare.CompareIsotopes, bondCompare=rdFMCS.BondCompare.CompareAny, ringMatchesRingOnly=False, completeRingsOnly=False) res.smartsString ``` returns: '[0*]1:[0*]:[0*]:[0*]:[0*]:[0*]:1', that is, it only picks the ring but not the sulfone. I actually want the sulfone to be found, if it is there. My problem is that I also want flexibility to change the ring atoms and still find the ring as a match, while considering a match on the sulfone only if it really is there. (e.g., CF3 should *not* match.) Does it make sense? Thanks a lot! -- Gustavo Seabra. On Thu, Jul 22, 2021 at 4:52 PM Andrew Dalke wrote: > Hi Gustavo, > > > > template = > Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1') > > Unless things have changed since I last looked at the algorithm, you can't > meaningfully pass a SMARTS-based query molecule into the MCS program, > outside of a few simple cases. > > It generates a SMARTS pattern based on the properties of the molecule. You > asked it to CompareElements, but those [a] terms all have an atomic number > of 0. > > >>> template = > Chem.MolFromSmarts('[a#1]1(-[S](-*)(=[O])=[O]):[a#1]:[a#1]:[a#1]:[a#1]:[a#1]:1') > >>> [a.GetAtomicNum() for a in template.GetAtoms()] > [0, 16, 0, 8, 8, 0, 0, 0, 0, 0] > > That's why your CompareAny search returns the #0 terms, like: > > > '[#16,#6](-[#0,#6])(=,-[#8,#9])(=,-[#8,#9])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:1' > > > I would appreciate some pointers on how it would be possible to find the > maximum common substructure of 2 molecules, where in the template structure > some atoms may be *any*, but some other atoms must be fixed. > > Perhaps with isotope labelling? > > That is, label the "any" atoms as isotope 1, and label your > -[S](=[O])(=[O])- as -[2S](=[3O])(=[3O])- > > Then use rdFMCS.AtomCompare.CompareIsotopes . > > If there's anything you don't want to match at all, give each atom a > unique isotope value. > > Best regards, > > Andrew > da...@dalkescientific.com > > > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS
Hi Gustavo, > template = > Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1') Unless things have changed since I last looked at the algorithm, you can't meaningfully pass a SMARTS-based query molecule into the MCS program, outside of a few simple cases. It generates a SMARTS pattern based on the properties of the molecule. You asked it to CompareElements, but those [a] terms all have an atomic number of 0. >>> template = Chem.MolFromSmarts('[a#1]1(-[S](-*)(=[O])=[O]):[a#1]:[a#1]:[a#1]:[a#1]:[a#1]:1') >>> [a.GetAtomicNum() for a in template.GetAtoms()] [0, 16, 0, 8, 8, 0, 0, 0, 0, 0] That's why your CompareAny search returns the #0 terms, like: '[#16,#6](-[#0,#6])(=,-[#8,#9])(=,-[#8,#9])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:1' > I would appreciate some pointers on how it would be possible to find the > maximum common substructure of 2 molecules, where in the template structure > some atoms may be *any*, but some other atoms must be fixed. Perhaps with isotope labelling? That is, label the "any" atoms as isotope 1, and label your -[S](=[O])(=[O])- as -[2S](=[3O])(=[3O])- Then use rdFMCS.AtomCompare.CompareIsotopes . If there's anything you don't want to match at all, give each atom a unique isotope value. Best regards, Andrew da...@dalkescientific.com ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Maximum Common Substructure using SMARTS
Hi all,, I would appreciate some pointers on how it would be possible to find the maximum common substructure of 2 molecules, where in the template structure some atoms may be *any*, but some other atoms must be fixed. Currently, I'm trying to use rdFMCS module. For example: from rdkit import Chem from rdkit.Chem import rdFMCS template = Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1') # This should give a sulfone connected to an aromatic ring and # some other (any) element. Notice that the ring may have # any atoms (N,C,O), but for me it is important to have the SO2 group. mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1') # This molecule has the pattern. # Now, if I try to find a substructure match, I use: compare = [template, mol1] res = rdFMCS.FindMCS(compare, atomCompare=rdFMCS.AtomCompare.CompareElements, bondCompare=rdFMCS.BondCompare.CompareAny, ringMatchesRingOnly=False, completeRingsOnly=False) res.smartsString # gives: '[#16](=[#8])=[#8]' # Notice that the only match is the SO2, it does not match the ring. However, if I try that with another structure that has a CF3 in place of the SO2, I get: mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1') compare = [template,mol2] res = rdFMCS.FindMCS(compare, atomCompare=rdFMCS.AtomCompare.CompareElements, bondCompare=rdFMCS.BondCompare.CompareAny, ringMatchesRingOnly=False, completeRingsOnly=False) res.smartsString # Returns: '' (empty string) # if I change to AtomCompare.CompareAny, now a CF3 will also match # in the SO2-X: mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1') compare = [template,mol2] res = rdFMCS.FindMCS(compare, atomCompare=rdFMCS.AtomCompare.CompareAny, bondCompare=rdFMCS.BondCompare.CompareAny, ringMatchesRingOnly=False, completeRingsOnly=False) res.smartsString # Returns: '[#16,#6](-[#0,#6])(=,-[#8,#9])(=,-[#8,#9])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:1' But now theCF3 is counted in place of the SO2. The result I'd like to get here would be just the ring, as in the case: new_template = Chem.MolFromSmarts('CS(=O)(=O)c1cnccc1') mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1') compare = [new_template,mol2] res = rdFMCS.FindMCS(compare, atomCompare=rdFMCS.AtomCompare.CompareElements, bondCompare=rdFMCS.BondCompare.CompareAny, ringMatchesRingOnly=False, completeRingsOnly=False) res.smartsString # Returns: '[#6]1:[#6]:[#7]:[#6]:[#6]:[#6]:1' (just the ring) Notice that if I use CompareElements, there seems to be no way to match the ring with either N or C. Does anyone have a suggestion on how I can specify flexibility (similar to AtomCompare.CompareAny) only for a portion of the molecule and still enforce specific atoms in another portion? Thank you so much! -- Gustavo Seabra. ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss