Hi Gustavo,

Chem.Atom.HasQuery() and Chem.Bond.HasQuery() return True when the
underlying atom (or bond) is an instance of Chem.QueryAtom (or Chem.
QueryBond).
Query atoms and bonds can either be defined through SMARTS expressions...

from rdkit import Chem
from rdkit.Chem import rdqueries

a = Chem.Atom(6)
a.HasQuery()
False

mol = Chem.MolFromSmarts("[+1;D3]")
qa_from_smarts = mol.GetAtomWithIdx(0)
qa_from_smarts.HasQuery()
True

qa_from_smarts.DescribeQuery()
'AtomAnd\n  AtomFormalCharge 1 = val\n  AtomExplicitDegree 3 = val\n'

...or be directly instantiated from Python and combined at your leisure;
through this approach you can actually define very specific queries that
may not be possible to describe with SMARTS.
Below I show how to construct the same query atom as from the above SMARTS
expression:

qa = rdqueries.FormalChargeEqualsQueryAtom(1)
qa
<rdkit.Chem.rdchem.QueryAtom at 0x7fe938ac2c60>

qa.HasQuery()
True

qa.DescribeQuery()
'AtomFormalCharge 1 = val\n'

qa2 = rdqueries.ExplicitDegreeEqualsQueryAtom(3)
qa2.DescribeQuery()
'AtomExplicitDegree 3 = val\n'

qa.ExpandQuery(qa2)
qa.DescribeQuery()
'AtomAnd\n  AtomFormalCharge 1 = val\n  AtomExplicitDegree 3 = val\n'

Cheers,
p.

On Fri, Jul 23, 2021 at 5:47 PM Gustavo Seabra <gustavo.sea...@gmail.com>
wrote:

> This works perfectly!
>
> I could understand most of what you did there ;-), but what does the
> ".HasQuery()" mean? The RDKit API is not very clear about it: "Returns
> whether or not the atom has an associated query". Is this described
> anywhere else?
>
> Thank you so much!
> --
> Gustavo Seabra.
>
>
> On Fri, Jul 23, 2021 at 4:53 AM Paolo Tosco <paolo.tosco.m...@gmail.com>
> wrote:
>
>> Hi Gustavo,
>>
>> you should be able to address this with a custom AtomCompare (and
>> BondCompare, if you want to use bond queries too) class, that now is
>> also supported from Python.
>> You can take a look at Code/GraphMol/FMCS/Wrap/testFMCS.py for
>> inspiration how to use it; here's something that seems to work for your
>> example:
>>
>> from rdkit import Chem
>> from rdkit.Chem import rdFMCS
>>
>> template =
>> Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1')
>> # This should give a sulfone connected to an aromatic ring and
>> # some other (any) element. Notice that the ring may have
>> # any atoms (N,C,O), but for me it is important to have the SO2 group.
>>
>> template
>> [image: image.png]
>>
>> mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3ccccc3)CCN2)cc1')
>> # This molecule has the pattern.
>>
>> mol1
>> [image: image.png]
>>
>> compare = [template, mol1]
>> res = rdFMCS.FindMCS(compare,
>>                 atomCompare=rdFMCS.AtomCompare.CompareElements,
>>                 bondCompare=rdFMCS.BondCompare.CompareAny,
>>                 ringMatchesRingOnly=False,
>>                 completeRingsOnly=False)
>> res.smartsString
>> # gives: '[#16](=[#8])=[#8]'
>>
>> # Let's address the problem with a custom AtomCompare class:
>>
>> class CompareQueryAtoms(rdFMCS.MCSAtomCompare):
>>     def __call__(self, p, mol1, atom1, mol2, atom2):
>>         a1 = mol1.GetAtomWithIdx(atom1)
>>         a2 = mol2.GetAtomWithIdx(atom2)
>>         if ((not a1.HasQuery()) and (not a2.HasQuery()) and
>> a1.GetAtomicNum() != a2.GetAtomicNum()):
>>             return False
>>         if (p.MatchValences and a1.GetTotalValence() !=
>> a2.GetTotalValence()):
>>             return False
>>         if (p.MatchChiralTag and not self.CheckAtomChirality(p, mol1,
>> atom1, mol2, atom2)):
>>             return False
>>         if (p.MatchFormalCharge and (not a1.HasQuery()) and (not
>> a2.HasQuery()) and not self.CheckAtomCharge(p, mol1, atom1, mol2, atom2)):
>>             return False
>>         if p.RingMatchesRingOnly:
>>             return self.CheckAtomRingMatch(p, mol1, atom1, mol2, atom2)
>>         if ((a1.HasQuery() or a2.HasQuery()) and (not a1.Match(a2))):
>>             return False
>>         return True
>>
>> params = rdFMCS.MCSParameters()
>> params.AtomCompareParameters.RingMatchesRingOnly = False
>> params.BondCompareParameters.RingMatchesRingOnly = False
>> params.AtomCompareParameters.CompleteRingsOnly = False
>> params.BondCompareParameters.CompleteRingsOnly = False
>> params.BondTyper = rdFMCS.BondCompare.CompareAny
>> params.AtomTyper = CompareQueryAtoms()
>>
>> compare = [template, mol1]
>> res = rdFMCS.FindMCS(compare, params)
>> res.smartsString
>>
>> '[#16](-[#0,#6])(=[#8])(=[#8])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:1'
>>
>>
>> # the queryMol returned by MCS will match the template, but the original 
>> template query
>>
>> # has many more details, so we extract the MCS part of the original template 
>> and use that
>>
>> # as query instead
>>
>> def trim_template(template, query):
>>     template_mcs_core = Chem.ReplaceSidechains(template, query)
>>     for a in template_mcs_core.GetAtoms():
>>         if (not a.GetAtomicNum()) and a.GetIsotope():
>>             a.SetAtomicNum(1)
>>             a.SetIsotope(0)
>>     return Chem.RemoveAllHs(template_mcs_core)
>>
>>
>> query_mol = trim_template(template, res.queryMol)
>> template.GetSubstructMatch(query_mol)
>>
>> (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
>>
>>
>> # here there seems to a be a bug with the 2D depiction, but that's another 
>> story
>>
>> template
>>
>> [image: image.png]
>>
>> mol1.GetSubstructMatches(query_mol)
>>
>> ((4, 1, 0, 2, 3, 5, 6, 7, 19, 20),)
>>
>>
>> mol1
>>
>> [image: image.png]
>>
>>
>> mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
>> compare = [template, mol2]
>>
>>
>> mol2
>>
>> [image: image.png]
>>
>>
>> res = rdFMCS.FindMCS(compare, params)
>> res.smartsString
>>
>> '[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:[#0,#7]:1'
>>
>>
>> query_mol = trim_template(template, res.queryMol)
>>
>> query_mol
>>
>> [image: image.png]
>>
>>
>> mol2.GetSubstructMatches(query_mol)
>>
>> ((1, 2, 3, 4, 20, 21), (10, 11, 12, 13, 18, 19))
>>
>>
>> mol2
>>
>> [image: image.png]
>>
>>
>> I hope the above helps, cheers
>>
>> p.
>>
>>
>> On Thu, Jul 22, 2021 at 7:45 PM Gustavo Seabra <gustavo.sea...@gmail.com>
>> wrote:
>>
>>> Hi all,,
>>>
>>> I would appreciate some pointers on how it would be possible to find the
>>> maximum common substructure of 2 molecules, where in the template structure
>>> some atoms may be *any*, but some other atoms must be fixed.
>>>
>>> Currently, I'm trying to use rdFMCS module. For example:
>>>
>>> from rdkit import Chem
>>> from rdkit.Chem import rdFMCS
>>>
>>> template =
>>> Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1')
>>> # This should give a sulfone connected to an aromatic ring and
>>> # some other (any) element. Notice that the ring may have
>>> # any atoms (N,C,O), but for me it is important to have the SO2 group.
>>>
>>> mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3ccccc3)CCN2)cc1')
>>> # This molecule has the pattern.
>>>
>>> # Now, if I try to find a substructure match, I use:
>>> compare = [template, mol1]
>>> res = rdFMCS.FindMCS(compare,
>>>                 atomCompare=rdFMCS.AtomCompare.CompareElements,
>>>                 bondCompare=rdFMCS.BondCompare.CompareAny,
>>>                 ringMatchesRingOnly=False,
>>>                 completeRingsOnly=False)
>>> res.smartsString
>>> # gives: '[#16](=[#8])=[#8]'
>>>
>>> # Notice that the only match is the SO2, it does not match the ring.
>>> However, if I try that with another structure that has a CF3 in place of
>>> the SO2, I get:
>>> mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
>>> compare = [template,mol2]
>>> res = rdFMCS.FindMCS(compare,
>>>                 atomCompare=rdFMCS.AtomCompare.CompareElements,
>>>                 bondCompare=rdFMCS.BondCompare.CompareAny,
>>>                 ringMatchesRingOnly=False,
>>>                 completeRingsOnly=False)
>>> res.smartsString
>>> # Returns: '' (empty string)
>>>
>>> # if I change to AtomCompare.CompareAny, now a CF3 will also match
>>> # in the SO2-X:
>>> mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
>>> compare = [template,mol2]
>>> res = rdFMCS.FindMCS(compare,
>>>                 atomCompare=rdFMCS.AtomCompare.CompareAny,
>>>                 bondCompare=rdFMCS.BondCompare.CompareAny,
>>>                 ringMatchesRingOnly=False,
>>>                 completeRingsOnly=False)
>>> res.smartsString
>>> # Returns:
>>> '[#16,#6](-[#0,#6])(=,-[#8,#9])(=,-[#8,#9])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:1'
>>>
>>> But now theCF3 is counted in place of the SO2. The result I'd like to
>>> get here would be just the ring, as in the case:
>>> new_template = Chem.MolFromSmarts('CS(=O)(=O)c1cnccc1')
>>> mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
>>> compare = [new_template,mol2]
>>> res = rdFMCS.FindMCS(compare,
>>>                 atomCompare=rdFMCS.AtomCompare.CompareElements,
>>>                 bondCompare=rdFMCS.BondCompare.CompareAny,
>>>                 ringMatchesRingOnly=False,
>>>                 completeRingsOnly=False)
>>> res.smartsString
>>> # Returns: '[#6]1:[#6]:[#7]:[#6]:[#6]:[#6]:1' (just the ring)
>>>
>>> Notice that if I use CompareElements, there seems to be no way to match
>>> the ring with either N or C.
>>>
>>> Does anyone have a suggestion on how I can specify flexibility (similar
>>> to AtomCompare.CompareAny) only for a portion of the molecule and still
>>> enforce specific atoms in another portion?
>>>
>>> Thank you so much!
>>> --
>>> Gustavo Seabra.
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to