Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-26 Thread Gustavo Seabra
On Fri, Jul 23, 2021 at 4:53 AM Paolo Tosco 
wrote:

>
> # here there seems to a be a bug with the 2D depiction, but that's another 
> story
>
> template
>
> [image: image.png]
>
>
Just a quick thing: I don't know if this is supposed to be a bug or a
feature, but I noticed that this seems to be caused by properties of the
Mol created from SMARTS *not* being set when the mol is created, but only
when they are requested the first time. Right after creating the mol
object by Chem.MolFromSmiles the IsInRing doesn't seem to be set
correctly (or at all), and the comparison operation distorts the molecule.
But, if you force the computation of the properties, e.g. by printing them,

for idx, atom in enumerate(template.GetAtoms()):
print(f"{idx:>4d}  {atom.GetAtomicNum():5d}  {str(atom.IsInRing()):>7}
{str(atom.GetIsAromatic()):>5}")

After that, all seems to work as expected \o/. I don't know if it is by
design that the properties are calculated only when needed?
Gustavo.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-23 Thread Gustavo Seabra
Thanks a lot!
--
Gustavo Seabra.


On Fri, Jul 23, 2021 at 12:18 PM Paolo Tosco 
wrote:

> Hi Gustavo,
>
> Chem.Atom.HasQuery() and Chem.Bond.HasQuery() return True when the
> underlying atom (or bond) is an instance of Chem.QueryAtom (or Chem.
> QueryBond).
> Query atoms and bonds can either be defined through SMARTS expressions...
>
> from rdkit import Chem
> from rdkit.Chem import rdqueries
>
> a = Chem.Atom(6)
> a.HasQuery()
> False
>
> mol = Chem.MolFromSmarts("[+1;D3]")
> qa_from_smarts = mol.GetAtomWithIdx(0)
> qa_from_smarts.HasQuery()
> True
>
> qa_from_smarts.DescribeQuery()
> 'AtomAnd\n  AtomFormalCharge 1 = val\n  AtomExplicitDegree 3 = val\n'
>
> ...or be directly instantiated from Python and combined at your leisure;
> through this approach you can actually define very specific queries that
> may not be possible to describe with SMARTS.
> Below I show how to construct the same query atom as from the above SMARTS
> expression:
>
> qa = rdqueries.FormalChargeEqualsQueryAtom(1)
> qa
> 
>
> qa.HasQuery()
> True
>
> qa.DescribeQuery()
> 'AtomFormalCharge 1 = val\n'
>
> qa2 = rdqueries.ExplicitDegreeEqualsQueryAtom(3)
> qa2.DescribeQuery()
> 'AtomExplicitDegree 3 = val\n'
>
> qa.ExpandQuery(qa2)
> qa.DescribeQuery()
> 'AtomAnd\n  AtomFormalCharge 1 = val\n  AtomExplicitDegree 3 = val\n'
>
> Cheers,
> p.
>
> On Fri, Jul 23, 2021 at 5:47 PM Gustavo Seabra 
> wrote:
>
>> This works perfectly!
>>
>> I could understand most of what you did there ;-), but what does the
>> ".HasQuery()" mean? The RDKit API is not very clear about it: "Returns
>> whether or not the atom has an associated query". Is this described
>> anywhere else?
>>
>> Thank you so much!
>> --
>> Gustavo Seabra.
>>
>>
>> On Fri, Jul 23, 2021 at 4:53 AM Paolo Tosco 
>> wrote:
>>
>>> Hi Gustavo,
>>>
>>> you should be able to address this with a custom AtomCompare (and
>>> BondCompare, if you want to use bond queries too) class, that now is
>>> also supported from Python.
>>> You can take a look at Code/GraphMol/FMCS/Wrap/testFMCS.py for
>>> inspiration how to use it; here's something that seems to work for your
>>> example:
>>>
>>> from rdkit import Chem
>>> from rdkit.Chem import rdFMCS
>>>
>>> template =
>>> Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1')
>>> # This should give a sulfone connected to an aromatic ring and
>>> # some other (any) element. Notice that the ring may have
>>> # any atoms (N,C,O), but for me it is important to have the SO2 group.
>>>
>>> template
>>> [image: image.png]
>>>
>>> mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1')
>>> # This molecule has the pattern.
>>>
>>> mol1
>>> [image: image.png]
>>>
>>> compare = [template, mol1]
>>> res = rdFMCS.FindMCS(compare,
>>> atomCompare=rdFMCS.AtomCompare.CompareElements,
>>> bondCompare=rdFMCS.BondCompare.CompareAny,
>>> ringMatchesRingOnly=False,
>>> completeRingsOnly=False)
>>> res.smartsString
>>> # gives: '[#16](=[#8])=[#8]'
>>>
>>> # Let's address the problem with a custom AtomCompare class:
>>>
>>> class CompareQueryAtoms(rdFMCS.MCSAtomCompare):
>>> def __call__(self, p, mol1, atom1, mol2, atom2):
>>> a1 = mol1.GetAtomWithIdx(atom1)
>>> a2 = mol2.GetAtomWithIdx(atom2)
>>> if ((not a1.HasQuery()) and (not a2.HasQuery()) and
>>> a1.GetAtomicNum() != a2.GetAtomicNum()):
>>> return False
>>> if (p.MatchValences and a1.GetTotalValence() !=
>>> a2.GetTotalValence()):
>>> return False
>>> if (p.MatchChiralTag and not self.CheckAtomChirality(p, mol1,
>>> atom1, mol2, atom2)):
>>> return False
>>> if (p.MatchFormalCharge and (not a1.HasQuery()) and (not
>>> a2.HasQuery()) and not self.CheckAtomCharge(p, mol1, atom1, mol2, atom2)):
>>> return False
>>> if p.RingMatchesRingOnly:
>>> return self.CheckAtomRingMatch(p, mol1, atom1, mol2, atom2)
>>> if ((a1.HasQuery() or a2.HasQuery()) and (not a1.Match(a2))):
>>> return False
>>> return True
>>>
>>> params = rdFMCS.MCSParameters()
>>> params.AtomCompareParameters.RingMatchesRingOnly = False
>>> params.BondCompareParameters.RingMatchesRingOnly = False
>>> params.AtomCompareParameters.CompleteRingsOnly = False
>>> params.BondCompareParameters.CompleteRingsOnly = False
>>> params.BondTyper = rdFMCS.BondCompare.CompareAny
>>> params.AtomTyper = CompareQueryAtoms()
>>>
>>> compare = [template, mol1]
>>> res = rdFMCS.FindMCS(compare, params)
>>> res.smartsString
>>>
>>> '[#16](-[#0,#6])(=[#8])(=[#8])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:1'
>>>
>>>
>>> # the queryMol returned by MCS will match the template, but the original 
>>> template query
>>>
>>> # has many more details, so we extract the MCS part of the original 
>>> template and use that
>>>
>>> # as query instead
>>>
>>> def trim_template(template, query):
>>> template_mcs_core = Chem.ReplaceSidechains(template, 

Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-23 Thread Paolo Tosco
Hi Gustavo,

Chem.Atom.HasQuery() and Chem.Bond.HasQuery() return True when the
underlying atom (or bond) is an instance of Chem.QueryAtom (or Chem.
QueryBond).
Query atoms and bonds can either be defined through SMARTS expressions...

from rdkit import Chem
from rdkit.Chem import rdqueries

a = Chem.Atom(6)
a.HasQuery()
False

mol = Chem.MolFromSmarts("[+1;D3]")
qa_from_smarts = mol.GetAtomWithIdx(0)
qa_from_smarts.HasQuery()
True

qa_from_smarts.DescribeQuery()
'AtomAnd\n  AtomFormalCharge 1 = val\n  AtomExplicitDegree 3 = val\n'

...or be directly instantiated from Python and combined at your leisure;
through this approach you can actually define very specific queries that
may not be possible to describe with SMARTS.
Below I show how to construct the same query atom as from the above SMARTS
expression:

qa = rdqueries.FormalChargeEqualsQueryAtom(1)
qa


qa.HasQuery()
True

qa.DescribeQuery()
'AtomFormalCharge 1 = val\n'

qa2 = rdqueries.ExplicitDegreeEqualsQueryAtom(3)
qa2.DescribeQuery()
'AtomExplicitDegree 3 = val\n'

qa.ExpandQuery(qa2)
qa.DescribeQuery()
'AtomAnd\n  AtomFormalCharge 1 = val\n  AtomExplicitDegree 3 = val\n'

Cheers,
p.

On Fri, Jul 23, 2021 at 5:47 PM Gustavo Seabra 
wrote:

> This works perfectly!
>
> I could understand most of what you did there ;-), but what does the
> ".HasQuery()" mean? The RDKit API is not very clear about it: "Returns
> whether or not the atom has an associated query". Is this described
> anywhere else?
>
> Thank you so much!
> --
> Gustavo Seabra.
>
>
> On Fri, Jul 23, 2021 at 4:53 AM Paolo Tosco 
> wrote:
>
>> Hi Gustavo,
>>
>> you should be able to address this with a custom AtomCompare (and
>> BondCompare, if you want to use bond queries too) class, that now is
>> also supported from Python.
>> You can take a look at Code/GraphMol/FMCS/Wrap/testFMCS.py for
>> inspiration how to use it; here's something that seems to work for your
>> example:
>>
>> from rdkit import Chem
>> from rdkit.Chem import rdFMCS
>>
>> template =
>> Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1')
>> # This should give a sulfone connected to an aromatic ring and
>> # some other (any) element. Notice that the ring may have
>> # any atoms (N,C,O), but for me it is important to have the SO2 group.
>>
>> template
>> [image: image.png]
>>
>> mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1')
>> # This molecule has the pattern.
>>
>> mol1
>> [image: image.png]
>>
>> compare = [template, mol1]
>> res = rdFMCS.FindMCS(compare,
>> atomCompare=rdFMCS.AtomCompare.CompareElements,
>> bondCompare=rdFMCS.BondCompare.CompareAny,
>> ringMatchesRingOnly=False,
>> completeRingsOnly=False)
>> res.smartsString
>> # gives: '[#16](=[#8])=[#8]'
>>
>> # Let's address the problem with a custom AtomCompare class:
>>
>> class CompareQueryAtoms(rdFMCS.MCSAtomCompare):
>> def __call__(self, p, mol1, atom1, mol2, atom2):
>> a1 = mol1.GetAtomWithIdx(atom1)
>> a2 = mol2.GetAtomWithIdx(atom2)
>> if ((not a1.HasQuery()) and (not a2.HasQuery()) and
>> a1.GetAtomicNum() != a2.GetAtomicNum()):
>> return False
>> if (p.MatchValences and a1.GetTotalValence() !=
>> a2.GetTotalValence()):
>> return False
>> if (p.MatchChiralTag and not self.CheckAtomChirality(p, mol1,
>> atom1, mol2, atom2)):
>> return False
>> if (p.MatchFormalCharge and (not a1.HasQuery()) and (not
>> a2.HasQuery()) and not self.CheckAtomCharge(p, mol1, atom1, mol2, atom2)):
>> return False
>> if p.RingMatchesRingOnly:
>> return self.CheckAtomRingMatch(p, mol1, atom1, mol2, atom2)
>> if ((a1.HasQuery() or a2.HasQuery()) and (not a1.Match(a2))):
>> return False
>> return True
>>
>> params = rdFMCS.MCSParameters()
>> params.AtomCompareParameters.RingMatchesRingOnly = False
>> params.BondCompareParameters.RingMatchesRingOnly = False
>> params.AtomCompareParameters.CompleteRingsOnly = False
>> params.BondCompareParameters.CompleteRingsOnly = False
>> params.BondTyper = rdFMCS.BondCompare.CompareAny
>> params.AtomTyper = CompareQueryAtoms()
>>
>> compare = [template, mol1]
>> res = rdFMCS.FindMCS(compare, params)
>> res.smartsString
>>
>> '[#16](-[#0,#6])(=[#8])(=[#8])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:1'
>>
>>
>> # the queryMol returned by MCS will match the template, but the original 
>> template query
>>
>> # has many more details, so we extract the MCS part of the original template 
>> and use that
>>
>> # as query instead
>>
>> def trim_template(template, query):
>> template_mcs_core = Chem.ReplaceSidechains(template, query)
>> for a in template_mcs_core.GetAtoms():
>> if (not a.GetAtomicNum()) and a.GetIsotope():
>> a.SetAtomicNum(1)
>> a.SetIsotope(0)
>> return Chem.RemoveAllHs(template_mcs_core)
>>
>>
>> query_mol = trim_template(template, 

Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-23 Thread Gustavo Seabra
This works perfectly!

I could understand most of what you did there ;-), but what does the
".HasQuery()" mean? The RDKit API is not very clear about it: "Returns
whether or not the atom has an associated query". Is this described
anywhere else?

Thank you so much!
--
Gustavo Seabra.


On Fri, Jul 23, 2021 at 4:53 AM Paolo Tosco 
wrote:

> Hi Gustavo,
>
> you should be able to address this with a custom AtomCompare (and
> BondCompare, if you want to use bond queries too) class, that now is also
> supported from Python.
> You can take a look at Code/GraphMol/FMCS/Wrap/testFMCS.py for
> inspiration how to use it; here's something that seems to work for your
> example:
>
> from rdkit import Chem
> from rdkit.Chem import rdFMCS
>
> template =
> Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1')
> # This should give a sulfone connected to an aromatic ring and
> # some other (any) element. Notice that the ring may have
> # any atoms (N,C,O), but for me it is important to have the SO2 group.
>
> template
> [image: image.png]
>
> mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1')
> # This molecule has the pattern.
>
> mol1
> [image: image.png]
>
> compare = [template, mol1]
> res = rdFMCS.FindMCS(compare,
> atomCompare=rdFMCS.AtomCompare.CompareElements,
> bondCompare=rdFMCS.BondCompare.CompareAny,
> ringMatchesRingOnly=False,
> completeRingsOnly=False)
> res.smartsString
> # gives: '[#16](=[#8])=[#8]'
>
> # Let's address the problem with a custom AtomCompare class:
>
> class CompareQueryAtoms(rdFMCS.MCSAtomCompare):
> def __call__(self, p, mol1, atom1, mol2, atom2):
> a1 = mol1.GetAtomWithIdx(atom1)
> a2 = mol2.GetAtomWithIdx(atom2)
> if ((not a1.HasQuery()) and (not a2.HasQuery()) and
> a1.GetAtomicNum() != a2.GetAtomicNum()):
> return False
> if (p.MatchValences and a1.GetTotalValence() !=
> a2.GetTotalValence()):
> return False
> if (p.MatchChiralTag and not self.CheckAtomChirality(p, mol1,
> atom1, mol2, atom2)):
> return False
> if (p.MatchFormalCharge and (not a1.HasQuery()) and (not
> a2.HasQuery()) and not self.CheckAtomCharge(p, mol1, atom1, mol2, atom2)):
> return False
> if p.RingMatchesRingOnly:
> return self.CheckAtomRingMatch(p, mol1, atom1, mol2, atom2)
> if ((a1.HasQuery() or a2.HasQuery()) and (not a1.Match(a2))):
> return False
> return True
>
> params = rdFMCS.MCSParameters()
> params.AtomCompareParameters.RingMatchesRingOnly = False
> params.BondCompareParameters.RingMatchesRingOnly = False
> params.AtomCompareParameters.CompleteRingsOnly = False
> params.BondCompareParameters.CompleteRingsOnly = False
> params.BondTyper = rdFMCS.BondCompare.CompareAny
> params.AtomTyper = CompareQueryAtoms()
>
> compare = [template, mol1]
> res = rdFMCS.FindMCS(compare, params)
> res.smartsString
>
> '[#16](-[#0,#6])(=[#8])(=[#8])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:1'
>
>
> # the queryMol returned by MCS will match the template, but the original 
> template query
>
> # has many more details, so we extract the MCS part of the original template 
> and use that
>
> # as query instead
>
> def trim_template(template, query):
> template_mcs_core = Chem.ReplaceSidechains(template, query)
> for a in template_mcs_core.GetAtoms():
> if (not a.GetAtomicNum()) and a.GetIsotope():
> a.SetAtomicNum(1)
> a.SetIsotope(0)
> return Chem.RemoveAllHs(template_mcs_core)
>
>
> query_mol = trim_template(template, res.queryMol)
> template.GetSubstructMatch(query_mol)
>
> (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
>
>
> # here there seems to a be a bug with the 2D depiction, but that's another 
> story
>
> template
>
> [image: image.png]
>
> mol1.GetSubstructMatches(query_mol)
>
> ((4, 1, 0, 2, 3, 5, 6, 7, 19, 20),)
>
>
> mol1
>
> [image: image.png]
>
>
> mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
> compare = [template, mol2]
>
>
> mol2
>
> [image: image.png]
>
>
> res = rdFMCS.FindMCS(compare, params)
> res.smartsString
>
> '[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:[#0,#7]:1'
>
>
> query_mol = trim_template(template, res.queryMol)
>
> query_mol
>
> [image: image.png]
>
>
> mol2.GetSubstructMatches(query_mol)
>
> ((1, 2, 3, 4, 20, 21), (10, 11, 12, 13, 18, 19))
>
>
> mol2
>
> [image: image.png]
>
>
> I hope the above helps, cheers
>
> p.
>
>
> On Thu, Jul 22, 2021 at 7:45 PM Gustavo Seabra 
> wrote:
>
>> Hi all,,
>>
>> I would appreciate some pointers on how it would be possible to find the
>> maximum common substructure of 2 molecules, where in the template structure
>> some atoms may be *any*, but some other atoms must be fixed.
>>
>> Currently, I'm trying to use rdFMCS module. For example:
>>
>> from rdkit import Chem
>> from rdkit.Chem import rdFMCS
>>
>> template =
>> 

Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-23 Thread Paolo Tosco
Hi Gustavo,

you should be able to address this with a custom AtomCompare (and
BondCompare, if you want to use bond queries too) class, that now is also
supported from Python.
You can take a look at Code/GraphMol/FMCS/Wrap/testFMCS.py for inspiration
how to use it; here's something that seems to work for your example:

from rdkit import Chem
from rdkit.Chem import rdFMCS

template =
Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1')
# This should give a sulfone connected to an aromatic ring and
# some other (any) element. Notice that the ring may have
# any atoms (N,C,O), but for me it is important to have the SO2 group.

template
[image: image.png]

mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1')
# This molecule has the pattern.

mol1
[image: image.png]

compare = [template, mol1]
res = rdFMCS.FindMCS(compare,
atomCompare=rdFMCS.AtomCompare.CompareElements,
bondCompare=rdFMCS.BondCompare.CompareAny,
ringMatchesRingOnly=False,
completeRingsOnly=False)
res.smartsString
# gives: '[#16](=[#8])=[#8]'

# Let's address the problem with a custom AtomCompare class:

class CompareQueryAtoms(rdFMCS.MCSAtomCompare):
def __call__(self, p, mol1, atom1, mol2, atom2):
a1 = mol1.GetAtomWithIdx(atom1)
a2 = mol2.GetAtomWithIdx(atom2)
if ((not a1.HasQuery()) and (not a2.HasQuery()) and
a1.GetAtomicNum() != a2.GetAtomicNum()):
return False
if (p.MatchValences and a1.GetTotalValence() !=
a2.GetTotalValence()):
return False
if (p.MatchChiralTag and not self.CheckAtomChirality(p, mol1,
atom1, mol2, atom2)):
return False
if (p.MatchFormalCharge and (not a1.HasQuery()) and (not
a2.HasQuery()) and not self.CheckAtomCharge(p, mol1, atom1, mol2, atom2)):
return False
if p.RingMatchesRingOnly:
return self.CheckAtomRingMatch(p, mol1, atom1, mol2, atom2)
if ((a1.HasQuery() or a2.HasQuery()) and (not a1.Match(a2))):
return False
return True

params = rdFMCS.MCSParameters()
params.AtomCompareParameters.RingMatchesRingOnly = False
params.BondCompareParameters.RingMatchesRingOnly = False
params.AtomCompareParameters.CompleteRingsOnly = False
params.BondCompareParameters.CompleteRingsOnly = False
params.BondTyper = rdFMCS.BondCompare.CompareAny
params.AtomTyper = CompareQueryAtoms()

compare = [template, mol1]
res = rdFMCS.FindMCS(compare, params)
res.smartsString

'[#16](-[#0,#6])(=[#8])(=[#8])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:1'


# the queryMol returned by MCS will match the template, but the
original template query

# has many more details, so we extract the MCS part of the original
template and use that

# as query instead

def trim_template(template, query):
template_mcs_core = Chem.ReplaceSidechains(template, query)
for a in template_mcs_core.GetAtoms():
if (not a.GetAtomicNum()) and a.GetIsotope():
a.SetAtomicNum(1)
a.SetIsotope(0)
return Chem.RemoveAllHs(template_mcs_core)


query_mol = trim_template(template, res.queryMol)
template.GetSubstructMatch(query_mol)

(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)


# here there seems to a be a bug with the 2D depiction, but that's another story

template

[image: image.png]

mol1.GetSubstructMatches(query_mol)

((4, 1, 0, 2, 3, 5, 6, 7, 19, 20),)


mol1

[image: image.png]


mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
compare = [template, mol2]


mol2

[image: image.png]


res = rdFMCS.FindMCS(compare, params)
res.smartsString

'[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:[#0,#7]:1'


query_mol = trim_template(template, res.queryMol)

query_mol

[image: image.png]


mol2.GetSubstructMatches(query_mol)

((1, 2, 3, 4, 20, 21), (10, 11, 12, 13, 18, 19))


mol2

[image: image.png]


I hope the above helps, cheers

p.


On Thu, Jul 22, 2021 at 7:45 PM Gustavo Seabra 
wrote:

> Hi all,,
>
> I would appreciate some pointers on how it would be possible to find the
> maximum common substructure of 2 molecules, where in the template structure
> some atoms may be *any*, but some other atoms must be fixed.
>
> Currently, I'm trying to use rdFMCS module. For example:
>
> from rdkit import Chem
> from rdkit.Chem import rdFMCS
>
> template =
> Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1')
> # This should give a sulfone connected to an aromatic ring and
> # some other (any) element. Notice that the ring may have
> # any atoms (N,C,O), but for me it is important to have the SO2 group.
>
> mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1')
> # This molecule has the pattern.
>
> # Now, if I try to find a substructure match, I use:
> compare = [template, mol1]
> res = rdFMCS.FindMCS(compare,
> atomCompare=rdFMCS.AtomCompare.CompareElements,
> bondCompare=rdFMCS.BondCompare.CompareAny,
> 

Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-23 Thread Andrew Dalke
On Jul 23, 2021, at 06:42, Andrew Dalke  wrote:
> 
> No, there's no way to do that.
> 
> The best I can suggest is to go back to the original Python implementation 
> and change the code leading up to

Alternatively, since your template is small, you can brute-force enumerate all 
possible matching SMARTS patterns, and test them from largest to smallest.

I believe the following patterns are correct for your template. These are 
ordered by number of bonds, then number of atoms, then ASCII-betically. (Note: 
these many contain duplicates because Chem.MolToSmarts doesn't produce 
canonical SMARTS.)

[n,c,o]1(-S(-*)(=O)=O):[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]:1
[n,c,o](-S(-*)(=O)=O):[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]
[n,c,o](-S(-*)(=O)=O)(:[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]):[n,c,o]
[n,c,o](-S(-*)(=O)=O)(:[n,c,o]:[n,c,o]:[n,c,o]):[n,c,o]:[n,c,o]
[n,c,o](-S(-*)(=O)=O)(:[n,c,o]:[n,c,o]):[n,c,o]:[n,c,o]:[n,c,o]
[n,c,o](-S(-*)(=O)=O)(:[n,c,o]):[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]
[n,c,o]1(-S(=O)=O):[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]:1
[n,c,o](-S(=O)=O):[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]
[n,c,o](-S(=O)=O)(:[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]):[n,c,o]
[n,c,o](-S(=O)=O)(:[n,c,o]:[n,c,o]:[n,c,o]):[n,c,o]:[n,c,o]
[n,c,o](-S(=O)=O)(:[n,c,o]:[n,c,o]):[n,c,o]:[n,c,o]:[n,c,o]
[n,c,o](-S(=O)=O)(:[n,c,o]):[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]
[n,c,o](-S(-*)(=O)=O):[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]
[n,c,o](-S(-*)(=O)=O)(:[n,c,o]:[n,c,o]:[n,c,o]):[n,c,o]
[n,c,o](-S(-*)(=O)=O)(:[n,c,o]:[n,c,o]):[n,c,o]:[n,c,o]
[n,c,o](-S(-*)(=O)=O)(:[n,c,o]):[n,c,o]:[n,c,o]:[n,c,o]
[n,c,o](-S(=O)=O):[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]
[n,c,o](-S(=O)=O)(:[n,c,o]:[n,c,o]:[n,c,o]):[n,c,o]
[n,c,o](-S(=O)=O)(:[n,c,o]:[n,c,o]):[n,c,o]:[n,c,o]
[n,c,o](-S(=O)=O)(:[n,c,o]):[n,c,o]:[n,c,o]:[n,c,o]
[n,c,o](-S(-*)(=O)=O):[n,c,o]:[n,c,o]:[n,c,o]
[n,c,o](-S(-*)(=O)=O)(:[n,c,o]:[n,c,o]):[n,c,o]
[n,c,o](-S(-*)(=O)=O)(:[n,c,o]):[n,c,o]:[n,c,o]
[n,c,o](-S(=O)=O):[n,c,o]:[n,c,o]:[n,c,o]
[n,c,o](-S(=O)=O)(:[n,c,o]:[n,c,o]):[n,c,o]
[n,c,o](-S(=O)=O)(:[n,c,o]):[n,c,o]:[n,c,o]
[n,c,o](-S(-*)(=O)=O):[n,c,o]:[n,c,o]
[n,c,o](-S(-*)(=O)=O)(:[n,c,o]):[n,c,o]
[n,c,o](-S(=O)=O):[n,c,o]:[n,c,o]
[n,c,o](-S(=O)=O)(:[n,c,o]):[n,c,o]
[n,c,o](-S(-*)(=O)=O):[n,c,o]
[n,c,o]-S(-*)(=O)=O
[n,c,o](-S(=O)=O):[n,c,o]
[n,c,o]-S(=O)=O
S(-*)(=O)=O
S(=O)=O

I generated it with the following:

===
from rdkit import Chem
import itertools

# Must have the atoms marked with an atom map (the atom map value is ignored).
template = 
'[n,c,o]1(-[S:1](-*)(=[O:1])=[O:1]):[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]:[n,c,o]:1'

mol = Chem.MolFromSmarts(template)

# Figure out which bonds to keep
bond_atom_indices = []
for bond in mol.GetBonds():
if all(atom.HasProp("molAtomMapNumber") for atom in (bond.GetBeginAtom(), 
bond.GetEndAtom())):
continue
bond_atom_indices.append((bond.GetBeginAtomIdx(), bond.GetEndAtomIdx()))

# Remove the atom maps
for atom in mol.GetAtoms():
if atom.HasProp("molAtomMapNumber"):
atom.ClearProp("molAtomMapNumber")

seen = set()

# Enumerate all possible bonds to delete (should be 2**n)
for r in range(0, len(bond_atom_indices)+1):
for delete_indices in itertools.combinations(bond_atom_indices, r):
tmp_mol = Chem.RWMol(mol)

# Remove the selected bonds
for atom1_idx, atom2_idx in delete_indices:
tmp_mol.RemoveBond(atom1_idx, atom2_idx)

# Remove any singletons. Start from the end so the indices are stable.
for atom in list(tmp_mol.GetAtoms())[::-1]:
if not atom.GetBonds():
tmp_mol.RemoveAtom(atom.GetIdx())

# Get the corresponding SMARTS
tmp_smarts = Chem.MolToSmarts(tmp_mol)

# Ensure it's singly connected
if "." in tmp_smarts:
continue

# Ensure it's unique; track the number of bonds and atoms for later 
sorting
key = (tmp_mol.GetNumBonds(), tmp_mol.GetNumAtoms(), tmp_smarts)
seen.add(key)

for num_bonds, num_atoms, smarts in sorted(seen, reverse=True):
print(smarts)
===




Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-22 Thread Andrew Dalke
On Jul 23, 2021, at 01:01, Gustavo Seabra  wrote:
> I actually want the sulfone to be found, if it is there. My problem is that I 
> also want flexibility to change the ring atoms and still find the ring as a 
> match, while considering a match on the sulfone only if it really is there. 
> (e.g., CF3 should *not* match.) Does it make sense?

Ahh, I see.

No, there's no way to do that.

The best I can suggest is to go back to the original Python implementation and 
change the code leading up to

   https://hg.sr.ht/~dalke/fmcs/browse/fmcs.py?rev=tip#L1929

so the initial seed is the sulfone instead of an (atom, bond, atom).

Then use that to the the MCS with the sulfone, and if that fails, use RDKit's 
existing method.

I point to my repository only because that's in Python and I know it better. If 
your C++ skills are better than mine, you might change the corresponding 
implementation in RDKit.

Cheers,

Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-22 Thread Gustavo Seabra
Hi,

Thanks a lot for the reply! However, in this case, it looks like I would
have to somehow label the isotope in every query molecule, right? For
example:
```
template =
Chem.MolFromSmarts('[c]1(-[2S](=[3O])(=[3O])(-C)):[c]:[c]:[c]:[c]:[c]:1')
mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1')
compare = [template,mol1]
res = rdFMCS.FindMCS(compare,
atomCompare=rdFMCS.AtomCompare.CompareIsotopes,
bondCompare=rdFMCS.BondCompare.CompareAny,
ringMatchesRingOnly=False,
completeRingsOnly=False)
res.smartsString
```
returns: '[0*]1:[0*]:[0*]:[0*]:[0*]:[0*]:1', that is, it only picks the
ring but not the sulfone. I actually want the sulfone to be found, if it is
there. My problem is that I also want flexibility to change the ring atoms
and still find the ring as a match, while considering a match on the
sulfone only if it really is there. (e.g., CF3 should *not* match.) Does it
make sense?

Thanks a lot!
--
Gustavo Seabra.


On Thu, Jul 22, 2021 at 4:52 PM Andrew Dalke 
wrote:

> Hi Gustavo,
>
>
> > template =
> Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1')
>
> Unless things have changed since I last looked at the algorithm, you can't
> meaningfully pass a SMARTS-based query molecule into the MCS program,
> outside of a few simple cases.
>
> It generates a SMARTS pattern based on the properties of the molecule. You
> asked it to CompareElements, but those [a] terms all have an atomic number
> of 0.
>
>   >>> template =
> Chem.MolFromSmarts('[a#1]1(-[S](-*)(=[O])=[O]):[a#1]:[a#1]:[a#1]:[a#1]:[a#1]:1')
>   >>> [a.GetAtomicNum() for a in template.GetAtoms()]
>   [0, 16, 0, 8, 8, 0, 0, 0, 0, 0]
>
> That's why your CompareAny search returns the #0 terms, like:
>
>
> '[#16,#6](-[#0,#6])(=,-[#8,#9])(=,-[#8,#9])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:1'
>
> > I would appreciate some pointers on how it would be possible to find the
> maximum common substructure of 2 molecules, where in the template structure
> some atoms may be *any*, but some other atoms must be fixed.
>
> Perhaps with isotope labelling?
>
> That is, label the "any" atoms as isotope 1, and label your
> -[S](=[O])(=[O])- as -[2S](=[3O])(=[3O])-
>
> Then use rdFMCS.AtomCompare.CompareIsotopes .
>
> If there's anything you don't want to match at all, give each atom a
> unique isotope value.
>
> Best regards,
>
> Andrew
> da...@dalkescientific.com
>
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-22 Thread Andrew Dalke
Hi Gustavo,


> template = 
> Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1')

Unless things have changed since I last looked at the algorithm, you can't 
meaningfully pass a SMARTS-based query molecule into the MCS program, outside 
of a few simple cases.

It generates a SMARTS pattern based on the properties of the molecule. You 
asked it to CompareElements, but those [a] terms all have an atomic number of 0.

  >>> template = 
Chem.MolFromSmarts('[a#1]1(-[S](-*)(=[O])=[O]):[a#1]:[a#1]:[a#1]:[a#1]:[a#1]:1')
  >>> [a.GetAtomicNum() for a in template.GetAtoms()]
  [0, 16, 0, 8, 8, 0, 0, 0, 0, 0]

That's why your CompareAny search returns the #0 terms, like:

  
'[#16,#6](-[#0,#6])(=,-[#8,#9])(=,-[#8,#9])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:1'

> I would appreciate some pointers on how it would be possible to find the 
> maximum common substructure of 2 molecules, where in the template structure 
> some atoms may be *any*, but some other atoms must be fixed.

Perhaps with isotope labelling?

That is, label the "any" atoms as isotope 1, and label your -[S](=[O])(=[O])- 
as -[2S](=[3O])(=[3O])-

Then use rdFMCS.AtomCompare.CompareIsotopes .

If there's anything you don't want to match at all, give each atom a unique 
isotope value.

Best regards,

Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-22 Thread Gustavo Seabra
Hi all,,

I would appreciate some pointers on how it would be possible to find the
maximum common substructure of 2 molecules, where in the template structure
some atoms may be *any*, but some other atoms must be fixed.

Currently, I'm trying to use rdFMCS module. For example:

from rdkit import Chem
from rdkit.Chem import rdFMCS

template =
Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1')
# This should give a sulfone connected to an aromatic ring and
# some other (any) element. Notice that the ring may have
# any atoms (N,C,O), but for me it is important to have the SO2 group.

mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1')
# This molecule has the pattern.

# Now, if I try to find a substructure match, I use:
compare = [template, mol1]
res = rdFMCS.FindMCS(compare,
atomCompare=rdFMCS.AtomCompare.CompareElements,
bondCompare=rdFMCS.BondCompare.CompareAny,
ringMatchesRingOnly=False,
completeRingsOnly=False)
res.smartsString
# gives: '[#16](=[#8])=[#8]'

# Notice that the only match is the SO2, it does not match the ring.
However, if I try that with another structure that has a CF3 in place of
the SO2, I get:
mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
compare = [template,mol2]
res = rdFMCS.FindMCS(compare,
atomCompare=rdFMCS.AtomCompare.CompareElements,
bondCompare=rdFMCS.BondCompare.CompareAny,
ringMatchesRingOnly=False,
completeRingsOnly=False)
res.smartsString
# Returns: '' (empty string)

# if I change to AtomCompare.CompareAny, now a CF3 will also match
# in the SO2-X:
mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
compare = [template,mol2]
res = rdFMCS.FindMCS(compare,
atomCompare=rdFMCS.AtomCompare.CompareAny,
bondCompare=rdFMCS.BondCompare.CompareAny,
ringMatchesRingOnly=False,
completeRingsOnly=False)
res.smartsString
# Returns:
'[#16,#6](-[#0,#6])(=,-[#8,#9])(=,-[#8,#9])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:1'

But now theCF3 is counted in place of the SO2. The result I'd like to get
here would be just the ring, as in the case:
new_template = Chem.MolFromSmarts('CS(=O)(=O)c1cnccc1')
mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
compare = [new_template,mol2]
res = rdFMCS.FindMCS(compare,
atomCompare=rdFMCS.AtomCompare.CompareElements,
bondCompare=rdFMCS.BondCompare.CompareAny,
ringMatchesRingOnly=False,
completeRingsOnly=False)
res.smartsString
# Returns: '[#6]1:[#6]:[#7]:[#6]:[#6]:[#6]:1' (just the ring)

Notice that if I use CompareElements, there seems to be no way to match the
ring with either N or C.

Does anyone have a suggestion on how I can specify flexibility (similar to
AtomCompare.CompareAny) only for a portion of the molecule and still
enforce specific atoms in another portion?

Thank you so much!
--
Gustavo Seabra.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss