# Re: [Rdkit-discuss] "Markush SMARTS" ?

```Hi Alexis,

I'm trying to make sure I understand the use case: you want to search for
aromatic rings that have one F and one Cl or aromatic rings that have
exactly two substitutions?```
```
If you just wanted to determine whether or not there is a match, you could
use this recursive SMARTS for the first use case:
[c;\$(c1(F)c(Cl)cccc1),\$(c1(F)cc(Cl)ccc1),\$(c1(F)ccc(Cl)cc1)]
and this one for the second:
[c;\$(c1(F)c(Cl)[cH][cH][cH][cH]1),\$(c1(F)[cH]c(Cl)[cH]c[
cH]1),\$(c1(F)[cH][cH]c(Cl)[cH][cH]1)]

Here's a little demo of that:

In [2]: smis =
('Fc1c(Cl)cccc1','Fc1cc(Cl)ccc1','Fc1ccc(Cl)cc1','Fc1c(Cl)cc(C)cc1','Fc1c(C)cccc1')

In [3]: ms = [Chem.MolFromSmiles(x) for x in smis]

In [4]: p1 =
Chem.MolFromSmarts('[c;\$(c1(F)c(Cl)cccc1),\$(c1(F)cc(Cl)ccc1),\$(c1(F)ccc(Cl)cc1)]')

In [6]: for smi,m in zip(smis,ms):
...:     print(smi,m.HasSubstructMatch(p1))
...:
Fc1c(Cl)cccc1 True
Fc1cc(Cl)ccc1 True
Fc1ccc(Cl)cc1 True
Fc1c(Cl)cc(C)cc1 True
Fc1c(C)cccc1 False

In [7]: p2 =
Chem.MolFromSmarts('[c;\$(c1(F)c(Cl)[cH][cH][cH][cH]1),\$(c1(F)[cH]c(Cl)[cH]c[cH]1),\$(c1(F)[cH][cH]c(Cl
...: )[cH][cH]1)]')

In [8]: for smi,m in zip(smis,ms):
...:     print(smi,m.HasSubstructMatch(p2))
...:
Fc1c(Cl)cccc1 True
Fc1cc(Cl)ccc1 True
Fc1ccc(Cl)cc1 True
Fc1c(Cl)cc(C)cc1 False
Fc1c(C)cccc1 False

Getting all the atoms involved in the match is a bit more complicated since
the recursive SMARTS above just match a single atom. The upcoming RDKit
release has a new data structure, the MolBundle, that would help here.
Here's an example of how that works:

In [9]: querySmis=('Fc1c(Cl)cccc1','Fc1cc(Cl)ccc1','Fc1ccc(Cl)cc1')

In [10]: queries = [Chem.MolFromSmiles(x) for x in querySmis]

In [11]: bndl = Chem.MolBundle()

In [12]: for query in queries: bndl.AddMol(query)

In [13]: for smi,m in zip(smis,ms):
...:     print(smi,m.HasSubstructMatch(bndl))
...:
Fc1c(Cl)cccc1 True
Fc1cc(Cl)ccc1 True
Fc1ccc(Cl)cc1 True
Fc1c(Cl)cc(C)cc1 True
Fc1c(C)cccc1 False

In [14]: for smi,m in zip(smis,ms):
...:     print(smi,m.GetSubstructMatch(bndl))
...:
Fc1c(Cl)cccc1 (0, 1, 2, 3, 4, 5, 6, 7)
Fc1cc(Cl)ccc1 (0, 1, 2, 3, 4, 5, 6, 7)
Fc1ccc(Cl)cc1 (0, 1, 2, 3, 4, 5, 6, 7)
Fc1c(Cl)cc(C)cc1 (0, 1, 2, 3, 4, 5, 7, 8)
Fc1c(C)cccc1 ()

Or, for the second use case:

In [16]: queries2 = [Chem.AdjustQueryProperties(x) for x in queries]

In [17]: bndl2 = Chem.MolBundle()

In [18]: for query in queries2: bndl2.AddMol(query)

In [19]: for smi,m in zip(smis,ms):
...:     print(smi,m.GetSubstructMatch(bndl2))
...:
Fc1c(Cl)cccc1 (0, 1, 2, 3, 4, 5, 6, 7)
Fc1cc(Cl)ccc1 (0, 1, 2, 3, 4, 5, 6, 7)
Fc1ccc(Cl)cc1 (0, 1, 2, 3, 4, 5, 6, 7)
Fc1c(Cl)cc(C)cc1 ()
Fc1c(C)cccc1 ()

Note that due to an oversight on my part the functions required to do this
are not part of the beta versions of the release.

Does that help?
-greg

On Fri, Sep 29, 2017 at 11:48 AM, Alexis Parenty <
alexis.parenty.h...@gmail.com> wrote:

> Dear rdkiters,
>
>
> I am interested to capture in a single SMARTS notation aromatic systems
> with several possible substitution positions (ortho, meta, para).
>
> Is there a way using rdkit to covert for example the three structures
> under in a SMARTS notation that would match the three structures when I do
> a substructure search?
>
>
> [image: Inline images 1]
>
>
> Is there such a thing as a “Markush SMARTS”?
>
>
>
> Many thanks and regards,
>
> Alexis
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
```
```------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
```_______________________________________________