Re: [Rdkit-discuss] SMARTS for heteroaromatic rings?

You are suggesting some interesting ideas.  Probably matching of atoms
in 5 and 6-membered aromatic rings will be sufficient for now.

I was initially stumped trying to figure out an elegant way to deal with

aromatic N's, O's, and S's in various combinations.  The usage of "a" in
SMARTS is powerful in this regard.

Thanks again.

Regards,
Jim Metz

My approach to this would depend on what you're trying to accomplish in the end.

If you just want all the aromatic atoms you can just use "[a]". Unless you do
some extra work when you read in the molecules, any aromatic atom will be in a
ring. If you want to be really sure, you can do "[a;r]"
If you want all the aromatic bonds, it's "[a]:[a]"

If you want the rings themselves and you want to just use SMARTS, you have to
enumerate. Python makes getting the patterns pretty easy:

In [8]: patts = ["[a]:1"+":[a]"*i+":[a]:1" for i in range(3,22)] # 24 is the
max aromatic ring size

In [9]: patts[:3]

Out[9]:

['[a]:1:[a]:[a]:[a]:[a]:1',

'[a]:1:[a]:[a]:[a]:[a]:[a]:1',

'[a]:1:[a]:[a]:[a]:[a]:[a]:[a]:1']

The rest is just some calls to MolFromSmarts() and then
mol.GetSubstructMatches() for the molecules you want to test.

-greg

Greg Landrum:
<rdkit-discuss@lists.sourceforge.net> wrote:

Jason,

Thanks!  I just thought of that for a 6-membered ring.  A 5-membered
ring would be [a]1[a][a][a][a]1.

Hmmm... I was thinking of using "r" to specify a ring, but I don't think

that would be necessary.  Correct?

Regards,

Jim Metz

if you don't care what type of atom it is, just that it's aromatic, you should
use [a],

so [a]1[a][a][a][a][a]1 would match any 6-membered aromatic ring

Jason Biggs

Jason Biggs:
<rdkit-discuss@lists.sourceforge.net> wrote:

Hello,

I would like to write a SMARTS that will match all of the individual atoms

in all possible heteroaromatic rings.  Does anyone know of an elegant,
compact way to do this?

If one SMARTS will not work, I can concatenate SMARTS using

a vertical pipe, "|", as I proposed in an earlier message in this forum.

I am (perhaps) expecting SMARTS something like

[c]1[c][n][c][c]1
etc
[c]1[c][c][c][c][c]1
[c]1[c][n][c][c][c]1
etc.

Perhaps there is a very elegant way to specify the possible

patterns.  I can't think of a way to do it, other than exhaustive
enumeration.

Any ideas?

Regards,

Jim Metz

