Hi Jim

The key thing to remember about the recursive SMARTS clauses is that
they only match one atom (the first), and the rest of the string
describes the environment in which that atom is located. So the clause
$(n1(C)ccc(=O)nc1=O) matches just the nitrogen atom - which has
embedded in the rest of the ring system. We then negate that with the
! symbol.

If we use just the recursive SMARTS expression '[$(a)]' (or the simple
SMARTS 'a'), it can match any of the six aromatic atoms in the
heterocycle. Adding the first exclusion '[$(a);!$(n1(C)ccc(=O)nc1=O)]'
means this atom can't match the nitrogen substituted by aliphatic
C,but it can still match any of the other five aromatic atoms.
Consequently there are five more exclusion clauses to add, each of
which starts with a different one of the aromatic atoms in your
undesired structure. As long as one of the atoms in the full SMARTS is
prevented from matching any of the atoms in the undesired structure in
this way, then the overall match is prevented.

Adding an exclusion for pyridine is then easy. We're already excluding
six patterns, and (considering symmetry) we only need to add four more
to exclude all pyridines. Appending
';!$(n1ccccc1);!$(c1ncccc1);!$(c1cnccc1);!$(c1ccncc1)' inside the
square brackets should do the trick.

You're quite right though, this gets pretty cumbersome very quickly
and it may well be best to handle it in code with simple include /
exclude SMARTS patterns. You'll have to think about checking which
atoms have been matched - for example, do you want to match quinoline
because it contains a benzene ring, or exclude it because it contains
a pyridine? If the former you'll have to check that the atoms matched
by your two patterns are different.

Hope this helps!

Chris Earnshaw

On 24 September 2017 at 15:01, James T. Metz <jamestm...@aol.com> wrote:
> Chris,
>
> Wow! Your recursive SMARTS expression works as needed!
>
> Hmmm... Help me understand this better ... it looks like you "walk around"
> the
> ring of the substructure we want to exclude and employ a slightly different
> recursive SMARTS beginning at that atom.  Is that correct?
>
> Also, since my situation is likely to get more complicated with respect to
> exclusions, suppose I still wanted to utilize the general aromatic
> expression
> for a 6-membered ring i.e. [a]1:[a]:[a]:[a][a]:[a]1, and I wanted to exclude
> the structures we have been discussing, and I also wanted to exclude
> pyridine i.e., [n]1:[c]:[c]:[c]:[c]:[c]1.
>
> Is there a SMARTS expression that would capture 2 exclusions?
>
> Perhaps this is getting too clumsy!  It might be better to have one or more
> inclusion SMARTS and one or more exclusion SMARTS, and write code
> to remove those groups of atoms that are coming from the exclusion SMARTS.
>
> Any ideas for PYTHON/RDkit code?  Something like
>
> test_smiles = 'c1ccccc1'
> inclusion_pattern = '[a]1:[a]:[a]:[a]:[a]:[a]1'
> exclusion_pattern = '[n]1:[c]:[c]:[c]:[c]:[c]1'
> etc...
>
> Hmmm... any other ideas, suggestions, comments?
>
> Thanks again.
>
> Regards,
> Jim Metz
>
>
>
>
> -----Original Message-----
> From: Chris Earnshaw <cgearns...@gmail.com>
> To: James T. Metz <jamestm...@aol.com>
> Cc: Rdkit-discuss@lists.sourceforge.net
> <rdkit-discuss@lists.sourceforge.net>
> Sent: Sun, Sep 24, 2017 4:01 am
> Subject: Re: [Rdkit-discuss] need SMARTS query with a specific exclusion
>
> Hi Jim
>
> It can be done with recursive SMARTS, though the syntax is a bit
> painful This may do what you want -
> [$(a);!$(n1(C)ccc(=O)nc1=O);!$(c1cc(=O)nc(=O)n1C);!$(c1c(=O)nc(=O)n(C)c1);!$(c(=O)1nc(=O)n(C)cc1);!$(n1c(=O)n(C)ccc1=O);!$(c(=O)1n(C)ccc(=O)n1)]:1:a:a:a:a:a:1
>
> Its basically the general 6-ring aromatic pattern a:1:a:a:a:a:a:1,
> with recursive SMARTS applied to the first atom to ensure that this
> can't match any of the 6 ring atoms in your undesired system.
>
> Regards,
> Chris Earnshaw
>
> On 24 September 2017 at 05:04, James T. Metz via Rdkit-discuss
> <rdkit-discuss@lists.sourceforge.net> wrote:
>> Hello,
>>
>> Suppose I have the following molecule
>>
>> m = 'CN1C=CC(=O)NC1=O'
>>
>> I would like to be able to use a SMARTS pattern
>>
>> pattern = '[a]1:[a][a]:[a]:[a]:a]1'
>>
>> to recognize the 6 atoms in a typical aromatic ring, but
>> I do not want to recognize the 6 atoms in the molecule,
>> m, as aromatic. In other words, I am trying to write
>> a specific exclusion.
>>
>> Is it possible to modify the SMARTS pattern to
>> exclude the above molecule? I have tried using
>> recursive SMARTS, but I can't get the syntax to
>> work.
>>
>> Any ideas? Thank you.
>>
>> Regards,
>> Jim Metz
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to