Hi Jim The key thing to remember about the recursive SMARTS clauses is that they only match one atom (the first), and the rest of the string describes the environment in which that atom is located. So the clause $(n1(C)ccc(=O)nc1=O) matches just the nitrogen atom - which has embedded in the rest of the ring system. We then negate that with the ! symbol.
If we use just the recursive SMARTS expression '[$(a)]' (or the simple SMARTS 'a'), it can match any of the six aromatic atoms in the heterocycle. Adding the first exclusion '[$(a);!$(n1(C)ccc(=O)nc1=O)]' means this atom can't match the nitrogen substituted by aliphatic C,but it can still match any of the other five aromatic atoms. Consequently there are five more exclusion clauses to add, each of which starts with a different one of the aromatic atoms in your undesired structure. As long as one of the atoms in the full SMARTS is prevented from matching any of the atoms in the undesired structure in this way, then the overall match is prevented. Adding an exclusion for pyridine is then easy. We're already excluding six patterns, and (considering symmetry) we only need to add four more to exclude all pyridines. Appending ';!$(n1ccccc1);!$(c1ncccc1);!$(c1cnccc1);!$(c1ccncc1)' inside the square brackets should do the trick. You're quite right though, this gets pretty cumbersome very quickly and it may well be best to handle it in code with simple include / exclude SMARTS patterns. You'll have to think about checking which atoms have been matched - for example, do you want to match quinoline because it contains a benzene ring, or exclude it because it contains a pyridine? If the former you'll have to check that the atoms matched by your two patterns are different. Hope this helps! Chris Earnshaw On 24 September 2017 at 15:01, James T. Metz <jamestm...@aol.com> wrote: > Chris, > > Wow! Your recursive SMARTS expression works as needed! > > Hmmm... Help me understand this better ... it looks like you "walk around" > the > ring of the substructure we want to exclude and employ a slightly different > recursive SMARTS beginning at that atom. Is that correct? > > Also, since my situation is likely to get more complicated with respect to > exclusions, suppose I still wanted to utilize the general aromatic > expression > for a 6-membered ring i.e. [a]1:[a]:[a]:[a][a]:[a]1, and I wanted to exclude > the structures we have been discussing, and I also wanted to exclude > pyridine i.e., [n]1:[c]:[c]:[c]:[c]:[c]1. > > Is there a SMARTS expression that would capture 2 exclusions? > > Perhaps this is getting too clumsy! It might be better to have one or more > inclusion SMARTS and one or more exclusion SMARTS, and write code > to remove those groups of atoms that are coming from the exclusion SMARTS. > > Any ideas for PYTHON/RDkit code? Something like > > test_smiles = 'c1ccccc1' > inclusion_pattern = '[a]1:[a]:[a]:[a]:[a]:[a]1' > exclusion_pattern = '[n]1:[c]:[c]:[c]:[c]:[c]1' > etc... > > Hmmm... any other ideas, suggestions, comments? > > Thanks again. > > Regards, > Jim Metz > > > > > -----Original Message----- > From: Chris Earnshaw <cgearns...@gmail.com> > To: James T. Metz <jamestm...@aol.com> > Cc: Rdkit-discuss@lists.sourceforge.net > <rdkit-discuss@lists.sourceforge.net> > Sent: Sun, Sep 24, 2017 4:01 am > Subject: Re: [Rdkit-discuss] need SMARTS query with a specific exclusion > > Hi Jim > > It can be done with recursive SMARTS, though the syntax is a bit > painful This may do what you want - > [$(a);!$(n1(C)ccc(=O)nc1=O);!$(c1cc(=O)nc(=O)n1C);!$(c1c(=O)nc(=O)n(C)c1);!$(c(=O)1nc(=O)n(C)cc1);!$(n1c(=O)n(C)ccc1=O);!$(c(=O)1n(C)ccc(=O)n1)]:1:a:a:a:a:a:1 > > Its basically the general 6-ring aromatic pattern a:1:a:a:a:a:a:1, > with recursive SMARTS applied to the first atom to ensure that this > can't match any of the 6 ring atoms in your undesired system. > > Regards, > Chris Earnshaw > > On 24 September 2017 at 05:04, James T. Metz via Rdkit-discuss > <rdkit-discuss@lists.sourceforge.net> wrote: >> Hello, >> >> Suppose I have the following molecule >> >> m = 'CN1C=CC(=O)NC1=O' >> >> I would like to be able to use a SMARTS pattern >> >> pattern = '[a]1:[a][a]:[a]:[a]:a]1' >> >> to recognize the 6 atoms in a typical aromatic ring, but >> I do not want to recognize the 6 atoms in the molecule, >> m, as aromatic. In other words, I am trying to write >> a specific exclusion. >> >> Is it possible to modify the SMARTS pattern to >> exclude the above molecule? I have tried using >> recursive SMARTS, but I can't get the syntax to >> work. >> >> Any ideas? Thank you. >> >> Regards, >> Jim Metz >> >> >> >> >> ------------------------------------------------------------------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss