On Wed, Aug 26, 2015 at 2:32 AM, Simon Saubern <[email protected]>
wrote:

> Thanks for doing this Greg.
>
> Fixing those SMARTS queries always looked like it would be a real...pain.
>

:-)

I've dropped your Github file into the KNIME workflow, and the RDKit
> version of the workflow (using nodes RDKit 2.5.0.201505221301) now hits 770
> structures in the WEHI-10k test set.
>

For what it's worth, I now get 888 matches across the WEHI-10K set when
running my Python test script. I am not 100% sure that the KNIME nodes are
doing (or can do) the mergeQueryHs step; that's something else for me to
follow up on.


> But that includes 19 false positives that weren't being caught by the SLN
> filters.
>
> One filter alone is responsible for 17 of those false positives:
>
> anil_di_alk_C(246)
> old: c:1:c:c(:c:c:c:1-[#8]-[#6;X4])-[#7](-[#6;X4])-[$([#1]),$([#6;X4])]
> new: c:1:c:c(:c:c:c:1-[#8]-[#6;X4])-[#7;!H0,$([#7]-[#6;X4])]-[#6;X4]
>
> An example of one of the false positive structures is the aniline
> sulfonamide WEHI-18518.
>
> I've checked with Johnathan, and the intention of that query is that "...
> that the nitrogen has a single bond to a carbon that has four atoms bonded
> to it (i.e. sp3), and that the other atom singly bonded to the nitrogen
> atom is anything so long as it is either H or an sp3 carbon".
>
> So no to sulfonamides, and also some of the acetamide (sp2 C) showing up
> as hits.
>

Thanks for pointing that out and providing the clarification about what is
expected!
I just committed a fix for this:
https://github.com/rdkit/rdkit/commit/e2487ffe79c393a6b0e472882bfb6eb66a3bcb8b

As an aside: If you could provide a text file that has the matches found
for each pattern in the WEHI-10k test set when you use the SLN version of
the PAINS, I would be very happy to use that to further refine these
patterns and to incorporate those results into the tests.

-greg
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to