On Wed, Aug 26, 2015 at 2:32 AM, Simon Saubern <[email protected]> wrote:
> Thanks for doing this Greg. > > Fixing those SMARTS queries always looked like it would be a real...pain. > :-) I've dropped your Github file into the KNIME workflow, and the RDKit > version of the workflow (using nodes RDKit 2.5.0.201505221301) now hits 770 > structures in the WEHI-10k test set. > For what it's worth, I now get 888 matches across the WEHI-10K set when running my Python test script. I am not 100% sure that the KNIME nodes are doing (or can do) the mergeQueryHs step; that's something else for me to follow up on. > But that includes 19 false positives that weren't being caught by the SLN > filters. > > One filter alone is responsible for 17 of those false positives: > > anil_di_alk_C(246) > old: c:1:c:c(:c:c:c:1-[#8]-[#6;X4])-[#7](-[#6;X4])-[$([#1]),$([#6;X4])] > new: c:1:c:c(:c:c:c:1-[#8]-[#6;X4])-[#7;!H0,$([#7]-[#6;X4])]-[#6;X4] > > An example of one of the false positive structures is the aniline > sulfonamide WEHI-18518. > > I've checked with Johnathan, and the intention of that query is that "... > that the nitrogen has a single bond to a carbon that has four atoms bonded > to it (i.e. sp3), and that the other atom singly bonded to the nitrogen > atom is anything so long as it is either H or an sp3 carbon". > > So no to sulfonamides, and also some of the acetamide (sp2 C) showing up > as hits. > Thanks for pointing that out and providing the clarification about what is expected! I just committed a fix for this: https://github.com/rdkit/rdkit/commit/e2487ffe79c393a6b0e472882bfb6eb66a3bcb8b As an aside: If you could provide a text file that has the matches found for each pattern in the WEHI-10k test set when you use the SLN version of the PAINS, I would be very happy to use that to further refine these patterns and to incorporate those results into the tests. -greg
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

