Thanks for that. Do you have a version that says which of the molecules hit which PAINS? That would really help with the refinement.
-greg On Wed, Aug 26, 2015 at 5:44 AM, Simon Saubern <[email protected]> wrote: > Attached the original list from Jonathan of the 861 SLN hits. > > S. > > > On 26/08/2015 13:08 , Greg Landrum wrote: > > > On Wed, Aug 26, 2015 at 2:32 AM, Simon Saubern <[email protected]> > wrote: > >> Thanks for doing this Greg. >> >> Fixing those SMARTS queries always looked like it would be a real...pain. >> > > :-) > > I've dropped your Github file into the KNIME workflow, and the RDKit >> version of the workflow (using nodes RDKit 2.5.0.201505221301) now hits 770 >> structures in the WEHI-10k test set. >> > > For what it's worth, I now get 888 matches across the WEHI-10K set when > running my Python test script. I am not 100% sure that the KNIME nodes are > doing (or can do) the mergeQueryHs step; that's something else for me to > follow up on. > > >> But that includes 19 false positives that weren't being caught by the SLN >> filters. >> >> One filter alone is responsible for 17 of those false positives: >> >> anil_di_alk_C(246) >> old: c:1:c:c(:c:c:c:1-[#8]-[#6;X4])-[#7](-[#6;X4])-[$([#1]),$([#6;X4])] >> new: c:1:c:c(:c:c:c:1-[#8]-[#6;X4])-[#7;!H0,$([#7]-[#6;X4])]-[#6;X4] >> >> An example of one of the false positive structures is the aniline >> sulfonamide WEHI-18518. >> >> I've checked with Johnathan, and the intention of that query is that "... >> that the nitrogen has a single bond to a carbon that has four atoms bonded >> to it (i.e. sp3), and that the other atom singly bonded to the nitrogen >> atom is anything so long as it is either H or an sp3 carbon". >> >> So no to sulfonamides, and also some of the acetamide (sp2 C) showing up >> as hits. >> > > Thanks for pointing that out and providing the clarification about what is > expected! > I just committed a fix for this: > > https://github.com/rdkit/rdkit/commit/e2487ffe79c393a6b0e472882bfb6eb66a3bcb8b > > As an aside: If you could provide a text file that has the matches found > for each pattern in the WEHI-10k test set when you use the SLN version of > the PAINS, I would be very happy to use that to further refine these > patterns and to incorporate those results into the tests. > > -greg > > > > >
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

