Hi Simon, On Tue, Sep 27, 2011 at 9:21 AM, Simon Saubern <[email protected]> wrote: > > The recent updates to the way explicit hydrogens are handled in the RDKit > nodes for KNIME http://goo.gl/DK0FS have dramatically improved the number > of correct matches that we observe when using the PAINS filters workflow > http://goo.gl/T9mT2 . > > Against the reference set from WEHI, we're now seeing 652 matches (up from > 329), but we also now get 231 false positives where we were getting none > before. > > Attached is a tab-sep file containing the mis-matches (regID, smiles, > smarts, smartsID). > > The smarts strings come from Raj's blog: http://blog.rguha.net/?p=850. > > Let us know if you need additional info to diagnose what's going on.
Thanks for providing all the data; that really helps. I think I've got at least part of it figured out and fixed. There was a problem with the way explicit Hs were being merged into the atoms they are connected to. This led to bits of query like "C([#1])[#1]" being converted to "[C&!H0]". This has been fixed in the RDKit itself. I also updated the relevant pieces of the Knime nodes, the changes should be in today's nightly build. Please give the new version a try and let us know if there are still problems, -greg ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

