Thanks for doing this Greg.

Fixing those SMARTS queries always looked like it would be a real...pain.

I've dropped your Github file into the KNIME workflow, and the RDKit version of the workflow (using nodes RDKit 2.5.0.201505221301) now hits 770 structures in the WEHI-10k test set. But that includes 19 false positives that weren't being caught by the SLN filters.

One filter alone is responsible for 17 of those false positives:

anil_di_alk_C(246)
old: c:1:c:c(:c:c:c:1-[#8]-[#6;X4])-[#7](-[#6;X4])-[$([#1]),$([#6;X4])]
new: c:1:c:c(:c:c:c:1-[#8]-[#6;X4])-[#7;!H0,$([#7]-[#6;X4])]-[#6;X4]

An example of one of the false positive structures is the aniline sulfonamide WEHI-18518.

I've checked with Johnathan, and the intention of that query is that "... that the nitrogen has a single bond to a carbon that has four atoms bonded to it (i.e. sp3), and that the other atom singly bonded to the nitrogen atom is anything so long as it is either H or an sp3 carbon".

So no to sulfonamides, and also some of the acetamide (sp2 C) showing up as hits.

--

Cheers,

Simon

------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to