Thanks for that.
Do you have a version that says which of the molecules hit which PAINS?
That would really help with the refinement.

-greg


On Wed, Aug 26, 2015 at 5:44 AM, Simon Saubern <[email protected]>
wrote:

> Attached the original list from Jonathan of the 861 SLN hits.
>
> S.
>
>
> On 26/08/2015 13:08 , Greg Landrum wrote:
>
>
> On Wed, Aug 26, 2015 at 2:32 AM, Simon Saubern <[email protected]>
> wrote:
>
>> Thanks for doing this Greg.
>>
>> Fixing those SMARTS queries always looked like it would be a real...pain.
>>
>
> :-)
>
> I've dropped your Github file into the KNIME workflow, and the RDKit
>> version of the workflow (using nodes RDKit 2.5.0.201505221301) now hits 770
>> structures in the WEHI-10k test set.
>>
>
> For what it's worth, I now get 888 matches across the WEHI-10K set when
> running my Python test script. I am not 100% sure that the KNIME nodes are
> doing (or can do) the mergeQueryHs step; that's something else for me to
> follow up on.
>
>
>> But that includes 19 false positives that weren't being caught by the SLN
>> filters.
>>
>> One filter alone is responsible for 17 of those false positives:
>>
>> anil_di_alk_C(246)
>> old: c:1:c:c(:c:c:c:1-[#8]-[#6;X4])-[#7](-[#6;X4])-[$([#1]),$([#6;X4])]
>> new: c:1:c:c(:c:c:c:1-[#8]-[#6;X4])-[#7;!H0,$([#7]-[#6;X4])]-[#6;X4]
>>
>> An example of one of the false positive structures is the aniline
>> sulfonamide WEHI-18518.
>>
>> I've checked with Johnathan, and the intention of that query is that "...
>> that the nitrogen has a single bond to a carbon that has four atoms bonded
>> to it (i.e. sp3), and that the other atom singly bonded to the nitrogen
>> atom is anything so long as it is either H or an sp3 carbon".
>>
>> So no to sulfonamides, and also some of the acetamide (sp2 C) showing up
>> as hits.
>>
>
> Thanks for pointing that out and providing the clarification about what is
> expected!
> I just committed a fix for this:
>
> https://github.com/rdkit/rdkit/commit/e2487ffe79c393a6b0e472882bfb6eb66a3bcb8b
>
> As an aside: If you could provide a text file that has the matches found
> for each pattern in the WEHI-10k test set when you use the SLN version of
> the PAINS, I would be very happy to use that to further refine these
> patterns and to incorporate those results into the tests.
>
> -greg
>
>
>
>
>
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to