HI Nik,

On Tue, Feb 10, 2009 at 4:02 PM,  <nikolaus.sti...@novartis.com> wrote:
>
> just a curiosity ...
>
> 765534 vs 76522
>
> is one a subset of the other? If not - would it make sense to test on both?

It's an interesting question

the screened-out compounds are 99% similar: of the 765534 screened out
by the RDK fingerprints, all but 7907 of them are also screened by the
layered fps. Turned around: of the 765224 screened out by the layered
fps, all but 7597 of them are also removed by the RDK fingerprints.

So in this dataset, doing a second pass using the RDK fingerprints of
the compounds screened out by the layered fps would reduce the number
of subgraph isomorphism calls from 57466 to 49869 (13%). That savings
will be accompanied by some additional complication and more required
storage (an extra FP that needs to be stored).

Not sure if it's worth it in the end or not... certainly it's worth
thinking about if larger datasets show similar patterns.

> Just a thought. Apart from that I think the setup is reasonable for most
> applications we will have ...

Thanks.

-greg

Reply via email to