HI Nik,
On Tue, Feb 10, 2009 at 4:02 PM, <nikolaus.sti...@novartis.com> wrote: > > just a curiosity ... > > 765534 vs 76522 > > is one a subset of the other? If not - would it make sense to test on both? It's an interesting question the screened-out compounds are 99% similar: of the 765534 screened out by the RDK fingerprints, all but 7907 of them are also screened by the layered fps. Turned around: of the 765224 screened out by the layered fps, all but 7597 of them are also removed by the RDK fingerprints. So in this dataset, doing a second pass using the RDK fingerprints of the compounds screened out by the layered fps would reduce the number of subgraph isomorphism calls from 57466 to 49869 (13%). That savings will be accompanied by some additional complication and more required storage (an extra FP that needs to be stored). Not sure if it's worth it in the end or not... certainly it's worth thinking about if larger datasets show similar patterns. > Just a thought. Apart from that I think the setup is reasonable for most > applications we will have ... Thanks. -greg