Hi Tim,

First the best answer: The best solution to this is to use chemfp. It's a
remarkable piece of software.

If you aren't willing to license that, the RDKit's search brute-force
fingerprint search capabilities aren't too bad for in-memory fingerprints.

There's some information in this slide deck, from a presentation I did here
in Basel last year:
https://www.slideshare.net/GregLandrum1/big-chemical-data-no-problem
The parts about similarity search start at slide 44. The "FPB"-based search
on slide 46 uses a chemfp file format, but it provides an upper bound on
what you could expect for an in-memory search.

Nils mentioned the postgresql cartridge too: That's not a terrible way to
do searches, and the individual similarity calculations are pretty quick,
but it's not as fast as using a built-for-purpose similarity search tool.

I still need to writeup some experiments I did at the end of last year on
ways to make things faster... time flies.

-greg




On Thu, May 18, 2017 at 11:15 PM, Tim Dudgeon <tdudgeon...@gmail.com> wrote:

> I think I recall Greg mentioning that RDKit can be used for very fast
> similarity search (e.g. all vs. all comparisons or searches against
> multi-million sized datasets).
> If so, is this part the of the standard distro, or something extra
> (chemfp?).
> And can it run inside the cartridge?
> And any benchmarks?
>
> Thanks
> Tim
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to