On May 19, 2017, at 08:33, Greg Landrum <greg.land...@gmail.com> wrote: > The best solution to this is to use chemfp. It's a remarkable piece of > software.
Thanks, Greg. > If you aren't willing to license that, the RDKit's search brute-force > fingerprint search capabilities aren't too bad for in-memory fingerprints. To clarify, chemfp 1.1 is available for no cost from chemfp.com or PyPI, while later versions (chemfp 3.0 now supports Python 3) costs money. Both are distributed under the MIT license. Within a month or so I'll be making a new release of the no-cost version. I'll update the fingerprint type names, to reflect a change in the recent RDKit release. On Thu, May 18, 2017 at 11:15 PM, Tim Dudgeon <tdudgeon...@gmail.com> wrote: > I think I recall Greg mentioning that RDKit can be used for very fast > similarity search (e.g. all vs. all comparisons or searches against > multi-million sized datasets). Greg's slides include a few timings numbers with single query searches. To add some all vs. all numbers, chemfp takes about 40 minutes to cluster the 1.58 million fingerprints from ChEMBL 21, at a threshold of 0.8 using 4 threads and 2048 bit RDKit fingerprints. This functionality is available in the no-cost version. Depending on the fingerprint size and your CPU type the newest version (3.0) is between 5% and 35% faster than chemfp 1.1. Feel free to contact me if you have questions about chemfp. Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss