On May 19, 2017, at 08:33, Greg Landrum <greg.land...@gmail.com> wrote:
> The best solution to this is to use chemfp. It's a remarkable piece of 
> software.

Thanks, Greg.

> If you aren't willing to license that, the RDKit's search brute-force 
> fingerprint search capabilities aren't too bad for in-memory fingerprints.

To clarify, chemfp 1.1 is available for no cost from chemfp.com or PyPI, while 
later versions (chemfp 3.0 now supports Python 3) costs money. Both are 
distributed under the MIT license.

Within a month or so I'll be making a new release of the no-cost version. I'll 
update the fingerprint type names, to reflect a change in the recent RDKit 

On Thu, May 18, 2017 at 11:15 PM, Tim Dudgeon <tdudgeon...@gmail.com> wrote:
> I think I recall Greg mentioning that RDKit can be used for very fast
> similarity search (e.g. all vs. all comparisons or searches against
> multi-million sized datasets).

Greg's slides include a few timings numbers with single query searches. To add 
some all vs. all numbers, chemfp takes about 40 minutes to cluster the 1.58 
million fingerprints from ChEMBL 21, at a threshold of 0.8 using 4 threads and 
2048 bit RDKit fingerprints.

This functionality is available in the no-cost version. Depending on the 
fingerprint size and your CPU type the newest version (3.0) is between 5% and 
35% faster than chemfp 1.1.

Feel free to contact me if you have questions about chemfp.


Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Rdkit-discuss mailing list

Reply via email to