Hi RDKit-ers, I have released chemfp 4.2. The new "simarray" functionality computes the full comparison matrix as a NumPy array, eg, for use in some clustering algorithms. It has built-in support for Tanimoto, Dice, cosine, and Hamming comparisons, plus an option to get the individual "a", "b", "c", and "d" components should you need a specialized metric. It processes 100M comparisons per second on my laptop, which means if you had 30 TB of free disk space you could generate the NxN comparisons for ChEMBL in about a day. (I'm curious if someone will do this!)
I've also updated chemfp's RDKit-Fingerprint, RDKit-Morgan, RDKit-AtomPair, and RDKit-Torsion fingerprint types to use RDKit's fingerprint generator API, instead of the older function-based API. This includes support for count emulation. Some of the parameter names have changed to follow RDKit's newer convention, and the RDKit-Morgan fingerprints now default to r=3 (to match the RDKit default) rather than r=2. Chemfp still supports the older function-based API, which is used if you specify the older version number explicitly. For a full description of what's new in this release, see https://chemfp.com/docs/whats_new_in_42.html . Chemfp may be the package you’ve been looking for, if you work with binary cheminformatics fingerprints in Python. Chemfp is perhaps best known for its high-performance fingerprint similarity search. Its Taylor/Butina clustering, MaxMin diversity selection, and sphere exclusion, (including directed sphere exclusion) are equally world-class. Or, if you simply need a 100K by 100K distance array to pass into scikit-learn, chemfp’s simarray can generate that in less than a minute. The chemfp homepage is https://chemfp.com/ . To install a pre-compiled chemfp for Linux-based OSes: python -m pip install chemfp -i https://chemfp.com/packages/ The default installation limits or disables a few chemfp features as described in the base license agreement at https://chemfp.com/BaseLicense.txt . To request a license key, which is free for academic use, see https://chemfp.com/license/ . Best regards, Andrew Dalke da...@dalkescientific.com _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss