Hi RDKit-ers,

  I have released chemfp 4.2. The new "simarray" functionality computes the 
full comparison matrix as a NumPy array, eg, for use  in some clustering 
algorithms. It has built-in support for Tanimoto, Dice, cosine, and Hamming 
comparisons, plus an option to get the individual "a", "b", "c", and "d" 
components should you need a specialized metric. It processes 100M comparisons 
per second on my laptop, which means if you had 30 TB of free disk space you 
could generate the NxN comparisons for ChEMBL in about a day. (I'm curious if 
someone will do this!)

I've also updated chemfp's RDKit-Fingerprint, RDKit-Morgan, RDKit-AtomPair, and 
RDKit-Torsion fingerprint types to use RDKit's fingerprint generator API, 
instead of the older function-based API. This includes support for count 
emulation. Some of the parameter names have changed to follow RDKit's newer 
convention, and the RDKit-Morgan fingerprints now default to r=3 (to match the 
RDKit default) rather than r=2.

Chemfp still supports the older function-based API, which is used if you 
specify the older version number explicitly.

For a full description of what's new in this release, see 
https://chemfp.com/docs/whats_new_in_42.html .

Chemfp may be the package you’ve been looking for, if you work with binary 
cheminformatics fingerprints in Python. Chemfp is perhaps best known for its 
high-performance fingerprint similarity search. Its Taylor/Butina clustering, 
MaxMin diversity selection, and sphere exclusion, (including directed sphere 
exclusion) are equally world-class. Or, if you simply need a 100K by 100K 
distance array to pass into scikit-learn, chemfp’s simarray can generate that 
in less than a minute.

The chemfp homepage is https://chemfp.com/ . To install a pre-compiled chemfp 
for Linux-based OSes:

  python -m pip install chemfp -i https://chemfp.com/packages/

The default installation limits or disables a few chemfp features as described 
in the base license agreement at https://chemfp.com/BaseLicense.txt . To 
request a license key, which is free for academic use, see 
https://chemfp.com/license/ .

Best regards,

                                Andrew Dalke
                                da...@dalkescientific.com



_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to