On Mon, Feb 23, 2015 at 4:24 PM, Patrick Walters <wpwalt...@gmail.com>
wrote:
> I agree that there are plenty of implementations of clustering, machine
> learning, etc. It would be better for the RDKit developers to focus on
> cheminformatics. This being said, there are some opportunities for domain
> specific performance enhancement. One of the slow steps in many clustering
> algorithms is the calculation of a distance matrix and identification of
> neighbors. If you're clustering fingerprints, I'd recommend looking at Andrew
> Dalke's ChemFP <http://chemfp.com/>. Andrew has applied a multitude of
> tricks that can make clustering blazingly fast. The ChemFP examples
> include an implementation of Taylor-Butina clustering. Even better, ChemFP
> works "out of the box" with the RDKit.
>
Actually, coupling chemfp to a hierarchical clustering algorithm like the
"Murtagh" code the RDKit includes would be pretty cool...
-greg
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss