On Mon, Feb 23, 2015 at 4:24 PM, Patrick Walters <wpwalt...@gmail.com>
wrote:

> I agree that there are plenty of implementations of clustering, machine
> learning, etc.  It would be better for the RDKit developers to focus on
> cheminformatics.   This being said, there are some opportunities for domain
> specific performance enhancement.  One of the slow steps in many clustering
> algorithms is the calculation of a distance matrix and identification of
> neighbors.  If you're clustering fingerprints, I'd recommend looking at Andrew
> Dalke's ChemFP <http://chemfp.com/>.  Andrew has applied a multitude of
> tricks that can make clustering blazingly fast.   The ChemFP examples
> include an implementation of Taylor-Butina clustering.  Even better, ChemFP
> works "out of the box" with the RDKit.
>

Actually, coupling chemfp to a hierarchical clustering algorithm like the
"Murtagh" code the RDKit includes would be pretty cool...

-greg
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to