Hi there All,

I am trying to tackle the most classical of cheminformatics problems -
clustering based on molecule similarity.
I have a few thousand molecules in a smiles file and I know how to
compute the similarity using my fingerprint of choice using RDKit.
But how do I cluster the results using the toolkit?  (I have found
some code in R for the Butina from Noel -
http://www.redbrick.dcu.ie/~noel/R_clustering.html - but considering
this algorithm seems to be implemented already in RDKit)

I can see that there is some clustering code in rdkit.Chem.ML.Cluster
- but I can hardly find any examples/documentation (one question is
what is the "Data" parameter like in ClusterData(...)
http://www.rdkit.org/docs/api/rdkit.ML.Cluster.Butina-module.html).
Is there a recommended algorithm?  Is it possible to generate exactly
n clusters (like kmeans) ?

Can someone offer a brief overview?
Perhaps something to cut and paste in a wiki page on the google code site?

Many Thanks

-
Jean-Paul Ebejer
Early Stage Researcher

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to