Hi there All, I am trying to tackle the most classical of cheminformatics problems - clustering based on molecule similarity. I have a few thousand molecules in a smiles file and I know how to compute the similarity using my fingerprint of choice using RDKit. But how do I cluster the results using the toolkit? (I have found some code in R for the Butina from Noel - http://www.redbrick.dcu.ie/~noel/R_clustering.html - but considering this algorithm seems to be implemented already in RDKit)
I can see that there is some clustering code in rdkit.Chem.ML.Cluster - but I can hardly find any examples/documentation (one question is what is the "Data" parameter like in ClusterData(...) http://www.rdkit.org/docs/api/rdkit.ML.Cluster.Butina-module.html). Is there a recommended algorithm? Is it possible to generate exactly n clusters (like kmeans) ? Can someone offer a brief overview? Perhaps something to cut and paste in a wiki page on the google code site? Many Thanks - Jean-Paul Ebejer Early Stage Researcher ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss