Grant Ingersoll <gsingers <at> apache.org> writes: > > I've tried various open source tools (Gephi, others), but haven't found one yet that can handle large > volumes of points in an efficient way. FWIW, the Carrot2 workbench is BSD, perhaps it could be used with > some work? > > That being said, I did recently add the ability to ClusterDumper to output in CSV or GraphML as well as make it > pluggable so one can output whatever format you wish. > Grant,
That was quick! Unfortunately, I don't have a lot of experience with these tools, but my current tool chain: solr->mahout lucene.vector->mahout canopy-> mahout kmeans -> mahout clusterdump is reporting that exactly one cluster got written. However, the csv file created using clusterdump is zero length. I'm also seeing no variation in results despite changing kmeans distance measurers and canopy t1 and t2 parameters. Am currently running Dirichlet on my 8k document data set in an attempt to understand the structure of my data better. Any advice? Thanks, Mark
