Hi Sai, The clustering in SenseClusters is performed by Cluto, which includes quite a few clustering algorithms, and has even more similarity measures. I'd actually sort of discourage you from trying to invent your own similarity measure since there are so many and it's not clear that you'd be able to do anything substantially different or better than what already exists. In terms of manipulating cluster size, I'm not sure about that. I think that's something you probably want to do indirectly through your choice of clustering algorithm, criterion function, and similarity measure (rather than saying I want clusters of 100 items each, which seems like it might be cheating a little :)
Speaking of cheating, one of the things Cluto requires is that you specify the number of clusters ahead of time. For many problems that is cheating, since you don't always know that a priori. SenseClusters actually adds cluster stopping measures to Cluto, and predicts the number of clusters automatically for you, which we think is a substantial improvement. In general that's the idea of SenseClusters, to provide support for all the things that must occur before and after the actual clustering operation. So, I'd suggest you check out Cluto a bit more. If that looks like it provides the functionality you need, I think SenseClusters can add quite a bit to that which will help in many sorts of text clustering applications. More about Cluto here: http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview And of course SenseClusters is here: http://senseclusters.sourceforge.net Cordially, Ted On Jan 13, 2008 11:17 AM, Sai Tang Huang <[EMAIL PROTECTED]> wrote: > > > Hi there, > > My name is Sai Tang Huang and I'm a student at the University of Brighton > (UK). I am doing a project that has got to do with text corpus clustering > (to eventually achieve corpus reduction in the context of SMT Language > Models). I need to find a clustering package that: > > - includes several algorithms > - allows parameters such as cluster size to be manipulated > - allows me to plug in my own similarity measures > > Having read your slides and seen your video on "Language independent methods > of clustering similar contexts" (long and interesting), I was wondering if > SenseCluster would be able to do this. If not, are you aware of any other > open source software that would acomplish this? > > Thanks a million for you time. > > Sai > > -- Ted Pedersen http://www.d.umn.edu/~tpederse ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace _______________________________________________ senseclusters-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/senseclusters-users
