Hi Sai,

The clustering in SenseClusters is performed by Cluto, which includes
quite a few clustering algorithms, and has even more similarity
measures. I'd actually sort of discourage you from trying to invent
your own similarity measure since there are so many and it's not clear
that you'd be able to do anything substantially different or better
than what already exists. In terms of manipulating cluster size, I'm
not sure about that. I think that's something you probably want to do
indirectly through your choice of clustering algorithm, criterion
function, and similarity measure (rather than saying I want clusters
of 100 items each, which seems like it might be cheating a little :)

Speaking of cheating, one of the things Cluto requires is that you
specify the number of clusters ahead of time. For many problems that
is cheating, since you don't always know that a priori. SenseClusters
actually adds cluster stopping measures to Cluto, and predicts the
number of clusters automatically for you, which we think is a
substantial improvement. In general that's the idea of SenseClusters,
to provide support for all the things that must occur before and after
the actual clustering operation.

So, I'd suggest you check out Cluto a bit more. If that looks like it
provides the functionality you need, I think SenseClusters can add
quite a bit to that which will help in many sorts of text clustering
applications.

More about Cluto here:
http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview

And of course SenseClusters is here:
http://senseclusters.sourceforge.net

Cordially,
Ted

On Jan 13, 2008 11:17 AM, Sai Tang Huang <[EMAIL PROTECTED]> wrote:
>
>
> Hi there,
>
> My name is Sai Tang Huang and I'm a student at the University of Brighton
> (UK). I am doing a project that has got to do with text corpus clustering
> (to eventually achieve corpus reduction in the context of SMT Language
> Models). I need to find a clustering package that:
>
> - includes several algorithms
> - allows parameters such as cluster size to be manipulated
> - allows me to plug in my own similarity measures
>
> Having read your slides and seen your video on "Language independent methods
> of clustering similar contexts" (long and interesting), I was wondering if
> SenseCluster would be able to do this. If not, are you aware of any other
> open source software that would acomplish this?
>
> Thanks a million for you time.
>
> Sai
>
>



-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Reply via email to