Re: Word and Phrase Clustering

Jeff Eastman Thu, 01 Dec 2011 20:30:00 -0800

Could you elaborate a bit on what you mean by "cluster a collection ofwords and phrases by syntactic similarity over a distributed environment"? If you can describe your collection in terms of a set of (sparse ordense) term vectors then you should be able to use Mahout clusteringdirectly. The vectors do not need to be huge (as "document" mightimply), indeed smaller dimensionality clusterings work better than largeones. The question would be how do you plan to encode these vectors?Another would be how large a collection you have?


On 12/1/11 8:08 PM, Neil Chaudhuri wrote:

I have a need to cluster a collection of words and phrases by syntactic 
similarity over a distributed environment, and I came upon Mahout as a possible 
solution. After studying the documentation though, I am finding all of it 
tailored to working with entire documents rather than words and phrases. I 
simply want to know if you believe that Mahout is the right tool for this job. 
I suppose I could try to view each word and phrase as individual tiny 
documents, but that feels like I am forcing it.


Any insight is appreciated.

Thanks.

Re: Word and Phrase Clustering

Reply via email to