Re: Word and Phrase Clustering

Ted Dunning Thu, 01 Dec 2011 22:15:57 -0800

It depends on what you mean by syntactic similarity.  If you mean how the
words are used, then you can build a document by collecting a reasonable
sized sample all the words that appear near each word.  These neighboring
words can be clustered as if they were documents and should give you
reasonable usage clusters.


If you mean by internal structure, then you need to do something a bit
different.

On Thu, Dec 1, 2011 at 7:08 PM, Neil Chaudhuri <[email protected]
> wrote:

> I have a need to cluster a collection of words and phrases by syntactic
> similarity over a distributed environment, and I came upon Mahout as a
> possible solution. After studying the documentation though, I am finding
> all of it tailored to working with entire documents rather than words and
> phrases. I simply want to know if you believe that Mahout is the right tool
> for this job. I suppose I could try to view each word and phrase as
> individual tiny documents, but that feels like I am forcing it.
>
> Any insight is appreciated.
>
> Thanks.
>

Re: Word and Phrase Clustering

Reply via email to