It depends on what you mean by syntactic similarity. If you mean how the words are used, then you can build a document by collecting a reasonable sized sample all the words that appear near each word. These neighboring words can be clustered as if they were documents and should give you reasonable usage clusters.
If you mean by internal structure, then you need to do something a bit different. On Thu, Dec 1, 2011 at 7:08 PM, Neil Chaudhuri <[email protected] > wrote: > I have a need to cluster a collection of words and phrases by syntactic > similarity over a distributed environment, and I came upon Mahout as a > possible solution. After studying the documentation though, I am finding > all of it tailored to working with entire documents rather than words and > phrases. I simply want to know if you believe that Mahout is the right tool > for this job. I suppose I could try to view each word and phrase as > individual tiny documents, but that feels like I am forcing it. > > Any insight is appreciated. > > Thanks. >
