I have a need to cluster a collection of words and phrases by syntactic similarity over a distributed environment, and I came upon Mahout as a possible solution. After studying the documentation though, I am finding all of it tailored to working with entire documents rather than words and phrases. I simply want to know if you believe that Mahout is the right tool for this job. I suppose I could try to view each word and phrase as individual tiny documents, but that feels like I am forcing it.
Any insight is appreciated. Thanks.
