Hi, I am working on a text mining of huge data. I have big set of strings (separated by a new line character), on which I want to run a algorithm which can give me similarity distances between the string. Further, I want to use that distance to group those strings based on their similarities. Now, I am new to mahout, but I also believe that for the size of data I have mahout can be good option. I am wondering if anyone can guide me how should I proceed with this problem.
Thanks for your help!! Regards, Atul Aggarwal
