Hi Siddharth,

              Your question is not very clear on what (either
keywords/documents) you want to cluster.
              BTW if you are looking for document clustering its straight
approach using the term/keyword weights and you can find documentation in
Mahout in Action or some other links on how to do this.
              Through vectorization process you have to convert your input
data into vectors which Mahout understands.

              If you are looking for Keyword clustering then probably you
need to identify certain features which could be helpful for finding
keyword clusters for example, whether a keyword is a NOUN, VERB, ADJECTIVE
etc and the synonyms associated with a word etc based on your requirement.
              After feature selection you need to create vectors associated
with each keyword and your vector can contain the values for all your
identified features.
              Finally you can pass through these vectors to K-Means
clustering algorithm in order to get keyword clusters.
              You can have better documentation in Mahout in Action on
clustering documents.

Best,
Mahesh Balija,
CalsoftLabs.

On Sat, May 19, 2012 at 1:16 PM, siddharth0ece <[email protected]>wrote:

> Friends,
>
> I have a .txt file with so many keywords, it is in normal notepad text
> format. I wanted to use Mahout Kmeans to cluster similar type of keywords
> together. Can you please help on how to go about this, I have been doing
> lot
> of search but have no idea how to do it. Please help me urgently, how shall
> I convert this text file in mahout friendly format and go about this.
>
> I will be highly thankful for your help.
>
> Regards
> Siddharth
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-make-normal-Text-suitable-for-Kmeans-using-mahout-tp3984839.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>

Reply via email to