You don't need Mahout for this. A very easy way to do this is to gather all the words for each category into a document. Thus:
CatA:selling buying sales payment CatB:gathering collecting CatC:information data info Then put these into a text retrieval engine so that you have one document per category. When you get a new document to categorize, just use the document as a query and you will get a list of possible categories back. Make sure you set the default query mode to OR for this. See http://wiki.apache.org/solr/SolrQuerySyntax for more on the syntax. On Fri, Oct 11, 2013 at 5:04 AM, Kasi Subrahmanyam <[email protected]>wrote: > Hi, > > I have a problem that i would like to implement in mahout clustering. > > I have input text documents with data like below. > > Document1: This is the first document of selling information. > Document2: This is the second document of gathering information. > > I also have another look up file with data like below > selling:CatA > gathering:CatB. > information:CatC > > NOw i would like to cluster the documents with output being genrated as > Document1:CatA,CatC > Document2:CatB,CatC > > Please let me know how to achieve this. > > Thanks, > Subbu >
