You don't need Mahout for this.

A very easy way to do this is to gather all the words for each category
into a document.  Thus:

CatA:selling buying sales payment
CatB:gathering collecting
CatC:information data info

Then put these into a text retrieval engine so that you have one document
per category.

When you get a new document to categorize, just use the document as a query
and you will get a list of possible categories back.  Make sure you set the
default query mode to OR for this.

See http://wiki.apache.org/solr/SolrQuerySyntax for more on the syntax.



On Fri, Oct 11, 2013 at 5:04 AM, Kasi Subrahmanyam
<[email protected]>wrote:

> Hi,
>
> I have a problem that i would like to implement in mahout clustering.
>
> I have input text documents with data like below.
>
> Document1: This is the first document of selling information.
> Document2: This is the second document of gathering information.
>
> I also have another look up file with data like below
> selling:CatA
> gathering:CatB.
> information:CatC
>
> NOw i would like to cluster the documents with output being genrated as
> Document1:CatA,CatC
> Document2:CatB,CatC
>
> Please let me know how to achieve this.
>
> Thanks,
> Subbu
>

Reply via email to