what a nice idea :-) really like that approach
2013/10/11 Ted Dunning <[email protected]> > You don't need Mahout for this. > > A very easy way to do this is to gather all the words for each category > into a document. Thus: > > CatA:selling buying sales payment > CatB:gathering collecting > CatC:information data info > > Then put these into a text retrieval engine so that you have one document > per category. > > When you get a new document to categorize, just use the document as a query > and you will get a list of possible categories back. Make sure you set the > default query mode to OR for this. > > See http://wiki.apache.org/solr/SolrQuerySyntax for more on the syntax. > > > > On Fri, Oct 11, 2013 at 5:04 AM, Kasi Subrahmanyam > <[email protected]>wrote: > > > Hi, > > > > I have a problem that i would like to implement in mahout clustering. > > > > I have input text documents with data like below. > > > > Document1: This is the first document of selling information. > > Document2: This is the second document of gathering information. > > > > I also have another look up file with data like below > > selling:CatA > > gathering:CatB. > > information:CatC > > > > NOw i would like to cluster the documents with output being genrated as > > Document1:CatA,CatC > > Document2:CatB,CatC > > > > Please let me know how to achieve this. > > > > Thanks, > > Subbu > > >
