On Sep 24, 2010, at 1:12 PM, Neil Ghosh wrote: > Is there any other examples/documents/reference how to use mahout for* text > classification. > * > I went through and ran the following > > > 1. Wikipedia Bayes > Example<https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html>- > Classify Wikipedia data. > > > 1. Twenty > Newsgroups<https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html>- > Classify the classic Twenty Newsgroups data. > > However these two are not much definitive and there aren't much explanation > for the examples .Please share if there are more documentation.
What kinds of problems are you looking to solve? In general, we don't have too much in the way of special things for text other than we have various utilities for converting text into Mahout's vector format based on various weighting schemes. Both of those examples just take and convert the text into vectors and then either train or test on them. I would agree, though, that a good tutorial is needed. It's a bit out of date in terms of the actual commands, but I believe the concepts are still accurate: http://www.ibm.com/developerworks/java/library/j-mahout/ See https://cwiki.apache.org/confluence/display/MAHOUT/Mahout+Wiki#MahoutWiki-ImplementationBackground (and the creating vectors section). Also see the Algorithms section. -------------------------- Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8
