Re: Text Classification using Mahout

Grant Ingersoll Mon, 27 Sep 2010 05:42:45 -0700

On Sep 24, 2010, at 1:12 PM, Neil Ghosh wrote:

> Is there any other examples/documents/reference how to use mahout for* text
> classification.
> *
> I went through and ran the following
> 
> 
>   1. Wikipedia Bayes
> Example<https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html>-
> Classify Wikipedia data.
> 
> 
>   1. Twenty 
> Newsgroups<https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html>-
> Classify the classic Twenty Newsgroups data.
> 
> However these two are not much definitive and there aren't much explanation
> for the examples .Please share if there are more documentation.



What kinds of problems are you looking to solve?  In general, we don't have too 
much in the way of special things for text other than we have various utilities 
for converting text into Mahout's vector format based on various weighting 
schemes.  Both of those examples just take and convert the text into vectors 
and then either train or test on them.  I would agree, though, that a good 
tutorial is needed.  It's a bit out of date in terms of the actual commands, 
but I believe the concepts are still accurate: 
http://www.ibm.com/developerworks/java/library/j-mahout/

See 
https://cwiki.apache.org/confluence/display/MAHOUT/Mahout+Wiki#MahoutWiki-ImplementationBackground
 (and the creating vectors section).  Also see the Algorithms section.


--------------------------
Grant Ingersoll
http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8

Re: Text Classification using Mahout

Reply via email to