Re: Text Classification using Mahout

Grant Ingersoll Wed, 29 Sep 2010 07:32:52 -0700

On Sep 28, 2010, at 2:54 PM, Ted Dunning wrote:

> Neil,
> 
> That example should be updated to the current trunk version of the software.  
> That isn't likely to happen right away, so you should
> adapt the procedures.
> 
> On Tue, Sep 28, 2010 at 10:49 AM, Neil Ghosh <[email protected]> wrote:
> Hi Grant ,
> 
> I am trying to run the classification example in
> 
> http://www.ibm.com/developerworks/java/library/j-mahout/
> 
> doing the step 3. ant install
> 
> We don't use ant any more.


I used Ant to build/run the examples.  The examples came w/ Mahout already 
built, so no need for Maven for the examples.

> 
> You should use 'mvn install' here instead.  Make sure you have checked out 
> the trunk version of the software.
>  
> 
> However it is trying to download the 2GB file , I might run out of space in
> my linux partition , also download may be disturbed in my connection .
> 
> Yes.  These could happen.  IF this is a problem, you might want to invest a 
> tiny amount of money to rent an EC2 machine for a few hours.  This literally 
> will be less than a dollar, even if you have to go through the process 
> several times.

Yes, it is going to get the Wikipedia data set.  It expands to about 10GB, if I 
recall.

> 
> Yes
> is there any way I can test the example in a smaller set of wikipedia data
> or download the data offline ?
> 
> Sure.  Try the 20newsgroups examples.

Yep, the principals here are the same.  For the wikipedia, all I did was 
classify into Democrats and Republicans, but the underlying process really is 
no different.

> 
> Also, you can download the wikipedia test data any way  you like.  

--------------------------
Grant Ingersoll
http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8

Re: Text Classification using Mahout

Reply via email to