Difficulties adding a custom job (analyzer) to Hadoop

Mohammed Omer Thu, 07 Aug 2014 13:39:07 -0700

All,

I'm having a tough time adding a custom analyzer to Hadoop and making use
of it through Mahout.


I've pruned down the Mahout in Action examples to a sole example which is a
customized Mahout 0.9 MailArchivesClusteringAnalyzer in
https://github.com/momer/MiA/blob/mahout-0.9/src/main/java/mia/clustering/ch09/MoAnalyzer.java

After updating the pom.xml to use Mahout 0.9, running `mvn package` and
moving the `mia-0.7-job.jar` to $HADOOP_HOME/lib, I run into a few issues:

First, I'm unsure how to remove the duplication of dependencies on SLF4J
from the job.jar, and,

Secondly, Hadoop is unable to find the Mahout classes when I'm using my
custom job jar.

Relevant stack traces are available at
https://gist.github.com/momer/52e1e7d2dd7612b26909

I'm admittedly pretty new to Hadoop/Mahout, and would really appreciate any
pointers in the right direction. Pretty much just need to get that Porter
Stemming step out of the analyzer!

Thank you much all for maintaining and keeping the mailing list alive,

Mo

Difficulties adding a custom job (analyzer) to Hadoop

Reply via email to