Also, to help a little bit with your first direct question re: which jars exactly... This is maven command that prints the exact and full transient dependency tree for mahout-core:
cd core dmitriy@BigDellRig:~/projects/github/mahout-commits/core$ ls bin pom.xml src target temp testdata dmitriy@BigDellRig:~/projects/github/mahout-commits/core$ mvn dependency:tree [INFO] Scanning for projects... [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building Mahout Core 0.6-SNAPSHOT [INFO] ------------------------------------------------------------------------ [INFO] [INFO] --- maven-dependency-plugin:2.1:tree (default-cli) @ mahout-core --- [INFO] org.apache.mahout:mahout-core:jar:0.6-SNAPSHOT [INFO] +- org.apache.mahout:mahout-math:jar:0.6-SNAPSHOT:compile [INFO] | +- org.uncommons.maths:uncommons-maths:jar:1.2.2:compile [INFO] | | \- jfree:jcommon:jar:1.0.12:compile [INFO] | +- com.google.guava:guava:jar:r09:compile [INFO] | \- org.apache.mahout:mahout-collections:jar:1.0:compile [INFO] +- org.apache.mahout:mahout-math:test-jar:tests:0.6-SNAPSHOT:test [INFO] +- org.apache.hadoop:hadoop-core:jar:0.20.204.0:compile [INFO] | +- commons-cli:commons-cli:jar:1.2:compile [INFO] | +- commons-httpclient:commons-httpclient:jar:3.0.1:compile [INFO] | | \- commons-logging:commons-logging:jar:1.1.1:compile [INFO] | +- commons-codec:commons-codec:jar:1.4:compile [INFO] | \- commons-configuration:commons-configuration:jar:1.6:compile [INFO] | +- commons-collections:commons-collections:jar:3.2.1:compile [INFO] | +- commons-digester:commons-digester:jar:1.8:compile [INFO] | | \- commons-beanutils:commons-beanutils:jar:1.7.0:compile [INFO] | \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile [INFO] +- org.codehaus.jackson:jackson-core-asl:jar:1.8.2:compile [INFO] +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.2:compile [INFO] +- org.slf4j:slf4j-api:jar:1.6.1:compile [INFO] +- org.slf4j:slf4j-jcl:jar:1.6.1:test [INFO] +- commons-lang:commons-lang:jar:2.6:compile [INFO] +- org.uncommons.watchmaker:watchmaker-framework:jar:0.6.2:compile [INFO] +- com.thoughtworks.xstream:xstream:jar:1.3.1:compile [INFO] | \- xpp3:xpp3_min:jar:1.1.4c:compile [INFO] +- org.apache.lucene:lucene-core:jar:3.4.0:compile [INFO] +- org.apache.lucene:lucene-analyzers:jar:3.4.0:compile [INFO] +- org.apache.mahout.commons:commons-cli:jar:2.0-mahout:compile [INFO] +- org.apache.commons:commons-math:jar:2.2:compile [INFO] +- junit:junit:jar:4.8.2:test [INFO] \- org.easymock:easymock:jar:3.0:test [INFO] +- cglib:cglib-nodep:jar:2.2:test [INFO] \- org.objenesis:objenesis:jar:1.2:test [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1.567s [INFO] Finished at: Thu Dec 29 12:34:45 PST 2011 [INFO] Final Memory: 13M/217M [INFO] ------------------------------------------------------------------------ On Thu, Dec 29, 2011 at 12:28 PM, Dmitriy Lyubimov <[email protected]> wrote: > bottom line, try to narrow your case to one of the 4, and then > probably it would be more clear where to dig to get your info. > > > ---------- Forwarded message ---------- > From: Dmitriy Lyubimov <[email protected]> > Date: Thu, Dec 29, 2011 at 12:25 PM > Subject: Re: STEPS(how) to write programs using mahout.. > To: [email protected] > > > 1) Are you sure you can't use Mahout command line? > > if no, try command line, otherwise proceed to #2. > > 2) Are you resolved to run it embedded client side? > > if no, go back to command line use. > if yes, your best bet is to build a maven project. Unfortunately i > cannot help you with maven references within framework of this list. I > think you need some maven resource to read thru how to build that. > > > 3) Are you also running MR backend-side with mahout dependencies as well? > If yes, you need something called mahout-core-0.6-SNAPSHOT-job.jar (if > you build Mahout from source, it will land in core/target folder). > That's something called "hadoop job" jar which you can redistribute to > MR backend tasks. If that's what you want to do, try to ask on Hadoop > forums how to do it in your mapreduce-enabled applications, I am not > really 100% sure myself. Standard hadoop command takes those with > --jar option. > > 4) Sometimes it is also needed to do something of inverse nature: to > include some of _your_ libraries running in backend with Mahout tasks. > (example being: custom lucene text analyzer for text inputs). I think > it may be also achievable with mahout command line option by using the > same standard --jar option for your own hadoop job jar, but I am not > 100% sure. I did somethnig like that long ago but i can't remember how > it was done now. > > Thanks. > -Dmitriy > > On Thu, Dec 29, 2011 at 1:02 AM, rahul raghavendhra > <[email protected]> wrote: >> It sound better.. can u please elaborate so that new uses like me can >> learn.. thanks Dmitry.. Please help.. thanks in advance >> >> ./rahul >> >> >> On Thu, Dec 29, 2011 at 2:07 PM, Dmitriy Lyubimov <[email protected]> wrote: >> >>> > (I actually don't do that, I do it slightly >>> >other way, by publishing all dependency jars of my project on hdfs and >>> >then use DistributedCache to add them to my MR classpath, so i don't >>> >know for sure about using mahout hadoop job jar outside the command >>> line). >>> >But command line is still probably the best way to try something, >>> >embedding takes more time. >>> >>>
