Also, to help a little bit with your first direct question re: which
jars exactly... This is maven command that prints the exact and full
transient dependency tree for mahout-core:

cd core
dmitriy@BigDellRig:~/projects/github/mahout-commits/core$ ls
bin  pom.xml  src  target  temp  testdata
dmitriy@BigDellRig:~/projects/github/mahout-commits/core$ mvn dependency:tree
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Mahout Core 0.6-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-dependency-plugin:2.1:tree (default-cli) @ mahout-core ---
[INFO] org.apache.mahout:mahout-core:jar:0.6-SNAPSHOT
[INFO] +- org.apache.mahout:mahout-math:jar:0.6-SNAPSHOT:compile
[INFO] |  +- org.uncommons.maths:uncommons-maths:jar:1.2.2:compile
[INFO] |  |  \- jfree:jcommon:jar:1.0.12:compile
[INFO] |  +- com.google.guava:guava:jar:r09:compile
[INFO] |  \- org.apache.mahout:mahout-collections:jar:1.0:compile
[INFO] +- org.apache.mahout:mahout-math:test-jar:tests:0.6-SNAPSHOT:test
[INFO] +- org.apache.hadoop:hadoop-core:jar:0.20.204.0:compile
[INFO] |  +- commons-cli:commons-cli:jar:1.2:compile
[INFO] |  +- commons-httpclient:commons-httpclient:jar:3.0.1:compile
[INFO] |  |  \- commons-logging:commons-logging:jar:1.1.1:compile
[INFO] |  +- commons-codec:commons-codec:jar:1.4:compile
[INFO] |  \- commons-configuration:commons-configuration:jar:1.6:compile
[INFO] |     +- commons-collections:commons-collections:jar:3.2.1:compile
[INFO] |     +- commons-digester:commons-digester:jar:1.8:compile
[INFO] |     |  \- commons-beanutils:commons-beanutils:jar:1.7.0:compile
[INFO] |     \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
[INFO] +- org.codehaus.jackson:jackson-core-asl:jar:1.8.2:compile
[INFO] +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.2:compile
[INFO] +- org.slf4j:slf4j-api:jar:1.6.1:compile
[INFO] +- org.slf4j:slf4j-jcl:jar:1.6.1:test
[INFO] +- commons-lang:commons-lang:jar:2.6:compile
[INFO] +- org.uncommons.watchmaker:watchmaker-framework:jar:0.6.2:compile
[INFO] +- com.thoughtworks.xstream:xstream:jar:1.3.1:compile
[INFO] |  \- xpp3:xpp3_min:jar:1.1.4c:compile
[INFO] +- org.apache.lucene:lucene-core:jar:3.4.0:compile
[INFO] +- org.apache.lucene:lucene-analyzers:jar:3.4.0:compile
[INFO] +- org.apache.mahout.commons:commons-cli:jar:2.0-mahout:compile
[INFO] +- org.apache.commons:commons-math:jar:2.2:compile
[INFO] +- junit:junit:jar:4.8.2:test
[INFO] \- org.easymock:easymock:jar:3.0:test
[INFO]    +- cglib:cglib-nodep:jar:2.2:test
[INFO]    \- org.objenesis:objenesis:jar:1.2:test
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.567s
[INFO] Finished at: Thu Dec 29 12:34:45 PST 2011
[INFO] Final Memory: 13M/217M
[INFO] ------------------------------------------------------------------------


On Thu, Dec 29, 2011 at 12:28 PM, Dmitriy Lyubimov <[email protected]> wrote:
> bottom line, try to narrow your case to one of the 4, and then
> probably it would be more clear where to dig to get your info.
>
>
> ---------- Forwarded message ----------
> From: Dmitriy Lyubimov <[email protected]>
> Date: Thu, Dec 29, 2011 at 12:25 PM
> Subject: Re: STEPS(how) to write programs using mahout..
> To: [email protected]
>
>
> 1) Are you sure you can't use Mahout command line?
>
> if no, try command line, otherwise proceed to #2.
>
> 2) Are you resolved to run it embedded client side?
>
> if no, go back to command line use.
> if yes, your best bet is to build a maven project. Unfortunately i
> cannot help you with maven references within framework of this list. I
> think you need some maven resource to read thru how to build that.
>
>
> 3) Are you also running MR backend-side with mahout dependencies as well?
> If yes, you need something called mahout-core-0.6-SNAPSHOT-job.jar (if
> you build Mahout from source, it will land in core/target folder).
> That's something called "hadoop job" jar which you can redistribute to
> MR backend tasks. If that's what you want to do, try to ask on Hadoop
> forums how to do it in your mapreduce-enabled applications, I am not
> really 100% sure myself. Standard hadoop command takes those with
> --jar option.
>
> 4) Sometimes it is also needed to do something of inverse nature: to
> include some of _your_ libraries running in backend with Mahout tasks.
> (example being: custom lucene text analyzer for text inputs). I think
> it may be also achievable with mahout command line option by using the
> same standard --jar option for your own hadoop job jar, but I am not
> 100% sure. I did somethnig like that long ago but i can't remember how
> it was done now.
>
> Thanks.
> -Dmitriy
>
> On Thu, Dec 29, 2011 at 1:02 AM, rahul raghavendhra
> <[email protected]> wrote:
>> It sound better.. can u please elaborate so that new uses like me can
>> learn.. thanks Dmitry.. Please help.. thanks in advance
>>
>> ./rahul
>>
>>
>> On Thu, Dec 29, 2011 at 2:07 PM, Dmitriy Lyubimov <[email protected]> wrote:
>>
>>> > (I actually don't do that, I do it slightly
>>> >other way, by publishing all dependency jars of my project on hdfs and
>>> >then use DistributedCache to add them to my MR classpath, so i don't
>>> >know for sure about using mahout hadoop job jar outside the command
>>> line).
>>> >But command line is still probably the best way to try something,
>>> >embedding takes more time.
>>>
>>>

Reply via email to