Ok, that makes sense then.

I don't have the script in front of me but one difference I do remember is that its checks for the presence of src/conf and adds its to its classpath. In the binary distribution there is no src/conf... just conf.

Thanks

On 6/9/11 9:08 AM, Sean Owen wrote:
Now that sounds like a problem then. I can guess what it is too. The script
looks for job files in different places than the binary distribution places
them, I believe. Let me open a JIRA for this; I bet you can find an answer
quite quickly though if you debug the script a little and see where it looks
for the job file vs where it really is. I imagine the script can just search
more widely.

On Thu, Jun 9, 2011 at 3:38 PM, Mark<[email protected]>  wrote:

Thanks for the explanation. I understand that hadoop needs all required
jars bundled together for it to work across nodes since they obviously will
need those dependencies. I also understand that binary distribution are
build from source but I'm still confused though why running seq2sparse using
the source distribution works while the bin distribution doesnt.

For example I tried seq2sparse on the binary distribution using the
bin/mahout launcher and I receive:


11/06/08 21:17:00 INFO mapred.JobClient: Task Id :
attempt_201106061352_0066_r_000001_1, Status : FAILED
Error: java.lang.ClassNotFoundException:
org.apache.lucene.analysis.TokenStream

Same on the source distribution and everything will work as expected.


On 6/9/11 12:32 AM, Sean Owen wrote:

These aren't specific to Mahout.

To run a Hadoop job, you have to give it all dependencies together. This
is
the error you're getting. To help with this, the distro has 'job' files
with
all dependencies packaged together for you. Your next error is just
another
symptom of this. No Hadoop job can run without its dependencies available
on
workers.

Here as in most projects, the bin and src files are built from the same
source. The difference is that bin contains the compiled artifacts and not
the source code, and is "ready to run". In bin I believe the configuration
is built into the compiled jar, yes.

On Thu, Jun 9, 2011 at 2:18 AM, Mark<[email protected]>   wrote:

  I explained in an earlier post that I was having problems running some
examples on a cluster when using the binary distribution. My cluster was
complaining about missing classes.. ie lucene analyzer and google
preconditions. However when I tried the same thing on a src distribution
(and after mvn package) I didn't receive those errors.

How do the bin and src distributions differ?

I also noticed that I was able to directly modify the
driver.classes.props
file using the src distribution and those changes were available
immediately. When I tried the same on the binary distribution my changes
never appeared??? Is this to be expected?

Thanks for any clarifications.


Reply via email to