Re: Confused on binary vs source distributions

Sean Owen Thu, 09 Jun 2011 09:09:50 -0700

Now that sounds like a problem then. I can guess what it is too. The script
looks for job files in different places than the binary distribution places
them, I believe. Let me open a JIRA for this; I bet you can find an answer
quite quickly though if you debug the script a little and see where it looks
for the job file vs where it really is. I imagine the script can just search
more widely.


On Thu, Jun 9, 2011 at 3:38 PM, Mark <[email protected]> wrote:

> Thanks for the explanation. I understand that hadoop needs all required
> jars bundled together for it to work across nodes since they obviously will
> need those dependencies. I also understand that binary distribution are
> build from source but I'm still confused though why running seq2sparse using
> the source distribution works while the bin distribution doesnt.
>
> For example I tried seq2sparse on the binary distribution using the
> bin/mahout launcher and I receive:
>
>
> 11/06/08 21:17:00 INFO mapred.JobClient: Task Id :
> attempt_201106061352_0066_r_000001_1, Status : FAILED
> Error: java.lang.ClassNotFoundException:
> org.apache.lucene.analysis.TokenStream
>
> Same on the source distribution and everything will work as expected.
>
>
> On 6/9/11 12:32 AM, Sean Owen wrote:
>
>> These aren't specific to Mahout.
>>
>> To run a Hadoop job, you have to give it all dependencies together. This
>> is
>> the error you're getting. To help with this, the distro has 'job' files
>> with
>> all dependencies packaged together for you. Your next error is just
>> another
>> symptom of this. No Hadoop job can run without its dependencies available
>> on
>> workers.
>>
>> Here as in most projects, the bin and src files are built from the same
>> source. The difference is that bin contains the compiled artifacts and not
>> the source code, and is "ready to run". In bin I believe the configuration
>> is built into the compiled jar, yes.
>>
>> On Thu, Jun 9, 2011 at 2:18 AM, Mark<[email protected]>  wrote:
>>
>>  I explained in an earlier post that I was having problems running some
>>> examples on a cluster when using the binary distribution. My cluster was
>>> complaining about missing classes.. ie lucene analyzer and google
>>> preconditions. However when I tried the same thing on a src distribution
>>> (and after mvn package) I didn't receive those errors.
>>>
>>> How do the bin and src distributions differ?
>>>
>>> I also noticed that I was able to directly modify the
>>> driver.classes.props
>>> file using the src distribution and those changes were available
>>> immediately. When I tried the same on the binary distribution my changes
>>> never appeared??? Is this to be expected?
>>>
>>> Thanks for any clarifications.
>>>
>>>

Re: Confused on binary vs source distributions

Reply via email to