Re: Crunch on EMR

Som Satpathy Tue, 01 Oct 2013 15:23:30 -0700

Thanks a lot Josh! That helped.

Regards,
Som



On Tue, Oct 1, 2013 at 1:10 PM, Josh Wills <[email protected]> wrote:

> Hey Som,
>
> You should be able to use any of the non-hadoop2 jars for Crunch on EMR,
> like the regular 0.7.0:
>
> http://mvnrepository.com/artifact/org.apache.crunch/crunch-core/0.7.0
>
> Those are compiled against the MR1 APIs, which is why you're getting the
> TaskInputOutputContext exception (the API changed from MR1 to MR2, which
> CDH4.3.0 and hadoop2 use.)
>
> Josh
>
>
> On Tue, Oct 1, 2013 at 12:00 PM, Som Satpathy <[email protected]>wrote:
>
>> Hi All,
>>
>> I have been trying to run crunch jobs on amazon EMR and faced a problem
>> while job execution -
>>
>> "found class org.apache.hadoop.mapreduce.taskinputoutputcontext but
>> interface was expected"
>>
>> This is happening because of hadoop incompatibilities between APIs used
>> while implementing the hadoop job, and the hadoop-code that runs in the
>> cluster.
>>
>> My crunch fat jar is based on crunch version 0.7 (CDH 4.3.0) while EMR
>> runs hadoop 1.0.3 (where TaskInputOutputContext is implemented as an
>> abstract class)
>>
>> Has any one been able to successfully execute their crunch jobs on EMR?
>>
>> If yes, what are the best practices to make custom crunch fat jars work
>> on EMR?
>>
>>
>> Look forward to hearing your thoughts.
>>
>> Thanks,
>>
>> Som
>>
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Re: Crunch on EMR

Reply via email to