Re: Reduce-side joins in Avro M/R

Andrew Kenworthy Tue, 13 Dec 2011 08:46:45 -0800

I'm currently using a UNION-schema to map two different types of data (read 
from two different input paths) in my reducer to a common record. This works 
fine, but - if I have understood the mechanism correctly - it would mean that 
Avro is having to check each and every record against my UNION schema. With a 
"normal" reduce-side join, I could use MultipleInputs to specify a mapper for 
each input, thus letting them run independently (since each mapper knows its 
input) with presumably less overhead.



Is it possible with Avro to avoid the overhead of checking each input row 
against the union schema?

Thanks,

Andrew



>________________________________
> From: Scott Carey <[email protected]>
>To: "[email protected]" <[email protected]>; Andrew Kenworthy 
><[email protected]> 
>Sent: Wednesday, December 7, 2011 7:40 PM
>Subject: Re: Reduce-side joins in Avro M/R
> 
>
>This should be conceptually the same as a normal map-reduce join of the same 
>type.  Avro handles the serialization, but not the map-reduce algorithm or 
>strategy.   
>
>On 12/6/11 8:43 AM, "Andrew Kenworthy" <[email protected]> wrote:
>
>
>Hi,
>>
>>
>>I'd like to use reduce-side joins in an avro M/R job, and am not sure how to 
>>do it: are there any best-practice tips or outlines of what one would have to 
>>implement in order to make this possible?
>>
>>
>>Thanks,
>>
>>
>>Andrew Kenworthy
>
>

Re: Reduce-side joins in Avro M/R

Reply via email to