I'm currently using a UNION-schema to map two different types of data (read from two different input paths) in my reducer to a common record. This works fine, but - if I have understood the mechanism correctly - it would mean that Avro is having to check each and every record against my UNION schema. With a "normal" reduce-side join, I could use MultipleInputs to specify a mapper for each input, thus letting them run independently (since each mapper knows its input) with presumably less overhead.
Is it possible with Avro to avoid the overhead of checking each input row against the union schema? Thanks, Andrew >________________________________ > From: Scott Carey <[email protected]> >To: "[email protected]" <[email protected]>; Andrew Kenworthy ><[email protected]> >Sent: Wednesday, December 7, 2011 7:40 PM >Subject: Re: Reduce-side joins in Avro M/R > > >This should be conceptually the same as a normal map-reduce join of the same >type. Avro handles the serialization, but not the map-reduce algorithm or >strategy. > >On 12/6/11 8:43 AM, "Andrew Kenworthy" <[email protected]> wrote: > > >Hi, >> >> >>I'd like to use reduce-side joins in an avro M/R job, and am not sure how to >>do it: are there any best-practice tips or outlines of what one would have to >>implement in order to make this possible? >> >> >>Thanks, >> >> >>Andrew Kenworthy > >
