Re: Keys between Mapper and Reducer in AvroJobs

Felix . 徐 Mon, 18 Apr 2011 18:15:54 -0700

Hi.Have you ever seen a testcase called TestWordCount.java in avro source
code?
Here is the mapper:
  public static class MapImpl extends AvroMapper<Utf8, Pair<Utf8, Long> > {
    @Override
      public void map(Utf8 text, AvroCollector<Pair<Utf8,Long>> collector,
                      Reporter reporter) throws IOException {
      StringTokenizer tokens = new StringTokenizer(text.toString());
      while (tokens.hasMoreTokens())
        // collector.c
        collector.collect(new Pair<Utf8,Long>(new
Utf8(tokens.nextToken()),1L));
    }
  }
Reducer:
  public static class ReduceImpl
    extends AvroReducer<Utf8, Long, Pair<Utf8, Long> > {
    @Override
    public void reduce(Utf8 word, Iterable<Long> counts,
                       AvroCollector<Pair<Utf8,Long>> collector,
                       Reporter reporter) throws IOException {
      long sum = 0;
      for (long count : counts)
        sum += count;
      collector.collect(new Pair<Utf8,Long>(word, sum));
    }
  }
Pair<Key,Value> stands for key/value pair..
2011/4/19 Markus Weimer <[email protected]>


> Hi,
>
> another question about writing hadoop  jobs using avro. I want to implement
> a basic shuffle and file aggregation: Mappers emit their input with random
> keys, reducers just write to disk. The number of reducers determines how
> many files I get in the result. The mapred documentation on Jobs where both
> input and putput are avro says:
>
> > Subclass AvroMapper and specify this as your job's mapper with [...]
>
> However, AvroMapper only seems to support input and output values, not
> keys. Did I miss the obvious here?
>
> Thanks,
>
> Markus
>
> PS: Ideally, I'd implement the shuffle without ever deserializing the data,
> which should be possible. But that is the next step.
>

Re: Keys between Mapper and Reducer in AvroJobs

Reply via email to