Hi.Have you ever seen a testcase called TestWordCount.java in avro source
code?
Here is the mapper:
public static class MapImpl extends AvroMapper<Utf8, Pair<Utf8, Long> > {
@Override
public void map(Utf8 text, AvroCollector<Pair<Utf8,Long>> collector,
Reporter reporter) throws IOException {
StringTokenizer tokens = new StringTokenizer(text.toString());
while (tokens.hasMoreTokens())
// collector.c
collector.collect(new Pair<Utf8,Long>(new
Utf8(tokens.nextToken()),1L));
}
}
Reducer:
public static class ReduceImpl
extends AvroReducer<Utf8, Long, Pair<Utf8, Long> > {
@Override
public void reduce(Utf8 word, Iterable<Long> counts,
AvroCollector<Pair<Utf8,Long>> collector,
Reporter reporter) throws IOException {
long sum = 0;
for (long count : counts)
sum += count;
collector.collect(new Pair<Utf8,Long>(word, sum));
}
}
Pair<Key,Value> stands for key/value pair..
2011/4/19 Markus Weimer <[email protected]>
> Hi,
>
> another question about writing hadoop jobs using avro. I want to implement
> a basic shuffle and file aggregation: Mappers emit their input with random
> keys, reducers just write to disk. The number of reducers determines how
> many files I get in the result. The mapred documentation on Jobs where both
> input and putput are avro says:
>
> > Subclass AvroMapper and specify this as your job's mapper with [...]
>
> However, AvroMapper only seems to support input and output values, not
> keys. Did I miss the obvious here?
>
> Thanks,
>
> Markus
>
> PS: Ideally, I'd implement the shuffle without ever deserializing the data,
> which should be possible. But that is the next step.
>