Thanks Ted, Nathan. Great advice.
So I've been looking at the InputFormat, RecordReader, and InputSplit
interfaces and associated classes and trying to get my head around it.
For the situation I'm in, where I have two types of file, the names are
distinct, and the names actually have time
Map-reduce excels at gluing together files like this.
The map phase selects the key and makes sure that you have some way of
telling what the source of the record is.
The reduce phase takes all of the records with the same key and glues them
together. It can do your processing, but it is also
It's possible to do the whole thing in one round of map/reduce.
The only requirement is to be able to differentiate between the 2
different types of input files, possibly using different file name
extensions.
One of my coworkers wrote a smart InputFormat class that creates a
different