Re: MapReduce with related data from disparate files

2008-03-25 Thread Colin Freas
Thanks Ted, Nathan. Great advice. So I've been looking at the InputFormat, RecordReader, and InputSplit interfaces and associated classes and trying to get my head around it. For the situation I'm in, where I have two types of file, the names are distinct, and the names actually have time

Re: MapReduce with related data from disparate files

2008-03-24 Thread Ted Dunning
Map-reduce excels at gluing together files like this. The map phase selects the key and makes sure that you have some way of telling what the source of the record is. The reduce phase takes all of the records with the same key and glues them together. It can do your processing, but it is also

RE: MapReduce with related data from disparate files

2008-03-24 Thread Nathan Wang
It's possible to do the whole thing in one round of map/reduce. The only requirement is to be able to differentiate between the 2 different types of input files, possibly using different file name extensions. One of my coworkers wrote a smart InputFormat class that creates a different