Thanks Devin. We don't just want one file. It's complicated. if the input folder contains data in X hours, we want X files, if Y hours, we want Y files.
obviously, X or Y is unknown on compile time. 2014-03-01 20:48 GMT+08:00 Devin Suiter RDX <[email protected]>: > If you only want one file, then you need to set the number of reducers to > 1. > > If the size of the data makes the original MR job impractical to use a > single reducer, you run a second job on the output of the first, with the > default mapper and reducer, which are the Identity- ones, and set that > numReducers = 1. > > Or use hdfs getmerge function to collate the results to one file. > On Mar 1, 2014 4:59 AM, "Fengyun RAO" <[email protected]> wrote: > >> Thanks, but how to set reducer number to X? X is dependent on input >> (run-time), which is unknown on job configuration (compile time). >> >> >> 2014-03-01 17:44 GMT+08:00 AnilKumar B <[email protected]>: >> >>> Hi, >>> >>> Write the custom partitioner on <timestamp> and as you mentioned set >>> #reducers to X. >>> >>> >>> >>
