Re: Map-Reduce: How to make MR output one file an hour?

Fengyun RAO Sat, 01 Mar 2014 19:38:16 -0800

Thanks Devin. We don't just want one file. It's complicated.

if the input folder contains data in X hours, we want X files,
if Y hours, we want Y files.


obviously, X or Y is unknown on compile time.

2014-03-01 20:48 GMT+08:00 Devin Suiter RDX <[email protected]>:

> If you only want one file, then you need to set the number of reducers to
> 1.
>
> If the size of the data makes the original MR job impractical to use a
> single reducer, you run a second job on the output of the first, with the
> default mapper and reducer, which are the Identity- ones, and set that
> numReducers = 1.
>
> Or use hdfs getmerge function to collate the results to one file.
> On Mar 1, 2014 4:59 AM, "Fengyun RAO" <[email protected]> wrote:
>
>> Thanks, but how to set reducer number to X? X is dependent on input
>> (run-time), which is unknown on job configuration (compile time).
>>
>>
>> 2014-03-01 17:44 GMT+08:00 AnilKumar B <[email protected]>:
>>
>>> Hi,
>>>
>>> Write the custom partitioner on <timestamp> and as you mentioned set
>>> #reducers to X.
>>>
>>>
>>>
>>

Re: Map-Reduce: How to make MR output one file an hour?

Reply via email to