You could combine them into one file using a reduce stage (with a
single reducer), or by using hadoop fs -getmerge on the output
directory.
Cheers,
Tom
On Thu, May 21, 2009 at 3:14 PM, John Clarke wrote:
> Hi,
>
> I want one output file not multiple but I think your reply has steered me in
> the
Hi,
I want one output file not multiple but I think your reply has steered me in
the right direction!
Thanks
John
2009/5/20 Tom White
> Hi John,
>
> You could do this with a map only-job (using NLineInputFormat, and
> setting the number of reducers to 0), and write the output key as
> docnameN,
Hi John,
You could do this with a map only-job (using NLineInputFormat, and
setting the number of reducers to 0), and write the output key as
docnameN,stat1,stat2,stat3,stat12 and a null value. This assumes
that you calculate all 12 statistics in one map. Each output file
would have a single l
Hi,
I'm having some trouble implementing what I want to achieve... essentially I
have a large input list of documents that I want to get statistics on. For
each document I have 12 different stats to work out.
So my input file is a text file with one document filepath on each line. The
documents a