Re: multiple results for each input line

2009-05-21 Thread Tom White
You could combine them into one file using a reduce stage (with a single reducer), or by using hadoop fs -getmerge on the output directory. Cheers, Tom On Thu, May 21, 2009 at 3:14 PM, John Clarke wrote: > Hi, > > I want one output file not multiple but I think your reply has steered me in > the

Re: multiple results for each input line

2009-05-21 Thread John Clarke
Hi, I want one output file not multiple but I think your reply has steered me in the right direction! Thanks John 2009/5/20 Tom White > Hi John, > > You could do this with a map only-job (using NLineInputFormat, and > setting the number of reducers to 0), and write the output key as > docnameN,

Re: multiple results for each input line

2009-05-20 Thread Tom White
Hi John, You could do this with a map only-job (using NLineInputFormat, and setting the number of reducers to 0), and write the output key as docnameN,stat1,stat2,stat3,stat12 and a null value. This assumes that you calculate all 12 statistics in one map. Each output file would have a single l

multiple results for each input line

2009-05-20 Thread John Clarke
Hi, I'm having some trouble implementing what I want to achieve... essentially I have a large input list of documents that I want to get statistics on. For each document I have 12 different stats to work out. So my input file is a text file with one document filepath on each line. The documents a