Hi Charles, Which load function are you using ? Is the default (PigStorage?). In the hadoop counters for the job in the jobtracker ui, do you see the expected number of input records being read? -Thejas
On 2/28/11 10:57 AM, "Charles Gonçalves" <[email protected]> wrote: I'm not using any filtering in the script. I'm just want to see the total traffic per day in all logs. If I combine 1000 log files into one and run the script on this log files I got the correct answer for those logs. But when I'm run with all the *43458* log files I got a incorrect output. The correct would be an histogram for each day from 2010-10 but the result contain only data from 2010-10-21. And if I process all the logs with an awk script I got the correct answer. On Mon, Feb 28, 2011 at 3:29 PM, Daniel Dai <[email protected]> wrote: > Not sure if I get your question. In 0.8, Pig combine small files into one > map, so it is possible you get less output files. This is not the problem. But thanks anyway! If that is your concern, you can try to disable split combine using > "-Dpig.splitCombination=false" > > Daniel > > > Charles Gonçalves wrote: > >> I tried to process a big number of small files on pig and I got a strange >> problem. >> >> 2011-02-27 00:00:58,746 [Thread-15] INFO >> org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths >> to process : *43458* >> 2011-02-27 00:00:58,755 [Thread-15] INFO >> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total >> input >> paths to process : *43458* >> 2011-02-27 00:01:14,173 [Thread-15] INFO >> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total >> input >> paths (combined) to process : *329* >> >> When the script finish to process, the result is just about a subgroup of >> the input files. >> These are logs from a whole month, but the results are just from the day >> 21. >> >> >> Maybe I'm missing something. >> Any Ideas? >> >> >> > > -- *Charles Ferreira Gonçalves * http://homepages.dcc.ufmg.br/~charles/ UFMG - ICEx - Dcc Cel.: 55 31 87741485 Tel.: 55 31 34741485 Lab.: 55 31 34095840
