I tried to process a big number of small files on pig and I got a strange problem.
2011-02-27 00:00:58,746 [Thread-15] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : *43458* 2011-02-27 00:00:58,755 [Thread-15] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : *43458* 2011-02-27 00:01:14,173 [Thread-15] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : *329* When the script finish to process, the result is just about a subgroup of the input files. These are logs from a whole month, but the results are just from the day 21. Maybe I'm missing something. Any Ideas? -- *Charles Ferreira Gonçalves * http://homepages.dcc.ufmg.br/~charles/ UFMG - ICEx - Dcc Cel.: 55 31 87741485 Tel.: 55 31 34741485 Lab.: 55 31 34095840
