Not sure if I get your question. In 0.8, Pig combine small files into
one map, so it is possible you get less output files. If that is your
concern, you can try to disable split combine using
"-Dpig.splitCombination=false"
Daniel
Charles Gonçalves wrote:
I tried to process a big number of small files on pig and I got a strange
problem.
2011-02-27 00:00:58,746 [Thread-15] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : *43458*
2011-02-27 00:00:58,755 [Thread-15] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : *43458*
2011-02-27 00:01:14,173 [Thread-15] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths (combined) to process : *329*
When the script finish to process, the result is just about a subgroup of
the input files.
These are logs from a whole month, but the results are just from the day
21.
Maybe I'm missing something.
Any Ideas?