Hi all,
I have a partitioned table in hive where each partition will have 630 gzip
compressed files each of average size 100kb. If I query over these files
using hive it will generate exactly 630 mappers i.e one mapper for one file.
Now as an experiment i tried reading those files with pig and pig actually
combined the files and spawned only 2 mappers and the operation was much
faster than hive.
Why is there a difference in execution style of pig and hive? In hive can
we similarly combine small files to spawn less mappers?

Reply via email to