Hive vs Pig against number of files spawned

Sreenath Mon, 31 Mar 2014 23:57:07 -0700

Hi all,
I have a partitioned table in hive where each partition will have 630 gzip
compressed files each of average size 100kb. If I query over these files
using hive it will generate exactly 630 mappers i.e one mapper for one file.
Now as an experiment i tried reading those files with pig and pig actually
combined the files and spawned only 2 mappers and the operation was much
faster than hive.
Why is there a difference in execution style of pig and hive? In hive can
we similarly combine small files to spawn less mappers?

Hive vs Pig against number of files spawned

Reply via email to