On Thu, Oct 13, 2011 at 07:56, Ayon Sinha <[email protected]> wrote: > Hi Kiranprasad, > What is your usecase? Are you sure you have picked the right tool for the > job? Pig/Hadoop is meant for massive datasets which mean millions and > billions of rows. Which in your case would lead to millions & billions of > files which Hadoop doesn't like anyway.
I have also found that MultiStorage runs a reducer for each partition, i.e., each separate file. This will be ok if for a small number of partitions (locations in Kiran's case), but will break down for larger numbers. I ended up letting Pig group the records and writing a script that splits the Pig output into one file per group. -- Thomas
