After USING MultipleStorage() the files have been generated based on the
group, now
How can I append the Headers to all the files generated ?
-----Original Message-----
From: kiranprasad
Sent: Thursday, October 13, 2011 3:40 PM
To: [email protected]
Subject: Re: How to store each record in a seperate file
Thank you All it is Working.
-----Original Message-----
From: Thomas Kappler
Sent: Thursday, October 13, 2011 12:33 PM
To: [email protected]
Subject: Re: How to store each record in a seperate file
On Thu, Oct 13, 2011 at 07:56, Ayon Sinha <[email protected]> wrote:
Hi Kiranprasad,
What is your usecase? Are you sure you have picked the right tool for the
job? Pig/Hadoop is meant for massive datasets which mean millions and
billions of rows. Which in your case would lead to millions & billions of
files which Hadoop doesn't like anyway.
I have also found that MultiStorage runs a reducer for each partition,
i.e., each separate file. This will be ok if for a small number of
partitions (locations in Kiran's case), but will break down for larger
numbers.
I ended up letting Pig group the records and writing a script that
splits the Pig output into one file per group.
-- Thomas