Hi Ayon
I have just started working on PIG and trying with different usecases.
one of my use case is there are 10 million records and after grouping them
with a field (say location), I want all the records of particular location
in separate file.
I am presently working on the local mode.
Kiran.G
-----Original Message-----
From: Ayon Sinha
Sent: Thursday, October 13, 2011 11:26 AM
To: [email protected]
Subject: Re: How to store each record in a seperate file
Hi Kiranprasad,
What is your usecase? Are you sure you have picked the right tool for the
job? Pig/Hadoop is meant for massive datasets which mean millions and
billions of rows. Which in your case would lead to millions & billions of
files which Hadoop doesn't like anyway.
Now if your dataset is really small then do you really need hadoop or perl,
python, shell or any programming language on a single machine would suffice?
Just asking to make sure you are not headed the wrong path.
OTOH, if you are doing this as an academic exercise, all is justified.
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.
________________________________
From: kiranprasad <[email protected]>
To: [email protected]; Ayon Sinha <[email protected]>
Sent: Wednesday, October 12, 2011 10:19 PM
Subject: Re: How to store each record in a seperate file
Thank you for quick response, But how can I perform the below in local mode.
-----Original Message-----
From: Jonathan Coveney
Sent: Thursday, October 13, 2011 10:28 AM
To: [email protected] ; Ayon Sinha
Subject: Re: How to store each record in a seperate file
To Ayon's point, MultipleOutputFormat can get the job done, but keep in mind
that Hadoop deals better with larger files than smaller ones. Every file is
allocated in blocks (64MB, 128MB, 256MB), so lot's of small blocks is bad.
2011/10/12 Ayon Sinha <[email protected]>
Besides the bigger question of Why would you want to store each record in
a
separate file?
I'm not sure how to do this in Pig but it is definitely possible in Hadoop
(and also streaming) via MultipleOutputFormat where the name of the output
file can be based on the base_dir and key and value. You can create your
own
filename based on those arguments.
http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html
You can definitely implement your own StoreFunc UDF.
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.
________________________________
From: kiranprasad <[email protected]>
To: [email protected]
Sent: Wednesday, October 12, 2011 9:35 PM
Subject: How to store each record in a seperate file
Hi
After grouping a data set, how do I save each group in a separate file.
ex:
A = E:/data.txt' USING PigStorage(',');
B = GROUP A BY $0;
cat data.txt;
(1,2,3)
(4,2,1)
(8,3,4)
(4,3,3)
(7,2,5)
(8,4,3)
After grouping
(1,{(1,2,3)})
(4,{(4,2,1),(4,3,3)})
(7,{(7,2,5)})
(8,{(8,3,4),(8,4,3)})
How do I save each record in separate file.
Regards
Kiran.G