Re: How to store each record in a seperate file

kiranprasad Wed, 12 Oct 2011 23:04:29 -0700

Hi Ayon

I have just started working on PIG and trying with different usecases.

one of my use case is there are 10 million records and after grouping themwith a field (say location), I want all the records of particular locationin separate file.

I am presently working on the local mode.


Kiran.G

-----Original Message-----From: Ayon Sinha

Sent: Thursday, October 13, 2011 11:26 AM
To: [email protected]
Subject: Re: How to store each record in a seperate file

Hi Kiranprasad,

What is your usecase? Are you sure you have picked the right tool for thejob? Pig/Hadoop is meant for massive datasets which mean millions andbillions of rows. Which in your case would lead to millions & billions offiles which Hadoop doesn't like anyway.Now if your dataset is really small then do you really need hadoop or perl,python, shell or any programming language on a single machine would suffice?

Just asking to make sure you are not headed the wrong path.
OTOH, if you are doing this as an academic exercise, all is justified.

-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.



________________________________
From: kiranprasad <[email protected]>
To: [email protected]; Ayon Sinha <[email protected]>
Sent: Wednesday, October 12, 2011 10:19 PM
Subject: Re: How to store each record in a seperate file

Thank you for quick response, But how can I perform the below in local mode.

-----Original Message-----From: Jonathan Coveney

Sent: Thursday, October 13, 2011 10:28 AM
To: [email protected] ; Ayon Sinha
Subject: Re: How to store each record in a seperate file

To Ayon's point, MultipleOutputFormat can get the job done, but keep in mind
that Hadoop deals better with larger files than smaller ones. Every file is
allocated in blocks (64MB, 128MB, 256MB), so lot's of small blocks is bad.

2011/10/12 Ayon Sinha <[email protected]>

Besides the bigger question of Why would you want to store each record in
a
separate file?
I'm not sure how to do this in Pig but it is definitely possible in Hadoop
(and also streaming) via MultipleOutputFormat where the name of the output
file can be based on the base_dir and key and value. You can create your
own
filename based on those arguments.
http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html

You can definitely implement your own StoreFunc UDF.
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.



________________________________
From: kiranprasad <[email protected]>
To: [email protected]
Sent: Wednesday, October 12, 2011 9:35 PM
Subject: How to store each record in a seperate file

Hi

After grouping a data set, how do I save each group in a separate file.

ex:
A = E:/data.txt' USING PigStorage(',');
B = GROUP A BY $0;

cat data.txt;

(1,2,3)
(4,2,1)
(8,3,4)
(4,3,3)
(7,2,5)
(8,4,3)

After grouping

(1,{(1,2,3)})
(4,{(4,2,1),(4,3,3)})
(7,{(7,2,5)})
(8,{(8,3,4),(8,4,3)})

How do I save each record in separate file.


Regards
Kiran.G

Re: How to store each record in a seperate file

Reply via email to