Re: How to store each record in a seperate file

Jonathan Coveney Wed, 12 Oct 2011 21:58:56 -0700

To Ayon's point, MultipleOutputFormat can get the job done, but keep in mind
that Hadoop deals better with larger files than smaller ones. Every file is
allocated in blocks (64MB, 128MB, 256MB), so lot's of small blocks is bad.


2011/10/12 Ayon Sinha <[email protected]>

> Besides the bigger question of Why would you want to store each record in a
> separate file?
> I'm not sure how to do this in Pig but it is definitely possible in Hadoop
> (and also streaming) via MultipleOutputFormat where the name of the output
> file can be based on the base_dir and key and value. You can create your own
> filename based on those arguments.
> http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html
>
> You can definitely implement your own StoreFunc UDF.
> -Ayon
> See My Photos on Flickr
> Also check out my Blog for answers to commonly asked questions.
>
>
>
> ________________________________
> From: kiranprasad <[email protected]>
> To: [email protected]
> Sent: Wednesday, October 12, 2011 9:35 PM
> Subject: How to store each record in a seperate file
>
> Hi
>
> After grouping a data set, how do I save each group in a separate file.
>
> ex:
> A = E:/data.txt' USING PigStorage(',');
> B = GROUP A BY $0;
>
> cat data.txt;
>
> (1,2,3)
> (4,2,1)
> (8,3,4)
> (4,3,3)
> (7,2,5)
> (8,4,3)
>
> After grouping
>
> (1,{(1,2,3)})
> (4,{(4,2,1),(4,3,3)})
> (7,{(7,2,5)})
> (8,{(8,3,4),(8,4,3)})
>
> How do I save each record in separate file.
>
>
> Regards
> Kiran.G
>

Re: How to store each record in a seperate file

Reply via email to