Hi Kevin,
Have you tried something like:
thin = foreach (group data by source) { generate group as source,
flatten($1); };
David
On Sat, Aug 20, 2011 at 11:47 AM, Kevin Burton <[email protected]> wrote:
> I'm optimizing a somewhat large pig job.
>
> One of the intermediate steps is a group which we use moving forward.
>
> The data right now looks like:
>
> 0 {(1),(2),(3),(4)}
>
> which has a second column of a bag of tuples each with one element.
>
> Wouldn't it be more efficient to store this as:
>
> 0 (1,2,3,4)
>
> ??
>
> I can't figure out how to do this…
>
> --test2.cvs
> 0,1
> 0,2
> 0,3
> 0,4
>
>
> data = LOAD 'test2.csv' USING PigStorage(',') AS (source:bytearray,
> target:bytearray);
>
> grouped = GROUP data by source;
> thin = FOREACH grouped GENERATE $0, $1.($1);
>
> STORE thin INTO 'thin.dmp';
>
>
> --
>
> Founder/CEO Spinn3r.com
>
> Location: *San Francisco, CA*
> Skype: *burtonator*
>
> Skype-in: *(415) 871-0687*
>
--
David Riccitelli
********************************************************************************
InsideOut10 s.r.l.
P.IVA: IT-11381771002
Fax: +39 0110708239
---
LinkedIn: http://it.linkedin.com/in/riccitelli
Twitter: ziodave
---
Layar Partner
Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
********************************************************************************