That's worse :-P

0 0 1
0 0 2
0 0 3
0 0 4

On Sat, Aug 20, 2011 at 2:01 AM, David Riccitelli <[email protected]>wrote:

> Hi Kevin,
>
> Have you tried something like:
>  thin = foreach (group data by source) { generate group as source,
> flatten($1);  };
>
> David
>
> On Sat, Aug 20, 2011 at 11:47 AM, Kevin Burton <[email protected]> wrote:
>
> > I'm optimizing a somewhat large pig job.
> >
> > One of the intermediate steps is a group which we use moving forward.
> >
> > The data right now looks like:
> >
> > 0 {(1),(2),(3),(4)}
> >
> > which has a second column of a bag of tuples each with one element.
> >
> > Wouldn't it be more efficient to store this as:
> >
> > 0 (1,2,3,4)
> >
> > ??
> >
> > I can't figure out how to do this…
> >
> > --test2.cvs
> > 0,1
> > 0,2
> > 0,3
> > 0,4
> >
> >
> > data = LOAD 'test2.csv' USING PigStorage(',') AS (source:bytearray,
> > target:bytearray);
> >
> > grouped = GROUP data by source;
> > thin = FOREACH grouped GENERATE $0, $1.($1);
> >
> > STORE thin           INTO 'thin.dmp';
> >
> >
> > --
> >
> > Founder/CEO Spinn3r.com
> >
> > Location: *San Francisco, CA*
> > Skype: *burtonator*
> >
> > Skype-in: *(415) 871-0687*
> >
>
>
>
> --
> David Riccitelli
>
>
> ********************************************************************************
> InsideOut10 s.r.l.
> P.IVA: IT-11381771002
> Fax: +39 0110708239
> ---
> LinkedIn: http://it.linkedin.com/in/riccitelli
> Twitter: ziodave
> ---
> Layar Partner Network<
> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1
> >
>
> ********************************************************************************
>



-- 

Founder/CEO Spinn3r.com

Location: *San Francisco, CA*
Skype: *burtonator*

Skype-in: *(415) 871-0687*

Reply via email to