I'll write one but right now it's a small optimization to my code so I'll do
it later I imagine.

Kevin

On Mon, Aug 22, 2011 at 11:03 AM, Daniel Dai <[email protected]> wrote:

> TOTUPLE will not solve the problem. We need a new UDF BagToTuple.
>
> Daniel
>
> On Sat, Aug 20, 2011 at 8:19 PM, Xiaomeng Wan <[email protected]> wrote:
> > try TOTUPLE(data.target)
> >
> > Shawn
> >
> > On Sat, Aug 20, 2011 at 3:51 AM, Kevin Burton <[email protected]>
> wrote:
> >> That's worse :-P
> >>
> >> 0 0 1
> >> 0 0 2
> >> 0 0 3
> >> 0 0 4
> >>
> >> On Sat, Aug 20, 2011 at 2:01 AM, David Riccitelli <[email protected]
> >wrote:
> >>
> >>> Hi Kevin,
> >>>
> >>> Have you tried something like:
> >>>  thin = foreach (group data by source) { generate group as source,
> >>> flatten($1);  };
> >>>
> >>> David
> >>>
> >>> On Sat, Aug 20, 2011 at 11:47 AM, Kevin Burton <[email protected]>
> wrote:
> >>>
> >>> > I'm optimizing a somewhat large pig job.
> >>> >
> >>> > One of the intermediate steps is a group which we use moving forward.
> >>> >
> >>> > The data right now looks like:
> >>> >
> >>> > 0 {(1),(2),(3),(4)}
> >>> >
> >>> > which has a second column of a bag of tuples each with one element.
> >>> >
> >>> > Wouldn't it be more efficient to store this as:
> >>> >
> >>> > 0 (1,2,3,4)
> >>> >
> >>> > ??
> >>> >
> >>> > I can't figure out how to do this…
> >>> >
> >>> > --test2.cvs
> >>> > 0,1
> >>> > 0,2
> >>> > 0,3
> >>> > 0,4
> >>> >
> >>> >
> >>> > data = LOAD 'test2.csv' USING PigStorage(',') AS (source:bytearray,
> >>> > target:bytearray);
> >>> >
> >>> > grouped = GROUP data by source;
> >>> > thin = FOREACH grouped GENERATE $0, $1.($1);
> >>> >
> >>> > STORE thin           INTO 'thin.dmp';
> >>> >
> >>> >
> >>> > --
> >>> >
> >>> > Founder/CEO Spinn3r.com
> >>> >
> >>> > Location: *San Francisco, CA*
> >>> > Skype: *burtonator*
> >>> >
> >>> > Skype-in: *(415) 871-0687*
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> David Riccitelli
> >>>
> >>>
> >>>
> ********************************************************************************
> >>> InsideOut10 s.r.l.
> >>> P.IVA: IT-11381771002
> >>> Fax: +39 0110708239
> >>> ---
> >>> LinkedIn: http://it.linkedin.com/in/riccitelli
> >>> Twitter: ziodave
> >>> ---
> >>> Layar Partner Network<
> >>>
> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1
> >>> >
> >>>
> >>>
> ********************************************************************************
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> Founder/CEO Spinn3r.com
> >>
> >> Location: *San Francisco, CA*
> >> Skype: *burtonator*
> >>
> >> Skype-in: *(415) 871-0687*
> >>
> >
>



-- 

Founder/CEO Spinn3r.com

Location: *San Francisco, CA*
Skype: *burtonator*

Skype-in: *(415) 871-0687*

Reply via email to