I'll write one but right now it's a small optimization to my code so I'll do it later I imagine.
Kevin On Mon, Aug 22, 2011 at 11:03 AM, Daniel Dai <[email protected]> wrote: > TOTUPLE will not solve the problem. We need a new UDF BagToTuple. > > Daniel > > On Sat, Aug 20, 2011 at 8:19 PM, Xiaomeng Wan <[email protected]> wrote: > > try TOTUPLE(data.target) > > > > Shawn > > > > On Sat, Aug 20, 2011 at 3:51 AM, Kevin Burton <[email protected]> > wrote: > >> That's worse :-P > >> > >> 0 0 1 > >> 0 0 2 > >> 0 0 3 > >> 0 0 4 > >> > >> On Sat, Aug 20, 2011 at 2:01 AM, David Riccitelli <[email protected] > >wrote: > >> > >>> Hi Kevin, > >>> > >>> Have you tried something like: > >>> thin = foreach (group data by source) { generate group as source, > >>> flatten($1); }; > >>> > >>> David > >>> > >>> On Sat, Aug 20, 2011 at 11:47 AM, Kevin Burton <[email protected]> > wrote: > >>> > >>> > I'm optimizing a somewhat large pig job. > >>> > > >>> > One of the intermediate steps is a group which we use moving forward. > >>> > > >>> > The data right now looks like: > >>> > > >>> > 0 {(1),(2),(3),(4)} > >>> > > >>> > which has a second column of a bag of tuples each with one element. > >>> > > >>> > Wouldn't it be more efficient to store this as: > >>> > > >>> > 0 (1,2,3,4) > >>> > > >>> > ?? > >>> > > >>> > I can't figure out how to do this… > >>> > > >>> > --test2.cvs > >>> > 0,1 > >>> > 0,2 > >>> > 0,3 > >>> > 0,4 > >>> > > >>> > > >>> > data = LOAD 'test2.csv' USING PigStorage(',') AS (source:bytearray, > >>> > target:bytearray); > >>> > > >>> > grouped = GROUP data by source; > >>> > thin = FOREACH grouped GENERATE $0, $1.($1); > >>> > > >>> > STORE thin INTO 'thin.dmp'; > >>> > > >>> > > >>> > -- > >>> > > >>> > Founder/CEO Spinn3r.com > >>> > > >>> > Location: *San Francisco, CA* > >>> > Skype: *burtonator* > >>> > > >>> > Skype-in: *(415) 871-0687* > >>> > > >>> > >>> > >>> > >>> -- > >>> David Riccitelli > >>> > >>> > >>> > ******************************************************************************** > >>> InsideOut10 s.r.l. > >>> P.IVA: IT-11381771002 > >>> Fax: +39 0110708239 > >>> --- > >>> LinkedIn: http://it.linkedin.com/in/riccitelli > >>> Twitter: ziodave > >>> --- > >>> Layar Partner Network< > >>> > http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1 > >>> > > >>> > >>> > ******************************************************************************** > >>> > >> > >> > >> > >> -- > >> > >> Founder/CEO Spinn3r.com > >> > >> Location: *San Francisco, CA* > >> Skype: *burtonator* > >> > >> Skype-in: *(415) 871-0687* > >> > > > -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* Skype-in: *(415) 871-0687*
