That's worse :-P 0 0 1 0 0 2 0 0 3 0 0 4
On Sat, Aug 20, 2011 at 2:01 AM, David Riccitelli <[email protected]>wrote: > Hi Kevin, > > Have you tried something like: > thin = foreach (group data by source) { generate group as source, > flatten($1); }; > > David > > On Sat, Aug 20, 2011 at 11:47 AM, Kevin Burton <[email protected]> wrote: > > > I'm optimizing a somewhat large pig job. > > > > One of the intermediate steps is a group which we use moving forward. > > > > The data right now looks like: > > > > 0 {(1),(2),(3),(4)} > > > > which has a second column of a bag of tuples each with one element. > > > > Wouldn't it be more efficient to store this as: > > > > 0 (1,2,3,4) > > > > ?? > > > > I can't figure out how to do this… > > > > --test2.cvs > > 0,1 > > 0,2 > > 0,3 > > 0,4 > > > > > > data = LOAD 'test2.csv' USING PigStorage(',') AS (source:bytearray, > > target:bytearray); > > > > grouped = GROUP data by source; > > thin = FOREACH grouped GENERATE $0, $1.($1); > > > > STORE thin INTO 'thin.dmp'; > > > > > > -- > > > > Founder/CEO Spinn3r.com > > > > Location: *San Francisco, CA* > > Skype: *burtonator* > > > > Skype-in: *(415) 871-0687* > > > > > > -- > David Riccitelli > > > ******************************************************************************** > InsideOut10 s.r.l. > P.IVA: IT-11381771002 > Fax: +39 0110708239 > --- > LinkedIn: http://it.linkedin.com/in/riccitelli > Twitter: ziodave > --- > Layar Partner Network< > http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1 > > > > ******************************************************************************** > -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* Skype-in: *(415) 871-0687*
