try TOTUPLE(data.target) Shawn
On Sat, Aug 20, 2011 at 3:51 AM, Kevin Burton <[email protected]> wrote: > That's worse :-P > > 0 0 1 > 0 0 2 > 0 0 3 > 0 0 4 > > On Sat, Aug 20, 2011 at 2:01 AM, David Riccitelli <[email protected]>wrote: > >> Hi Kevin, >> >> Have you tried something like: >> thin = foreach (group data by source) { generate group as source, >> flatten($1); }; >> >> David >> >> On Sat, Aug 20, 2011 at 11:47 AM, Kevin Burton <[email protected]> wrote: >> >> > I'm optimizing a somewhat large pig job. >> > >> > One of the intermediate steps is a group which we use moving forward. >> > >> > The data right now looks like: >> > >> > 0 {(1),(2),(3),(4)} >> > >> > which has a second column of a bag of tuples each with one element. >> > >> > Wouldn't it be more efficient to store this as: >> > >> > 0 (1,2,3,4) >> > >> > ?? >> > >> > I can't figure out how to do this… >> > >> > --test2.cvs >> > 0,1 >> > 0,2 >> > 0,3 >> > 0,4 >> > >> > >> > data = LOAD 'test2.csv' USING PigStorage(',') AS (source:bytearray, >> > target:bytearray); >> > >> > grouped = GROUP data by source; >> > thin = FOREACH grouped GENERATE $0, $1.($1); >> > >> > STORE thin INTO 'thin.dmp'; >> > >> > >> > -- >> > >> > Founder/CEO Spinn3r.com >> > >> > Location: *San Francisco, CA* >> > Skype: *burtonator* >> > >> > Skype-in: *(415) 871-0687* >> > >> >> >> >> -- >> David Riccitelli >> >> >> ******************************************************************************** >> InsideOut10 s.r.l. >> P.IVA: IT-11381771002 >> Fax: +39 0110708239 >> --- >> LinkedIn: http://it.linkedin.com/in/riccitelli >> Twitter: ziodave >> --- >> Layar Partner Network< >> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1 >> > >> >> ******************************************************************************** >> > > > > -- > > Founder/CEO Spinn3r.com > > Location: *San Francisco, CA* > Skype: *burtonator* > > Skype-in: *(415) 871-0687* >
