try TOTUPLE(data.target)

Shawn

On Sat, Aug 20, 2011 at 3:51 AM, Kevin Burton <[email protected]> wrote:
> That's worse :-P
>
> 0 0 1
> 0 0 2
> 0 0 3
> 0 0 4
>
> On Sat, Aug 20, 2011 at 2:01 AM, David Riccitelli <[email protected]>wrote:
>
>> Hi Kevin,
>>
>> Have you tried something like:
>>  thin = foreach (group data by source) { generate group as source,
>> flatten($1);  };
>>
>> David
>>
>> On Sat, Aug 20, 2011 at 11:47 AM, Kevin Burton <[email protected]> wrote:
>>
>> > I'm optimizing a somewhat large pig job.
>> >
>> > One of the intermediate steps is a group which we use moving forward.
>> >
>> > The data right now looks like:
>> >
>> > 0 {(1),(2),(3),(4)}
>> >
>> > which has a second column of a bag of tuples each with one element.
>> >
>> > Wouldn't it be more efficient to store this as:
>> >
>> > 0 (1,2,3,4)
>> >
>> > ??
>> >
>> > I can't figure out how to do this…
>> >
>> > --test2.cvs
>> > 0,1
>> > 0,2
>> > 0,3
>> > 0,4
>> >
>> >
>> > data = LOAD 'test2.csv' USING PigStorage(',') AS (source:bytearray,
>> > target:bytearray);
>> >
>> > grouped = GROUP data by source;
>> > thin = FOREACH grouped GENERATE $0, $1.($1);
>> >
>> > STORE thin           INTO 'thin.dmp';
>> >
>> >
>> > --
>> >
>> > Founder/CEO Spinn3r.com
>> >
>> > Location: *San Francisco, CA*
>> > Skype: *burtonator*
>> >
>> > Skype-in: *(415) 871-0687*
>> >
>>
>>
>>
>> --
>> David Riccitelli
>>
>>
>> ********************************************************************************
>> InsideOut10 s.r.l.
>> P.IVA: IT-11381771002
>> Fax: +39 0110708239
>> ---
>> LinkedIn: http://it.linkedin.com/in/riccitelli
>> Twitter: ziodave
>> ---
>> Layar Partner Network<
>> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1
>> >
>>
>> ********************************************************************************
>>
>
>
>
> --
>
> Founder/CEO Spinn3r.com
>
> Location: *San Francisco, CA*
> Skype: *burtonator*
>
> Skype-in: *(415) 871-0687*
>

Reply via email to