Thanks Cheolsoo! Uri
On Tue, Jan 22, 2013 at 6:01 PM, Cheolsoo Park <[email protected]>wrote: > Hi Uri, > > Try this: > > data = load 'test.txt' using PigStorage(' ') as (cid:chararray, > iid:chararray, num1:int, num2:int); > grouped = group data by cid; > results = foreach grouped generate FLATTEN(data), SUM(data.num2) as sum; > appended = foreach results generate cid, iid, num1, num2, (sum > 0 ? num1 : > 0) as num3; > dump appended; > > This will give you: > > (a,e,11,0,0) > (b,f,2,2,2) > (c,g,3,3,3) > (c,h,44,44,44) > (c,i,75,0,75) > (d,j,89,0,0) > (d,k,120,0,0) > (d,l,3000,0,0) > > Thanks, > Cheolsoo > > > On Tue, Jan 22, 2013 at 5:17 PM, Uri Laserson <[email protected]> > wrote: > > > I have data that looks like this: > > > > a e 11 0 > > b f 2 2 > > c g 3 3 > > c h 44 44 > > c i 75 0 > > d j 89 0 > > d k 120 0 > > d l 3000 0 > > > > and I load it like so: > > > > data = load 'test.txt' using PigStorage(' ') as (cid:chararray, > > iid:chararray, num1:int, num2:int); > > > > I want to group by the first column, cid. For each group, if any of the > > num2 values (last column) are positive, I want to output every tuple in > > that group with an extra field equal to num1. If all the num2 values for > > that group are zero, then I want to output every tuple in that group with > > an extra field equal to 0. > > > > I figured something like this would work: > > > > data = load 'test.txt' using PigStorage(' ') as (cid:chararray, > > iid:chararray, num1:int, num2:int); > > grouped = group data by cid; > > results = foreach grouped { > > result1 = SUM(data.num2); > > extended = foreach data generate *, result1 > 0 ? num1 : 0; > > generate FLATTEN(extended); > > }; > > > > but it does not. I get this error: > > > > 2013-01-22 17:15:07,647 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > ERROR 1200: <line 98, column 48> mismatched input '>' expecting > SEMI_COLON > > > > What is the proper way to do this? From the MapReduce perspective, I > group > > by the key, and in the reducer, I compute a value for each group, and > then > > emit every single value for that group along with some extra data. > > > > Thanks! > > Uri > > > > > > > > -- > > Uri Laserson, PhD > > Data Scientist, Cloudera > > Twitter/GitHub: @laserson > > +1 617 910 0447 > > [email protected] > > > -- Uri Laserson, PhD Data Scientist, Cloudera Twitter/GitHub: @laserson +1 617 910 0447 [email protected]
