results = foreach (group raw all) generate MyUdf(raw) input to the udf will be a tuple with a single field. This field will be a bag of tuples. Each of those tuples is one of your raw rows.
Note that this forces everything into memory and isn't scalable... On Thu, Jan 19, 2012 at 12:54 AM, Michael Lok <[email protected]> wrote: > Hi folks, > > I've got one resultset which I need to run a comparison with all the > rows within the same resultset. For example: > > R1 > R2 > R3 > R4 > R5 > > Take R1, I'll need to compare R1 with all rows from R2-R5. The > comparison will be written in a UDF. Here's what I have so far: > > ============================================ > RAW = load 'raw_data.txt' using PigStorage(','); > > RAW_2 = foreach RAW generate *; > > PROCESSED = foreach RAW { > /* perform comparo here */ > }; > ============================================ > > I'm stuck at the filtering inside the nested block. How should I go > about the comparing the rows there? > > Any help is greatly appreciated. > > > Thanks! >
