results = foreach (group raw all) generate MyUdf(raw)

input to the udf will be a tuple with a single field. This field will be a
bag of tuples. Each of those tuples is one of your raw rows.

Note that this forces everything into memory and isn't scalable...



On Thu, Jan 19, 2012 at 12:54 AM, Michael Lok <[email protected]> wrote:

> Hi folks,
>
> I've got one resultset which I need to run a comparison with all the
> rows within the same resultset.  For example:
>
> R1
> R2
> R3
> R4
> R5
>
> Take R1, I'll need to compare R1 with all rows from R2-R5.  The
> comparison will be written in a UDF.  Here's what I have so far:
>
> ============================================
> RAW = load 'raw_data.txt' using PigStorage(',');
>
> RAW_2 = foreach RAW generate *;
>
> PROCESSED = foreach RAW {
>    /* perform comparo here */
> };
> ============================================
>
> I'm stuck at the filtering inside the nested block.  How should I go
> about the comparing the rows there?
>
> Any help is greatly appreciated.
>
>
> Thanks!
>

Reply via email to