Hi Dmitriy,

Am I correct to say that all rows in "results" is inside a bag when
passed into the UDF?

On Thu, Jan 19, 2012 at 7:23 PM, Dmitriy Ryaboy <[email protected]> wrote:
> results = foreach (group raw all) generate MyUdf(raw)
>
> input to the udf will be a tuple with a single field. This field will be a
> bag of tuples. Each of those tuples is one of your raw rows.
>
> Note that this forces everything into memory and isn't scalable...
>
>
>
> On Thu, Jan 19, 2012 at 12:54 AM, Michael Lok <[email protected]> wrote:
>
>> Hi folks,
>>
>> I've got one resultset which I need to run a comparison with all the
>> rows within the same resultset.  For example:
>>
>> R1
>> R2
>> R3
>> R4
>> R5
>>
>> Take R1, I'll need to compare R1 with all rows from R2-R5.  The
>> comparison will be written in a UDF.  Here's what I have so far:
>>
>> ============================================
>> RAW = load 'raw_data.txt' using PigStorage(',');
>>
>> RAW_2 = foreach RAW generate *;
>>
>> PROCESSED = foreach RAW {
>>    /* perform comparo here */
>> };
>> ============================================
>>
>> I'm stuck at the filtering inside the nested block.  How should I go
>> about the comparing the rows there?
>>
>> Any help is greatly appreciated.
>>
>>
>> Thanks!
>>

Reply via email to