foo = group my_relation by (a, b, c, d) parallel $p;
bar = foreach foo {
first_e = limit my_relation.e 1;
generate flatten(group) as (a, b, c, d), flatten(first_e) as e;
}
That should do the trick.
On Fri, Jul 1, 2011 at 3:11 AM, Tony Burton <[email protected]>wrote:
> Hello
>
> My dataset has five fields, I want to select DISTINCT lines based upon the
> first four fields and then append the fifth field from the first common line
> (based on the first four fields). Is this possible using Pig? I have read on
> the Pig latin Reference Manual 2 page "You cannot use DISTINCT on a subset
> of fields. To do this, use FOREACH...GENERATE to select the fields, and then
> use DISTINCT (see Example: Nested Block<
> http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#nestedblock>)" but
> I'm not sure how to adapt this example to my problem. Attempts so far - not
> using the nested FOREACH syntax - have resulted in either (1) duplicate
> lines repeated with their own previous fifth column or (2) duplicate lines
> repeated with the same fifth column; ie in each case, there's no
> DISTINCT-ness about the data based on fields 1-4.
>
> Is there a way to do this, or should I create a UDF that operates on lines
> GROUPed by columns 1-4?
>
> Thanks,
>
> Tony
>
>
> **********************************************************************
> This email and any attachments are confidential, protected by copyright and
> may be legally privileged. If you are not the intended recipient, then the
> dissemination or copying of this email is prohibited. If you have received
> this in error, please notify the sender by replying by email and then delete
> the email completely from your system. Neither Sporting Index nor the
> sender accepts responsibility for any virus, or any other defect which might
> affect any computer or IT system into which the email is received and/or
> opened. It is the responsibility of the recipient to scan the email and no
> responsibility is accepted for any loss or damage arising in any way from
> receipt or use of this email. Sporting Index Ltd is a company registered in
> England and Wales with company number 2636842, whose registered office is at
> Brookfield House, Green Lane, Ivinghoe, Leighton Buzzard, LU7 9ES. Sporting
> Index Ltd is authorised and regulated by the UK Financial Services Authority
> (reg. no. 150404). Any financial promotion contained herein has been issued
> and approved by Sporting Index Ltd.
>
> Outbound email has been scanned for viruses and SPAM
>