Hi Alan, Thanks for your reply.
I am trying to understand how Pig processes these relations. As I mentioned, my UDF returns the result in the following format; {(id1,x,y,z), (id2, a, b, c), (id3,x,a)} /* User 1 info */ {(id10,x,y,z), (id9, a, b, c), (id1,x,a)} /* User 2 info */ {(id8,x,y,z), (id4, a, b, c), (id2,x,a)} /* User 3 info */ {(id6,x,y,z), (id6, a, b, c), (id9,x,a)} /* User 4 info */ B = foreach A { /* Each element in A is a bag. This will apply the following on each element within A that is each bag. */ Is this correct? B1 = order A by $0; -- order on the id /*What does this A refer to? Does it refer to it to each Bag of relationship A ? I get the following error: expression is not a project expression: /* rest of the code */ } Thanks for your help. > Subject: Re: Bag of tuples > From: ga...@hortonworks.com > Date: Wed, 6 Nov 2013 09:36:04 -0800 > To: user@pig.apache.org > > Do you mean you want to find the top 5 per input record? Also, what is your > ordering criteria? Just sort by id? Something like this should order all > tuples in each bag by id and then produce the top 5. My syntax may be a > little off as I'm working offline and don't have the manual in front of me, > but this should be the general idea. > > A = load 'yourinput' as (b:bag); > B = foreach A { > B1 = order A by $0; -- order on the id > B2 = limit B1 5; > generate flatten(B2); > } > > Alan. > > On Nov 5, 2013, at 9:52 AM, Sameer Tilak wrote: > > > Hi Pig experts, > > Sorry to post so many questions, I have one more question on doing some > > analytics on bag of tuples. > > > > My input has the following format: > > > > {(id1,x,y,z), (id2, a, b, c), (id3,x,a)} /* User 1 info */ > > {(id10,x,y,z), (id9, a, b, c), (id1,x,a)} /* User 2 info */ > > {(id8,x,y,z), (id4, a, b, c), (id2,x,a)} /* User 3 info */ > > {(id6,x,y,z), (id6, a, b, c), (id9,x,a)} /* User 4 info */ > > > > I can change my UDF to give more simple output. However, I want to find out > > if something like this can be done easily: > > I would like to find out top 5 ids (field 1 in a tuple) among all the > > users. Note that each user has a bag and the first field of each tuple in > > that bag is id. > > > > How difficult will it be to filter based on fields of tuples and do > > analytics across the entire user base. > > > > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You.