Hi Alan,
Thanks for your reply.

I am trying to understand how Pig processes these relations. As I mentioned, my 
UDF returns the result in the following format;

 {(id1,x,y,z), (id2, a, b, c), (id3,x,a)}  /* User 1 info */
 {(id10,x,y,z), (id9, a, b, c), (id1,x,a)} /* User 2 info */
 {(id8,x,y,z), (id4, a, b, c), (id2,x,a)} /* User 3 info */
 {(id6,x,y,z), (id6, a, b, c), (id9,x,a)} /* User 4 info */

B = foreach A { /* Each element in A is a bag. This will apply the following on 
each element within A that is each bag. */ Is this correct? 
B1 = order A by $0; -- order on the id /*What does this A refer to? Does it 
refer to it to each Bag of relationship A ? I get the following error: 
expression is not a project expression:
/* rest of the code */
}

Thanks for your help.


> Subject: Re: Bag of tuples
> From: ga...@hortonworks.com
> Date: Wed, 6 Nov 2013 09:36:04 -0800
> To: user@pig.apache.org
> 
> Do you mean you want to find the top 5 per input record?  Also, what is your 
> ordering criteria?  Just sort by id?  Something like this should order all 
> tuples in each bag by id and then produce the top 5.  My syntax may be a 
> little off as I'm working offline and don't have the manual in front of me, 
> but this should be the general idea.
> 
> A = load 'yourinput' as (b:bag);
> B = foreach A {
>       B1 = order A by $0; -- order on the id
>       B2 = limit B1 5;
>       generate flatten(B2);
> }
> 
> Alan.
> 
> On Nov 5, 2013, at 9:52 AM, Sameer Tilak wrote:
> 
> > Hi Pig experts,
> > Sorry to post so many questions, I have one more question on doing some 
> > analytics on bag of tuples.
> > 
> > My input has the following format:
> > 
> > {(id1,x,y,z), (id2, a, b, c), (id3,x,a)}  /* User 1 info */
> > {(id10,x,y,z), (id9, a, b, c), (id1,x,a)} /* User 2 info */
> > {(id8,x,y,z), (id4, a, b, c), (id2,x,a)} /* User 3 info */
> > {(id6,x,y,z), (id6, a, b, c), (id9,x,a)} /* User 4 info */
> > 
> > I can change my UDF to give more simple output. However, I want to find out 
> > if something like this can be done easily:
> > I would like to find out top 5 ids (field 1 in a tuple) among all the 
> > users. Note that each user has a bag and the first field of each tuple in 
> > that bag is id. 
> > 
> > How difficult will it be to filter based on fields of tuples and do 
> > analytics across the entire user base.
> >                                       
> 
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.
                                          

Reply via email to