Hi Pig experts, Sorry to post so many questions, I have one more question on doing some analytics on bag of tuples.
My input has the following format: {(id1,x,y,z), (id2, a, b, c), (id3,x,a)} /* User 1 info */ {(id10,x,y,z), (id9, a, b, c), (id1,x,a)} /* User 2 info */ {(id8,x,y,z), (id4, a, b, c), (id2,x,a)} /* User 3 info */ {(id6,x,y,z), (id6, a, b, c), (id9,x,a)} /* User 4 info */ I can change my UDF to give more simple output. However, I want to find out if something like this can be done easily: I would like to find out top 5 ids (field 1 in a tuple) among all the users. Note that each user has a bag and the first field of each tuple in that bag is id. How difficult will it be to filter based on fields of tuples and do analytics across the entire user base.