Sameer, did you check out the TOMAP function in the documentation? The example is close to yours. I think with a nested FOREACH in combination with TOMAP and you'd get there, though I haven't tried it myself. SB
______________________ Steve Bernstein VP/Analytics 408.499.0961 MOBILE deem.com -----Original Message----- From: Sameer Tilak [mailto:[email protected]] Sent: Monday, February 03, 2014 2:00 PM To: [email protected] Subject: Ideas for data processing Hi everyone, We have data set in the following format: user1 item1 valueuser2 item1 valueuser3 item1 value...................user1 item2 valueuser20 item2 valueuser35 item2 value..................user2 item3 valueuser25 item3 value....... We have around 20 items and millions of users and not all users have entries for all the items. We would like to transform this into user1 item1 value, item2, value, item3, value....user2 item4 value, item 18 value, item 19 value..... I can think of a couple of ways for doing this in Pig Latin. For example, one way would be to create a map (where key is item name and value is the associated value) and then fill out that map as you read the data. Then write it out to a file. I am not sure how efficient will that be. I would love to get suggestions for doing this in Pig Latin.
