hey all, Can anyone let me know how I can accomplish below problem in Pig?
I have 2 data sources: TABLE A with a list of User IDs: User1 User2 User3 User4 User5 User6 User7 User8 User9 TABLE B with (Host name, Capacity): Hostb 2 Hostc 4 Hostd 3 I basically need to spread the data in table A based on Table B based on how much capacity Table B has. So end result should be a file: User1 Hostb User2 Hostb User3 Hostc User4 Hostc User5 Hostc User6 Hostc User7 Hostd User8 Hostd User9 Hostd The order does not matter as long as each Host gets the capacity it can take. Also the SUM(TableB.Capacity) will always be COUNT(TableA.UserID) so there wont be any extra or less values to plug in. thanks, JM
