The mapping will have to be done in a udf. The udf would return a bag of tuples.
Pig query would look like this - mapped_tuples = foreach input generate FLATTEN mapudf(bagcol); In pig 0.8 (to be released in few days), you can also write your udfs in python - http://wiki.apache.org/pig/UDFsUsingScriptingLanguages Thanks, Thejas On 11/29/10 2:51 PM, "Matt Tanquary" <[email protected]> wrote: > I have this problem which I solved easily with M/R but I'm trying to solve > through PIG instead: > > Given the following bags, perform a lookup in a special table to retrieve 4 > additional variations of the data: > {(10), (15)} > {(5} > {(5), (10), (15)} > > Lookup table: > 5 15 30 8 2 > 10 125 135 13 3 > 15 4 90 10 1 > > Note the lookup table has 5 columns, 1 for each level. The bags are given as > level 1 data, so you will find that value in the first column of the lookup. > Now, for the fun part: Need to create new bags for each level based on the > given level 1 data. For instance: > > {(10), (15)} IN would yield the additional bags: > {(125), (4)} > {(135), (90)} > {(13), (10)} > {(3), (1)} > > additionally: > {(5)} IN would yield: > {(15)} > {(30)} > {(8)} > {(2)} > > So, this is the final big picture: > Records IN: > {(10), (15)} > {(5)} > > Records OUT: > {(10), (15)} > {(125), (4)} > {(135), (90)} > {(13), (10)} > {(3), (1)} > {(5)} > {(15)} > {(30)} > {(8)} > {(2)} > > The cases where there is only one item in a bag is simple, but when more > than one are introduced I am unable to determine an efficient way to tackle > this. As a side note, I will probably only need to process up to 3 items in > a bag in this manner. > > I hope this makes sense. Any assistance is much appreciated. > Regards, > -M@ >
