I'm trying to replace a couple of fields in a relation with values looked up in another relation. Here's an example; let's say I have a relation mapping each integer to its square:
-----map.txt----- 1 1 2 4 3 9 4 16 5 25 Then I have some data, let's call the columns a and b: -----data.txt----- 1 2 3 4 5 2 I want to replace each number in the data with its square. My basic approach is to join 'a' with the key, then generate the value; then join 'b' with the key, and generate that value. Here's my pig script: m = load 'map.txt' as (k,v); data = load 'data.txt' as (a,b); x = join m by k, data by a; y = foreach x generate v as aa, b; z = join m by k, y by b; w = foreach z generate aa, v as bb; dump w; This outputs: (4,4) (4,4) (16,16) The problem is it y's version of v gets replaced with w's version. I expect it to output: (1, 4) (9, 16) (25, 4) What's weird is I'm pretty sure this used to work in Pig 0.7. If there's a better way to do this (using maps?), please let me know. I'm using Pig 0.8 with Cloudera CDH3b4. Thanks.
