I'm trying to replace a couple of fields in a relation with values
looked up in another relation.  Here's an example; let's say I have a
relation mapping each integer to its square:

-----map.txt-----
1    1
2    4
3    9
4    16
5    25

Then I have some data, let's call the columns a and b:

-----data.txt-----
1    2
3    4
5    2

I want to replace each number in the data with its square.  My basic
approach is to join 'a' with the key, then generate the value; then
join 'b' with the key, and generate that value. Here's my pig script:

m = load 'map.txt' as (k,v);
data = load 'data.txt' as (a,b);
x = join m by k, data by a;
y = foreach x generate v as aa, b;
z = join m by k, y by b;
w = foreach z generate aa, v as bb;
dump w;

This outputs:

(4,4)
(4,4)
(16,16)

The problem is it y's version of v gets replaced with w's version.  I
expect it to output:

(1, 4)
(9, 16)
(25, 4)

What's weird is I'm pretty sure this used to work in Pig 0.7.  If
there's a better way to do this (using maps?), please let me know.
I'm using Pig 0.8 with Cloudera CDH3b4.

Thanks.

Reply via email to