Hi,
I'm trying to do a left outer join of two files, on eight keys, but it
always seems that the keys don't match. I'm able to reproduce this with the
code and files below.
I expect to see lines like these:
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;2213;;...;10;31;;;0;3
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;2213;;...;0;-1;;;1;0
But all I get are values from j2, with no matches from "failed".
Is this a bug, or just a too-long day? I'm using CDH 3b3, pig version Apache
Pig version 0.7.0+16
Thanks,
Marko
failed = LOAD 'hdfs:///user/hadoop/out/failed.csv' USING PigStorage(';') AS
(y:int, m:int, d:int, n:int, i:chararray, t:chararray,
l2:int,l3:int,r1:int,r2:int,r3:int,r4:int,dclr:int, bclr:int);
j2 = LOAD 'hdfs:///user/hadoop/out/j2.csv' USING PigStorage(';') AS
(y:int, m:int, d:int, n:int, i:chararray, t:chararray,
total:int, dc:int, bc:int,l2:int, l3:int,
total_l: int, dcl:int, bcl:int);
j3 = JOIN j2 BY (y,m,d,n,i,t,l2,l3) LEFT OUTER,
failed BY (y,m,d,n,i,t,l2,l3);
j2:
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;2213;;5;1;3
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;2963;;37;0;0
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;6861;;1;0;0
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;241;;3;0;0
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;301;;6;2;2
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;2652;;5;0;0
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;2212;;1;0;0
failed:
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;301;;10;31;;;0;2
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;2213;;10;31;;;0;3
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;301;;0;-1;;;1;0
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;301;;9;21;;;1;0
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;2213;;0;-1;;;1;0