Offhand I think its dump faulty behavior after join combined with datatype 
misinterpretation, you can use store and that might work. However I would try 
using a foreach generate stmt after C and then filter..

D = foreach C generate $0 as fvar1, $1 as fvar2, (chararray)$2 as fvar3;
E = filter D by fvar3 is null;
Dump E; //verify result at null
E = filter D by fvar3 is not null;
Dump E; //Verify results for not null

Cheers,
/R

On 6/7/10 12:57 PM, "Alexander SchÀtzle" <[email protected]> wrote:

Hi all,

my script looks like this:

A = LOAD 'left_rel.txt' AS (var1, var2);
B = LOAD 'right_rel.txt' AS (var1, var3);
C = JOIN A BY var1 LEFT OUTER, B BY var1;
D = FILTER C BY $2 is null;
DUMP D;

But when I dump D I get the error "Unable to store alias D".
I suppose there is something going wrong with the Filter vor null-values (is 
not null also doesn't work).
What I want to do is to filter for the tuples in A which do not find a Join 
partner in B
Input files are attached.

Does anybody know what's going on and how to fix this?
By the way, I'm using Cloudera Distribution for Hadoop 3 Beta with pig 0.5.0.

Thx in advance,
Alex



Reply via email to