In case of co-group, if nothing matched the group key, you get an empty bag, not null.
So checking for COUNT(alias) == 0 is what you need. Regards, Mridul On Wednesday 21 April 2010 03:37 PM, Alexander Schätzle wrote:
Hello, I want to use IS NULL in a FILTER but the behavior seems to be a Bug: I make a LeftJoin with a result of 7 tuples with fields 's' and 'nick'. 4 tuples have a value for 'nick', the other 3 don't have a value for 'nick'. Afterwards I want to filter so that only the 3 tuples without a nick are left: Filter1 = FILTER LeftJoin1 BY nick is null; But as result I get all 7 tuples but all of them now don't have a nick! So what's going on there!? If I use IS NOT NULL instead I get all 7 tuples unchanged! This is the complete script, input data can be found in the attachment: indata = LOAD 'foaf' USING PigStorage() AS (s,p,o); f1 = FILTER indata BY p == 'ex:type' AND o == 'ex:Person'; BGP1 = FOREACH f1 GENERATE s AS s; f1 = FILTER indata BY p == 'ex:nick'; BGP2 = FOREACH f1 GENERATE s AS s, o AS nick; lj1 = JOIN BGP1 BY s LEFT OUTER, BGP2 BY s; LEFTJOIN1 = FOREACH lj1 GENERATE $0 AS s, $2 AS nick; FILTER1 = FILTER LEFTJOIN1 BY nick is null; STORE FILTER1 INTO 'outfile' USING PigStorage(); Can anyone help me what's going wrong? Thx, Alex
