I am using Pig 0.7 w/ stock Apache Hadoop 0.20.2. Works on both local and mapreduce mode.
$ pig -d WARN test.pig ... (c,x,,) $ cat left_rel.txt a x a y b x b y c x $ cat right_rel.txt a 5 a 10 b 5 b 10 $ cat test.pig A = LOAD 'left_rel.txt' AS (var1, var2); B = LOAD 'right_rel.txt' AS (var1, var3); C = JOIN A BY var1 LEFT OUTER, B BY var1; D = FILTER C BY $2 is null; DUMP D; - Sandip On Mon, Jun 7, 2010 at 11:18 PM, Dmitriy Ryaboy <[email protected]> wrote: > I can reproduce this in 0.6, and it appears to have nothing to do with your > data or with the DUMP operator -- a simple "explain" on D causes the same > problem. Looks like there is something wrong with how the query plan gets > compiled: > > Caused by: java.lang.NullPointerException > at org.apache.pig.impl.plan.OperatorPlan.add(OperatorPlan.java:152) > at > org.apache.pig.impl.logicalLayer.parser.QueryParser.generateStorePlan(QueryParser.java:128) > at org.apache.pig.PigServer.store(PigServer.java:552) > ... 7 more > > > Haven't tried on 0.7 > > -D > > > On Mon, Jun 7, 2010 at 5:10 AM, Alexander Schätzle < > [email protected]> wrote: > >> I exchanged the FILTER statement by a SPLIT: >> >> SPLIT C into D if var3 is null, E if var3 is not null; >> >> Now, this works! >> Obviously there is a problem with null-values in the FILTER statement. >> Does anybody know what's the problem? >> >> Cheers, >> Alex >> >> >> >> ________________________________ >> Von: Rekha Joshi <[email protected]> >> An: "[email protected]" <[email protected]> >> Gesendet: Montag, den 7. Juni 2010, 10:22:19 Uhr >> Betreff: Re: Unable to store alias >> >> Offhand I think its dump faulty behavior after join combined with datatype >> misinterpretation, you can use store and that might work. However I would >> try using a foreach generate stmt after C and then filter.. >> >> D = foreach C generate $0 as fvar1, $1 as fvar2, (chararray)$2 as fvar3; >> E = filter D by fvar3 is null; >> Dump E; //verify result at null >> E = filter D by fvar3 is not null; >> Dump E; //Verify results for not null >> >> Cheers, >> /R >> >> On 6/7/10 12:57 PM, "Alexander SchÀtzle" <[email protected]> >> wrote: >> >> Hi all, >> >> my script looks like this: >> >> A = LOAD 'left_rel.txt' AS (var1, var2); >> B = LOAD 'right_rel.txt' AS (var1, var3); >> C = JOIN A BY var1 LEFT OUTER, B BY var1; >> D = FILTER C BY $2 is null; >> DUMP D; >> >> But when I dump D I get the error "Unable to store alias D". >> I suppose there is something going wrong with the Filter vor null-values >> (is not null also doesn't work). >> What I want to do is to filter for the tuples in A which do not find a Join >> partner in B >> Input files are attached. >> >> Does anybody know what's going on and how to fix this? >> By the way, I'm using Cloudera Distribution for Hadoop 3 Beta with pig >> 0.5.0. >> >> Thx in advance, >> Alex >> >> > -- http://www.pedalogue.com
