Someone from pig team can answer better if there is any impl issues here with average. But assuming there are none, if you can treat null's as zeros - you could add additional checks to the statements, to allow it to proceed.
Something to check for : a) If A == null, generate 0. b) If A.v == null, generate 0. (This is a strong possibility too). Regards, Mridul On Tuesday 09 February 2010 04:08 PM, Alex Parvulescu wrote:
hello Mridul, and thanks for the quick answer! A itself is not null, just some group by values. I can't drop the nulls because I also need a count in the group by, even if it's only null values. I just wandered if theres anything to be done about the NPE to make it more clear, that's all. I guess you can see this as an eventual feature / improvement of some sort, no problems :) alex On Tue, Feb 9, 2010 at 11:35 AM, Mridul Muralidharan <[email protected] <mailto:[email protected]>> wrote: On second thought, probably A itself is NULL - in which case you will need a null check on A, and not on A.v (which, I think, is handled iirc). Regards, Mridul On Tuesday 09 February 2010 04:02 PM, Mridul Muralidharan wrote: Without knowing rest of the script, you could do something like : C = FOREACH B { X = FILTER A BY v IS NOT NULL; GENERATE group, (int)AVG(X) as statsavg; }; I am assuming it is cos there are nulls in your bag field. Regards, Mridul On Tuesday 09 February 2010 03:52 PM, Alex Parvulescu wrote: Hello, I ran into a NPE today, which seems to be my fault, but I'm wondering if there anythig that could be done to make the error more clear. What I did it is: 'C = FOREACH B GENERATE group, (int)AVG(A.v) as statsavg;' The problem here is the AVG ran into some null values and returned null. And consequently the cast failed with a NPE. This is the stacktrace 2010-02-09 11:14:36,444 [Thread-85] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0006 java.lang.NullPointerException at org.apache.pig.builtin.IntAvg.getValue(IntAvg.java:282) at org.apache.pig.builtin.IntAvg.getValue(IntAvg.java:39) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:208) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:182) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:352) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:277) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:423) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:391) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:371) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:239) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:215) Now, because I'm not well aware how this works, I did not realize that the cast throws the NPE and not the computation of the average function on null values provided by the data set. I initially thought this was a bug in Pig. I know the NPE is all on me, but is there anything you can do to improve the error message thanks, alex
