There is already a JIRA (with patch) opened for this - http://issues.apache.org/jira/browse/PIG-1233
-...@nkur On 2/11/10 2:01 PM, "Alex Parvulescu" <[email protected]> wrote: Hello, thanks Dmitriy! Wow how could I have missed that one? seems easy enough: AVG( val == null ? 0 : val) I'll give it a go asap :) Here is the Jira issue, I hope I got everything in there https://issues.apache.org/jira/browse/PIG-1236 thanks, Alex On Tue, Feb 9, 2010 at 6:04 PM, Dmitriy Ryaboy <[email protected]> wrote: > This is a legit bug, I think, in the new accumulator interface > implementation. Nice find, Alex. Can you open a jira? > > btw, I saw on your blog you had some issues with how pig was ignoring > nulls when calculating average values before (this is documented and > expected behavior btw), and wound up writing your own. You don't > really need to: > > averages = foreach A generate AVG( val == null ? 0 : val); > > > On Tue, Feb 9, 2010 at 2:57 AM, Mridul Muralidharan > <[email protected]> wrote: > > > > Someone from pig team can answer better if there is any impl issues here > > with average. > > But assuming there are none, if you can treat null's as zeros - you could > > add additional checks to the statements, to allow it to proceed. > > > > Something to check for : > > a) If A == null, generate 0. > > b) If A.v == null, generate 0. (This is a strong possibility too). > > > > > > Regards, > > Mridul > > > > On Tuesday 09 February 2010 04:08 PM, Alex Parvulescu wrote: > >> > >> hello Mridul, > >> > >> and thanks for the quick answer! > >> > >> A itself is not null, just some group by values. I can't drop the nulls > >> because I also need a count in the group by, even if it's only null > >> values. > >> > >> I just wandered if theres anything to be done about the NPE to make it > >> more clear, that's all. > >> > >> I guess you can see this as an eventual feature / improvement of some > >> sort, no problems :) > >> > >> alex > >> > >> On Tue, Feb 9, 2010 at 11:35 AM, Mridul Muralidharan > >> <[email protected] <mailto:[email protected]>> wrote: > >> > >> > >> On second thought, probably A itself is NULL - in which case you > >> will need a null check on A, and not on A.v (which, I think, is > >> handled iirc). > >> > >> > >> Regards, > >> Mridul > >> > >> > >> On Tuesday 09 February 2010 04:02 PM, Mridul Muralidharan wrote: > >> > >> > >> Without knowing rest of the script, you could do something like : > >> > >> C = FOREACH B { > >> X = FILTER A BY v IS NOT NULL; > >> GENERATE group, (int)AVG(X) as statsavg; > >> }; > >> > >> I am assuming it is cos there are nulls in your bag field. > >> > >> Regards, > >> Mridul > >> > >> > >> On Tuesday 09 February 2010 03:52 PM, Alex Parvulescu wrote: > >> > >> Hello, > >> > >> I ran into a NPE today, which seems to be my fault, but I'm > >> wondering if > >> there anythig that could be done to make the error more > clear. > >> > >> What I did it is: > >> 'C = FOREACH B GENERATE group, (int)AVG(A.v) as statsavg;' > >> The problem here is the AVG ran into some null values and > >> returned null. And > >> consequently the cast failed with a NPE. > >> > >> This is the stacktrace > >> 2010-02-09 11:14:36,444 [Thread-85] WARN > >> org.apache.hadoop.mapred.LocalJobRunner - job_local_0006 > >> java.lang.NullPointerException > >> at > >> org.apache.pig.builtin.IntAvg.getValue(IntAvg.java:282) > >> at > org.apache.pig.builtin.IntAvg.getValue(IntAvg.java:39) > >> at > >> > >> > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:208) > >> at > >> > >> > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281) > >> at > >> > >> > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:182) > >> at > >> > >> > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:352) > >> at > >> > >> > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:277) > >> at > >> > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:423) > >> at > >> > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:391) > >> at > >> > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:371) > >> at > >> > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:239) > >> at > >> > >> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463) > >> at > >> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) > >> at > >> > >> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:215) > >> > >> Now, because I'm not well aware how this works, I did not > >> realize that the > >> cast throws the NPE and not the computation of the average > >> function on null > >> values provided by the data set. > >> I initially thought this was a bug in Pig. > >> > >> I know the NPE is all on me, but is there anything you can > >> do to improve the > >> error message > >> > >> thanks, > >> alex > >> > >> > >> > >> > > > > >
