This is a legit bug, I think, in the new accumulator interface
implementation. Nice find, Alex. Can you open a jira?

btw, I saw on your blog you had some issues with how pig was ignoring
nulls when calculating average values before (this is documented and
expected behavior btw), and wound up writing your own. You don't
really need to:

averages = foreach A generate AVG( val == null ? 0 : val);


On Tue, Feb 9, 2010 at 2:57 AM, Mridul Muralidharan
<[email protected]> wrote:
>
> Someone from pig team can answer better if there is any impl issues here
> with average.
> But assuming there are none, if you can treat null's as zeros - you could
> add additional checks to the statements, to allow it to proceed.
>
> Something to check for :
> a) If A == null, generate 0.
> b) If A.v == null, generate 0. (This is a strong possibility too).
>
>
> Regards,
> Mridul
>
> On Tuesday 09 February 2010 04:08 PM, Alex Parvulescu wrote:
>>
>> hello Mridul,
>>
>> and thanks for the quick answer!
>>
>> A itself is not null, just some group by values. I can't drop the nulls
>> because I also need a count in the group by, even if it's only null
>> values.
>>
>> I just wandered if theres anything to be done about the NPE to make it
>> more clear, that's all.
>>
>> I guess you can see this as an eventual feature / improvement of some
>> sort, no problems :)
>>
>> alex
>>
>> On Tue, Feb 9, 2010 at 11:35 AM, Mridul Muralidharan
>> <[email protected] <mailto:[email protected]>> wrote:
>>
>>
>>    On second thought, probably A itself is NULL - in which case you
>>    will need a null check on A, and not on A.v (which, I think, is
>>    handled iirc).
>>
>>
>>    Regards,
>>    Mridul
>>
>>
>>    On Tuesday 09 February 2010 04:02 PM, Mridul Muralidharan wrote:
>>
>>
>>        Without knowing rest of the script, you could do something like :
>>
>>        C = FOREACH B {
>>            X = FILTER A BY v IS NOT NULL;
>>            GENERATE group, (int)AVG(X) as statsavg;
>>        };
>>
>>        I am assuming it is cos there are nulls in your bag field.
>>
>>        Regards,
>>        Mridul
>>
>>
>>        On Tuesday 09 February 2010 03:52 PM, Alex Parvulescu wrote:
>>
>>            Hello,
>>
>>            I ran into a NPE today, which seems to be my fault, but I'm
>>            wondering if
>>            there anythig that could be done to make the error more clear.
>>
>>            What I did it is:
>>            'C = FOREACH B GENERATE group, (int)AVG(A.v) as statsavg;'
>>            The problem here is the AVG ran into some null values and
>>            returned null. And
>>            consequently the cast failed with a NPE.
>>
>>            This is the stacktrace
>>            2010-02-09 11:14:36,444 [Thread-85] WARN
>>            org.apache.hadoop.mapred.LocalJobRunner - job_local_0006
>>            java.lang.NullPointerException
>>                  at
>> org.apache.pig.builtin.IntAvg.getValue(IntAvg.java:282)
>>                  at org.apache.pig.builtin.IntAvg.getValue(IntAvg.java:39)
>>                  at
>>
>>  org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:208)
>>                  at
>>
>>  org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
>>                  at
>>
>>  org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:182)
>>                  at
>>
>>  org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:352)
>>                  at
>>
>>  org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:277)
>>                  at
>>
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:423)
>>                  at
>>
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:391)
>>                  at
>>
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:371)
>>                  at
>>
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:239)
>>                  at
>>
>>  org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
>>                  at
>>            org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
>>                  at
>>
>>  org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:215)
>>
>>            Now, because I'm not well aware how this works, I did not
>>            realize that the
>>            cast throws the NPE and not the computation of the average
>>            function on null
>>            values provided by the data set.
>>            I initially thought this was a bug in Pig.
>>
>>            I know the NPE is all on me, but is there anything you can
>>            do to improve the
>>            error message
>>
>>            thanks,
>>            alex
>>
>>
>>
>>
>
>

Reply via email to