Re: Pig 0.6 average (AVG) question

Alex Parvulescu Tue, 09 Feb 2010 02:39:28 -0800

hello Mridul,

and thanks for the quick answer!


A itself is not null, just some group by values. I can't drop the nulls
because I also need a count in the group by, even if it's only null values.

I just wandered if theres anything to be done about the NPE to make it more
clear, that's all.

I guess you can see this as an eventual feature / improvement of some sort,
no problems :)

alex

On Tue, Feb 9, 2010 at 11:35 AM, Mridul Muralidharan
<[email protected]>wrote:

>
> On second thought, probably A itself is NULL - in which case you will need
> a null check on A, and not on A.v (which, I think, is handled iirc).
>
>
> Regards,
> Mridul
>
>
> On Tuesday 09 February 2010 04:02 PM, Mridul Muralidharan wrote:
>
>>
>> Without knowing rest of the script, you could do something like :
>>
>> C = FOREACH B {
>>    X = FILTER A BY v IS NOT NULL;
>>    GENERATE group, (int)AVG(X) as statsavg;
>> };
>>
>> I am assuming it is cos there are nulls in your bag field.
>>
>> Regards,
>> Mridul
>>
>>
>> On Tuesday 09 February 2010 03:52 PM, Alex Parvulescu wrote:
>>
>>> Hello,
>>>
>>> I ran into a NPE today, which seems to be my fault, but I'm wondering if
>>> there anythig that could be done to make the error more clear.
>>>
>>> What I did it is:
>>>   'C = FOREACH B GENERATE group, (int)AVG(A.v) as statsavg;'
>>> The problem here is the AVG ran into some null values and returned null.
>>> And
>>> consequently the cast failed with a NPE.
>>>
>>> This is the stacktrace
>>> 2010-02-09 11:14:36,444 [Thread-85] WARN
>>> org.apache.hadoop.mapred.LocalJobRunner - job_local_0006
>>> java.lang.NullPointerException
>>>      at org.apache.pig.builtin.IntAvg.getValue(IntAvg.java:282)
>>>      at org.apache.pig.builtin.IntAvg.getValue(IntAvg.java:39)
>>>      at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:208)
>>>      at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
>>>      at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:182)
>>>      at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:352)
>>>      at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:277)
>>>      at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:423)
>>>      at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:391)
>>>      at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:371)
>>>      at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:239)
>>>      at
>>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
>>>      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
>>>      at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:215)
>>>
>>> Now, because I'm not well aware how this works, I did not realize that
>>> the
>>> cast throws the NPE and not the computation of the average function on
>>> null
>>> values provided by the data set.
>>> I initially thought this was a bug in Pig.
>>>
>>> I know the NPE is all on me, but is there anything you can do to improve
>>> the
>>> error message
>>>
>>> thanks,
>>> alex
>>>
>>
>>
>

Re: Pig 0.6 average (AVG) question

Reply via email to