Someone from pig team can answer better if there is any impl issues here with average. But assuming there are none, if you can treat null's as zeros - you could add additional checks to the statements, to allow it to proceed.

Something to check for :
a) If A == null, generate 0.
b) If A.v == null, generate 0. (This is a strong possibility too).


Regards,
Mridul

On Tuesday 09 February 2010 04:08 PM, Alex Parvulescu wrote:
hello Mridul,

and thanks for the quick answer!

A itself is not null, just some group by values. I can't drop the nulls
because I also need a count in the group by, even if it's only null values.

I just wandered if theres anything to be done about the NPE to make it
more clear, that's all.

I guess you can see this as an eventual feature / improvement of some
sort, no problems :)

alex

On Tue, Feb 9, 2010 at 11:35 AM, Mridul Muralidharan
<[email protected] <mailto:[email protected]>> wrote:


    On second thought, probably A itself is NULL - in which case you
    will need a null check on A, and not on A.v (which, I think, is
    handled iirc).


    Regards,
    Mridul


    On Tuesday 09 February 2010 04:02 PM, Mridul Muralidharan wrote:


        Without knowing rest of the script, you could do something like :

        C = FOREACH B {
            X = FILTER A BY v IS NOT NULL;
            GENERATE group, (int)AVG(X) as statsavg;
        };

        I am assuming it is cos there are nulls in your bag field.

        Regards,
        Mridul


        On Tuesday 09 February 2010 03:52 PM, Alex Parvulescu wrote:

            Hello,

            I ran into a NPE today, which seems to be my fault, but I'm
            wondering if
            there anythig that could be done to make the error more clear.

            What I did it is:
            'C = FOREACH B GENERATE group, (int)AVG(A.v) as statsavg;'
            The problem here is the AVG ran into some null values and
            returned null. And
            consequently the cast failed with a NPE.

            This is the stacktrace
            2010-02-09 11:14:36,444 [Thread-85] WARN
            org.apache.hadoop.mapred.LocalJobRunner - job_local_0006
            java.lang.NullPointerException
                  at org.apache.pig.builtin.IntAvg.getValue(IntAvg.java:282)
                  at org.apache.pig.builtin.IntAvg.getValue(IntAvg.java:39)
                  at
            
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:208)
                  at
            
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
                  at
            
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:182)
                  at
            
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:352)
                  at
            
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:277)
                  at
            
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:423)
                  at
            
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:391)
                  at
            
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:371)
                  at
            
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:239)
                  at
            
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
                  at
            org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
                  at
            
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:215)

            Now, because I'm not well aware how this works, I did not
            realize that the
            cast throws the NPE and not the computation of the average
            function on null
            values provided by the data set.
            I initially thought this was a bug in Pig.

            I know the NPE is all on me, but is there anything you can
            do to improve the
            error message

            thanks,
            alex





Reply via email to