First, when troubleshooting (and just in general), I prefer to break steps
out into multiple lines instead of trying to be overly expressive in one
line. Pig scripts in general aren't so large that breaking it out doesn't
aid a lot in debugging, but this is of course personal style.
I create a file thing.txt, whose contents are as follows:
1,1
1,2
1,3
1,4
,
,
1,
2,
,3
4,
6,6
4,1
2,3
8,
9
9
So there are some null lines, some lines with only one, the other, etc. Here
is the script I ran. Caveat: I'm running pig trunk.
register /home/jcoveney/pig/build/ivy/lib/Pig/antlr-runtime-3.2.jar;
register /home/jcoveney/pig/contrib/piggybank/java/piggybank.jar;
A = LOAD 'thing.txt' USING PigStorage(',') AS (rank1,rank2);
B = FILTER A BY rank1 is not null OR rank2 is not null;
C = FOREACH B GENERATE ( rank1 is null ? rank2 : rank1 ) as rank1, ( rank2
is null ? rank1 : rank2 ) as rank2;
D = FOREACH C GENERATE
org.apache.pig.piggybank.evaluation.math.MAX(rank1,rank2);
This worked fine.
2011/6/16 Lakshminarayana Motamarri <[email protected]>
>
> Hi all,
>
> Thanks Jonathan and Daniel for prompt responses..
>
> Based on ur suggestions, I tried as following...
>
> * Code:*
>
> REGISTER
> /home/training/Desktop/1pig/pig-0.7.0/contrib/piggybank/piggybank.jar
> *
> // all 3 combinations of A, are followed by four combinations of B:*
> * A = LOAD 'FF23_Filtered1.txt' AS (appID: float, rankW2: float,
> rankW3: float);
> A = LOAD 'FF23_Filtered1.txt' AS (appID: int, rankW2: int, rankW3:
> int);
> A = LOAD 'FF23_Filtered1.txt' AS (appID, rankW2, rankW3);
> *
> *B = FOREACH A GENERATE appID,
> org.apache.pig.piggybank.evaluation.math.MAX((double)rankW2,
> (double)rankW3); **
> store B into 'FF23_FJM.txt'; **//received null pointer
> exception.**
> **
> B = FOREACH A GENERATE appID,
> org.apache.pig.piggybank.evaluation.math.MAX(((double)rankW2 is null ?
> (double)rankW3 : (double)rankW2), ((double)rankW3 is null ? (double)rankW2 :
> (double)rankW3));
> store B into 'FF23_FJM.txt'; **//received nullpointer
> exception.*
> *
> B = FOREACH A GENERATE appID,
> org.apache.pig.piggybank.evaluation.math.MAX(((double)rankW2 is null ?
> (double)rankW3 : (double)rankW2) AS (double)rankW2, ((double)rankW3 is null
> ? (double)rankW2 : (double)rankW3) AS (double)rankW3)); **//
> received invalid alias error**
>
>
> B = FOREACH A GENERATE appID,
> org.apache.pig.piggybank.evaluation.math.MAX((rankW2 is null ? rankW3 :
> rankW2) AS (double)rankW2, (rankW3 is null ? rankW2 : rankW3) AS
> (double)rankW3)); **
> **//invalid alias**
>
> -> As mentioned above, in all 12 combinations of the trails, I got the
> corresponding exceptions, as mentioned with B's... Please advise, if I
> missed some thing...
>
> **the details of both exceptions are:**
> 1) org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught
> error from UDF: org.apache.pig.piggybank.evaluation.math.DoubleMax [Caught
> exception processing input row [null]]
>
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:263)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:269)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.io.IOException: Caught exception processing input row
> [null]
> at
> org.apache.pig.piggybank.evaluation.math.DoubleMax.exec(DoubleMax.java:70)
> at
> org.apache.pig.piggybank.evaluation.math.DoubleMax.exec(DoubleMax.java:57)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
> ... 10 more
> Caused by: java.lang.NullPointerException
> ... 13 more
>
> 2)---
> ERROR 1000: Error during parsing. Invalid alias: org in {appID:
> float,rankW2: float,rankW3: float}
>
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error
> during parsing. Invalid alias: org in {appID: float,rankW2: float,rankW3:
> float}
> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1037)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:981)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:383)
> at
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:717)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:273)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
> at org.apache.pig.Main.main(Main.java:363)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid
> alias: org in {appID: float,rankW2: float,rankW3: float}
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:6731)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:6575)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:4682)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:4579)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.java:4525)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:4434)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:4360)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:4326)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:4252)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:4175)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:4119)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:3528)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:2938)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1314)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:893)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:682)
> at
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1031)
> ... 8 more
> *
> ---
> Thanks & Regards,
> Narayan.
>
>
> On Thu, Jun 16, 2011 at 11:30 AM, Daniel Dai <[email protected]>wrote:
>
>> Jonathan is right. math.MAX does not handle null input. Check for null
>> before feeding into MAX is necessary.
>>
>> Daniel
>
>