[ https://issues.apache.org/jira/browse/HIVE-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333879#comment-14333879 ]
Naveen Gangam commented on HIVE-9755: ------------------------------------- When a mapper returns an empty result set, the ngram UDAF has nothing to merge during the reduce phase, merge(). The code {code} int n = Integer.parseInt(partialNGrams.get(partialNGrams.size()-1).toString()); if(myagg.n > 0 && myagg.n != n) { throw new HiveException(getClass().getSimpleName() + ": mismatch in value for 'n'" + ", which usually is caused by a non-constant expression. Found '"+n+"' and '" + myagg.n + "'."); } {code} In the code snippet above, the variables "n" and "myagg.n" refer to the same value (the n in nGrams). This value gets added to end of the partial nGrams list generated by each mapper. However, this value gets initialized during the map phase (iterate() method call). So if iterate() is never called, when the mapper resultset is empty, this value is never initialized to the "n" value from the query so defaults to java integer default of 0. The merge() method currently checks for null partial objects {code} public void merge(AggregationBuffer agg, Object partial) throws HiveException { if(partial == null) { return; } {code} Given the design, there is atleast one element is this partial buffer (the "n" value) so it may never be null. The merge() should be a no-op if the value of "n" is ZERO. I will upload a patch shortly. > Hive built-in "ngram" UDAF fails when a mapper has no matches. > -------------------------------------------------------------- > > Key: HIVE-9755 > URL: https://issues.apache.org/jira/browse/HIVE-9755 > Project: Hive > Issue Type: Bug > Components: UDF > Affects Versions: 0.14.0 > Reporter: Naveen Gangam > Assignee: Naveen Gangam > Priority: Critical > > hive> describe ngramtest; > OK > col1 int > col3 string > Time taken: 0.192 seconds, Fetched: 2 row(s) > SELECT explode(ngrams(sentences(lower(t.col3)), 3, 10)) as x FROM (SELECT > col3 FROM ngramtest WHERE col1=0) t; > when any result has value equal null, returned the error. > 2015-01-08 09:15:00,262 FATAL ExecReducer: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) > {"key":{},"value":{"_col0":["0","0","0","0"]},"alias":0} > at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:258) > at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > > at org.apache.hadoop.mapred.Child.main(Child.java:262) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > GenericUDAFnGramEvaluator: mismatch in value for 'n', which usually is caused > by a non-constant expression. Found '0' and '1'. > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFnGrams$GenericUDAFnGramEvaluator.merge(GenericUDAFnGrams.java:242) > > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:142) > > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:658) > > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:911) > > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:753) > > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:819) > > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) > at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:249) -- This message was sent by Atlassian JIRA (v6.3.4#6332)