[jira] [Commented] (HIVE-9755) Hive built-in "ngram" UDAF fails when a mapper has no matches.

Naveen Gangam (JIRA) Mon, 23 Feb 2015 14:08:49 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333879#comment-14333879
 ]


Naveen Gangam commented on HIVE-9755:
-------------------------------------

When a mapper returns an empty result set, the ngram UDAF has nothing to merge 
during the reduce phase, merge(). The code
{code}
int n = Integer.parseInt(partialNGrams.get(partialNGrams.size()-1).toString());
if(myagg.n > 0 && myagg.n != n) {
        throw new HiveException(getClass().getSimpleName() + ": mismatch in 
value for 'n'"
            + ", which usually is caused by a non-constant expression. Found 
'"+n+"' and '"
            + myagg.n + "'.");
      }
{code}
In the code snippet above, the variables "n" and "myagg.n" refer to the same 
value (the n in nGrams). This value gets added to end of the partial nGrams 
list generated by each mapper. However, this value gets initialized during the 
map phase (iterate() method call). So if iterate() is never called, when the 
mapper resultset is empty, this value is never initialized to the "n" value 
from the query so defaults to java integer default of 0.

The merge() method currently checks for null partial objects
{code}
    public void merge(AggregationBuffer agg, Object partial) throws 
HiveException {
      if(partial == null) {
        return;
      }
{code}

Given the design, there is atleast one element is this partial buffer (the "n" 
value) so it may never be null. The merge() should be a no-op if the value of 
"n" is ZERO.

I will upload a patch shortly.
 

> Hive built-in "ngram" UDAF fails when a mapper has no matches.
> --------------------------------------------------------------
>
>                 Key: HIVE-9755
>                 URL: https://issues.apache.org/jira/browse/HIVE-9755
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 0.14.0
>            Reporter: Naveen Gangam
>            Assignee: Naveen Gangam
>            Priority: Critical
>
> hive> describe ngramtest;
> OK
> col1                  int                                         
> col3                  string                                      
> Time taken: 0.192 seconds, Fetched: 2 row(s)
> SELECT explode(ngrams(sentences(lower(t.col3)), 3, 10)) as x FROM (SELECT 
> col3  FROM ngramtest WHERE col1=0) t;
> when any result has value equal null, returned the error. 
> 2015-01-08 09:15:00,262 FATAL ExecReducer: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) 
> {"key":{},"value":{"_col0":["0","0","0","0"]},"alias":0} 
> at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:258) 
> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) 
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) 
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:396) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>  
> at org.apache.hadoop.mapred.Child.main(Child.java:262) 
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> GenericUDAFnGramEvaluator: mismatch in value for 'n', which usually is caused 
> by a non-constant expression. Found '0' and '1'. 
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFnGrams$GenericUDAFnGramEvaluator.merge(GenericUDAFnGrams.java:242)
>  
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:142)
>  
> at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:658)
>  
> at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:911)
>  
> at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:753)
>  
> at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:819)
>  
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) 
> at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:249) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9755) Hive built-in "ngram" UDAF fails when a mapper has no matches.

Reply via email to