[ 
https://issues.apache.org/jira/browse/HIVE-22903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041312#comment-17041312
 ] 

Ramesh Kumar Thangarajan edited comment on HIVE-22903 at 2/20/20 9:39 PM:
--------------------------------------------------------------------------

[~ShubhamChaurasia] I was thinking something like
{code:java}
for (VectorPTFEvaluatorBase evaluator : evaluators) {
  if(!(evaluator instanceof VectorPTFEvaluatorRowNumber && 
verifyEvaluatorArgumentsAreConstant)) {
    evaluator.resetEvaluator();
  }
}
{code}
Need to pass the arguments of each of the evaluators to compute this –  
verifyEvaluatorArgumentsAreConstant

Looking more into this, the problem doesn't look specific to constants too. For 
example, we reset the evaluators for every batch. So the problem should exists 
for grouping by columns too. We might notice the issue if we actually group by 
a column, where the column contains a repeated value for more than 1024 
times(spanning the VRB size). Thinking more about this, it looks like we are 
not calling the resetEvaluators() at the right place in the code. I think we 
are not differentiating between the partition groups and the row batch groups. 
We should only reset for the partition groups and not for the row batch groups.

 


was (Author: rameshkumar):
I was thinking something like

 
{code:java}
for (VectorPTFEvaluatorBase evaluator : evaluators) {
  if(!(evaluator instanceof VectorPTFEvaluatorRowNumber && 
verifyEvaluatorArgumentsAreConstant)) {
    evaluator.resetEvaluator();
  }
}
{code}
Need to pass the arguments of each of the evaluators to compute this –  
verifyEvaluatorArgumentsAreConstant

Looking more into this, the problem doesn't look specific to constants too. For 
example, we reset the evaluators for every batch. So the problem should exists 
for grouping by columns too. We might notice the issue if we actually group by 
a column, where the column contains a repeated value for more than 1024 
times(spanning the VRB size). Thinking more about this, it looks like we are 
not calling the resetEvaluators() at the right place in the code. I think we 
are not differentiating between the partition groups and the row batch groups. 
We should only reset for the partition groups and not for the row batch groups.

 

> Vectorized row_number() resets the row number after one batch in case of 
> constant expression in partition clause
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-22903
>                 URL: https://issues.apache.org/jira/browse/HIVE-22903
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF, Vectorization
>    Affects Versions: 4.0.0
>            Reporter: Shubham Chaurasia
>            Assignee: Shubham Chaurasia
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-22903.01.patch, HIVE-22903.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Vectorized row number implementation resets the row number when constant 
> expression is passed in partition clause.
> Repro Query
> {code}
> select row_number() over(partition by 1) r1, t from over10k_n8;
> Or
> select row_number() over() r1, t from over10k_n8;
> {code}
> where table over10k_n8 contains more than 1024 records.
> This happens because currently in VectorPTFOperator, we reset evaluators if 
> only partition clause is there.
> {code:java}
>     // If we are only processing a PARTITION BY, reset our evaluators.
>     if (!isPartitionOrderBy) {
>       groupBatches.resetEvaluators();
>     }
> {code}
> To resolve, we should also check if the entire partition clause is a constant 
> expression, if it is so then we should not do 
> {{groupBatches.resetEvaluators()}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to