[ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095796#comment-16095796
 ] 

Ke Jia edited comment on HIVE-17139 at 7/27/17 3:41 AM:
--------------------------------------------------------

With this patch, I test "select case when a=1 then trim(b) end from 
test_orc_5000" in my development machine. The data scale is almost 50 million 
records in table test_orc_5000(a int, b string) stored as ORC. The execution 
engine is spark. I do three experiments and the average value is as below 
table. The result shows the execution time of spark from 35.76s to 32.57s, the 
time cost of VectorSelectOperator from 3.12s to 0.89s and the count of then 
expression evaluation from 49999735 to 5000712.

||  ||Non-optimization||Optimization||Improvement||
|Hos|35.76s|32.57s|8.9%|
|VectorSelectOperator|3.12s|0.89s|71.5%|
|count|49999735|5000712|8.99%|


                                                
                        

                        




was (Author: jk_self):
With this patch, I test "select case when a=1 then trim(b) end from 
test_orc_5000" in my development machine. The data scale is almost 50 million 
records in table test_orc_5000(a int, b string) stored as ORC. The execution 
engine is spark. I do three experiments and the average value is as below 
table. The result shows the execution time of spark from 35.76s to 32.57s, the 
time cost of VectorSelectOperator from 3.12s to 0.89s and the count of then 
expression evaluation from 49999735 to 5000712.

||  ||Non-optimization||Optimization||Improvement||
|Hos|35.76s|32.57s|8.9%|
|VectorSelectOperator|3.12s|0.89s|7.15%|
|count|49999735|5000712|8.99%|


                                                
                        

                        



> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-17139
>                 URL: https://issues.apache.org/jira/browse/HIVE-17139
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ke Jia
>            Assignee: Ke Jia
>         Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to