[jira] [Comment Edited] (HIVE-10179) Optimization for SIMD instructions in Hive

liyunzhang (JIRA) Mon, 13 Nov 2017 23:29:40 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-10179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16251015#comment-16251015
 ]


liyunzhang edited comment on HIVE-10179 at 11/14/17 7:28 AM:
-------------------------------------------------------------

[~teddy.choi]: i want ask a question about 
[DoubleColAddRepeatingDoubleColumnBench|https://github.com/apache/hive/blob/master/itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizedArithmeticBench.java#L53].
 Why should we test {{DoubleColAddRepeatingDoubleColumnBench}}, in my view, 
this test relates col1+col2 and the elements in col2 is same.  Is there any 
difference between {{DoubleColAddRepeatingDoubleColumnBench}} and 
{{DoubleColAddDoubleColumnBench}} in SIMD instructions?
I add some code in VectorizedArithmeticBench.java like following
{code}
  public static class DoubleColAddDoubleColumnBench extends AbstractExpression {
    @Override
    public void setup() {
      rowBatch = buildRowBatch(new DoubleColumnVector(), 2, 
getDoubleColumnVector(),
          getDoubleColumnVector());
      expression = new DoubleColAddDoubleColumn(0, 1, 2); 
    }   
  }
{code}

After testing {{DoubleColAddDoubleColumnBench}} and 
{{DoubleColAddRepeatingDoubleColumnBench}}, I found
|| ||AVX1||AVX2|| perf improvement ||
| DoubleColAddDoubleColumnBench |150709|        159073| 5% |
|  DoubleColAddRepeatingDoubleColumnBench |  111093| 95520         |14%  |
 
It is very interesting that great improvement on 
{{DoubleColAddRepeatingDoubleColumnBench}} while no obvious improvement on 
{{DoubleColAddDoubleColumnBench}}
I guess the goal to add {{DoubleColAddRepeatingDoubleColumnBench}} is to test 
whether there is benefit from SIMD instructions if one vector add a constant 
value or not? Is my understanding right?


was (Author: kellyzly):
[~teddy.choi]: i want ask a question about 
[DoubleColAddRepeatingDoubleColumnBench|https://github.com/apache/hive/blob/master/itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizedArithmeticBench.java#L53].
 Why should we test {{DoubleColAddRepeatingDoubleColumnBench}}, in my view, 
this test relates col1+col2 and the elements in col2 is same.  Is there any 
difference between {{DoubleColAddRepeatingDoubleColumnBench}} and 
{{DoubleColAddDoubleColumnBench}} in SIMD instructions?
I add some code in VectorizedArithmeticBench.java like following
{code}
  public static class DoubleColAddDoubleColumnBench extends AbstractExpression {
    @Override
    public void setup() {
      rowBatch = buildRowBatch(new DoubleColumnVector(), 2, 
getDoubleColumnVector(),
          getDoubleColumnVector());
      expression = new DoubleColAddDoubleColumn(0, 1, 2); 
    }   
  }
{code}

After testing {{DoubleColAddDoubleColumnBench}} and 
{{DoubleColAddRepeatingDoubleColumnBench}}, I found
|| ||AVX1||AVX2|| perf improvement ||
| DoubleColAddDoubleColumnBench |159588          |        158131  | 0.9% |
|  DoubleColAddRepeatingDoubleColumnBench |  111093| 95520         |14%  |
 
It is very interesting that great improvement on 
{{DoubleColAddRepeatingDoubleColumnBench}} while no obvious improvement on 
{{DoubleColAddDoubleColumnBench}}
I guess the goal to add {{DoubleColAddRepeatingDoubleColumnBench}} is to test 
whether there is benefit from SIMD instructions if one vector add a constant 
value or not? Is my understanding right?

> Optimization for SIMD instructions in Hive
> ------------------------------------------
>
>                 Key: HIVE-10179
>                 URL: https://issues.apache.org/jira/browse/HIVE-10179
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>              Labels: optimization
>
> [SIMD|http://en.wikipedia.org/wiki/SIMD] instuctions could be found in most 
> of current CPUs, such as Intel's SSE2, SSE3, SSE4.x, AVX and AVX2, and it 
> would help Hive to outperform if we can vectorize the mathematical 
> manipulation part of Hive. This umbrella JIRA may contains but not limited to 
> the subtasks like:
> # Code schema adaption, current JVM is quite strictly on the code schema 
> which could be transformed into SIMD instructions during execution. 
> # New implementation of mathematical manipulation part of Hive which designed 
> to be optimized for SIMD instructions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-10179) Optimization for SIMD instructions in Hive

Reply via email to