[ https://issues.apache.org/jira/browse/HIVE-10179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16251015#comment-16251015 ]
liyunzhang edited comment on HIVE-10179 at 11/14/17 7:28 AM: ------------------------------------------------------------- [~teddy.choi]: i want ask a question about [DoubleColAddRepeatingDoubleColumnBench|https://github.com/apache/hive/blob/master/itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizedArithmeticBench.java#L53]. Why should we test {{DoubleColAddRepeatingDoubleColumnBench}}, in my view, this test relates col1+col2 and the elements in col2 is same. Is there any difference between {{DoubleColAddRepeatingDoubleColumnBench}} and {{DoubleColAddDoubleColumnBench}} in SIMD instructions? I add some code in VectorizedArithmeticBench.java like following {code} public static class DoubleColAddDoubleColumnBench extends AbstractExpression { @Override public void setup() { rowBatch = buildRowBatch(new DoubleColumnVector(), 2, getDoubleColumnVector(), getDoubleColumnVector()); expression = new DoubleColAddDoubleColumn(0, 1, 2); } } {code} After testing {{DoubleColAddDoubleColumnBench}} and {{DoubleColAddRepeatingDoubleColumnBench}}, I found || ||AVX1||AVX2|| perf improvement || | DoubleColAddDoubleColumnBench |150709| 159073| 5% | | DoubleColAddRepeatingDoubleColumnBench | 111093| 95520 |14% | It is very interesting that great improvement on {{DoubleColAddRepeatingDoubleColumnBench}} while no obvious improvement on {{DoubleColAddDoubleColumnBench}} I guess the goal to add {{DoubleColAddRepeatingDoubleColumnBench}} is to test whether there is benefit from SIMD instructions if one vector add a constant value or not? Is my understanding right? was (Author: kellyzly): [~teddy.choi]: i want ask a question about [DoubleColAddRepeatingDoubleColumnBench|https://github.com/apache/hive/blob/master/itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizedArithmeticBench.java#L53]. Why should we test {{DoubleColAddRepeatingDoubleColumnBench}}, in my view, this test relates col1+col2 and the elements in col2 is same. Is there any difference between {{DoubleColAddRepeatingDoubleColumnBench}} and {{DoubleColAddDoubleColumnBench}} in SIMD instructions? I add some code in VectorizedArithmeticBench.java like following {code} public static class DoubleColAddDoubleColumnBench extends AbstractExpression { @Override public void setup() { rowBatch = buildRowBatch(new DoubleColumnVector(), 2, getDoubleColumnVector(), getDoubleColumnVector()); expression = new DoubleColAddDoubleColumn(0, 1, 2); } } {code} After testing {{DoubleColAddDoubleColumnBench}} and {{DoubleColAddRepeatingDoubleColumnBench}}, I found || ||AVX1||AVX2|| perf improvement || | DoubleColAddDoubleColumnBench |159588 | 158131 | 0.9% | | DoubleColAddRepeatingDoubleColumnBench | 111093| 95520 |14% | It is very interesting that great improvement on {{DoubleColAddRepeatingDoubleColumnBench}} while no obvious improvement on {{DoubleColAddDoubleColumnBench}} I guess the goal to add {{DoubleColAddRepeatingDoubleColumnBench}} is to test whether there is benefit from SIMD instructions if one vector add a constant value or not? Is my understanding right? > Optimization for SIMD instructions in Hive > ------------------------------------------ > > Key: HIVE-10179 > URL: https://issues.apache.org/jira/browse/HIVE-10179 > Project: Hive > Issue Type: Improvement > Reporter: Chengxiang Li > Assignee: Chengxiang Li > Labels: optimization > > [SIMD|http://en.wikipedia.org/wiki/SIMD] instuctions could be found in most > of current CPUs, such as Intel's SSE2, SSE3, SSE4.x, AVX and AVX2, and it > would help Hive to outperform if we can vectorize the mathematical > manipulation part of Hive. This umbrella JIRA may contains but not limited to > the subtasks like: > # Code schema adaption, current JVM is quite strictly on the code schema > which could be transformed into SIMD instructions during execution. > # New implementation of mathematical manipulation part of Hive which designed > to be optimized for SIMD instructions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)