[jira] [Commented] (HIVE-20873) Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision
[ https://issues.apache.org/jira/browse/HIVE-20873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700254#comment-16700254 ] Teddy Choi commented on HIVE-20873: --- Pushed to master. Thanks, [~bslim] and [~gopalv]! > Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision > > > Key: HIVE-20873 > URL: https://issues.apache.org/jira/browse/HIVE-20873 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20873.1.patch, HIVE-20873.2.patch, > HIVE-20873.3.patch > > > VectorHashKeyWrapperTwoLong is implemented with few bit shift operators and > XOR operators for short computation time, but more hash collision. Group by > operations become very slow on large data sets. It needs Murmur hash or a > better hash function for less hash collision. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20873) Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision
[ https://issues.apache.org/jira/browse/HIVE-20873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698475#comment-16698475 ] Gopal V commented on HIVE-20873: [~teddy.choi]: this is good to go into Apache - has been tested and found to be good. > Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision > > > Key: HIVE-20873 > URL: https://issues.apache.org/jira/browse/HIVE-20873 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20873.1.patch, HIVE-20873.2.patch, > HIVE-20873.3.patch > > > VectorHashKeyWrapperTwoLong is implemented with few bit shift operators and > XOR operators for short computation time, but more hash collision. Group by > operations become very slow on large data sets. It needs Murmur hash or a > better hash function for less hash collision. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20873) Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision
[ https://issues.apache.org/jira/browse/HIVE-20873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680922#comment-16680922 ] Hive QA commented on HIVE-20873: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12947335/HIVE-20873.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15531 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[timestamptz_2] (batchId=85) org.apache.hive.jdbc.TestJdbcDriver2.testSelectExecAsync2 (batchId=259) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/14825/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14825/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14825/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12947335 - PreCommit-HIVE-Build > Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision > > > Key: HIVE-20873 > URL: https://issues.apache.org/jira/browse/HIVE-20873 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20873.1.patch, HIVE-20873.2.patch, > HIVE-20873.3.patch > > > VectorHashKeyWrapperTwoLong is implemented with few bit shift operators and > XOR operators for short computation time, but more hash collision. Group by > operations become very slow on large data sets. It needs Murmur hash or a > better hash function for less hash collision. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20873) Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision
[ https://issues.apache.org/jira/browse/HIVE-20873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680894#comment-16680894 ] Hive QA commented on HIVE-20873: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 55s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 20s{color} | {color:blue} storage-api in master has 48 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 31s{color} | {color:blue} common in master has 65 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 51s{color} | {color:blue} ql in master has 2315 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 29m 42s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-14825/dev-support/hive-personality.sh | | git revision | master / 5aac805 | | Default Java | 1.8.0_111 | | findbugs | v3.0.1 | | modules | C: storage-api common ql U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-14825/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision > > > Key: HIVE-20873 > URL: https://issues.apache.org/jira/browse/HIVE-20873 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20873.1.patch, HIVE-20873.2.patch, > HIVE-20873.3.patch > > > VectorHashKeyWrapperTwoLong is implemented with few bit shift operators and > XOR operators for short computation time, but more hash collision. Group by > operations become very slow on large data sets. It needs Murmur hash or a > better hash function for less hash collision. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20873) Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision
[ https://issues.apache.org/jira/browse/HIVE-20873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678881#comment-16678881 ] Hive QA commented on HIVE-20873: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12947198/HIVE-20873.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15528 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[schemeAuthority2] (batchId=192) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/14797/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14797/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14797/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12947198 - PreCommit-HIVE-Build > Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision > > > Key: HIVE-20873 > URL: https://issues.apache.org/jira/browse/HIVE-20873 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20873.1.patch, HIVE-20873.2.patch > > > VectorHashKeyWrapperTwoLong is implemented with few bit shift operators and > XOR operators for short computation time, but more hash collision. Group by > operations become very slow on large data sets. It needs Murmur hash or a > better hash function for less hash collision. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20873) Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision
[ https://issues.apache.org/jira/browse/HIVE-20873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680050#comment-16680050 ] slim bouguerra commented on HIVE-20873: --- Still unclear to me why are we using Murmur, there is a dozen of other hash algorithms including XXhash that way faster and has good quality. https://cyan4973.github.io/xxHash/ Anyway i will try to take a look at benchmarking this i have created a sub task. FYI XXHash is widely used by lot of MPP style engines. > Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision > > > Key: HIVE-20873 > URL: https://issues.apache.org/jira/browse/HIVE-20873 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20873.1.patch, HIVE-20873.2.patch, > HIVE-20873.3.patch > > > VectorHashKeyWrapperTwoLong is implemented with few bit shift operators and > XOR operators for short computation time, but more hash collision. Group by > operations become very slow on large data sets. It needs Murmur hash or a > better hash function for less hash collision. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20873) Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision
[ https://issues.apache.org/jira/browse/HIVE-20873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678844#comment-16678844 ] slim bouguerra commented on HIVE-20873: --- [~teddy.choi] Thanks, am not trying by any mean to waste your time, but it would be nice if you share what is the improvement you see how are you measuring it? and maybe also investigate if this will be a regression for other queries as well. This will help me and others to learn form your experiments. > Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision > > > Key: HIVE-20873 > URL: https://issues.apache.org/jira/browse/HIVE-20873 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20873.1.patch, HIVE-20873.2.patch > > > VectorHashKeyWrapperTwoLong is implemented with few bit shift operators and > XOR operators for short computation time, but more hash collision. Group by > operations become very slow on large data sets. It needs Murmur hash or a > better hash function for less hash collision. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20873) Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision
[ https://issues.apache.org/jira/browse/HIVE-20873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678852#comment-16678852 ] Gopal V commented on HIVE-20873: [~bslim]: Teddy & I have a UDF for the hash function, which we use to calculate skews. I've merged Teddy's changes into it https://github.com/t3rmin4t0r/long-hash-udf {code} select long2hash(i_item_sk, 1) & 255, count(1) from item group by long2hash(i_item_sk, 1) & 255 order by count(1) desc ; 0 65536 2 65536 3 65536 1 65535 5 37857 {code} So there's a bit-skew in the old hash function, instead of generating 256 unique bit-patterns, but it skews the low-bits by the 2nd arg to the long2 hash. {code} select long2murmur(i_item_sk, 1) & 255, count(1) from item group by long2murmur(i_item_sk, 1) & 255 order by count(1) desc ; 170 1274 37 1264 220 1254 110 1253 152 1241 5 1235 56 1232 179 1231 231 1228 168 1228 149 1228 84 1222 ... 156 1082 Time taken: 1.727 seconds, Fetched: 256 row(s) {code} > Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision > > > Key: HIVE-20873 > URL: https://issues.apache.org/jira/browse/HIVE-20873 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20873.1.patch, HIVE-20873.2.patch > > > VectorHashKeyWrapperTwoLong is implemented with few bit shift operators and > XOR operators for short computation time, but more hash collision. Group by > operations become very slow on large data sets. It needs Murmur hash or a > better hash function for less hash collision. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20873) Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision
[ https://issues.apache.org/jira/browse/HIVE-20873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678877#comment-16678877 ] Gopal V commented on HIVE-20873: LGTM - +1 tests pending. TestHashCodeUtil.java needs ASF license. > Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision > > > Key: HIVE-20873 > URL: https://issues.apache.org/jira/browse/HIVE-20873 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20873.1.patch, HIVE-20873.2.patch > > > VectorHashKeyWrapperTwoLong is implemented with few bit shift operators and > XOR operators for short computation time, but more hash collision. Group by > operations become very slow on large data sets. It needs Murmur hash or a > better hash function for less hash collision. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20873) Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision
[ https://issues.apache.org/jira/browse/HIVE-20873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678817#comment-16678817 ] Hive QA commented on HIVE-20873: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 33s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 30s{color} | {color:blue} common in master has 65 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 37s{color} | {color:blue} ql in master has 2315 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s{color} | {color:red} common: The patch generated 4 new + 6 unchanged - 0 fixed = 10 total (was 6) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 13s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 28s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-14797/dev-support/hive-personality.sh | | git revision | master / 6d713b6 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-14797/yetus/diff-checkstyle-common.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-14797/yetus/patch-asflicense-problems.txt | | modules | C: common ql U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-14797/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision > > > Key: HIVE-20873 > URL: https://issues.apache.org/jira/browse/HIVE-20873 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20873.1.patch, HIVE-20873.2.patch > > > VectorHashKeyWrapperTwoLong is implemented with few bit shift operators and > XOR operators for short computation time, but more hash collision. Group by > operations become very slow on large data sets. It needs Murmur hash or a > better hash function for less hash collision. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20873) Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision
[ https://issues.apache.org/jira/browse/HIVE-20873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677804#comment-16677804 ] Teddy Choi commented on HIVE-20873: --- In my case, TPC-H query 21 and TPC-DS query 16 seem related with it. TPC-H query 21 uses map join, and TPC-DS query 16 uses group by. Both of them use VectorHashKeyWrapperBatch, which uses VectorHashKeyWrapperSingleLong, which uses HashCodeUtil.calculateLongHashCode. Also there are other hash algorithms, but Murmur3 is already used in Hadoop and Hive. See org.apache.hive.common.util.Murmur3 and org.apache.hadoop.util.hash.MurmurHash. So I think it would be safe to use Murmur3 instead of benchmarking other hash algorithms. > Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision > > > Key: HIVE-20873 > URL: https://issues.apache.org/jira/browse/HIVE-20873 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20873.1.patch, HIVE-20873.2.patch > > > VectorHashKeyWrapperTwoLong is implemented with few bit shift operators and > XOR operators for short computation time, but more hash collision. Group by > operations become very slow on large data sets. It needs Murmur hash or a > better hash function for less hash collision. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20873) Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision
[ https://issues.apache.org/jira/browse/HIVE-20873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677129#comment-16677129 ] slim bouguerra commented on HIVE-20873: --- [~teddy.choi] Am wondering did you get chance to perform any benchmarks to see if this actually helps? Also did you consider other hashing algorithm that are less expensive than this one ? Thanks > Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision > > > Key: HIVE-20873 > URL: https://issues.apache.org/jira/browse/HIVE-20873 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20873.1.patch > > > VectorHashKeyWrapperTwoLong is implemented with few bit shift operators and > XOR operators for short computation time, but more hash collision. Group by > operations become very slow on large data sets. It needs Murmur hash or a > better hash function for less hash collision. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20873) Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision
[ https://issues.apache.org/jira/browse/HIVE-20873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677097#comment-16677097 ] ASF GitHub Bot commented on HIVE-20873: --- GitHub user pudidic opened a pull request: https://github.com/apache/hive/pull/485 HIVE-20873: Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce… … hash collision (Teddy Choi) Change-Id: Ie3ae307acb331c48bc5e1cb9c417cd5d1d792f50 You can merge this pull request into a Git repository by running: $ git pull https://github.com/pudidic/hive HIVE-20873 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/485.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #485 commit b658957051c9f75861cd75383f5239a76dfb9f0e Author: Teddy Choi Date: 2018-11-06T18:02:26Z HIVE-20873: Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision (Teddy Choi) Change-Id: Ie3ae307acb331c48bc5e1cb9c417cd5d1d792f50 > Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision > > > Key: HIVE-20873 > URL: https://issues.apache.org/jira/browse/HIVE-20873 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20873.1.patch > > > VectorHashKeyWrapperTwoLong is implemented with few bit shift operators and > XOR operators for short computation time, but more hash collision. Group by > operations become very slow on large data sets. It needs Murmur hash or a > better hash function for less hash collision. -- This message was sent by Atlassian JIRA (v7.6.3#76005)