[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290910#comment-14290910 ] Lefty Leverenz commented on HIVE-7616: -- Doc note: *hive.hashtable.key.count.adjustment* is documented and *hive.hashtable.initialCapacity* has a description with the correct parameter reference in the wiki. Removing TODOC14 label. * [Configuration Properties -- hive.hashtable.key.count.adjustment | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.hashtable.key.count.adjustment] * [Configuration Properties -- hive.hashtable.initialCapacity | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.hashtable.initialCapacity] > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 0.14.0 > > Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, > HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.05.patch, > HIVE-7616.06.patch, HIVE-7616.07.patch, HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290853#comment-14290853 ] Lefty Leverenz commented on HIVE-7616: -- Doc error: The description of *hive.hashtable.initialCapacity* refers to a parameter that existed in patch 2 ("hive.hashtable.stats.key.estimate.adjustment") but was renamed *hive.hashtable.key.count.adjustment* in patch 3. {quote} +HIVEHASHTABLEKEYCOUNTADJUSTMENT("hive.hashtable.key.count.adjustment", 1.0f, +"Adjustment to mapjoin hashtable size derived from table and column statistics; the estimate" + +" of the number of keys is divided by this value. If the value is 0, statistics are not used" + +"and hive.hashtable.initialCapacity is used instead."), +HIVEHASHTABLETHRESHOLD("hive.hashtable.initialCapacity", 10, "Initial capacity of " + +"mapjoin hashtable if statistics are absent, or if hive.hashtable.stats.key.estimate.adjustment is set to 0"), {quote} Opened HIVE-9457 to fix this. > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: TODOC14 > Fix For: 0.14.0 > > Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, > HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.05.patch, > HIVE-7616.06.patch, HIVE-7616.07.patch, HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093738#comment-14093738 ] Lefty Leverenz commented on HIVE-7616: -- This adds configuration parameter *hive.hashtable.key.count.adjustment* and gives a description for *hive.hashtable.initialCapacity* (introduced in 0.7.0 by HIVE-1642). So document *hive.hashtable.key.count.adjustment* and *hive.hashtable.initialCapacity* in the wiki before 0.14.0 is released. > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: TODOC14 > Fix For: 0.14.0 > > Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, > HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.05.patch, > HIVE-7616.06.patch, HIVE-7616.07.patch, HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093354#comment-14093354 ] Sergey Shelukhin commented on HIVE-7616: ping? > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, > HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.05.patch, > HIVE-7616.06.patch, HIVE-7616.07.patch, HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091695#comment-14091695 ] Hive QA commented on HIVE-7616: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660754/HIVE-7616.06.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5888 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_mapjoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/243/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/243/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-243/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660754 > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, > HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.05.patch, > HIVE-7616.06.patch, HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091284#comment-14091284 ] Gunther Hagleitner commented on HIVE-7616: -- Removing TODO in commit is fine by me. I've had one additional question about how to detect bucketed joins on the reviewboard. For testing: Can you add the "expected key count" to explain extended? that way you can verify the correct working through the unit tests. > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, > HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091287#comment-14091287 ] Gunther Hagleitner commented on HIVE-7616: -- other than these 2 things i am +1 > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, > HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091127#comment-14091127 ] Hive QA commented on HIVE-7616: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660507/HIVE-7616.04.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5886 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_nested_mapjoin org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/231/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/231/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-231/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660507 > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, > HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090061#comment-14090061 ] Sergey Shelukhin commented on HIVE-7616: will remove TODO#: comments on next iteration or commit > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, > HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088869#comment-14088869 ] Hive QA commented on HIVE-7616: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660246/HIVE-7616.02.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5883 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_nested_mapjoin org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/200/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/200/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-200/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660246 > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088519#comment-14088519 ] Gunther Hagleitner commented on HIVE-7616: -- I like it. Some comments on rb. > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088308#comment-14088308 ] Prasanth J commented on HIVE-7616: -- The statistics part looks good to me except for a null check mentioned in RB. > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.01.patch, HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088289#comment-14088289 ] Sergey Shelukhin commented on HIVE-7616: https://reviews.apache.org/r/24427/ > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.01.patch, HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088143#comment-14088143 ] Gunther Hagleitner commented on HIVE-7616: -- Can you create a rb entry for that please? Is there one already? > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088041#comment-14088041 ] Sergey Shelukhin commented on HIVE-7616: There's a config setting for that > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088045#comment-14088045 ] Sergey Shelukhin commented on HIVE-7616: But yeah that would be useful > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087996#comment-14087996 ] Mostafa Mokhtar commented on HIVE-7616: --- This will work for most of the TPC-DS queries since joins with the dimension tables is always on key columns and there is a PK/FK relationship between the dimension tables and the fact tables , hence for most cases the number of rows for the broadcast table will be equal to the number of keys. (One to Many joins) In MapJoins where tables don't naturally have a PK/FK relation (Many to Many joins) the number of rows can be significantly higher than the number of keys. Can you add the following perflogging to track such potential issue: 1) Number of keys in hash table after load Vs. Number of keys at init 2) Number of times expandAndRehash was called and total amount of time spent there Using these metrics we can track the performance and behavior of the hash table. > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087779#comment-14087779 ] Hive QA commented on HIVE-7616: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12659956/HIVE-7616.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5877 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_case org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/188/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/188/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-188/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12659956 > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087106#comment-14087106 ] Sergey Shelukhin commented on HIVE-7616: also, for joins where many rows come from right side per key, this patch will not be so good (hashtable will be too big), given the above about number of keys vs number of rows > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087104#comment-14087104 ] Sergey Shelukhin commented on HIVE-7616: another consideration is that we currently store full hash to avoid recalculating the missing bits when we expand. If stats can be trusted, we can say we will never expand (if we allocate enough above the number of rows) and avoid that. > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086802#comment-14086802 ] Sergey Shelukhin commented on HIVE-7616: it would be nice to have key count estimate in stats too, but I guess key could be different for each query as far as Hive is concerned. Maybe we can add some notion of keys for typical dimension tables and have stats for that. > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086799#comment-14086799 ] Sergey Shelukhin commented on HIVE-7616: [~hagleitn] [~mmokhtar] can you guys take a look > pre-size mapjoin hashtable based on statistics > -- > > Key: HIVE-7616 > URL: https://issues.apache.org/jira/browse/HIVE-7616 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-7616.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)