[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2015-01-24 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290910#comment-14290910
 ] 

Lefty Leverenz commented on HIVE-7616:
--

Doc note:  *hive.hashtable.key.count.adjustment* is documented and 
*hive.hashtable.initialCapacity* has a description with the correct parameter 
reference in the wiki.  Removing TODOC14 label.

* [Configuration Properties -- hive.hashtable.key.count.adjustment | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.hashtable.key.count.adjustment]
* [Configuration Properties -- hive.hashtable.initialCapacity | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.hashtable.initialCapacity]

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 0.14.0
>
> Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, 
> HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.05.patch, 
> HIVE-7616.06.patch, HIVE-7616.07.patch, HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2015-01-24 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290853#comment-14290853
 ] 

Lefty Leverenz commented on HIVE-7616:
--

Doc error:  The description of *hive.hashtable.initialCapacity* refers to a 
parameter that existed in patch 2 
("hive.hashtable.stats.key.estimate.adjustment") but was renamed 
*hive.hashtable.key.count.adjustment* in patch 3.

{quote}
+HIVEHASHTABLEKEYCOUNTADJUSTMENT("hive.hashtable.key.count.adjustment", 
1.0f,
+"Adjustment to mapjoin hashtable size derived from table and column 
statistics; the estimate" +
+" of the number of keys is divided by this value. If the value is 0, 
statistics are not used" +
+"and hive.hashtable.initialCapacity is used instead."),
+HIVEHASHTABLETHRESHOLD("hive.hashtable.initialCapacity", 10, "Initial 
capacity of " +
+"mapjoin hashtable if statistics are absent, or if 
hive.hashtable.stats.key.estimate.adjustment is set to 0"),
{quote}

Opened HIVE-9457 to fix this.

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC14
> Fix For: 0.14.0
>
> Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, 
> HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.05.patch, 
> HIVE-7616.06.patch, HIVE-7616.07.patch, HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-11 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093738#comment-14093738
 ] 

Lefty Leverenz commented on HIVE-7616:
--

This adds configuration parameter *hive.hashtable.key.count.adjustment* and 
gives a description for *hive.hashtable.initialCapacity* (introduced in 0.7.0 
by HIVE-1642).

So document *hive.hashtable.key.count.adjustment* and 
*hive.hashtable.initialCapacity* in the wiki before 0.14.0 is released.

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC14
> Fix For: 0.14.0
>
> Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, 
> HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.05.patch, 
> HIVE-7616.06.patch, HIVE-7616.07.patch, HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-11 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093354#comment-14093354
 ] 

Sergey Shelukhin commented on HIVE-7616:


ping?

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, 
> HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.05.patch, 
> HIVE-7616.06.patch, HIVE-7616.07.patch, HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091695#comment-14091695
 ] 

Hive QA commented on HIVE-7616:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660754/HIVE-7616.06.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5888 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_mapjoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/243/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/243/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-243/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660754

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, 
> HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.05.patch, 
> HIVE-7616.06.patch, HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-08 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091284#comment-14091284
 ] 

Gunther Hagleitner commented on HIVE-7616:
--

Removing TODO in commit is fine by me. I've had one additional question about 
how to detect bucketed joins on the reviewboard.

For testing: Can you add the "expected key count" to explain extended? that way 
you can verify the correct working through the unit tests.

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, 
> HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-08 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091287#comment-14091287
 ] 

Gunther Hagleitner commented on HIVE-7616:
--

other than these 2 things i am +1

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, 
> HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091127#comment-14091127
 ] 

Hive QA commented on HIVE-7616:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660507/HIVE-7616.04.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5886 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_nested_mapjoin
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/231/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/231/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-231/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660507

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, 
> HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090061#comment-14090061
 ] 

Sergey Shelukhin commented on HIVE-7616:


will remove TODO#: comments on next iteration or commit

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, 
> HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088869#comment-14088869
 ] 

Hive QA commented on HIVE-7616:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660246/HIVE-7616.02.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5883 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_nested_mapjoin
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/200/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/200/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-200/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660246

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-06 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088519#comment-14088519
 ] 

Gunther Hagleitner commented on HIVE-7616:
--

I like it. Some comments on rb.

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-06 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088308#comment-14088308
 ] 

Prasanth J commented on HIVE-7616:
--

The statistics part looks good to me except for a null check mentioned in RB.

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.01.patch, HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088289#comment-14088289
 ] 

Sergey Shelukhin commented on HIVE-7616:


https://reviews.apache.org/r/24427/

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.01.patch, HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-06 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088143#comment-14088143
 ] 

Gunther Hagleitner commented on HIVE-7616:
--

Can you create a rb entry for that please? Is there one already?

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088041#comment-14088041
 ] 

Sergey Shelukhin commented on HIVE-7616:


There's a config setting for that

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088045#comment-14088045
 ] 

Sergey Shelukhin commented on HIVE-7616:


But yeah that would be useful

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-06 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087996#comment-14087996
 ] 

Mostafa Mokhtar commented on HIVE-7616:
---

This will work for most of the TPC-DS queries since joins with the dimension 
tables is always on key columns and there is a PK/FK relationship between the 
dimension tables and the fact tables , hence for most cases the number of rows 
for the broadcast table will be equal to the number of keys. (One to Many joins)

In MapJoins where tables don't naturally have a PK/FK relation (Many to Many 
joins) the number of rows can be significantly higher than the number of keys.

Can you add the following perflogging to track such potential issue:
1) Number of keys in hash table after load Vs. Number of keys at init
2) Number of times expandAndRehash was called and total amount of time spent 
there

Using these metrics we can track the performance and behavior of the hash table.


> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087779#comment-14087779
 ] 

Hive QA commented on HIVE-7616:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12659956/HIVE-7616.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5877 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_case
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/188/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/188/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-188/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12659956

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-05 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087106#comment-14087106
 ] 

Sergey Shelukhin commented on HIVE-7616:


also, for joins where many rows come from right side per key, this patch will 
not be so good (hashtable will be too big), given the above about number of 
keys vs number of rows

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-05 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087104#comment-14087104
 ] 

Sergey Shelukhin commented on HIVE-7616:


another consideration is that we currently store  full hash to avoid 
recalculating the missing bits when we expand. If stats can be trusted, we can 
say we will never expand (if we allocate enough above the number of rows) and 
avoid that.

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-05 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086802#comment-14086802
 ] 

Sergey Shelukhin commented on HIVE-7616:


it would be nice to have key count estimate in stats too, but I guess key could 
be different for each query as far as Hive is concerned. Maybe we can add some 
notion of keys for typical dimension tables and have stats for that.

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-05 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086799#comment-14086799
 ] 

Sergey Shelukhin commented on HIVE-7616:


[~hagleitn] [~mmokhtar] can you guys take a look

> pre-size mapjoin hashtable based on statistics
> --
>
> Key: HIVE-7616
> URL: https://issues.apache.org/jira/browse/HIVE-7616
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)