[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront

2015-05-26 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560459#comment-14560459
 ] 

Lefty Leverenz commented on HIVE-10793:
---

Doc note:  This changes the default value of 
*hive.mapjoin.optimized.hashtable.wbsize* so the wiki needs to be updated (with 
version information).

* [Configuration Properties -- hive.mapjoin.optimized.hashtable.wbsize | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.mapjoin.optimized.hashtable.wbsize]

The patch also makes minor changes to the definitions of 
*hive.mapjoin.hybridgrace.minwbsize* and 
*hive.mapjoin.hybridgrace.minnumpartitions* which do not need any doc changes.

> Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
> 
>
> Key: HIVE-10793
> URL: https://issues.apache.org/jira/browse/HIVE-10793
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Fix For: 1.3.0
>
> Attachments: HIVE-10793.1.patch, HIVE-10793.2.patch
>
>
> HybridHashTableContainer will allocate memory based on estimate, which means 
> if the actual is less than the estimate the allocated memory won't be used.
> Number of partitions is calculated based on estimated data size
> {code}
> numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, 
> minNumParts, minWbSize,
>   nwayConf);
> {code}
> Then based on number of partitions writeBufferSize is set
> {code}
> writeBufferSize = (int)(estimatedTableSize / numPartitions);
> {code}
> Each hash partition will allocate 1 WriteBuffer, with no further allocation 
> if the estimate data size is correct.
> Suggested solution is to reduce writeBufferSize by a factor such that only X% 
> of the memory is preallocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront

2015-05-26 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559704#comment-14559704
 ] 

Mostafa Mokhtar commented on HIVE-10793:


[~sushanth] [~sershe]
Can this go to 1.2.1 as well?



> Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
> 
>
> Key: HIVE-10793
> URL: https://issues.apache.org/jira/browse/HIVE-10793
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10793.1.patch, HIVE-10793.2.patch
>
>
> HybridHashTableContainer will allocate memory based on estimate, which means 
> if the actual is less than the estimate the allocated memory won't be used.
> Number of partitions is calculated based on estimated data size
> {code}
> numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, 
> minNumParts, minWbSize,
>   nwayConf);
> {code}
> Then based on number of partitions writeBufferSize is set
> {code}
> writeBufferSize = (int)(estimatedTableSize / numPartitions);
> {code}
> Each hash partition will allocate 1 WriteBuffer, with no further allocation 
> if the estimate data size is correct.
> Suggested solution is to reduce writeBufferSize by a factor such that only X% 
> of the memory is preallocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront

2015-05-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556886#comment-14556886
 ] 

Hive QA commented on HIVE-10793:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12734898/HIVE-10793.2.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 8967 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4006/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4006/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4006/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12734898 - PreCommit-HIVE-TRUNK-Build

> Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
> 
>
> Key: HIVE-10793
> URL: https://issues.apache.org/jira/browse/HIVE-10793
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10793.1.patch, HIVE-10793.2.patch
>
>
> HybridHashTableContainer will allocate memory based on estimate, which means 
> if the actual is less than the estimate the allocated memory won't be used.
> Number of partitions is calculated based on estimated data size
> {code}
> numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, 
> minNumParts, minWbSize,
>   nwayConf);
> {code}
> Then based on number of partitions writeBufferSize is set
> {code}
> writeBufferSize = (int)(estimatedTableSize / numPartitions);
> {code}
> Each hash partition will allocate 1 WriteBuffer, with no further allocation 
> if the estimate data size is correct.
> Suggested solution is to reduce writeBufferSize by a factor such that only X% 
> of the memory is preallocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront

2015-05-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556668#comment-14556668
 ] 

Sergey Shelukhin commented on HIVE-10793:
-

+1

> Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
> 
>
> Key: HIVE-10793
> URL: https://issues.apache.org/jira/browse/HIVE-10793
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10793.1.patch, HIVE-10793.2.patch
>
>
> HybridHashTableContainer will allocate memory based on estimate, which means 
> if the actual is less than the estimate the allocated memory won't be used.
> Number of partitions is calculated based on estimated data size
> {code}
> numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, 
> minNumParts, minWbSize,
>   nwayConf);
> {code}
> Then based on number of partitions writeBufferSize is set
> {code}
> writeBufferSize = (int)(estimatedTableSize / numPartitions);
> {code}
> Each hash partition will allocate 1 WriteBuffer, with no further allocation 
> if the estimate data size is correct.
> Suggested solution is to reduce writeBufferSize by a factor such that only X% 
> of the memory is preallocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront

2015-05-22 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556641#comment-14556641
 ] 

Mostafa Mokhtar commented on HIVE-10793:


[~sershe]
Updated patch.

> Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
> 
>
> Key: HIVE-10793
> URL: https://issues.apache.org/jira/browse/HIVE-10793
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10793.1.patch, HIVE-10793.2.patch
>
>
> HybridHashTableContainer will allocate memory based on estimate, which means 
> if the actual is less than the estimate the allocated memory won't be used.
> Number of partitions is calculated based on estimated data size
> {code}
> numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, 
> minNumParts, minWbSize,
>   nwayConf);
> {code}
> Then based on number of partitions writeBufferSize is set
> {code}
> writeBufferSize = (int)(estimatedTableSize / numPartitions);
> {code}
> Each hash partition will allocate 1 WriteBuffer, with no further allocation 
> if the estimate data size is correct.
> Suggested solution is to reduce writeBufferSize by a factor such that only X% 
> of the memory is preallocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront

2015-05-22 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556520#comment-14556520
 ] 

Mostafa Mokhtar commented on HIVE-10793:


[~sershe] [~vikram.dixit]
Sure, I can do that.
The main reason I made this change is that ConvertJoinMapJoin can pack more 
than HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD in a single vertex, which means 
we can potentially hit OOM during Hash table load.

For this to be fully fixed ConvertJoinMapJoin need to be fixed and each 
HashTableLoader needs to know its memory limit. 

> Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
> 
>
> Key: HIVE-10793
> URL: https://issues.apache.org/jira/browse/HIVE-10793
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10793.1.patch
>
>
> HybridHashTableContainer will allocate memory based on estimate, which means 
> if the actual is less than the estimate the allocated memory won't be used.
> Number of partitions is calculated based on estimated data size
> {code}
> numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, 
> minNumParts, minWbSize,
>   nwayConf);
> {code}
> Then based on number of partitions writeBufferSize is set
> {code}
> writeBufferSize = (int)(estimatedTableSize / numPartitions);
> {code}
> Each hash partition will allocate 1 WriteBuffer, with no further allocation 
> if the estimate data size is correct.
> Suggested solution is to reduce writeBufferSize by a factor such that only X% 
> of the memory is preallocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront

2015-05-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556476#comment-14556476
 ] 

Sergey Shelukhin commented on HIVE-10793:
-

I understand that; can you set upper cap on one buffer size to be the config 
setting though

> Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
> 
>
> Key: HIVE-10793
> URL: https://issues.apache.org/jira/browse/HIVE-10793
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10793.1.patch
>
>
> HybridHashTableContainer will allocate memory based on estimate, which means 
> if the actual is less than the estimate the allocated memory won't be used.
> Number of partitions is calculated based on estimated data size
> {code}
> numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, 
> minNumParts, minWbSize,
>   nwayConf);
> {code}
> Then based on number of partitions writeBufferSize is set
> {code}
> writeBufferSize = (int)(estimatedTableSize / numPartitions);
> {code}
> Each hash partition will allocate 1 WriteBuffer, with no further allocation 
> if the estimate data size is correct.
> Suggested solution is to reduce writeBufferSize by a factor such that only X% 
> of the memory is preallocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront

2015-05-22 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556475#comment-14556475
 ] 

Mostafa Mokhtar commented on HIVE-10793:


[~sershe]
HybridHybrid can create an arbitrary number of partitions based on data size 
and available memory, if we use WriteBufferSize as is we can potentially hit 
OOM in the constructor, which is why WriteBufferSize can't be used as is.

> Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
> 
>
> Key: HIVE-10793
> URL: https://issues.apache.org/jira/browse/HIVE-10793
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10793.1.patch
>
>
> HybridHashTableContainer will allocate memory based on estimate, which means 
> if the actual is less than the estimate the allocated memory won't be used.
> Number of partitions is calculated based on estimated data size
> {code}
> numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, 
> minNumParts, minWbSize,
>   nwayConf);
> {code}
> Then based on number of partitions writeBufferSize is set
> {code}
> writeBufferSize = (int)(estimatedTableSize / numPartitions);
> {code}
> Each hash partition will allocate 1 WriteBuffer, with no further allocation 
> if the estimate data size is correct.
> Suggested solution is to reduce writeBufferSize by a factor such that only X% 
> of the memory is preallocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront

2015-05-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556379#comment-14556379
 ] 

Sergey Shelukhin commented on HIVE-10793:
-

Why does it even bother allocating more than configured writeBufferSize?
The whole point of write buffers is to not allocated unneeded memory. Perf 
impact of having multiple buffers is minimal.

> Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
> 
>
> Key: HIVE-10793
> URL: https://issues.apache.org/jira/browse/HIVE-10793
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10793.1.patch
>
>
> HybridHashTableContainer will allocate memory based on estimate, which means 
> if the actual is less than the estimate the allocated memory won't be used.
> Number of partitions is calculated based on estimated data size
> {code}
> numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, 
> minNumParts, minWbSize,
>   nwayConf);
> {code}
> Then based on number of partitions writeBufferSize is set
> {code}
> writeBufferSize = (int)(estimatedTableSize / numPartitions);
> {code}
> Each hash partition will allocate 1 WriteBuffer, with no further allocation 
> if the estimate data size is correct.
> Suggested solution is to reduce writeBufferSize by a factor such that only X% 
> of the memory is preallocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)