[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
[ https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560459#comment-14560459 ] Lefty Leverenz commented on HIVE-10793: --- Doc note: This changes the default value of *hive.mapjoin.optimized.hashtable.wbsize* so the wiki needs to be updated (with version information). * [Configuration Properties -- hive.mapjoin.optimized.hashtable.wbsize | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.mapjoin.optimized.hashtable.wbsize] The patch also makes minor changes to the definitions of *hive.mapjoin.hybridgrace.minwbsize* and *hive.mapjoin.hybridgrace.minnumpartitions* which do not need any doc changes. > Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront > > > Key: HIVE-10793 > URL: https://issues.apache.org/jira/browse/HIVE-10793 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar > Fix For: 1.3.0 > > Attachments: HIVE-10793.1.patch, HIVE-10793.2.patch > > > HybridHashTableContainer will allocate memory based on estimate, which means > if the actual is less than the estimate the allocated memory won't be used. > Number of partitions is calculated based on estimated data size > {code} > numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, > minNumParts, minWbSize, > nwayConf); > {code} > Then based on number of partitions writeBufferSize is set > {code} > writeBufferSize = (int)(estimatedTableSize / numPartitions); > {code} > Each hash partition will allocate 1 WriteBuffer, with no further allocation > if the estimate data size is correct. > Suggested solution is to reduce writeBufferSize by a factor such that only X% > of the memory is preallocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
[ https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559704#comment-14559704 ] Mostafa Mokhtar commented on HIVE-10793: [~sushanth] [~sershe] Can this go to 1.2.1 as well? > Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront > > > Key: HIVE-10793 > URL: https://issues.apache.org/jira/browse/HIVE-10793 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10793.1.patch, HIVE-10793.2.patch > > > HybridHashTableContainer will allocate memory based on estimate, which means > if the actual is less than the estimate the allocated memory won't be used. > Number of partitions is calculated based on estimated data size > {code} > numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, > minNumParts, minWbSize, > nwayConf); > {code} > Then based on number of partitions writeBufferSize is set > {code} > writeBufferSize = (int)(estimatedTableSize / numPartitions); > {code} > Each hash partition will allocate 1 WriteBuffer, with no further allocation > if the estimate data size is correct. > Suggested solution is to reduce writeBufferSize by a factor such that only X% > of the memory is preallocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
[ https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556886#comment-14556886 ] Hive QA commented on HIVE-10793: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12734898/HIVE-10793.2.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 8967 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4006/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4006/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4006/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12734898 - PreCommit-HIVE-TRUNK-Build > Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront > > > Key: HIVE-10793 > URL: https://issues.apache.org/jira/browse/HIVE-10793 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10793.1.patch, HIVE-10793.2.patch > > > HybridHashTableContainer will allocate memory based on estimate, which means > if the actual is less than the estimate the allocated memory won't be used. > Number of partitions is calculated based on estimated data size > {code} > numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, > minNumParts, minWbSize, > nwayConf); > {code} > Then based on number of partitions writeBufferSize is set > {code} > writeBufferSize = (int)(estimatedTableSize / numPartitions); > {code} > Each hash partition will allocate 1 WriteBuffer, with no further allocation > if the estimate data size is correct. > Suggested solution is to reduce writeBufferSize by a factor such that only X% > of the memory is preallocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
[ https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556668#comment-14556668 ] Sergey Shelukhin commented on HIVE-10793: - +1 > Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront > > > Key: HIVE-10793 > URL: https://issues.apache.org/jira/browse/HIVE-10793 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10793.1.patch, HIVE-10793.2.patch > > > HybridHashTableContainer will allocate memory based on estimate, which means > if the actual is less than the estimate the allocated memory won't be used. > Number of partitions is calculated based on estimated data size > {code} > numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, > minNumParts, minWbSize, > nwayConf); > {code} > Then based on number of partitions writeBufferSize is set > {code} > writeBufferSize = (int)(estimatedTableSize / numPartitions); > {code} > Each hash partition will allocate 1 WriteBuffer, with no further allocation > if the estimate data size is correct. > Suggested solution is to reduce writeBufferSize by a factor such that only X% > of the memory is preallocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
[ https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556641#comment-14556641 ] Mostafa Mokhtar commented on HIVE-10793: [~sershe] Updated patch. > Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront > > > Key: HIVE-10793 > URL: https://issues.apache.org/jira/browse/HIVE-10793 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10793.1.patch, HIVE-10793.2.patch > > > HybridHashTableContainer will allocate memory based on estimate, which means > if the actual is less than the estimate the allocated memory won't be used. > Number of partitions is calculated based on estimated data size > {code} > numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, > minNumParts, minWbSize, > nwayConf); > {code} > Then based on number of partitions writeBufferSize is set > {code} > writeBufferSize = (int)(estimatedTableSize / numPartitions); > {code} > Each hash partition will allocate 1 WriteBuffer, with no further allocation > if the estimate data size is correct. > Suggested solution is to reduce writeBufferSize by a factor such that only X% > of the memory is preallocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
[ https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556520#comment-14556520 ] Mostafa Mokhtar commented on HIVE-10793: [~sershe] [~vikram.dixit] Sure, I can do that. The main reason I made this change is that ConvertJoinMapJoin can pack more than HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD in a single vertex, which means we can potentially hit OOM during Hash table load. For this to be fully fixed ConvertJoinMapJoin need to be fixed and each HashTableLoader needs to know its memory limit. > Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront > > > Key: HIVE-10793 > URL: https://issues.apache.org/jira/browse/HIVE-10793 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10793.1.patch > > > HybridHashTableContainer will allocate memory based on estimate, which means > if the actual is less than the estimate the allocated memory won't be used. > Number of partitions is calculated based on estimated data size > {code} > numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, > minNumParts, minWbSize, > nwayConf); > {code} > Then based on number of partitions writeBufferSize is set > {code} > writeBufferSize = (int)(estimatedTableSize / numPartitions); > {code} > Each hash partition will allocate 1 WriteBuffer, with no further allocation > if the estimate data size is correct. > Suggested solution is to reduce writeBufferSize by a factor such that only X% > of the memory is preallocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
[ https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556476#comment-14556476 ] Sergey Shelukhin commented on HIVE-10793: - I understand that; can you set upper cap on one buffer size to be the config setting though > Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront > > > Key: HIVE-10793 > URL: https://issues.apache.org/jira/browse/HIVE-10793 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10793.1.patch > > > HybridHashTableContainer will allocate memory based on estimate, which means > if the actual is less than the estimate the allocated memory won't be used. > Number of partitions is calculated based on estimated data size > {code} > numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, > minNumParts, minWbSize, > nwayConf); > {code} > Then based on number of partitions writeBufferSize is set > {code} > writeBufferSize = (int)(estimatedTableSize / numPartitions); > {code} > Each hash partition will allocate 1 WriteBuffer, with no further allocation > if the estimate data size is correct. > Suggested solution is to reduce writeBufferSize by a factor such that only X% > of the memory is preallocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
[ https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556475#comment-14556475 ] Mostafa Mokhtar commented on HIVE-10793: [~sershe] HybridHybrid can create an arbitrary number of partitions based on data size and available memory, if we use WriteBufferSize as is we can potentially hit OOM in the constructor, which is why WriteBufferSize can't be used as is. > Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront > > > Key: HIVE-10793 > URL: https://issues.apache.org/jira/browse/HIVE-10793 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10793.1.patch > > > HybridHashTableContainer will allocate memory based on estimate, which means > if the actual is less than the estimate the allocated memory won't be used. > Number of partitions is calculated based on estimated data size > {code} > numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, > minNumParts, minWbSize, > nwayConf); > {code} > Then based on number of partitions writeBufferSize is set > {code} > writeBufferSize = (int)(estimatedTableSize / numPartitions); > {code} > Each hash partition will allocate 1 WriteBuffer, with no further allocation > if the estimate data size is correct. > Suggested solution is to reduce writeBufferSize by a factor such that only X% > of the memory is preallocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
[ https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556379#comment-14556379 ] Sergey Shelukhin commented on HIVE-10793: - Why does it even bother allocating more than configured writeBufferSize? The whole point of write buffers is to not allocated unneeded memory. Perf impact of having multiple buffers is minimal. > Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront > > > Key: HIVE-10793 > URL: https://issues.apache.org/jira/browse/HIVE-10793 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10793.1.patch > > > HybridHashTableContainer will allocate memory based on estimate, which means > if the actual is less than the estimate the allocated memory won't be used. > Number of partitions is calculated based on estimated data size > {code} > numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, > minNumParts, minWbSize, > nwayConf); > {code} > Then based on number of partitions writeBufferSize is set > {code} > writeBufferSize = (int)(estimatedTableSize / numPartitions); > {code} > Each hash partition will allocate 1 WriteBuffer, with no further allocation > if the estimate data size is correct. > Suggested solution is to reduce writeBufferSize by a factor such that only X% > of the memory is preallocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)