[jira] [Commented] (HIVE-23509) MapJoin AssertionError: Capacity must be power of 2

2020-07-10 Thread Shashank Pedamallu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155534#comment-17155534
 ] 

Shashank Pedamallu commented on HIVE-23509:
---

Thank you very much for getting this through!

> MapJoin AssertionError: Capacity must be power of 2
> ---
>
> Key: HIVE-23509
> URL: https://issues.apache.org/jira/browse/HIVE-23509
> Project: Hive
>  Issue Type: Bug
> Environment: Hive-2.3.6
>Reporter: Shashank Pedamallu
>Assignee: Shashank Pedamallu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Observed AssertionError errors in Hive query when rowCount for join is issued 
> as (2^x)+(2^(x+1)).
> Following is the stacktrace:
> {noformat}
> [2020-05-11 05:43:12,135] {base_task_runner.py:95} INFO - Subtask: ERROR : 
> Vertex failed, vertexName=Map 4, vertexId=vertex_1588729523139_51702_1_06, 
> diagnostics=[Task failed, taskId=task_1588729523139_51702_1_06_001286, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1588729523139_51702_1_06_001286_0:java.lang.RuntimeException: 
> java.lang.AssertionError: Capacity must be a power of two [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) 
> [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.security.AccessController.doPrivileged(Native Method) [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> javax.security.auth.Subject.doAs(Subject.java:422) [2020-05-11 05:43:12,136] 
> {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) 
> [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> java.lang.Thread.run(Thread.java:748) [2020-05-11 05:43:12,137] 
> {base_task_runner.py:95} INFO - Subtask: Caused by: java.lang.AssertionError: 
> Capacity must be a power of two [2020-05-11 05:43:12,137] 
> {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.validateCapacity(BytesBytesMultiHashMap.java:552)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashImpl(BytesBytesMultiHashMap.java:731)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashToTarget(BytesBytesMultiHashMap.java:545)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$HashPartition.getHashMapFromDisk(HybridHashTableContainer.java:183)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.reloadHashTable(MapJoinOperator.java:641)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.continueProcess(MapJoinOperator.java:603)
>  

[jira] [Commented] (HIVE-23509) MapJoin AssertionError: Capacity must be power of 2

2020-07-01 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149128#comment-17149128
 ] 

Zoltan Haindrich commented on HIVE-23509:
-

okay; but I think if someone still uses it - or want's to backport it to some 
older version might make sense to have it
+1

> MapJoin AssertionError: Capacity must be power of 2
> ---
>
> Key: HIVE-23509
> URL: https://issues.apache.org/jira/browse/HIVE-23509
> Project: Hive
>  Issue Type: Bug
> Environment: Hive-2.3.6
>Reporter: Shashank Pedamallu
>Assignee: Shashank Pedamallu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Observed AssertionError errors in Hive query when rowCount for join is issued 
> as (2^x)+(2^(x+1)).
> Following is the stacktrace:
> {noformat}
> [2020-05-11 05:43:12,135] {base_task_runner.py:95} INFO - Subtask: ERROR : 
> Vertex failed, vertexName=Map 4, vertexId=vertex_1588729523139_51702_1_06, 
> diagnostics=[Task failed, taskId=task_1588729523139_51702_1_06_001286, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1588729523139_51702_1_06_001286_0:java.lang.RuntimeException: 
> java.lang.AssertionError: Capacity must be a power of two [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) 
> [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.security.AccessController.doPrivileged(Native Method) [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> javax.security.auth.Subject.doAs(Subject.java:422) [2020-05-11 05:43:12,136] 
> {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) 
> [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> java.lang.Thread.run(Thread.java:748) [2020-05-11 05:43:12,137] 
> {base_task_runner.py:95} INFO - Subtask: Caused by: java.lang.AssertionError: 
> Capacity must be a power of two [2020-05-11 05:43:12,137] 
> {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.validateCapacity(BytesBytesMultiHashMap.java:552)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashImpl(BytesBytesMultiHashMap.java:731)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashToTarget(BytesBytesMultiHashMap.java:545)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$HashPartition.getHashMapFromDisk(HybridHashTableContainer.java:183)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.reloadHashTable(MapJoinOperator.java:641)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> 

[jira] [Commented] (HIVE-23509) MapJoin AssertionError: Capacity must be power of 2

2020-05-19 Thread Gopal Vijayaraghavan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111367#comment-17111367
 ] 

Gopal Vijayaraghavan commented on HIVE-23509:
-

HybridHash got disabled in hive 3.x, due to this and similar issues 
(MapJoinOperator.reloadHashTable is related to spilling joins locally).

> MapJoin AssertionError: Capacity must be power of 2
> ---
>
> Key: HIVE-23509
> URL: https://issues.apache.org/jira/browse/HIVE-23509
> Project: Hive
>  Issue Type: Bug
> Environment: Hive-2.3.6
>Reporter: Shashank Pedamallu
>Assignee: Shashank Pedamallu
>Priority: Major
>
> Observed AssertionError errors in Hive query when rowCount for join is issued 
> as (2^x)+(2^(x+1)).
> Following is the stacktrace:
> {noformat}
> [2020-05-11 05:43:12,135] {base_task_runner.py:95} INFO - Subtask: ERROR : 
> Vertex failed, vertexName=Map 4, vertexId=vertex_1588729523139_51702_1_06, 
> diagnostics=[Task failed, taskId=task_1588729523139_51702_1_06_001286, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1588729523139_51702_1_06_001286_0:java.lang.RuntimeException: 
> java.lang.AssertionError: Capacity must be a power of two [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) 
> [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.security.AccessController.doPrivileged(Native Method) [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> javax.security.auth.Subject.doAs(Subject.java:422) [2020-05-11 05:43:12,136] 
> {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) 
> [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> java.lang.Thread.run(Thread.java:748) [2020-05-11 05:43:12,137] 
> {base_task_runner.py:95} INFO - Subtask: Caused by: java.lang.AssertionError: 
> Capacity must be a power of two [2020-05-11 05:43:12,137] 
> {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.validateCapacity(BytesBytesMultiHashMap.java:552)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashImpl(BytesBytesMultiHashMap.java:731)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashToTarget(BytesBytesMultiHashMap.java:545)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$HashPartition.getHashMapFromDisk(HybridHashTableContainer.java:183)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.reloadHashTable(MapJoinOperator.java:641)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.continueProcess(MapJoinOperator.java:603)
>  [2020-05-11 05:43:12,137]