[jira] [Updated] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch
[ https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Slawski updated HIVE-10538: - Attachment: HIVE-10538.2.patch I've attached the second revision of the patch which updates failed Spark qtests. Fix NPE in FileSinkOperator from hashcode mismatch -- Key: HIVE-10538 URL: https://issues.apache.org/jira/browse/HIVE-10538 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0, 1.2.0 Reporter: Peter Slawski Assignee: Peter Slawski Priority: Critical Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, HIVE-10538.1.patch, HIVE-10538.2.patch A Null Pointer Exception occurs when in FileSinkOperator when using bucketed tables and distribute by with multiFileSpray enabled. The following snippet query reproduces this issue: {code} set hive.enforce.bucketing = true; set hive.exec.reducers.max = 20; create table bucket_a(key int, value_a string) clustered by (key) into 256 buckets; create table bucket_b(key int, value_b string) clustered by (key) into 256 buckets; create table bucket_ab(key int, value_a string, value_b string) clustered by (key) into 256 buckets; -- Insert data into bucket_a and bucket_b insert overwrite table bucket_ab select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key = b.key) distribute by key; {code} The following stack trace is logged. {code} 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer (ExecReducer.java:reduce(255)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) ... 8 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch
[ https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10538: - Attachment: HIVE-10538.1.patch Fix NPE in FileSinkOperator from hashcode mismatch -- Key: HIVE-10538 URL: https://issues.apache.org/jira/browse/HIVE-10538 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0, 1.2.0 Reporter: Peter Slawski Assignee: Peter Slawski Priority: Critical Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, HIVE-10538.1.patch A Null Pointer Exception occurs when in FileSinkOperator when using bucketed tables and distribute by with multiFileSpray enabled. The following snippet query reproduces this issue: {code} set hive.enforce.bucketing = true; set hive.exec.reducers.max = 20; create table bucket_a(key int, value_a string) clustered by (key) into 256 buckets; create table bucket_b(key int, value_b string) clustered by (key) into 256 buckets; create table bucket_ab(key int, value_a string, value_b string) clustered by (key) into 256 buckets; -- Insert data into bucket_a and bucket_b insert overwrite table bucket_ab select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key = b.key) distribute by key; {code} The following stack trace is logged. {code} 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer (ExecReducer.java:reduce(255)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) ... 8 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch
[ https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10538: - Attachment: HIVE-10538.1.patch Fix NPE in FileSinkOperator from hashcode mismatch -- Key: HIVE-10538 URL: https://issues.apache.org/jira/browse/HIVE-10538 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0, 1.2.0 Reporter: Peter Slawski Assignee: Peter Slawski Priority: Critical Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch A Null Pointer Exception occurs when in FileSinkOperator when using bucketed tables and distribute by with multiFileSpray enabled. The following snippet query reproduces this issue: {code} set hive.enforce.bucketing = true; set hive.exec.reducers.max = 20; create table bucket_a(key int, value_a string) clustered by (key) into 256 buckets; create table bucket_b(key int, value_b string) clustered by (key) into 256 buckets; create table bucket_ab(key int, value_a string, value_b string) clustered by (key) into 256 buckets; -- Insert data into bucket_a and bucket_b insert overwrite table bucket_ab select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key = b.key) distribute by key; {code} The following stack trace is logged. {code} 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer (ExecReducer.java:reduce(255)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) ... 8 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch
[ https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10538: - Priority: Critical (was: Major) Fix NPE in FileSinkOperator from hashcode mismatch -- Key: HIVE-10538 URL: https://issues.apache.org/jira/browse/HIVE-10538 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0, 1.2.0 Reporter: Peter Slawski Priority: Critical Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10538.1.patch A Null Pointer Exception occurs when in FileSinkOperator when using bucketed tables and distribute by with multiFileSpray enabled. The following snippet query reproduces this issue: {code} set hive.enforce.bucketing = true; set hive.exec.reducers.max = 20; create table bucket_a(key int, value_a string) clustered by (key) into 256 buckets; create table bucket_b(key int, value_b string) clustered by (key) into 256 buckets; create table bucket_ab(key int, value_a string, value_b string) clustered by (key) into 256 buckets; -- Insert data into bucket_a and bucket_b insert overwrite table bucket_ab select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key = b.key) distribute by key; {code} The following stack trace is logged. {code} 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer (ExecReducer.java:reduce(255)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) ... 8 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch
[ https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-10538: - Assignee: Peter Slawski Fix NPE in FileSinkOperator from hashcode mismatch -- Key: HIVE-10538 URL: https://issues.apache.org/jira/browse/HIVE-10538 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0, 1.2.0 Reporter: Peter Slawski Assignee: Peter Slawski Priority: Critical Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10538.1.patch A Null Pointer Exception occurs when in FileSinkOperator when using bucketed tables and distribute by with multiFileSpray enabled. The following snippet query reproduces this issue: {code} set hive.enforce.bucketing = true; set hive.exec.reducers.max = 20; create table bucket_a(key int, value_a string) clustered by (key) into 256 buckets; create table bucket_b(key int, value_b string) clustered by (key) into 256 buckets; create table bucket_ab(key int, value_a string, value_b string) clustered by (key) into 256 buckets; -- Insert data into bucket_a and bucket_b insert overwrite table bucket_ab select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key = b.key) distribute by key; {code} The following stack trace is logged. {code} 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer (ExecReducer.java:reduce(255)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) ... 8 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch
[ https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10538: - Fix Version/s: 1.2.0 Fix NPE in FileSinkOperator from hashcode mismatch -- Key: HIVE-10538 URL: https://issues.apache.org/jira/browse/HIVE-10538 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0, 1.2.0 Reporter: Peter Slawski Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10538.1.patch A Null Pointer Exception occurs when in FileSinkOperator when using bucketed tables and distribute by with multiFileSpray enabled. The following snippet query reproduces this issue: {code} set hive.enforce.bucketing = true; set hive.exec.reducers.max = 20; create table bucket_a(key int, value_a string) clustered by (key) into 256 buckets; create table bucket_b(key int, value_b string) clustered by (key) into 256 buckets; create table bucket_ab(key int, value_a string, value_b string) clustered by (key) into 256 buckets; -- Insert data into bucket_a and bucket_b insert overwrite table bucket_ab select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key = b.key) distribute by key; {code} The following stack trace is logged. {code} 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer (ExecReducer.java:reduce(255)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) ... 8 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch
[ https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Slawski updated HIVE-10538: - Attachment: HIVE-10538.1.patch Fix NPE in FileSinkOperator from hashcode mismatch -- Key: HIVE-10538 URL: https://issues.apache.org/jira/browse/HIVE-10538 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0, 1.2.0 Reporter: Peter Slawski Attachments: HIVE-10538.1.patch A Null Pointer Exception occurs when in FileSinkOperator when using bucketed tables and distribute by with multiFileSpray enabled. The following snippet query reproduces this issue: {code} set hive.enforce.bucketing = true; set hive.exec.reducers.max = 20; create table bucket_a(key int, value_a string) clustered by (key) into 256 buckets; create table bucket_b(key int, value_b string) clustered by (key) into 256 buckets; create table bucket_ab(key int, value_a string, value_b string) clustered by (key) into 256 buckets; -- Insert data into bucket_a and bucket_b insert overwrite table bucket_ab select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key = b.key) distribute by key; {code} The following stack trace is logged. {code} 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer (ExecReducer.java:reduce(255)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) ... 8 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)