[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101680#comment-14101680 ] Rui Li commented on HIVE-7624: -- [~brocknoland] Got it, thanks! > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Fix For: spark-branch > > Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, > HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, > HIVE-7624.7-spark.patch, HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100769#comment-14100769 ] Brock Noland commented on HIVE-7624: Hi Rui, Hive generally follows one commit = one jira so I moved your patch over to HIVE-7766 and committed it. Thank you!! > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Fix For: spark-branch > > Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, > HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, > HIVE-7624.7-spark.patch, HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100358#comment-14100358 ] Hive QA commented on HIVE-7624: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662431/HIVE-7624.7-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5915 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/54/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/54/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-54/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662431 > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Fix For: spark-branch > > Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, > HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, > HIVE-7624.7-spark.patch, HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093796#comment-14093796 ] Rui Li commented on HIVE-7624: -- Thanks for the review : -) > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Fix For: spark-branch > > Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, > HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, > HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093774#comment-14093774 ] Szehon Ho commented on HIVE-7624: - udaf_context_ngrams is a strange failure, never seen it before, but doubt it's be related to this patch. Will commit this shortly. > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, > HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, > HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093770#comment-14093770 ] Hive QA commented on HIVE-7624: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12661110/HIVE-7624.6-spark.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5844 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_context_ngrams org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/32/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/32/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-32/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12661110 > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, > HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, > HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093638#comment-14093638 ] Szehon Ho commented on HIVE-7624: - Oh I thought the last version on RB was already rebased. +1 pending test for latest version. > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, > HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, > HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093618#comment-14093618 ] Rui Li commented on HIVE-7624: -- Hi [~brocknoland], [~szehon] I'll rebase the patch. > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, > HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093614#comment-14093614 ] Szehon Ho commented on HIVE-7624: - +1 for latest version > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, > HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092900#comment-14092900 ] Brock Noland commented on HIVE-7624: Nice work!! bq. The patch does not appear to apply with p0, p1, or p2 Looks like the patch needs to be rebased. > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, > HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092606#comment-14092606 ] Hive QA commented on HIVE-7624: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660957/HIVE-7624.5-spark.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/28/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/28/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-28/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-SPARK-Build-28/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-spark-source ]] + [[ ! -d apache-svn-spark-source/.svn ]] + [[ ! -d apache-svn-spark-source ]] + cd apache-svn-spark-source + svn revert -R . Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java' ++ svn status --no-ignore ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hwi/target common/target common/src/gen contrib/target service/target serde/target beeline/target cli/target odbc/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1617233. At revision 1617233. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12660957 > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, > HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091198#comment-14091198 ] Szehon Ho commented on HIVE-7624: - Hi Li Rui, I think the patch looks reasonable. Just had a comment and a question on the RB. Thanks > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, > HIVE-7624.4-spark.patch, HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090689#comment-14090689 ] Hive QA commented on HIVE-7624: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660590/HIVE-7624.4-spark.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5828 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/24/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/24/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-24/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660590 > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, > HIVE-7624.4-spark.patch, HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090525#comment-14090525 ] Hive QA commented on HIVE-7624: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660582/HIVE-7624.3-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5843 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/23/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/23/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-23/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660582 > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, > HIVE-7624.4-spark.patch, HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090509#comment-14090509 ] Rui Li commented on HIVE-7624: -- Some change may bypass HIVE-7597. Remove it. > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, > HIVE-7624.4-spark.patch, HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090452#comment-14090452 ] Rui Li commented on HIVE-7624: -- Finally I found this is because we don't set output collector for RS in ExecReducer. While this is natural for MR where ExecReducer shouldn't contain RS, we have to do it for spark. The added code just looks for RS and sets collector for it, so there shouldn't be any regression. > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, > HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090261#comment-14090261 ] Rui Li commented on HIVE-7624: -- Hi [~csun] I updated the patch based on latest code. But the group by+order by query still returns nothing for me (with HIVE-7492 & HIVE-7652 in place). I'm not sure if this is another issue or the side effect of cloning the job conf. Any ideas? > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.2-spark.patch, HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090152#comment-14090152 ] Rui Li commented on HIVE-7624: -- [~xuefuz] currently the second R does end with a ReduceSink. [~csun] sorry it's been a while since I last sync with the upstream. I'll rebase my branch and run the test again. > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090010#comment-14090010 ] Chao commented on HIVE-7624: sorry should be order by in my last comment. > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090009#comment-14090009 ] Chao commented on HIVE-7624: With this patch and HIVE-7492 plus HIVE-7652, group by + reduce by produces correct result on my machine. I'm not sure why. Also, not sure whether there's any further issue. > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089922#comment-14089922 ] Xuefu Zhang commented on HIVE-7624: --- {quote} However, no result is returned. I checked the log and found the second reduce work got nothing to process. Not sure what is missing here... {quote} I think the problem is caused by FileSinkOperator in reduce-side operator tree. That tree, in MR world, is probably always FileSinkOperator (which write on disk). If we have MRR, then the first R should not write to disk. Instead, it should have RedcueSinkOperator, which outputs the result to the SparkCollector. The result RDD is based on the SparkCollector, which can be picked up by the second R. I think we need to modify the operator tree a bit for this to work correctly. Please follow Tez's way to do this. > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089555#comment-14089555 ] Chao commented on HIVE-7624: Hi [~ruili], I think this patch overlaps a little bit with HIVE-7597, on {{GenMapRedUtils}}. I can't apply the patch due to the conflict. > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089459#comment-14089459 ] Brock Noland commented on HIVE-7624: During debugging I have used the code below {noformat} System.err.println("JoinOperator " + alias + " row = " + SerDeUtils.getJSONString(row, inputObjInspectors[tag])); {noformat} I wonder if we should not commit that to each operator for debugging since it's much easier to see how the rows are filtered, modified... > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089455#comment-14089455 ] Chao commented on HIVE-7624: Great! Thanks [~ruili]. I'll try this patch. > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089181#comment-14089181 ] Rui Li commented on HIVE-7624: -- This patch solves the reducesinkkey0 problem. Map work and reduce work finish successfully. However, no result is returned. I checked the log and found the second reduce work got nothing to process. Not sure what is missing here... I quickly looked at tez code and find it sets output collector for each reduce sink. (OperatorUtils.setChildrenCollector) Don't know if this is related though > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089001#comment-14089001 ] Rui Li commented on HIVE-7624: -- Thanks very much [~csun]. After some debugging, I found this issue is caused in GenMapRedUtils.setKeyAndValueDescForTaskTree, which is called after we compiled the task. In that method we always set the keyDesc of the leaf reduce work according to the root map work. I suppose this is both incorrect and redundant because when a reduce work is created, we already call GenSparkUtils.setupReduceSink to set the keyDesc. I removed these code and the exception is gone. However I met another problem: no result is returned for the multi-MR query. (I cloned the jobConf and set a new plan path for the cloned) > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088993#comment-14088993 ] Chao commented on HIVE-7624: Hi [~ruili], I spent sometime looking at this bug today. What I found out is that, even with cloned JobConfs, in {{Utilities.setBaseWork}} it will still create same {{planPath}} for different reduce plans. Therefore, only one reduce plan will be left. I think we might need to find some way to allow multiple reduce plan files to co-exist. Hope this helps. > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
Another thing to watch is HiveConf's thread safety. I see it uses many static variables, but not sure if this the cause. On Tue, Aug 5, 2014 at 10:39 PM, Chao (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087281#comment-14087281 > ] > > Chao commented on HIVE-7624: > > > Hi [~brocknoland] and [~lirui], I tried cloning new jobConf today, and it > somehow still gave me the same error. Not sure why. > > > Reduce operator initialization failed when running multiple MR query on > spark > > > - > > > > Key: HIVE-7624 > > URL: https://issues.apache.org/jira/browse/HIVE-7624 > > Project: Hive > > Issue Type: Bug > > Components: Spark > >Reporter: Rui Li > > > > The following error occurs when I try to run a query with multiple > reduce works (M->R->R): > > {quote} > > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 > (TID 1) > > java.lang.RuntimeException: Reduce operator initialization failed > > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > > at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > > at org.apache.spark.scheduler.Task.run(Task.scala:54) > > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:744) > > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 > from [0:_col0] > > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > > … > > {quote} > > I suspect we're applying the reduce function in wrong order. > > > > -- > This message was sent by Atlassian JIRA > (v6.2#6252) >
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087281#comment-14087281 ] Chao commented on HIVE-7624: Hi [~brocknoland] and [~lirui], I tried cloning new jobConf today, and it somehow still gave me the same error. Not sure why. > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087272#comment-14087272 ] Rui Li commented on HIVE-7624: -- Thanks [~brocknoland] let me try this. > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087264#comment-14087264 ] Brock Noland commented on HIVE-7624: [~lirui] In our sync-up you mentioned overwriting values in JobConf for Reduce work. I have found while digging around that we need to clone the jobConf for each MapWork or ReduceWork so they don't overwrite each other. We should do this in SparkPlanGenerator.generate methods {noformat} JobConf newJobConf = new JobConf(jobConf); {noformat} > Reduce operator initialization failed when running multiple MR query on spark > - > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)