[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-18 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101680#comment-14101680
 ] 

Rui Li commented on HIVE-7624:
--

[~brocknoland] Got it, thanks!

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: spark-branch
>
> Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
> HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, 
> HIVE-7624.7-spark.patch, HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-18 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100769#comment-14100769
 ] 

Brock Noland commented on HIVE-7624:


Hi Rui,

Hive generally follows one commit = one jira so I moved your patch over to 
HIVE-7766 and committed it. Thank you!!

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: spark-branch
>
> Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
> HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, 
> HIVE-7624.7-spark.patch, HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100358#comment-14100358
 ] 

Hive QA commented on HIVE-7624:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662431/HIVE-7624.7-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5915 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/54/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/54/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-54/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662431

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: spark-branch
>
> Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
> HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, 
> HIVE-7624.7-spark.patch, HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-11 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093796#comment-14093796
 ] 

Rui Li commented on HIVE-7624:
--

Thanks for the review : -) 

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: spark-branch
>
> Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
> HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, 
> HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-11 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093774#comment-14093774
 ] 

Szehon Ho commented on HIVE-7624:
-

udaf_context_ngrams is a strange failure, never seen it before, but doubt it's 
be related to this patch.  

Will commit this shortly.

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
> HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, 
> HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093770#comment-14093770
 ] 

Hive QA commented on HIVE-7624:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12661110/HIVE-7624.6-spark.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5844 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_context_ngrams
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/32/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/32/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-32/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12661110

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
> HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, 
> HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-11 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093638#comment-14093638
 ] 

Szehon Ho commented on HIVE-7624:
-

Oh I thought the last version on RB was already rebased.  +1 pending test for 
latest version.

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
> HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, 
> HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-11 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093618#comment-14093618
 ] 

Rui Li commented on HIVE-7624:
--

Hi [~brocknoland], [~szehon] I'll rebase the patch.

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
> HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-11 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093614#comment-14093614
 ] 

Szehon Ho commented on HIVE-7624:
-

+1 for latest version

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
> HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-11 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092900#comment-14092900
 ] 

Brock Noland commented on HIVE-7624:


Nice work!!

bq. The patch does not appear to apply with p0, p1, or p2

Looks like the patch needs to be rebased.

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
> HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092606#comment-14092606
 ] 

Hive QA commented on HIVE-7624:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660957/HIVE-7624.5-spark.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/28/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/28/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-28/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-SPARK-Build-28/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-spark-source ]]
+ [[ ! -d apache-svn-spark-source/.svn ]]
+ [[ ! -d apache-svn-spark-source ]]
+ cd apache-svn-spark-source
+ svn revert -R .
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java'
++ svn status --no-ignore
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-unit/target itests/custom-serde/target itests/util/target 
hcatalog/target hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hwi/target 
common/target common/src/gen contrib/target service/target serde/target 
beeline/target cli/target odbc/target ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1617233.

At revision 1617233.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660957

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
> HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.

[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-08 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091198#comment-14091198
 ] 

Szehon Ho commented on HIVE-7624:
-

Hi Li Rui, I think the patch looks reasonable.  Just had a comment and a 
question on the RB.  Thanks

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
> HIVE-7624.4-spark.patch, HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090689#comment-14090689
 ] 

Hive QA commented on HIVE-7624:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660590/HIVE-7624.4-spark.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5828 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/24/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/24/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-24/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660590

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
> HIVE-7624.4-spark.patch, HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090525#comment-14090525
 ] 

Hive QA commented on HIVE-7624:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660582/HIVE-7624.3-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5843 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/23/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/23/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-23/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660582

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
> HIVE-7624.4-spark.patch, HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-08 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090509#comment-14090509
 ] 

Rui Li commented on HIVE-7624:
--

Some change may bypass HIVE-7597. Remove it.

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
> HIVE-7624.4-spark.patch, HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-08 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090452#comment-14090452
 ] 

Rui Li commented on HIVE-7624:
--

Finally I found this is because we don't set output collector for RS in 
ExecReducer. While this is natural for MR where ExecReducer shouldn't contain 
RS, we have to do it for spark. The added code just looks for RS and sets 
collector for it, so there shouldn't be any regression.

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
> HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-07 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090261#comment-14090261
 ] 

Rui Li commented on HIVE-7624:
--

Hi [~csun] I updated the patch based on latest code.
But the group by+order by query still returns nothing for me (with HIVE-7492 & 
HIVE-7652 in place).

I'm not sure if this is another issue or the side effect of cloning the job 
conf. Any ideas?

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.2-spark.patch, HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-07 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090152#comment-14090152
 ] 

Rui Li commented on HIVE-7624:
--

[~xuefuz] currently the second R does end with a ReduceSink.

[~csun] sorry it's been a while since I last sync with the upstream. I'll 
rebase my branch and run the test again.

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-07 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090010#comment-14090010
 ] 

Chao commented on HIVE-7624:


sorry should be order by in my last comment.

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-07 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090009#comment-14090009
 ] 

Chao commented on HIVE-7624:


With this patch and HIVE-7492 plus HIVE-7652, group by + reduce by produces 
correct result on my machine.
I'm not sure why. Also, not sure whether there's any further issue.

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089922#comment-14089922
 ] 

Xuefu Zhang commented on HIVE-7624:
---

{quote}
However, no result is returned. I checked the log and found the second reduce 
work got nothing to process. Not sure what is missing here...
{quote}

I think the problem is caused by FileSinkOperator in reduce-side operator tree. 
That tree, in MR world, is probably always FileSinkOperator (which write on 
disk). If we have MRR, then the first R should not write to disk. Instead, it 
should have RedcueSinkOperator, which outputs the result to the SparkCollector. 
The result RDD is based on the SparkCollector, which can be picked up by the 
second R.

I think we need to modify the operator tree a bit for this to work correctly. 
Please follow Tez's way to do this.

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-07 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089555#comment-14089555
 ] 

Chao commented on HIVE-7624:


Hi [~ruili], I think this patch overlaps a little bit with HIVE-7597, on 
{{GenMapRedUtils}}. I can't apply the patch due to the conflict.

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-07 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089459#comment-14089459
 ] 

Brock Noland commented on HIVE-7624:


During debugging I have used the code below

{noformat}
System.err.println("JoinOperator " + alias + " row = " + 
SerDeUtils.getJSONString(row, inputObjInspectors[tag]));
{noformat}

I wonder if we should not commit that to each operator for debugging since it's 
much easier to see how the rows are filtered, modified...

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-07 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089455#comment-14089455
 ] 

Chao commented on HIVE-7624:


Great! Thanks [~ruili]. I'll try this patch.

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-07 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089181#comment-14089181
 ] 

Rui Li commented on HIVE-7624:
--

This patch solves the reducesinkkey0 problem. Map work and reduce work finish 
successfully.
However, no result is returned. I checked the log and found the second reduce 
work got nothing to process. Not sure what is missing here...

I quickly looked at tez code and find it sets output collector for each reduce 
sink. (OperatorUtils.setChildrenCollector) Don't know if this is related though

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-07 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089001#comment-14089001
 ] 

Rui Li commented on HIVE-7624:
--

Thanks very much [~csun]. After some debugging, I found this issue is caused in 
GenMapRedUtils.setKeyAndValueDescForTaskTree, which is called after we compiled 
the task. In that method we always set the keyDesc of the leaf reduce work 
according to the root map work. I suppose this is both incorrect and redundant 
because when a reduce work is created, we already call 
GenSparkUtils.setupReduceSink to set the keyDesc. I removed these code and the 
exception is gone.

However I met another problem: no result is returned for the multi-MR query. (I 
cloned the jobConf and set a new plan path for the cloned)

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-07 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088993#comment-14088993
 ] 

Chao commented on HIVE-7624:


Hi [~ruili], I spent sometime looking at this bug today. What I found out is 
that, even with cloned JobConfs,
in {{Utilities.setBaseWork}} it will still create same {{planPath}} for 
different reduce plans. Therefore, only one reduce plan will be left. I think 
we might need to find some way to allow multiple reduce plan files to co-exist.
Hope this helps.

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-06 Thread Xuefu Zhang
Another thing to watch is HiveConf's thread safety. I see it uses many
static variables, but not sure if this the cause.


On Tue, Aug 5, 2014 at 10:39 PM, Chao (JIRA)  wrote:

>
> [
> https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087281#comment-14087281
> ]
>
> Chao commented on HIVE-7624:
> 
>
> Hi [~brocknoland] and [~lirui], I tried cloning new jobConf today, and it
> somehow still gave me the same error. Not sure why.
>
> > Reduce operator initialization failed when running multiple MR query on
> spark
> >
> -
> >
> > Key: HIVE-7624
> > URL: https://issues.apache.org/jira/browse/HIVE-7624
> > Project: Hive
> >  Issue Type: Bug
> >  Components: Spark
> >Reporter: Rui Li
> >
> > The following error occurs when I try to run a query with multiple
> reduce works (M->R->R):
> > {quote}
> > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0
> (TID 1)
> > java.lang.RuntimeException: Reduce operator initialization failed
> > at
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> > at
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> > at
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> > at
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> > at
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> > at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> > at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> > at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> > at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> > at org.apache.spark.scheduler.Task.run(Task.scala:54)
> > at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> > at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:744)
> > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0
> from [0:_col0]
> > at
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> > at
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> > …
> > {quote}
> > I suspect we're applying the reduce function in wrong order.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)
>


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-05 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087281#comment-14087281
 ] 

Chao commented on HIVE-7624:


Hi [~brocknoland] and [~lirui], I tried cloning new jobConf today, and it 
somehow still gave me the same error. Not sure why.

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-05 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087272#comment-14087272
 ] 

Rui Li commented on HIVE-7624:
--

Thanks [~brocknoland] let me try this.

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-05 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087264#comment-14087264
 ] 

Brock Noland commented on HIVE-7624:


[~lirui] In our sync-up you mentioned overwriting values in JobConf for Reduce 
work. I have found while digging around that we need to clone the jobConf for 
each MapWork or ReduceWork so they don't overwrite each other. We should do 
this in SparkPlanGenerator.generate methods
{noformat}
JobConf newJobConf = new JobConf(jobConf);
{noformat}



> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)