subject:"\"\\\\\\\[jira\\\\\\\] \\\\\\\[Commented\\\\\\\] \\\\\\\(HIVE\\\\\\\-8911\\\\\\\) Enable mapjoin hints \\\\\\\[Spark Branch\\\\\\\]\""

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2014-12-13 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14245672#comment-14245672
 ] 

Chao commented on HIVE-8911:


Yes, I think this need doc. Thanks [~leftylev] for reminding me!

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao
>  Labels: TODOC-SPARK
> Fix For: spark-branch
>
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, 
> HIVE-8911.3-spark.patch, HIVE-8911.4-spark.patch, HIVE-8911.5-spark.patch, 
> HIVE-8911.6-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2014-12-13 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14245668#comment-14245668
 ] 

Lefty Leverenz commented on HIVE-8911:
--

Does this need documentation?  If so, please add the TODOC-SPARK label.

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao
> Fix For: spark-branch
>
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, 
> HIVE-8911.3-spark.patch, HIVE-8911.4-spark.patch, HIVE-8911.5-spark.patch, 
> HIVE-8911.6-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2014-12-13 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14245401#comment-14245401
 ] 

Chao commented on HIVE-8911:


bucketmapjoin10 failed because of the IndexOutOfBoundException:

{noformat}
java.lang.RuntimeException: Hive Runtime Error while closing operators
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:207)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
at 
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
at 
org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
at 
org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at 
org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:87)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
... 15 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at 
org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70)
at 
org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:149)
at 
org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:170)
at 
org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:142)
at 
org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:81)
... 20 more
{noformat}

This run may have not used the patch from HIVE-8982 - it is committed at 
07:33am, while the run ended at 07:49am.

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, 
> HIVE-8911.3-spark.patch, HIVE-8911.4-spark.patch, HIVE-8911.5-spark.patch, 
> HIVE-8911.6-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2014-12-13 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14245386#comment-14245386
 ] 

Xuefu Zhang commented on HIVE-8911:
---

ppd_join4 and smb_mapjoin_25 are not related, but bucketmapjoin10 might be.

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, 
> HIVE-8911.3-spark.patch, HIVE-8911.4-spark.patch, HIVE-8911.5-spark.patch, 
> HIVE-8911.6-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2014-12-13 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14245259#comment-14245259
 ] 

Hive QA commented on HIVE-8911:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12686982/HIVE-8911.6-spark.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 7233 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_25
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/534/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/534/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-534/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12686982 - PreCommit-HIVE-SPARK-Build

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, 
> HIVE-8911.3-spark.patch, HIVE-8911.4-spark.patch, HIVE-8911.5-spark.patch, 
> HIVE-8911.6-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2014-12-12 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244486#comment-14244486
 ] 

Chao commented on HIVE-8911:


This is quite strange - it just the same patch as v3 plus a few golden files.


> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, 
> HIVE-8911.3-spark.patch, HIVE-8911.4-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2014-12-12 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244136#comment-14244136
 ] 

Xuefu Zhang commented on HIVE-8911:
---

The above test ran for 4h40m. I aborted it manually. Not sure why it was 
hanging.

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, 
> HIVE-8911.3-spark.patch, HIVE-8911.4-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2014-12-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244130#comment-14244130
 ] 

Hive QA commented on HIVE-8911:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12686769/HIVE-8911.4-spark.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7218 tests executed
*Failed tests:*
{noformat}
TestCliDriver-ppd_join4.q-describe_xpath.q-union27.q-and-12-more - did not 
produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/523/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/523/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-523/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12686769 - PreCommit-HIVE-SPARK-Build

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, 
> HIVE-8911.3-spark.patch, HIVE-8911.4-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2014-12-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243467#comment-14243467
 ] 

Hive QA commented on HIVE-8911:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12686697/HIVE-8911.3-spark.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 7233 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_outer_join5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_25
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/520/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/520/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-520/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12686697 - PreCommit-HIVE-SPARK-Build

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, 
> HIVE-8911.3-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2014-12-11 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243310#comment-14243310
 ] 

Chao commented on HIVE-8911:


Yes, moving all join optimizations into one place sounds good to me. But, the 
task seems non-trivial. Let's do that as part of the future work.

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, 
> HIVE-8911.3-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2014-12-11 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243285#comment-14243285
 ] 

Szehon Ho commented on HIVE-8911:
-

This looks ok for now to me.  We might consider cleaning this up as part of SMB 
hint JIRA, such that we have Spark versions of MapJoinProcessor, 
BucketMapJoinOptimizer, and SortedMergeBucketMapjoinProc and we run them later 
in our own SparkCompiler.  Then here, we will just disable MapJoinProcessor.

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, 
> HIVE-8911.3-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2014-12-11 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243269#comment-14243269
 ] 

Chao commented on HIVE-8911:


BTW, I saw {{mapjoin_filter_onejoin.q.out}} and {{mapjoin_tester.q.out}} in our 
spark folder, but the corresponding qfiles were not found.

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2014-12-11 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243042#comment-14243042
 ] 

Chao commented on HIVE-8911:


[~szehon] Thanks for the review. You mean create some class like 
{{SparkMapJoinProcess}} that overrides {{generateMapJoinOperator}}?

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2014-12-11 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243034#comment-14243034
 ] 

Szehon Ho commented on HIVE-8911:
-

Hi Chao, patch looks great.  My only preference would be to do it another way 
that doesn't affect the main class. For example can we make a subclass and 
override some methods?

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2014-12-09 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240622#comment-14240622
 ] 

Chao commented on HIVE-8911:


OK, I found it. The test failures for bucketmapjoin8 and bucketmapjoin11 are 
caused by the issue in HIVE-8982:

{noformat}
java.lang.RuntimeException: Hive Runtime Error while closing operators
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:207)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
at 
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
at 
org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
at 
org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at 
org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:87)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
... 15 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at 
org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70)
at 
org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:149)
at 
org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:170)
at 
org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:142)
at 
org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:81)
... 20 more
{noformat}

I need to get this fixed soon...

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2014-12-09 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240616#comment-14240616
 ] 

Chao commented on HIVE-8911:


Hmm, don't know why bucketmapjoin8 and bucketmapjoin11 failed. These two passed 
on my machine.

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2014-12-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240604#comment-14240604
 ] 

Hive QA commented on HIVE-8911:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12686178/HIVE-8911.2-spark.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7207 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/507/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/507/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-507/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12686178 - PreCommit-HIVE-SPARK-Build

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2014-12-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240117#comment-14240117
 ] 

Hive QA commented on HIVE-8911:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12686070/HIVE-8911.1-spark.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7207 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_filter_on_outerjoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_test_outer
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_decimal_aggregate
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/502/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/502/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-502/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12686070 - PreCommit-HIVE-SPARK-Build

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao
> Attachments: HIVE-8911.1-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

18 matches

Site Navigation

Mail list logo

Footer information