[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14245672#comment-14245672 ] Chao commented on HIVE-8911: Yes, I think this need doc. Thanks [~leftylev] for reminding me! > Enable mapjoin hints [Spark Branch] > --- > > Key: HIVE-8911 > URL: https://issues.apache.org/jira/browse/HIVE-8911 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao > Labels: TODOC-SPARK > Fix For: spark-branch > > Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, > HIVE-8911.3-spark.patch, HIVE-8911.4-spark.patch, HIVE-8911.5-spark.patch, > HIVE-8911.6-spark.patch > > > Currently the big table selection in a mapjoin is based on stats. > We should also enable the big-table selection based on hints. See class > MapJoinProcessor. This is a logical-optimizer class, so we should be able to > re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14245668#comment-14245668 ] Lefty Leverenz commented on HIVE-8911: -- Does this need documentation? If so, please add the TODOC-SPARK label. > Enable mapjoin hints [Spark Branch] > --- > > Key: HIVE-8911 > URL: https://issues.apache.org/jira/browse/HIVE-8911 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao > Fix For: spark-branch > > Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, > HIVE-8911.3-spark.patch, HIVE-8911.4-spark.patch, HIVE-8911.5-spark.patch, > HIVE-8911.6-spark.patch > > > Currently the big table selection in a mapjoin is based on stats. > We should also enable the big-table selection based on hints. See class > MapJoinProcessor. This is a logical-optimizer class, so we should be able to > re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14245401#comment-14245401 ] Chao commented on HIVE-8911: bucketmapjoin10 failed because of the IndexOutOfBoundException: {noformat} java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:207) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) ... 15 more Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:149) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:170) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:142) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:81) ... 20 more {noformat} This run may have not used the patch from HIVE-8982 - it is committed at 07:33am, while the run ended at 07:49am. > Enable mapjoin hints [Spark Branch] > --- > > Key: HIVE-8911 > URL: https://issues.apache.org/jira/browse/HIVE-8911 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao > Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, > HIVE-8911.3-spark.patch, HIVE-8911.4-spark.patch, HIVE-8911.5-spark.patch, > HIVE-8911.6-spark.patch > > > Currently the big table selection in a mapjoin is based on stats. > We should also enable the big-table selection based on hints. See class > MapJoinProcessor. This is a logical-optimizer class, so we should be able to > re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14245386#comment-14245386 ] Xuefu Zhang commented on HIVE-8911: --- ppd_join4 and smb_mapjoin_25 are not related, but bucketmapjoin10 might be. > Enable mapjoin hints [Spark Branch] > --- > > Key: HIVE-8911 > URL: https://issues.apache.org/jira/browse/HIVE-8911 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao > Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, > HIVE-8911.3-spark.patch, HIVE-8911.4-spark.patch, HIVE-8911.5-spark.patch, > HIVE-8911.6-spark.patch > > > Currently the big table selection in a mapjoin is based on stats. > We should also enable the big-table selection based on hints. See class > MapJoinProcessor. This is a logical-optimizer class, so we should be able to > re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14245259#comment-14245259 ] Hive QA commented on HIVE-8911: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12686982/HIVE-8911.6-spark.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 7233 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join4 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_25 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/534/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/534/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-534/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12686982 - PreCommit-HIVE-SPARK-Build > Enable mapjoin hints [Spark Branch] > --- > > Key: HIVE-8911 > URL: https://issues.apache.org/jira/browse/HIVE-8911 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao > Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, > HIVE-8911.3-spark.patch, HIVE-8911.4-spark.patch, HIVE-8911.5-spark.patch, > HIVE-8911.6-spark.patch > > > Currently the big table selection in a mapjoin is based on stats. > We should also enable the big-table selection based on hints. See class > MapJoinProcessor. This is a logical-optimizer class, so we should be able to > re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244486#comment-14244486 ] Chao commented on HIVE-8911: This is quite strange - it just the same patch as v3 plus a few golden files. > Enable mapjoin hints [Spark Branch] > --- > > Key: HIVE-8911 > URL: https://issues.apache.org/jira/browse/HIVE-8911 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao > Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, > HIVE-8911.3-spark.patch, HIVE-8911.4-spark.patch > > > Currently the big table selection in a mapjoin is based on stats. > We should also enable the big-table selection based on hints. See class > MapJoinProcessor. This is a logical-optimizer class, so we should be able to > re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244136#comment-14244136 ] Xuefu Zhang commented on HIVE-8911: --- The above test ran for 4h40m. I aborted it manually. Not sure why it was hanging. > Enable mapjoin hints [Spark Branch] > --- > > Key: HIVE-8911 > URL: https://issues.apache.org/jira/browse/HIVE-8911 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao > Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, > HIVE-8911.3-spark.patch, HIVE-8911.4-spark.patch > > > Currently the big table selection in a mapjoin is based on stats. > We should also enable the big-table selection based on hints. See class > MapJoinProcessor. This is a logical-optimizer class, so we should be able to > re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244130#comment-14244130 ] Hive QA commented on HIVE-8911: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12686769/HIVE-8911.4-spark.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7218 tests executed *Failed tests:* {noformat} TestCliDriver-ppd_join4.q-describe_xpath.q-union27.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/523/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/523/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-523/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12686769 - PreCommit-HIVE-SPARK-Build > Enable mapjoin hints [Spark Branch] > --- > > Key: HIVE-8911 > URL: https://issues.apache.org/jira/browse/HIVE-8911 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao > Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, > HIVE-8911.3-spark.patch, HIVE-8911.4-spark.patch > > > Currently the big table selection in a mapjoin is based on stats. > We should also enable the big-table selection based on hints. See class > MapJoinProcessor. This is a logical-optimizer class, so we should be able to > re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243467#comment-14243467 ] Hive QA commented on HIVE-8911: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12686697/HIVE-8911.3-spark.patch {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 7233 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join4 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_outer_join5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_25 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/520/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/520/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-520/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12686697 - PreCommit-HIVE-SPARK-Build > Enable mapjoin hints [Spark Branch] > --- > > Key: HIVE-8911 > URL: https://issues.apache.org/jira/browse/HIVE-8911 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao > Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, > HIVE-8911.3-spark.patch > > > Currently the big table selection in a mapjoin is based on stats. > We should also enable the big-table selection based on hints. See class > MapJoinProcessor. This is a logical-optimizer class, so we should be able to > re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243310#comment-14243310 ] Chao commented on HIVE-8911: Yes, moving all join optimizations into one place sounds good to me. But, the task seems non-trivial. Let's do that as part of the future work. > Enable mapjoin hints [Spark Branch] > --- > > Key: HIVE-8911 > URL: https://issues.apache.org/jira/browse/HIVE-8911 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao > Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, > HIVE-8911.3-spark.patch > > > Currently the big table selection in a mapjoin is based on stats. > We should also enable the big-table selection based on hints. See class > MapJoinProcessor. This is a logical-optimizer class, so we should be able to > re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243285#comment-14243285 ] Szehon Ho commented on HIVE-8911: - This looks ok for now to me. We might consider cleaning this up as part of SMB hint JIRA, such that we have Spark versions of MapJoinProcessor, BucketMapJoinOptimizer, and SortedMergeBucketMapjoinProc and we run them later in our own SparkCompiler. Then here, we will just disable MapJoinProcessor. > Enable mapjoin hints [Spark Branch] > --- > > Key: HIVE-8911 > URL: https://issues.apache.org/jira/browse/HIVE-8911 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao > Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, > HIVE-8911.3-spark.patch > > > Currently the big table selection in a mapjoin is based on stats. > We should also enable the big-table selection based on hints. See class > MapJoinProcessor. This is a logical-optimizer class, so we should be able to > re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243269#comment-14243269 ] Chao commented on HIVE-8911: BTW, I saw {{mapjoin_filter_onejoin.q.out}} and {{mapjoin_tester.q.out}} in our spark folder, but the corresponding qfiles were not found. > Enable mapjoin hints [Spark Branch] > --- > > Key: HIVE-8911 > URL: https://issues.apache.org/jira/browse/HIVE-8911 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao > Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch > > > Currently the big table selection in a mapjoin is based on stats. > We should also enable the big-table selection based on hints. See class > MapJoinProcessor. This is a logical-optimizer class, so we should be able to > re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243042#comment-14243042 ] Chao commented on HIVE-8911: [~szehon] Thanks for the review. You mean create some class like {{SparkMapJoinProcess}} that overrides {{generateMapJoinOperator}}? > Enable mapjoin hints [Spark Branch] > --- > > Key: HIVE-8911 > URL: https://issues.apache.org/jira/browse/HIVE-8911 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao > Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch > > > Currently the big table selection in a mapjoin is based on stats. > We should also enable the big-table selection based on hints. See class > MapJoinProcessor. This is a logical-optimizer class, so we should be able to > re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243034#comment-14243034 ] Szehon Ho commented on HIVE-8911: - Hi Chao, patch looks great. My only preference would be to do it another way that doesn't affect the main class. For example can we make a subclass and override some methods? > Enable mapjoin hints [Spark Branch] > --- > > Key: HIVE-8911 > URL: https://issues.apache.org/jira/browse/HIVE-8911 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao > Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch > > > Currently the big table selection in a mapjoin is based on stats. > We should also enable the big-table selection based on hints. See class > MapJoinProcessor. This is a logical-optimizer class, so we should be able to > re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240622#comment-14240622 ] Chao commented on HIVE-8911: OK, I found it. The test failures for bucketmapjoin8 and bucketmapjoin11 are caused by the issue in HIVE-8982: {noformat} java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:207) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) ... 15 more Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:149) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:170) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:142) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:81) ... 20 more {noformat} I need to get this fixed soon... > Enable mapjoin hints [Spark Branch] > --- > > Key: HIVE-8911 > URL: https://issues.apache.org/jira/browse/HIVE-8911 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao > Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch > > > Currently the big table selection in a mapjoin is based on stats. > We should also enable the big-table selection based on hints. See class > MapJoinProcessor. This is a logical-optimizer class, so we should be able to > re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240616#comment-14240616 ] Chao commented on HIVE-8911: Hmm, don't know why bucketmapjoin8 and bucketmapjoin11 failed. These two passed on my machine. > Enable mapjoin hints [Spark Branch] > --- > > Key: HIVE-8911 > URL: https://issues.apache.org/jira/browse/HIVE-8911 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao > Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch > > > Currently the big table selection in a mapjoin is based on stats. > We should also enable the big-table selection based on hints. See class > MapJoinProcessor. This is a logical-optimizer class, so we should be able to > re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240604#comment-14240604 ] Hive QA commented on HIVE-8911: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12686178/HIVE-8911.2-spark.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7207 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin8 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/507/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/507/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-507/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12686178 - PreCommit-HIVE-SPARK-Build > Enable mapjoin hints [Spark Branch] > --- > > Key: HIVE-8911 > URL: https://issues.apache.org/jira/browse/HIVE-8911 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao > Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch > > > Currently the big table selection in a mapjoin is based on stats. > We should also enable the big-table selection based on hints. See class > MapJoinProcessor. This is a logical-optimizer class, so we should be able to > re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240117#comment-14240117 ] Hive QA commented on HIVE-8911: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12686070/HIVE-8911.1-spark.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7207 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_filter_on_outerjoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_test_outer org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_decimal_aggregate {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/502/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/502/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-502/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12686070 - PreCommit-HIVE-SPARK-Build > Enable mapjoin hints [Spark Branch] > --- > > Key: HIVE-8911 > URL: https://issues.apache.org/jira/browse/HIVE-8911 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao > Attachments: HIVE-8911.1-spark.patch > > > Currently the big table selection in a mapjoin is based on stats. > We should also enable the big-table selection based on hints. See class > MapJoinProcessor. This is a logical-optimizer class, so we should be able to > re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)