[jira] [Commented] (HIVE-10084) Improve common join performance [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959215#comment-15959215 ] Xuefu Zhang commented on HIVE-10084: Hi [~stakiar], The conclusion came from a benchmark between Spark and Tez in Hive. However, a lot of things have changed, so I'm not sure if this still holds true. You can construct a query that invokes common join to see how it performs, and profile it if necessary. I think the difference might come from Spark shuffle. We have recently changed the usage of spark shuffle, so it's unclear to me if there is anything to do before you actually benchmark it. > Improve common join performance [Spark Branch] > -- > > Key: HIVE-10084 > URL: https://issues.apache.org/jira/browse/HIVE-10084 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang > > Benchmark shows that Hive on Spark shows some numbers which indicate that > common join performance can be improved. This task is to investigate and fix > the issue. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-10084) Improve common join performance [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959173#comment-15959173 ] Sahil Takiar commented on HIVE-10084: - Hey [~xuefuz], few questions on this: * Do you still have the benchmark that show the perf problems with common-join? Do you remember the query that showed this perf problem? * Was this in comparison to Hive-on-MRs common-join, or Spark-SQLs common-join? * Was there a specific issue with common-join that you had in mind that could be causing this problem? > Improve common join performance [Spark Branch] > -- > > Key: HIVE-10084 > URL: https://issues.apache.org/jira/browse/HIVE-10084 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang > > Benchmark shows that Hive on Spark shows some numbers which indicate that > common join performance can be improved. This task is to investigate and fix > the issue. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-10084) Improve common join performance [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503301#comment-14503301 ] Hive QA commented on HIVE-10084: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12726596/HIVE-10084.1-spark.patch {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 8718 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucket6.q-scriptfile1_win.q-quotedid_smb.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-bucketizedhiveinputformat.q-empty_dir_in_table.q - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-infer_bucket_sort_map_operators.q-load_hdfs_file_with_space_in_the_name.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-import_exported_table.q-truncate_column_buckets.q-bucket_num_reducers2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-infer_bucket_sort_num_buckets.q-parallel_orderby.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-join1.q-infer_bucket_sort_bucketed_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-infer_bucket_sort_merge.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-input16_cc.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-bucket_num_reducers.q-scriptfile1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx_cbo_2.q-bucketmapjoin6.q-bucket4.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-reduce_deduplicate.q-infer_bucket_sort_dyn_part.q-udf_using.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-uber_reduce.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-stats_counter_partitioned.q-external_table_with_space_in_location_path.q-disable_merge_for_bucketing.q-and-1-more - did not produce a TEST-*.xml file TestPigHBaseStorageHandler - did not produce a TEST-*.xml file org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/831/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/831/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-831/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12726596 - PreCommit-HIVE-SPARK-Build Improve common join performance [Spark Branch] -- Key: HIVE-10084 URL: https://issues.apache.org/jira/browse/HIVE-10084 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Attachments: HIVE-10084.1-spark.patch, HIVE-10084.1-spark.patch Benchmark shows that Hive on Spark shows some numbers which indicate that common join performance can be improved. This task is to investigate and fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10084) Improve common join performance [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503124#comment-14503124 ] Rui Li commented on HIVE-10084: --- OOO and travelling abroad from 4/14 to 4/22. Please expect slow email response. Sorry for the inconvenience. Improve common join performance [Spark Branch] -- Key: HIVE-10084 URL: https://issues.apache.org/jira/browse/HIVE-10084 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Attachments: HIVE-10084.1-spark.patch, HIVE-10084.1-spark.patch Benchmark shows that Hive on Spark shows some numbers which indicate that common join performance can be improved. This task is to investigate and fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10084) Improve common join performance [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385598#comment-14385598 ] Hive QA commented on HIVE-10084: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12708014/HIVE-10084.1-spark.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 8707 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union31 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_22 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6_subq org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/812/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/812/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-812/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12708014 - PreCommit-HIVE-SPARK-Build Improve common join performance [Spark Branch] -- Key: HIVE-10084 URL: https://issues.apache.org/jira/browse/HIVE-10084 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Attachments: HIVE-10084.1-spark.patch Benchmark shows that Hive on Spark shows some numbers which indicate that common join performance can be improved. This task is to investigate and fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)