[jira] [Commented] (HIVE-10084) Improve common join performance [Spark Branch]

2017-04-06 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959215#comment-15959215
 ] 

Xuefu Zhang commented on HIVE-10084:


Hi [~stakiar], The conclusion came from a benchmark between Spark and Tez in 
Hive. However, a lot of things have changed, so I'm not sure if this still 
holds true.

You can construct a query that invokes common join to see how it performs, and 
profile it if necessary. I think the difference might come from Spark shuffle. 
We have recently changed the usage of spark shuffle, so it's unclear to me if 
there is anything to do before you actually benchmark it.

> Improve common join performance [Spark Branch]
> --
>
> Key: HIVE-10084
> URL: https://issues.apache.org/jira/browse/HIVE-10084
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Benchmark shows that Hive on Spark shows some numbers which indicate that 
> common join performance can be improved. This task is to investigate and fix 
> the issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-10084) Improve common join performance [Spark Branch]

2017-04-06 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959173#comment-15959173
 ] 

Sahil Takiar commented on HIVE-10084:
-

Hey [~xuefuz], few questions on this:

* Do you still have the benchmark that show the perf problems with common-join? 
Do you remember the query that showed this perf problem?
* Was this in comparison to Hive-on-MRs common-join, or Spark-SQLs common-join?
* Was there a specific issue with common-join that you had in mind that could 
be causing this problem?

> Improve common join performance [Spark Branch]
> --
>
> Key: HIVE-10084
> URL: https://issues.apache.org/jira/browse/HIVE-10084
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Benchmark shows that Hive on Spark shows some numbers which indicate that 
> common join performance can be improved. This task is to investigate and fix 
> the issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-10084) Improve common join performance [Spark Branch]

2015-04-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503301#comment-14503301
 ] 

Hive QA commented on HIVE-10084:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12726596/HIVE-10084.1-spark.patch

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 8718 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucket6.q-scriptfile1_win.q-quotedid_smb.q-and-1-more - did 
not produce a TEST-*.xml file
TestMinimrCliDriver-bucketizedhiveinputformat.q-empty_dir_in_table.q - did not 
produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-infer_bucket_sort_map_operators.q-load_hdfs_file_with_space_in_the_name.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-import_exported_table.q-truncate_column_buckets.q-bucket_num_reducers2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-infer_bucket_sort_num_buckets.q-parallel_orderby.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-join1.q-infer_bucket_sort_bucketed_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-infer_bucket_sort_merge.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-input16_cc.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-bucket_num_reducers.q-scriptfile1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx_cbo_2.q-bucketmapjoin6.q-bucket4.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-reduce_deduplicate.q-infer_bucket_sort_dyn_part.q-udf_using.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-uber_reduce.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-stats_counter_partitioned.q-external_table_with_space_in_location_path.q-disable_merge_for_bucketing.q-and-1-more
 - did not produce a TEST-*.xml file
TestPigHBaseStorageHandler - did not produce a TEST-*.xml file
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/831/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/831/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-831/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12726596 - PreCommit-HIVE-SPARK-Build

 Improve common join performance [Spark Branch]
 --

 Key: HIVE-10084
 URL: https://issues.apache.org/jira/browse/HIVE-10084
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
 Attachments: HIVE-10084.1-spark.patch, HIVE-10084.1-spark.patch


 Benchmark shows that Hive on Spark shows some numbers which indicate that 
 common join performance can be improved. This task is to investigate and fix 
 the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10084) Improve common join performance [Spark Branch]

2015-04-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503124#comment-14503124
 ] 

Rui Li commented on HIVE-10084:
---

OOO and travelling abroad from 4/14 to 4/22. Please expect slow email response. 
Sorry for the inconvenience.


 Improve common join performance [Spark Branch]
 --

 Key: HIVE-10084
 URL: https://issues.apache.org/jira/browse/HIVE-10084
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
 Attachments: HIVE-10084.1-spark.patch, HIVE-10084.1-spark.patch


 Benchmark shows that Hive on Spark shows some numbers which indicate that 
 common join performance can be improved. This task is to investigate and fix 
 the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10084) Improve common join performance [Spark Branch]

2015-03-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385598#comment-14385598
 ] 

Hive QA commented on HIVE-10084:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12708014/HIVE-10084.1-spark.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 8707 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union31
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_22
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6_subq
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/812/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/812/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-812/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12708014 - PreCommit-HIVE-SPARK-Build

 Improve common join performance [Spark Branch]
 --

 Key: HIVE-10084
 URL: https://issues.apache.org/jira/browse/HIVE-10084
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
 Attachments: HIVE-10084.1-spark.patch


 Benchmark shows that Hive on Spark shows some numbers which indicate that 
 common join performance can be improved. This task is to investigate and fix 
 the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)