[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]

2014-09-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135415#comment-14135415
 ] 

Xuefu Zhang commented on HIVE-8054:
---

Thank you for the catch, [~leftylev].

> Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark 
> Branch]
> --
>
> Key: HIVE-8054
> URL: https://issues.apache.org/jira/browse/HIVE-8054
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Na Yang
>  Labels: Spark-M1, TODOC-SPARK
> Fix For: spark-branch
>
> Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch, 
> HIVE-8054.3-spark.patch
>
>
> Option hive.optimize.union.remove introduced in HIVE-3276 removes union 
> operators from the operator graph in certain cases as an optimization reduce 
> the number of MR jobs. While making sense in MR, this optimization is 
> actually harmful to an execution engine such as Spark, which natives supports 
> union without requiring additional jobs. This is because removing union 
> operator creates disjointed operator graphs, each graph generating a job, and 
> thus this optimization requires more jobs to run the query. Not to mention 
> the additional complexity handling linked FS descriptors.
> I propose that we disable such optimization when the execution engine is 
> Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]

2014-09-15 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135009#comment-14135009
 ] 

Lefty Leverenz commented on HIVE-8054:
--

This should be documented in wikidoc "Configuration Properties" when the Spark 
branch gets merged into trunk.  In the meantime, it could be documented in 
"Hive on Spark: Getting Started" as a note in the "Configuring Hive" section.

* [Hive on Spark: Getting Started -- Configuring Hive | 
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started#HiveonSpark:GettingStarted-ConfiguringHive]
* [Configuration Properties -- hive.optimize.union.remove | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.optimize.union.remove]

> Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark 
> Branch]
> --
>
> Key: HIVE-8054
> URL: https://issues.apache.org/jira/browse/HIVE-8054
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Na Yang
>  Labels: Spark-M1, TODOC-SPARK
> Fix For: spark-branch
>
> Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch, 
> HIVE-8054.3-spark.patch
>
>
> Option hive.optimize.union.remove introduced in HIVE-3276 removes union 
> operators from the operator graph in certain cases as an optimization reduce 
> the number of MR jobs. While making sense in MR, this optimization is 
> actually harmful to an execution engine such as Spark, which natives supports 
> union without requiring additional jobs. This is because removing union 
> operator creates disjointed operator graphs, each graph generating a job, and 
> thus this optimization requires more jobs to run the query. Not to mention 
> the additional complexity handling linked FS descriptors.
> I propose that we disable such optimization when the execution engine is 
> Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]

2014-09-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134722#comment-14134722
 ] 

Hive QA commented on HIVE-8054:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668898/HIVE-8054.3-spark.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6343 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/130/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/130/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-130/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668898

> Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark 
> Branch]
> --
>
> Key: HIVE-8054
> URL: https://issues.apache.org/jira/browse/HIVE-8054
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Na Yang
>  Labels: Spark-M1
> Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch, 
> HIVE-8054.3-spark.patch
>
>
> Option hive.optimize.union.remove introduced in HIVE-3276 removes union 
> operators from the operator graph in certain cases as an optimization reduce 
> the number of MR jobs. While making sense in MR, this optimization is 
> actually harmful to an execution engine such as Spark, which natives supports 
> union without requiring additional jobs. This is because removing union 
> operator creates disjointed operator graphs, each graph generating a job, and 
> thus this optimization requires more jobs to run the query. Not to mention 
> the additional complexity handling linked FS descriptors.
> I propose that we disable such optimization when the execution engine is 
> Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]

2014-09-15 Thread Na Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134668#comment-14134668
 ] 

Na Yang commented on HIVE-8054:
---

Hi [~xuefuz], yes. both load_dyn_part13 and optimize_nullscan are related to my 
patch. I regenerated the .q.out files for both of them. Thanks.

> Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark 
> Branch]
> --
>
> Key: HIVE-8054
> URL: https://issues.apache.org/jira/browse/HIVE-8054
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Na Yang
>  Labels: Spark-M1
> Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch
>
>
> Option hive.optimize.union.remove introduced in HIVE-3276 removes union 
> operators from the operator graph in certain cases as an optimization reduce 
> the number of MR jobs. While making sense in MR, this optimization is 
> actually harmful to an execution engine such as Spark, which natives supports 
> union without requiring additional jobs. This is because removing union 
> operator creates disjointed operator graphs, each graph generating a job, and 
> thus this optimization requires more jobs to run the query. Not to mention 
> the additional complexity handling linked FS descriptors.
> I propose that we disable such optimization when the execution engine is 
> Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]

2014-09-15 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134426#comment-14134426
 ] 

Xuefu Zhang commented on HIVE-8054:
---

Hi [~nyang], is load_dyn_part13 failure related to your patch? It seems having 
a different test output. Thanks.

> Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark 
> Branch]
> --
>
> Key: HIVE-8054
> URL: https://issues.apache.org/jira/browse/HIVE-8054
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Na Yang
>  Labels: Spark-M1
> Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch
>
>
> Option hive.optimize.union.remove introduced in HIVE-3276 removes union 
> operators from the operator graph in certain cases as an optimization reduce 
> the number of MR jobs. While making sense in MR, this optimization is 
> actually harmful to an execution engine such as Spark, which natives supports 
> union without requiring additional jobs. This is because removing union 
> operator creates disjointed operator graphs, each graph generating a job, and 
> thus this optimization requires more jobs to run the query. Not to mention 
> the additional complexity handling linked FS descriptors.
> I propose that we disable such optimization when the execution engine is 
> Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]

2014-09-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134394#comment-14134394
 ] 

Hive QA commented on HIVE-8054:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668813/HIVE-8054.2-spark.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6343 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/129/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/129/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-129/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668813

> Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark 
> Branch]
> --
>
> Key: HIVE-8054
> URL: https://issues.apache.org/jira/browse/HIVE-8054
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Na Yang
>  Labels: Spark-M1
> Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch
>
>
> Option hive.optimize.union.remove introduced in HIVE-3276 removes union 
> operators from the operator graph in certain cases as an optimization reduce 
> the number of MR jobs. While making sense in MR, this optimization is 
> actually harmful to an execution engine such as Spark, which natives supports 
> union without requiring additional jobs. This is because removing union 
> operator creates disjointed operator graphs, each graph generating a job, and 
> thus this optimization requires more jobs to run the query. Not to mention 
> the additional complexity handling linked FS descriptors.
> I propose that we disable such optimization when the execution engine is 
> Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]

2014-09-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133568#comment-14133568
 ] 

Xuefu Zhang commented on HIVE-8054:
---

[~nyang], it looks like that some test output needs to be updated. Thanks.

> Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark 
> Branch]
> --
>
> Key: HIVE-8054
> URL: https://issues.apache.org/jira/browse/HIVE-8054
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Na Yang
>  Labels: Spark-M1
> Attachments: HIVE-8054-spark.patch
>
>
> Option hive.optimize.union.remove introduced in HIVE-3276 removes union 
> operators from the operator graph in certain cases as an optimization reduce 
> the number of MR jobs. While making sense in MR, this optimization is 
> actually harmful to an execution engine such as Spark, which natives supports 
> union without requiring additional jobs. This is because removing union 
> operator creates disjointed operator graphs, each graph generating a job, and 
> thus this optimization requires more jobs to run the query. Not to mention 
> the additional complexity handling linked FS descriptors.
> I propose that we disable such optimization when the execution engine is 
> Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]

2014-09-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133246#comment-14133246
 ] 

Hive QA commented on HIVE-8054:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668643/HIVE-8054-spark.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6343 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_20
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/127/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/127/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-127/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668643

> Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark 
> Branch]
> --
>
> Key: HIVE-8054
> URL: https://issues.apache.org/jira/browse/HIVE-8054
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Na Yang
>  Labels: Spark-M1
> Attachments: HIVE-8054-spark.patch
>
>
> Option hive.optimize.union.remove introduced in HIVE-3276 removes union 
> operators from the operator graph in certain cases as an optimization reduce 
> the number of MR jobs. While making sense in MR, this optimization is 
> actually harmful to an execution engine such as Spark, which natives supports 
> union without requiring additional jobs. This is because removing union 
> operator creates disjointed operator graphs, each graph generating a job, and 
> thus this optimization requires more jobs to run the query. Not to mention 
> the additional complexity handling linked FS descriptors.
> I propose that we disable such optimization when the execution engine is 
> Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]

2014-09-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133227#comment-14133227
 ] 

Xuefu Zhang commented on HIVE-8054:
---

Hi [~nyang], thank you very much for working on this. The patch looks good, and 
I just submitted it to let the test run.

+1 pending on test result.

> Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark 
> Branch]
> --
>
> Key: HIVE-8054
> URL: https://issues.apache.org/jira/browse/HIVE-8054
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Na Yang
>  Labels: Spark-M1
> Attachments: HIVE-8054-spark.patch
>
>
> Option hive.optimize.union.remove introduced in HIVE-3276 removes union 
> operators from the operator graph in certain cases as an optimization reduce 
> the number of MR jobs. While making sense in MR, this optimization is 
> actually harmful to an execution engine such as Spark, which natives supports 
> union without requiring additional jobs. This is because removing union 
> operator creates disjointed operator graphs, each graph generating a job, and 
> thus this optimization requires more jobs to run the query. Not to mention 
> the additional complexity handling linked FS descriptors.
> I propose that we disable such optimization when the execution engine is 
> Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]

2014-09-13 Thread Na Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133111#comment-14133111
 ] 

Na Yang commented on HIVE-8054:
---

review board link: https://reviews.apache.org/r/25619

> Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark 
> Branch]
> --
>
> Key: HIVE-8054
> URL: https://issues.apache.org/jira/browse/HIVE-8054
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Na Yang
>  Labels: Spark-M1
> Attachments: HIVE-8054-spark.patch
>
>
> Option hive.optimize.union.remove introduced in HIVE-3276 removes union 
> operators from the operator graph in certain cases as an optimization reduce 
> the number of MR jobs. While making sense in MR, this optimization is 
> actually harmful to an execution engine such as Spark, which natives supports 
> union without requiring additional jobs. This is because removing union 
> operator creates disjointed operator graphs, each graph generating a job, and 
> thus this optimization requires more jobs to run the query. Not to mention 
> the additional complexity handling linked FS descriptors.
> I propose that we disable such optimization when the execution engine is 
> Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)