[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135415#comment-14135415 ] Xuefu Zhang commented on HIVE-8054: --- Thank you for the catch, [~leftylev]. > Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark > Branch] > -- > > Key: HIVE-8054 > URL: https://issues.apache.org/jira/browse/HIVE-8054 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Xuefu Zhang >Assignee: Na Yang > Labels: Spark-M1, TODOC-SPARK > Fix For: spark-branch > > Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch, > HIVE-8054.3-spark.patch > > > Option hive.optimize.union.remove introduced in HIVE-3276 removes union > operators from the operator graph in certain cases as an optimization reduce > the number of MR jobs. While making sense in MR, this optimization is > actually harmful to an execution engine such as Spark, which natives supports > union without requiring additional jobs. This is because removing union > operator creates disjointed operator graphs, each graph generating a job, and > thus this optimization requires more jobs to run the query. Not to mention > the additional complexity handling linked FS descriptors. > I propose that we disable such optimization when the execution engine is > Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135009#comment-14135009 ] Lefty Leverenz commented on HIVE-8054: -- This should be documented in wikidoc "Configuration Properties" when the Spark branch gets merged into trunk. In the meantime, it could be documented in "Hive on Spark: Getting Started" as a note in the "Configuring Hive" section. * [Hive on Spark: Getting Started -- Configuring Hive | https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started#HiveonSpark:GettingStarted-ConfiguringHive] * [Configuration Properties -- hive.optimize.union.remove | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.optimize.union.remove] > Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark > Branch] > -- > > Key: HIVE-8054 > URL: https://issues.apache.org/jira/browse/HIVE-8054 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Xuefu Zhang >Assignee: Na Yang > Labels: Spark-M1, TODOC-SPARK > Fix For: spark-branch > > Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch, > HIVE-8054.3-spark.patch > > > Option hive.optimize.union.remove introduced in HIVE-3276 removes union > operators from the operator graph in certain cases as an optimization reduce > the number of MR jobs. While making sense in MR, this optimization is > actually harmful to an execution engine such as Spark, which natives supports > union without requiring additional jobs. This is because removing union > operator creates disjointed operator graphs, each graph generating a job, and > thus this optimization requires more jobs to run the query. Not to mention > the additional complexity handling linked FS descriptors. > I propose that we disable such optimization when the execution engine is > Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134722#comment-14134722 ] Hive QA commented on HIVE-8054: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668898/HIVE-8054.3-spark.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6343 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/130/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/130/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-130/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12668898 > Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark > Branch] > -- > > Key: HIVE-8054 > URL: https://issues.apache.org/jira/browse/HIVE-8054 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Xuefu Zhang >Assignee: Na Yang > Labels: Spark-M1 > Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch, > HIVE-8054.3-spark.patch > > > Option hive.optimize.union.remove introduced in HIVE-3276 removes union > operators from the operator graph in certain cases as an optimization reduce > the number of MR jobs. While making sense in MR, this optimization is > actually harmful to an execution engine such as Spark, which natives supports > union without requiring additional jobs. This is because removing union > operator creates disjointed operator graphs, each graph generating a job, and > thus this optimization requires more jobs to run the query. Not to mention > the additional complexity handling linked FS descriptors. > I propose that we disable such optimization when the execution engine is > Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134668#comment-14134668 ] Na Yang commented on HIVE-8054: --- Hi [~xuefuz], yes. both load_dyn_part13 and optimize_nullscan are related to my patch. I regenerated the .q.out files for both of them. Thanks. > Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark > Branch] > -- > > Key: HIVE-8054 > URL: https://issues.apache.org/jira/browse/HIVE-8054 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Xuefu Zhang >Assignee: Na Yang > Labels: Spark-M1 > Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch > > > Option hive.optimize.union.remove introduced in HIVE-3276 removes union > operators from the operator graph in certain cases as an optimization reduce > the number of MR jobs. While making sense in MR, this optimization is > actually harmful to an execution engine such as Spark, which natives supports > union without requiring additional jobs. This is because removing union > operator creates disjointed operator graphs, each graph generating a job, and > thus this optimization requires more jobs to run the query. Not to mention > the additional complexity handling linked FS descriptors. > I propose that we disable such optimization when the execution engine is > Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134426#comment-14134426 ] Xuefu Zhang commented on HIVE-8054: --- Hi [~nyang], is load_dyn_part13 failure related to your patch? It seems having a different test output. Thanks. > Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark > Branch] > -- > > Key: HIVE-8054 > URL: https://issues.apache.org/jira/browse/HIVE-8054 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Xuefu Zhang >Assignee: Na Yang > Labels: Spark-M1 > Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch > > > Option hive.optimize.union.remove introduced in HIVE-3276 removes union > operators from the operator graph in certain cases as an optimization reduce > the number of MR jobs. While making sense in MR, this optimization is > actually harmful to an execution engine such as Spark, which natives supports > union without requiring additional jobs. This is because removing union > operator creates disjointed operator graphs, each graph generating a job, and > thus this optimization requires more jobs to run the query. Not to mention > the additional complexity handling linked FS descriptors. > I propose that we disable such optimization when the execution engine is > Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134394#comment-14134394 ] Hive QA commented on HIVE-8054: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668813/HIVE-8054.2-spark.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6343 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/129/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/129/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-129/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12668813 > Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark > Branch] > -- > > Key: HIVE-8054 > URL: https://issues.apache.org/jira/browse/HIVE-8054 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Xuefu Zhang >Assignee: Na Yang > Labels: Spark-M1 > Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch > > > Option hive.optimize.union.remove introduced in HIVE-3276 removes union > operators from the operator graph in certain cases as an optimization reduce > the number of MR jobs. While making sense in MR, this optimization is > actually harmful to an execution engine such as Spark, which natives supports > union without requiring additional jobs. This is because removing union > operator creates disjointed operator graphs, each graph generating a job, and > thus this optimization requires more jobs to run the query. Not to mention > the additional complexity handling linked FS descriptors. > I propose that we disable such optimization when the execution engine is > Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133568#comment-14133568 ] Xuefu Zhang commented on HIVE-8054: --- [~nyang], it looks like that some test output needs to be updated. Thanks. > Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark > Branch] > -- > > Key: HIVE-8054 > URL: https://issues.apache.org/jira/browse/HIVE-8054 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Xuefu Zhang >Assignee: Na Yang > Labels: Spark-M1 > Attachments: HIVE-8054-spark.patch > > > Option hive.optimize.union.remove introduced in HIVE-3276 removes union > operators from the operator graph in certain cases as an optimization reduce > the number of MR jobs. While making sense in MR, this optimization is > actually harmful to an execution engine such as Spark, which natives supports > union without requiring additional jobs. This is because removing union > operator creates disjointed operator graphs, each graph generating a job, and > thus this optimization requires more jobs to run the query. Not to mention > the additional complexity handling linked FS descriptors. > I propose that we disable such optimization when the execution engine is > Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133246#comment-14133246 ] Hive QA commented on HIVE-8054: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668643/HIVE-8054-spark.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6343 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_20 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/127/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/127/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-127/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12668643 > Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark > Branch] > -- > > Key: HIVE-8054 > URL: https://issues.apache.org/jira/browse/HIVE-8054 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Xuefu Zhang >Assignee: Na Yang > Labels: Spark-M1 > Attachments: HIVE-8054-spark.patch > > > Option hive.optimize.union.remove introduced in HIVE-3276 removes union > operators from the operator graph in certain cases as an optimization reduce > the number of MR jobs. While making sense in MR, this optimization is > actually harmful to an execution engine such as Spark, which natives supports > union without requiring additional jobs. This is because removing union > operator creates disjointed operator graphs, each graph generating a job, and > thus this optimization requires more jobs to run the query. Not to mention > the additional complexity handling linked FS descriptors. > I propose that we disable such optimization when the execution engine is > Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133227#comment-14133227 ] Xuefu Zhang commented on HIVE-8054: --- Hi [~nyang], thank you very much for working on this. The patch looks good, and I just submitted it to let the test run. +1 pending on test result. > Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark > Branch] > -- > > Key: HIVE-8054 > URL: https://issues.apache.org/jira/browse/HIVE-8054 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Xuefu Zhang >Assignee: Na Yang > Labels: Spark-M1 > Attachments: HIVE-8054-spark.patch > > > Option hive.optimize.union.remove introduced in HIVE-3276 removes union > operators from the operator graph in certain cases as an optimization reduce > the number of MR jobs. While making sense in MR, this optimization is > actually harmful to an execution engine such as Spark, which natives supports > union without requiring additional jobs. This is because removing union > operator creates disjointed operator graphs, each graph generating a job, and > thus this optimization requires more jobs to run the query. Not to mention > the additional complexity handling linked FS descriptors. > I propose that we disable such optimization when the execution engine is > Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133111#comment-14133111 ] Na Yang commented on HIVE-8054: --- review board link: https://reviews.apache.org/r/25619 > Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark > Branch] > -- > > Key: HIVE-8054 > URL: https://issues.apache.org/jira/browse/HIVE-8054 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Xuefu Zhang >Assignee: Na Yang > Labels: Spark-M1 > Attachments: HIVE-8054-spark.patch > > > Option hive.optimize.union.remove introduced in HIVE-3276 removes union > operators from the operator graph in certain cases as an optimization reduce > the number of MR jobs. While making sense in MR, this optimization is > actually harmful to an execution engine such as Spark, which natives supports > union without requiring additional jobs. This is because removing union > operator creates disjointed operator graphs, each graph generating a job, and > thus this optimization requires more jobs to run the query. Not to mention > the additional complexity handling linked FS descriptors. > I propose that we disable such optimization when the execution engine is > Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)