[jira] [Updated] (HIVE-11363) Prewarm Hive on Spark containers [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-11363: -- Labels: TODOC-SPARK (was: ) Prewarm Hive on Spark containers [Spark Branch] --- Key: HIVE-11363 URL: https://issues.apache.org/jira/browse/HIVE-11363 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.1.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Labels: TODOC-SPARK Fix For: spark-branch Attachments: HIVE-11363.1-spark.patch, HIVE-11363.2-spark.patch, HIVE-11363.3-spark.patch, HIVE-11363.4-spark.patch, HIVE-11363.5-spark.patch When Hive job is launched by Oozie, a Hive session is created and job script is executed. Session is closed when Hive job is completed. Thus, Hive session is not shared among Hive jobs either in an Oozie workflow or across workflows. Since the parallelism of a Hive job executed on Spark is impacted by the available executors, such Hive jobs will suffer the executor ramp-up overhead. The idea here is to wait a bit so that enough executors can come up before a job can be executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11363) Prewarm Hive on Spark containers [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650174#comment-14650174 ] Lefty Leverenz commented on HIVE-11363: --- Doc note: This changes the descriptions of two configuration parameters (*hive.prewarm.enabled* and *hive.prewarm.numcontainers*) in HiveConf.java, removing the words for Tez -- the parameters are currently documented in the Tez section of Configuration Properties. Question: Should they be kept in the Tez section and also added to the Spark section? (Alternatively, they could go in the general section with crossreferences in the Tez and Spark sections.) * [Configuration Properties -- Tez -- hive.prewarm.enabled | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.prewarm.enabled] * [Configuration Properties -- Tez -- hive.prewarm.numcontainers | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.prewarm.numcontainers] * [Configuration Properties -- Spark | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Spark] They should also be documented in Hive on Spark: Getting Started. * [Hive on Spark: Getting Started -- Configuring Hive | https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started#HiveonSpark:GettingStarted-ConfiguringHive] However, when the Spark branch was merged to master (7/30/2015) the commit for this issue seems to have used an earlier patch, not patch 5 -- it creates two new parameters (*hive.spark.prewarm.containers* *hive.spark.prewarm.num.containers*). That needs to be sorted out. See HIVE-10166 and commit 537114b964c71b7a5cd00c9938eadc6d0cf76536. Prewarm Hive on Spark containers [Spark Branch] --- Key: HIVE-11363 URL: https://issues.apache.org/jira/browse/HIVE-11363 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.1.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Labels: TODOC-SPARK Fix For: spark-branch Attachments: HIVE-11363.1-spark.patch, HIVE-11363.2-spark.patch, HIVE-11363.3-spark.patch, HIVE-11363.4-spark.patch, HIVE-11363.5-spark.patch When Hive job is launched by Oozie, a Hive session is created and job script is executed. Session is closed when Hive job is completed. Thus, Hive session is not shared among Hive jobs either in an Oozie workflow or across workflows. Since the parallelism of a Hive job executed on Spark is impacted by the available executors, such Hive jobs will suffer the executor ramp-up overhead. The idea here is to wait a bit so that enough executors can come up before a job can be executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11433) NPE for a multiple inner join query
[ https://issues.apache.org/jira/browse/HIVE-11433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-11433: --- Attachment: HIVE-11433.patch Attached patch, which adds a null check in the offensive code, seems makes explain query pass. The explain query will trigger the NPE w/o the patch. However harmless, I'm not sure whether this is the only problem or the fix is the best. Your feedback is greatly welcome. NPE for a multiple inner join query --- Key: HIVE-11433 URL: https://issues.apache.org/jira/browse/HIVE-11433 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0, 1.1.0, 2.0.0 Reporter: Xuefu Zhang Attachments: HIVE-11433.patch NullPointException is thrown for query that has multiple (greater than 3) inner joins. Stacktrace for 1.1.0 {code} NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.ParseUtils.getIndex(ParseUtils.java:149) at org.apache.hadoop.hive.ql.parse.ParseUtils.checkJoinFilterRefersOneAlias(ParseUtils.java:166) at org.apache.hadoop.hive.ql.parse.ParseUtils.checkJoinFilterRefersOneAlias(ParseUtils.java:185) at org.apache.hadoop.hive.ql.parse.ParseUtils.checkJoinFilterRefersOneAlias(ParseUtils.java:185) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.mergeJoins(SemanticAnalyzer.java:8257) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.mergeJoinTree(SemanticAnalyzer.java:8422) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9805) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9714) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10150) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10161) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10078) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1104) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:101) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:172) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:257) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:386) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:373) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:271) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:486) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code}. However, the problem can also be reproduced in latest master branch. Further investigation shows that the following code (in ParseUtils.java) is problematic: {code} static int getIndex(String[] list, String elem) { for(int i=0; i list.length; i++) { if (list[i].toLowerCase().equals(elem)) { return i; } } return -1; } {code} The code assumes that every element in the list is not null, which isn't true because of the following code in SemanticAnalyzer.java (method genJoinTree()): {code} if ((right.getToken().getType() == HiveParser.TOK_TABREF) || (right.getToken().getType() == HiveParser.TOK_SUBQUERY) || (right.getToken().getType() == HiveParser.TOK_PTBLFUNCTION)) { String tableName =
[jira] [Commented] (HIVE-10166) Merge Spark branch to master 7/30/2015
[ https://issues.apache.org/jira/browse/HIVE-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650178#comment-14650178 ] Lefty Leverenz commented on HIVE-10166: --- Doc note: This creates four configuration parameters, two for dynamic partition pruning from HIVE-9152 and two for prewarming containers from HIVE-11363. See those issues for doc requirements. Problem: HIVE-11363's patches first introduced two configuration parameters, but then patch 5 (the final one) got rid of them and reused two existing parameters. This merge patch doesn't have the reused parameters. See commit 537114b964c71b7a5cd00c9938eadc6d0cf76536 and the doc note on HIVE-11363. * [doc note on HIVE-11363 | https://issues.apache.org/jira/browse/HIVE-11363?focusedCommentId=14650174page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14650174] Merge Spark branch to master 7/30/2015 -- Key: HIVE-10166 URL: https://issues.apache.org/jira/browse/HIVE-10166 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.1.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 2.0.0 Attachments: HIVE-10166.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11363) Prewarm Hive on Spark containers [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650177#comment-14650177 ] Lefty Leverenz commented on HIVE-11363: --- I see the commit to Spark branch also created the new parameters instead of reusing the old ones, which explains why the merge to master did the same. See commit 537114b964c71b7a5cd00c9938eadc6d0cf76536. Was there a decision not to reuse the old parameters? Prewarm Hive on Spark containers [Spark Branch] --- Key: HIVE-11363 URL: https://issues.apache.org/jira/browse/HIVE-11363 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.1.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Labels: TODOC-SPARK Fix For: spark-branch Attachments: HIVE-11363.1-spark.patch, HIVE-11363.2-spark.patch, HIVE-11363.3-spark.patch, HIVE-11363.4-spark.patch, HIVE-11363.5-spark.patch When Hive job is launched by Oozie, a Hive session is created and job script is executed. Session is closed when Hive job is completed. Thus, Hive session is not shared among Hive jobs either in an Oozie workflow or across workflows. Since the parallelism of a Hive job executed on Spark is impacted by the available executors, such Hive jobs will suffer the executor ramp-up overhead. The idea here is to wait a bit so that enough executors can come up before a job can be executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10166) Merge Spark branch to master 7/30/2015
[ https://issues.apache.org/jira/browse/HIVE-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650185#comment-14650185 ] Xuefu Zhang commented on HIVE-10166: Thank you very much, [~leftylev]. You are wonderful in catching up the docs and finding problems. I will take a look about the configuration and fix it as necessary. Merge Spark branch to master 7/30/2015 -- Key: HIVE-10166 URL: https://issues.apache.org/jira/browse/HIVE-10166 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.1.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 2.0.0 Attachments: HIVE-10166.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11434) Followup for HIVE-10166: reuse existing configurations for prewarming Spark executors
[ https://issues.apache.org/jira/browse/HIVE-11434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650430#comment-14650430 ] Hive QA commented on HIVE-11434: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12748309/HIVE-11434.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4786/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4786/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4786/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: java.io.IOException: Could not create /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4786/succeeded/TestHiveServer2 {noformat} This message is automatically generated. ATTACHMENT ID: 12748309 - PreCommit-HIVE-TRUNK-Build Followup for HIVE-10166: reuse existing configurations for prewarming Spark executors - Key: HIVE-11434 URL: https://issues.apache.org/jira/browse/HIVE-11434 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 2.0.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-11434.patch It appears that the patch other than the latest from HIVE-11363 was committed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11413) Error in detecting availability of HiveSemanticAnalyzerHooks
[ https://issues.apache.org/jira/browse/HIVE-11413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650324#comment-14650324 ] Xuefu Zhang commented on HIVE-11413: +1 Error in detecting availability of HiveSemanticAnalyzerHooks Key: HIVE-11413 URL: https://issues.apache.org/jira/browse/HIVE-11413 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 2.0.0 Reporter: Raajay Viswanathan Assignee: Raajay Viswanathan Priority: Trivial Labels: newbie Fix For: 2.0.0 Attachments: HIVE-11413.patch In {{compile(String, Boolean)}} function in {{Driver.java}}, the list of available {{HiveSemanticAnalyzerHook}} (_saHooks_) are obtained using the {{getHooks}} method. This method always returns a {{List}} of hooks. However, while checking for availability of hooks, the current version of the code uses a comparison of _saHooks_ with NULL. This is incorrect, as the segment of code designed to call pre and post Analyze functions gets executed even when the list is empty. The comparison should be changed to {{saHooks.size() 0}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11434) Followup for HIVE-10166: reuse existing configurations for prewarming Spark executors
[ https://issues.apache.org/jira/browse/HIVE-11434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-11434: --- Description: It appears that the patch other than the latest from HIVE-11363 was committed. (was: It appears that the patch other than the latest from HIVE- was committed.) Followup for HIVE-10166: reuse existing configurations for prewarming Spark executors - Key: HIVE-11434 URL: https://issues.apache.org/jira/browse/HIVE-11434 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 2.0.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang It appears that the patch other than the latest from HIVE-11363 was committed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11434) Followup for HIVE-10166: reuse existing configurations for prewarming Spark executors
[ https://issues.apache.org/jira/browse/HIVE-11434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-11434: --- Attachment: HIVE-11434.patch Attached patch fixed the issue. In addition, it adjusted the max wait time for warming up to 30s from 60s. Followup for HIVE-10166: reuse existing configurations for prewarming Spark executors - Key: HIVE-11434 URL: https://issues.apache.org/jira/browse/HIVE-11434 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 2.0.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-11434.patch It appears that the patch other than the latest from HIVE-11363 was committed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11430) Followup HIVE-10166: investigate and fix the two test failures
[ https://issues.apache.org/jira/browse/HIVE-11430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650378#comment-14650378 ] Hive QA commented on HIVE-11430: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12748308/HIVE-11430.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4785/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4785/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4785/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: java.io.IOException: Could not create /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4785/succeeded/TestFolderPermissions {noformat} This message is automatically generated. ATTACHMENT ID: 12748308 - PreCommit-HIVE-TRUNK-Build Followup HIVE-10166: investigate and fix the two test failures -- Key: HIVE-11430 URL: https://issues.apache.org/jira/browse/HIVE-11430 Project: Hive Issue Type: Bug Components: Test Affects Versions: 2.0.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-11430.patch {code} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_convert_enum_to_string org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_rdd_cache {code} As show in https://issues.apache.org/jira/browse/HIVE-10166?focusedCommentId=14649066page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14649066. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11430) Followup HIVE-10166: investigate and fix the two test failures
[ https://issues.apache.org/jira/browse/HIVE-11430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-11430: --- Attachment: HIVE-11430.patch Followup HIVE-10166: investigate and fix the two test failures -- Key: HIVE-11430 URL: https://issues.apache.org/jira/browse/HIVE-11430 Project: Hive Issue Type: Bug Components: Test Affects Versions: 2.0.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-11430.patch {code} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_convert_enum_to_string org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_rdd_cache {code} As show in https://issues.apache.org/jira/browse/HIVE-10166?focusedCommentId=14649066page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14649066. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10166) Merge Spark branch to master 7/30/2015
[ https://issues.apache.org/jira/browse/HIVE-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650319#comment-14650319 ] Xuefu Zhang commented on HIVE-10166: I created HIVE-11434 to fix the configuration problem. Merge Spark branch to master 7/30/2015 -- Key: HIVE-10166 URL: https://issues.apache.org/jira/browse/HIVE-10166 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.1.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 2.0.0 Attachments: HIVE-10166.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10594) Remote Spark client doesn't use Kerberos keytab to authenticate [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650577#comment-14650577 ] Xuefu Zhang commented on HIVE-10594: Merged to master and cherry-picked to branch-1. Remote Spark client doesn't use Kerberos keytab to authenticate [Spark Branch] -- Key: HIVE-10594 URL: https://issues.apache.org/jira/browse/HIVE-10594 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.1.0 Reporter: Chao Sun Assignee: Xuefu Zhang Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-10594.1-spark.patch Reporting problem found by one of the HoS users: Currently, if user is running Beeline on a different host than HS2, and he/she didn't do kinit on the HS2 host, then he/she may get the following error: {code} 2015-04-29 15:49:34,614 INFO org.apache.hive.spark.client.SparkClientImpl: 15/04/29 15:49:34 WARN UserGroupInformation: PriviledgedActionException as:hive (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2015-04-29 15:49:34,652 INFO org.apache.hive.spark.client.SparkClientImpl: Exception in thread main java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: secure-hos-1.ent.cloudera.com/10.20.77.79; destination host is: secure-hos-1.ent.cloudera.com:8032; 2015-04-29 15:49:34,653 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) 2015-04-29 15:49:34,653 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.ipc.Client.call(Client.java:1472) 2015-04-29 15:49:34,654 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.ipc.Client.call(Client.java:1399) 2015-04-29 15:49:34,654 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) 2015-04-29 15:49:34,654 INFO org.apache.hive.spark.client.SparkClientImpl: at com.sun.proxy.$Proxy11.getClusterMetrics(Unknown Source) 2015-04-29 15:49:34,655 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:202) 2015-04-29 15:49:34,655 INFO org.apache.hive.spark.client.SparkClientImpl: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2015-04-29 15:49:34,655 INFO org.apache.hive.spark.client.SparkClientImpl: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 2015-04-29 15:49:34,656 INFO org.apache.hive.spark.client.SparkClientImpl: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2015-04-29 15:49:34,656 INFO org.apache.hive.spark.client.SparkClientImpl: at java.lang.reflect.Method.invoke(Method.java:606) 2015-04-29 15:49:34,656 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at com.sun.proxy.$Proxy12.getClusterMetrics(Unknown Source) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:461) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:91) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:91) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.spark.Logging$class.logInfo(Logging.scala:59) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:49) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:90)
[jira] [Updated] (HIVE-8342) Potential null dereference in ColumnTruncateMapper#jobClose()
[ https://issues.apache.org/jira/browse/HIVE-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8342: - Description: {code} Utilities.mvFileToFinalPath(outputPath, job, success, LOG, dynPartCtx, null, reporter); {code} Utilities.mvFileToFinalPath() calls createEmptyBuckets() where conf is dereferenced: {code} boolean isCompressed = conf.getCompressed(); TableDesc tableInfo = conf.getTableInfo(); {code} was: {code} Utilities.mvFileToFinalPath(outputPath, job, success, LOG, dynPartCtx, null, reporter); {code} Utilities.mvFileToFinalPath() calls createEmptyBuckets() where conf is dereferenced: {code} boolean isCompressed = conf.getCompressed(); TableDesc tableInfo = conf.getTableInfo(); {code} Potential null dereference in ColumnTruncateMapper#jobClose() - Key: HIVE-8342 URL: https://issues.apache.org/jira/browse/HIVE-8342 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: skrho Priority: Minor Attachments: HIVE-8342_001.patch, HIVE-8342_002.patch {code} Utilities.mvFileToFinalPath(outputPath, job, success, LOG, dynPartCtx, null, reporter); {code} Utilities.mvFileToFinalPath() calls createEmptyBuckets() where conf is dereferenced: {code} boolean isCompressed = conf.getCompressed(); TableDesc tableInfo = conf.getTableInfo(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10989) HoS can't control number of map tasks for runtime skew join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10989: --- Fix Version/s: 2.0.0 1.3.0 HoS can't control number of map tasks for runtime skew join [Spark Branch] -- Key: HIVE-10989 URL: https://issues.apache.org/jira/browse/HIVE-10989 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-10989.1-spark.patch Flags {{hive.skewjoin.mapjoin.map.tasks}} and {{hive.skewjoin.mapjoin.min.split}} are used to control the number of map tasks for the map join of runtime skew join. They work well for MR but have no effect for spark. This makes runtime skew join less useful, i.e. we just end up with slow mappers instead of reducers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650579#comment-14650579 ] Xuefu Zhang commented on HIVE-10999: Merged to master and cherry-picked to branch-1. Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-10999.1-spark.patch, HIVE-10999.2-spark.patch, HIVE-10999.3-spark.patch, HIVE-10999.3-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10844) Combine equivalent Works for HoS[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10844: --- Fix Version/s: 2.0.0 1.3.0 Combine equivalent Works for HoS[Spark Branch] -- Key: HIVE-10844 URL: https://issues.apache.org/jira/browse/HIVE-10844 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-10844.1-spark.patch, HIVE-10844.2-spark.patch, HIVE-10844.3-spark.patch Some Hive queries(like [TPCDS Q39|https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-tpcds/query39.sql]) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10999: --- Fix Version/s: 2.0.0 1.3.0 Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-10999.1-spark.patch, HIVE-10999.2-spark.patch, HIVE-10999.3-spark.patch, HIVE-10999.3-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10844) Combine equivalent Works for HoS[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650580#comment-14650580 ] Xuefu Zhang commented on HIVE-10844: Merged to master and cherry-picked to branch-1. Combine equivalent Works for HoS[Spark Branch] -- Key: HIVE-10844 URL: https://issues.apache.org/jira/browse/HIVE-10844 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-10844.1-spark.patch, HIVE-10844.2-spark.patch, HIVE-10844.3-spark.patch Some Hive queries(like [TPCDS Q39|https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-tpcds/query39.sql]) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11108) HashTableSinkOperator doesn't support vectorization [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-11108: --- Fix Version/s: 2.0.0 1.3.0 HashTableSinkOperator doesn't support vectorization [Spark Branch] -- Key: HIVE-11108 URL: https://issues.apache.org/jira/browse/HIVE-11108 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-11108.1-spark.patch, HIVE-11108.2-spark.patch This prevents any BaseWork containing HTS from being vectorized. It's basically specific to spark, because Tez doesn't use HTS and MR runs HTS in local tasks. We should verify if it makes sense to make HTS support vectorization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11138) Query fails when there isn't a comparator for an operator [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-11138: --- Fix Version/s: 2.0.0 1.3.0 Query fails when there isn't a comparator for an operator [Spark Branch] Key: HIVE-11138 URL: https://issues.apache.org/jira/browse/HIVE-11138 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-11138.1-spark.patch In such case, OperatorComparatorFactory should default to false instead of throw exceptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11108) HashTableSinkOperator doesn't support vectorization [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650591#comment-14650591 ] Xuefu Zhang commented on HIVE-11108: Merged to master and cherry-picked to branch-1. HashTableSinkOperator doesn't support vectorization [Spark Branch] -- Key: HIVE-11108 URL: https://issues.apache.org/jira/browse/HIVE-11108 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-11108.1-spark.patch, HIVE-11108.2-spark.patch This prevents any BaseWork containing HTS from being vectorized. It's basically specific to spark, because Tez doesn't use HTS and MR runs HTS in local tasks. We should verify if it makes sense to make HTS support vectorization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11138) Query fails when there isn't a comparator for an operator [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650592#comment-14650592 ] Xuefu Zhang commented on HIVE-11138: Merged to master and cherry-picked to branch-1. Query fails when there isn't a comparator for an operator [Spark Branch] Key: HIVE-11138 URL: https://issues.apache.org/jira/browse/HIVE-11138 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-11138.1-spark.patch In such case, OperatorComparatorFactory should default to false instead of throw exceptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10855) Make HIVE-10568 work with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10855: --- Fix Version/s: 2.0.0 1.3.0 Make HIVE-10568 work with Spark [Spark Branch] -- Key: HIVE-10855 URL: https://issues.apache.org/jira/browse/HIVE-10855 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-10855.1-spark.patch, HIVE-10855.2-spark.patch, HIVE-10855.3-spark.patch HIVE-10568 only works with Tez. It's good to make it also work for Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11109) Replication factor is not properly set in SparkHashTableSinkOperator [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-11109: --- Fix Version/s: 2.0.0 1.3.0 Replication factor is not properly set in SparkHashTableSinkOperator [Spark Branch] --- Key: HIVE-11109 URL: https://issues.apache.org/jira/browse/HIVE-11109 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Trivial Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-11109.1-spark.patch The replication factor only gets set in some abnormal cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11082) Support multi edge between nodes in SparkPlan[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650586#comment-14650586 ] Xuefu Zhang commented on HIVE-11082: Merged to master and cherry-picked to branch-1. Support multi edge between nodes in SparkPlan[Spark Branch] --- Key: HIVE-11082 URL: https://issues.apache.org/jira/browse/HIVE-11082 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-11082.1-spark.patch, HIVE-11082.2-spark.patch, HIVE-11082.3-spark.patch For Dynamic RDD caching optimization, we found SparkPlan::connect throw exception while we try to combine 2 works with same child, support multi edge between nodes in SparkPlan would help to enable dynamic RDD caching in more use cases, like self join and self union. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11053) Add more tests for HIVE-10844[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650584#comment-14650584 ] Xuefu Zhang commented on HIVE-11053: Merged to master and cherry-picked to branch-1. Add more tests for HIVE-10844[Spark Branch] --- Key: HIVE-11053 URL: https://issues.apache.org/jira/browse/HIVE-11053 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chengxiang Li Assignee: GaoLun Priority: Minor Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-11053.1-spark.patch, HIVE-11053.2-spark.patch, HIVE-11053.3-spark.patch, HIVE-11053.4-spark.patch, HIVE-11053.5-spark.patch, HIVE-11053.5-spark.patch Add some test cases for self union, self-join, CWE, and repeated sub-queries to verify the job of combining quivalent works in HIVE-10844. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11182) Enable optimized hash tables for spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-11182: --- Fix Version/s: 2.0.0 1.3.0 Enable optimized hash tables for spark [Spark Branch] - Key: HIVE-11182 URL: https://issues.apache.org/jira/browse/HIVE-11182 Project: Hive Issue Type: Improvement Components: Spark Reporter: Rui Li Assignee: Rui Li Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-11182.1-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11109) Replication factor is not properly set in SparkHashTableSinkOperator [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650583#comment-14650583 ] Xuefu Zhang commented on HIVE-11109: Merged to master and cherry-picked to branch-1. Replication factor is not properly set in SparkHashTableSinkOperator [Spark Branch] --- Key: HIVE-11109 URL: https://issues.apache.org/jira/browse/HIVE-11109 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Trivial Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-11109.1-spark.patch The replication factor only gets set in some abnormal cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11053) Add more tests for HIVE-10844[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-11053: --- Fix Version/s: 2.0.0 1.3.0 Add more tests for HIVE-10844[Spark Branch] --- Key: HIVE-11053 URL: https://issues.apache.org/jira/browse/HIVE-11053 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chengxiang Li Assignee: GaoLun Priority: Minor Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-11053.1-spark.patch, HIVE-11053.2-spark.patch, HIVE-11053.3-spark.patch, HIVE-11053.4-spark.patch, HIVE-11053.5-spark.patch, HIVE-11053.5-spark.patch Add some test cases for self union, self-join, CWE, and repeated sub-queries to verify the job of combining quivalent works in HIVE-10844. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11099) Add support for running negative q-tests [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650582#comment-14650582 ] Xuefu Zhang commented on HIVE-11099: Merged to master and cherry-picked to branch-1. Add support for running negative q-tests [Spark Branch] --- Key: HIVE-11099 URL: https://issues.apache.org/jira/browse/HIVE-11099 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-11099-1-spark.patch, HIVE-11099-spark.patch, HIVE-11099.1-spark.patch, HIVE-11099.2-spark.patch Add support for TestSparkNegativeCliDriver TestMiniSparkOnYarnNegativeCliDriver to negative q-tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11099) Add support for running negative q-tests [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-11099: --- Fix Version/s: 2.0.0 1.3.0 Add support for running negative q-tests [Spark Branch] --- Key: HIVE-11099 URL: https://issues.apache.org/jira/browse/HIVE-11099 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-11099-1-spark.patch, HIVE-11099-spark.patch, HIVE-11099.1-spark.patch, HIVE-11099.2-spark.patch Add support for TestSparkNegativeCliDriver TestMiniSparkOnYarnNegativeCliDriver to negative q-tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11314) Print Execution completed successfully as part of spark job info [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-11314: --- Fix Version/s: 2.0.0 1.3.0 Print Execution completed successfully as part of spark job info [Spark Branch] - Key: HIVE-11314 URL: https://issues.apache.org/jira/browse/HIVE-11314 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.1.0 Reporter: Xuefu Zhang Assignee: Ferdinand Xu Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-11314-spark.patch Like Hive on MR, Hive on Spark should print Execution completed successfully as part of the spark job info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9152) Dynamic Partition Pruning [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9152: -- Fix Version/s: 2.0.0 1.3.0 Dynamic Partition Pruning [Spark Branch] Key: HIVE-9152 URL: https://issues.apache.org/jira/browse/HIVE-9152 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Sun Labels: TODOC-SPARK Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-9152.1-spark.patch, HIVE-9152.10-spark.patch, HIVE-9152.11-spark.patch, HIVE-9152.12-spark.patch, HIVE-9152.2-spark.patch, HIVE-9152.3-spark.patch, HIVE-9152.4-spark.patch, HIVE-9152.5-spark.patch, HIVE-9152.6-spark.patch, HIVE-9152.8-spark.patch, HIVE-9152.9-spark.patch Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11363) Prewarm Hive on Spark containers [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-11363: --- Fix Version/s: 2.0.0 1.3.0 Prewarm Hive on Spark containers [Spark Branch] --- Key: HIVE-11363 URL: https://issues.apache.org/jira/browse/HIVE-11363 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.1.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Labels: TODOC-SPARK Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-11363.1-spark.patch, HIVE-11363.2-spark.patch, HIVE-11363.3-spark.patch, HIVE-11363.4-spark.patch, HIVE-11363.5-spark.patch When Hive job is launched by Oozie, a Hive session is created and job script is executed. Session is closed when Hive job is completed. Thus, Hive session is not shared among Hive jobs either in an Oozie workflow or across workflows. Since the parallelism of a Hive job executed on Spark is impacted by the available executors, such Hive jobs will suffer the executor ramp-up overhead. The idea here is to wait a bit so that enough executors can come up before a job can be executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9152) Dynamic Partition Pruning [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650589#comment-14650589 ] Xuefu Zhang commented on HIVE-9152: --- Merged to master and cherry-picked to branch-1. Dynamic Partition Pruning [Spark Branch] Key: HIVE-9152 URL: https://issues.apache.org/jira/browse/HIVE-9152 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Sun Labels: TODOC-SPARK Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-9152.1-spark.patch, HIVE-9152.10-spark.patch, HIVE-9152.11-spark.patch, HIVE-9152.12-spark.patch, HIVE-9152.2-spark.patch, HIVE-9152.3-spark.patch, HIVE-9152.4-spark.patch, HIVE-9152.5-spark.patch, HIVE-9152.6-spark.patch, HIVE-9152.8-spark.patch, HIVE-9152.9-spark.patch Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11314) Print Execution completed successfully as part of spark job info [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650590#comment-14650590 ] Xuefu Zhang commented on HIVE-11314: Merged to master and cherry-picked to branch-1. Print Execution completed successfully as part of spark job info [Spark Branch] - Key: HIVE-11314 URL: https://issues.apache.org/jira/browse/HIVE-11314 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.1.0 Reporter: Xuefu Zhang Assignee: Ferdinand Xu Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-11314-spark.patch Like Hive on MR, Hive on Spark should print Execution completed successfully as part of the spark job info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11363) Prewarm Hive on Spark containers [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650587#comment-14650587 ] Xuefu Zhang commented on HIVE-11363: Merged to master and cherry-picked to branch-1. Prewarm Hive on Spark containers [Spark Branch] --- Key: HIVE-11363 URL: https://issues.apache.org/jira/browse/HIVE-11363 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.1.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Labels: TODOC-SPARK Fix For: spark-branch, 1.3.0, 2.0.0 Attachments: HIVE-11363.1-spark.patch, HIVE-11363.2-spark.patch, HIVE-11363.3-spark.patch, HIVE-11363.4-spark.patch, HIVE-11363.5-spark.patch When Hive job is launched by Oozie, a Hive session is created and job script is executed. Session is closed when Hive job is completed. Thus, Hive session is not shared among Hive jobs either in an Oozie workflow or across workflows. Since the parallelism of a Hive job executed on Spark is impacted by the available executors, such Hive jobs will suffer the executor ramp-up overhead. The idea here is to wait a bit so that enough executors can come up before a job can be executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11432) Hive macro give same result for different arguments
[ https://issues.apache.org/jira/browse/HIVE-11432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650593#comment-14650593 ] Hive QA commented on HIVE-11432: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12748337/HIVE-11432.01.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9305 tests executed *Failed tests:* {noformat} TestCliDriver-udf_md5.q-join18.q-union11.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_convert_enum_to_string org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_rdd_cache org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4789/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4789/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4789/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12748337 - PreCommit-HIVE-TRUNK-Build Hive macro give same result for different arguments --- Key: HIVE-11432 URL: https://issues.apache.org/jira/browse/HIVE-11432 Project: Hive Issue Type: Bug Reporter: Jay Pandya Assignee: Pengcheng Xiong Attachments: HIVE-11432.01.patch If you use hive macro more than once while processing same row, hive returns same result for all invocations even if the argument are different. Example : CREATE TABLE macro_testing( a int, b int, c int) select * from macro_testing; 1 2 3 4 5 6 7 8 9 1011 12 create temporary macro math_square(x int) x*x; select math_square(a), b, math_square(c) from macro_testing; 9 2 9 365 36 818 81 144 11 144 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11416) CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY
[ https://issues.apache.org/jira/browse/HIVE-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650574#comment-14650574 ] Hive QA commented on HIVE-11416: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12748336/HIVE-11416.02.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4788/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4788/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4788/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: java.io.IOException: Could not create /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4788/succeeded/TestPasswordWithCredentialProvider {noformat} This message is automatically generated. ATTACHMENT ID: 12748336 - PreCommit-HIVE-TRUNK-Build CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY -- Key: HIVE-11416 URL: https://issues.apache.org/jira/browse/HIVE-11416 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11416.01.patch, HIVE-11416.02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11416) CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY
[ https://issues.apache.org/jira/browse/HIVE-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11416: --- Attachment: HIVE-11416.03.patch CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY -- Key: HIVE-11416 URL: https://issues.apache.org/jira/browse/HIVE-11416 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11416.01.patch, HIVE-11416.02.patch, HIVE-11416.03.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11430) Followup HIVE-10166: investigate and fix the two test failures
[ https://issues.apache.org/jira/browse/HIVE-11430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650306#comment-14650306 ] Xuefu Zhang commented on HIVE-11430: 1. Diff on convert_enum_to_string.q seems due to thrift version change. Since there were on other failures, it might be okay. However, we have to keep an eye on it, in case of any incompatibility that might arise. 2. dynamic_rdd_cache.q was merged from the branch, and test output needs to be updated due to recent master changes. Followup HIVE-10166: investigate and fix the two test failures -- Key: HIVE-11430 URL: https://issues.apache.org/jira/browse/HIVE-11430 Project: Hive Issue Type: Bug Components: Test Affects Versions: 2.0.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang {code} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_convert_enum_to_string org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_rdd_cache {code} As show in https://issues.apache.org/jira/browse/HIVE-10166?focusedCommentId=14649066page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14649066. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11087) DbTxnManager exceptions should include txnid
[ https://issues.apache.org/jira/browse/HIVE-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650532#comment-14650532 ] Hive QA commented on HIVE-11087: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12748325/HIVE-11087.2.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4787/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4787/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4787/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: java.io.IOException: Could not create /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4787/succeeded/TestOperationLoggingAPIWithMr {noformat} This message is automatically generated. ATTACHMENT ID: 12748325 - PreCommit-HIVE-TRUNK-Build DbTxnManager exceptions should include txnid Key: HIVE-11087 URL: https://issues.apache.org/jira/browse/HIVE-11087 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11087.2.patch, HIVE-11087.patch must include txnid in the exception so that user visible error can be correlated with log file info -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11416) CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY
[ https://issues.apache.org/jira/browse/HIVE-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11416: --- Attachment: HIVE-11416.02.patch address test case failure CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY -- Key: HIVE-11416 URL: https://issues.apache.org/jira/browse/HIVE-11416 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11416.01.patch, HIVE-11416.02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11432) Hive macro give same result for different arguments
[ https://issues.apache.org/jira/browse/HIVE-11432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong reassigned HIVE-11432: -- Assignee: Pengcheng Xiong Hive macro give same result for different arguments --- Key: HIVE-11432 URL: https://issues.apache.org/jira/browse/HIVE-11432 Project: Hive Issue Type: Bug Reporter: Jay Pandya Assignee: Pengcheng Xiong If you use hive macro more than once while processing same row, hive returns same result for all invocations even if the argument are different. Example : CREATE TABLE macro_testing( a int, b int, c int) select * from macro_testing; 1 2 3 4 5 6 7 8 9 1011 12 create temporary macro math_square(x int) x*x; select math_square(a), b, math_square(c) from macro_testing; 9 2 9 365 36 818 81 144 11 144 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11432) Hive macro give same result for different arguments
[ https://issues.apache.org/jira/browse/HIVE-11432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11432: --- Attachment: HIVE-11432.01.patch [~mendax], could you please try the patch and see if the problem is resolved? Thanks. Hive macro give same result for different arguments --- Key: HIVE-11432 URL: https://issues.apache.org/jira/browse/HIVE-11432 Project: Hive Issue Type: Bug Reporter: Jay Pandya Assignee: Pengcheng Xiong Attachments: HIVE-11432.01.patch If you use hive macro more than once while processing same row, hive returns same result for all invocations even if the argument are different. Example : CREATE TABLE macro_testing( a int, b int, c int) select * from macro_testing; 1 2 3 4 5 6 7 8 9 1011 12 create temporary macro math_square(x int) x*x; select math_square(a), b, math_square(c) from macro_testing; 9 2 9 365 36 818 81 144 11 144 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11087) DbTxnManager exceptions should include txnid
[ https://issues.apache.org/jira/browse/HIVE-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-11087: -- Attachment: HIVE-11087.2.patch DbTxnManager exceptions should include txnid Key: HIVE-11087 URL: https://issues.apache.org/jira/browse/HIVE-11087 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11087.2.patch, HIVE-11087.patch must include txnid in the exception so that user visible error can be correlated with log file info -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11397) Parse Hive OR clauses as they are written into the AST
[ https://issues.apache.org/jira/browse/HIVE-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650249#comment-14650249 ] Jesus Camacho Rodriguez commented on HIVE-11397: I checked the code and it shouldn't be that difficult to do... Then we need to explore how to actually exploit it (taking into account the Calcite translation, etc). We could check in this one, and create a bigger patch exploring the multiple inputs idea in HIVE-11398. Parse Hive OR clauses as they are written into the AST -- Key: HIVE-11397 URL: https://issues.apache.org/jira/browse/HIVE-11397 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11397.1.patch, HIVE-11397.patch When parsing A OR B OR C, hive converts it into (C OR B) OR A instead of turning it into A OR (B OR C) {code} GenericUDFOPOr or = new GenericUDFOPOr(); ListExprNodeDesc expressions = new ArrayListExprNodeDesc(2); expressions.add(previous); expressions.add(current); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)