[jira] [Updated] (HIVE-9120) Hive Query log does not work when hive.exec.parallel is true
[ https://issues.apache.org/jira/browse/HIVE-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Chen updated HIVE-9120: Attachment: HIVE-9120.patch My pleasure. Also thank [~chengxiang li] for finding this problem! Patch attached. Changes are: * Move OperationLog from hive-service to hive-exec, in order to avoid maven cyclic dependency of these 2 module * reset it in Driver and TaskRunner if parallel * related changes caused by the movement Hive Query log does not work when hive.exec.parallel is true Key: HIVE-9120 URL: https://issues.apache.org/jira/browse/HIVE-9120 Project: Hive Issue Type: Bug Components: HiveServer2, Logging Reporter: Dong Chen Assignee: Dong Chen Attachments: HIVE-9120.patch When hive.exec.parallel is true, the query log is not saved and Beeline can not retrieve it. When parallel, Driver.launchTask() may run the task in a new thread if other conditions are also on. TaskRunner.start() is invoked instead of TaskRunner.runSequential(). This cause the threadlocal variable OperationLog to be null and query logs are not logged. The OperationLog object should be set in the new thread in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9141) HiveOnTez: mix of union all, distinct, group by generates error
[ https://issues.apache.org/jira/browse/HIVE-9141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249615#comment-14249615 ] Vikram Dixit K commented on HIVE-9141: -- [~pxiong] I was able to run the query you mentioned above only getting a diff in the result. Can you try again with the latest changes and see it it works? HiveOnTez: mix of union all, distinct, group by generates error --- Key: HIVE-9141 URL: https://issues.apache.org/jira/browse/HIVE-9141 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Here is the way to produce it: in Hive q test setting (with src table) set hive.execution.engine=tez; SELECT key, value FROM ( SELECT key, value FROM src UNION ALL SELECT key, key as value FROM ( SELECT distinct key FROM ( SELECT key, value FROM (SELECT key, value FROM src UNION ALL SELECT key, value FROM src )t1 group by key, value )t2 )t3 )t4 group by key, value; will generate 2014-12-16 23:19:13,593 ERROR ql.Driver (SessionState.java:printError(834)) - FAILED: ClassCastException org.apache.hadoop.hive.ql.plan.MapWork cannot be cast to org.apache.hadoop.hive.ql.plan.ReduceWork java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.MapWork cannot be cast to org.apache.hadoop.hive.ql.plan.ReduceWork at org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:361) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69) at org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:419) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1107) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1155) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1044) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1034) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:206) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:158) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:369) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:304) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:834) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:136) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_uniontez2(TestMiniTezCliDriver.java:120) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9094) TimeoutException when trying get executor count from RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249620#comment-14249620 ] Hive QA commented on HIVE-9094: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687682/HIVE-9094.1-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 7236 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/560/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/560/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-560/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687682 - PreCommit-HIVE-SPARK-Build TimeoutException when trying get executor count from RSC [Spark Branch] --- Key: HIVE-9094 URL: https://issues.apache.org/jira/browse/HIVE-9094 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Attachments: HIVE-9094.1-spark.patch In http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/532/testReport, join25.q failed because: {code} 2014-12-12 19:14:50,084 ERROR [main]: ql.Driver (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get spark memory/core info: java.util.concurrent.TimeoutException org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark memory/core info: java.util.concurrent.TimeoutException at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837) at org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234) at org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25(TestSparkCliDriver.java:162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
[jira] [Commented] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance
[ https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249631#comment-14249631 ] Hive QA commented on HIVE-9127: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687603/HIVE-9127.3.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6713 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2103/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2103/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2103/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687603 - PreCommit-HIVE-TRUNK-Build Improve CombineHiveInputFormat.getSplit performance --- Key: HIVE-9127 URL: https://issues.apache.org/jira/browse/HIVE-9127 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9127.1-spark.patch.txt, HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. Caller ST: {noformat} 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.ShuffleDependency.init(Dependency.scala:79) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at
[jira] [Commented] (HIVE-8809) Activate maven profile hadoop-2 by default
[ https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249653#comment-14249653 ] Lefty Leverenz commented on HIVE-8809: -- Doc note: This will require some documentation changes. The 'mvn clean install' command occurs 15 times in the wiki, and 'mvn' occurs 43 times. The string '-Phadoop-1' occurs 24 times and '-Phadoop-2' occurs 12 times. They should all be reviewed for possible revisions, with version notes. These docs contain 'mvn' with '-Phadoop-1' or '-Phadoop-2' (or '-Podbc,hadoop-1'): * Getting Started * Hive Developer FAQ * Hive ODBC * How To Contribute * How To Release Activate maven profile hadoop-2 by default -- Key: HIVE-8809 URL: https://issues.apache.org/jira/browse/HIVE-8809 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Minor Attachments: HIVE-8809.01.patch, HIVE-8809.1.patch, dep_itests_with_hadoop_2.txt, dep_itests_without_hadoop_2.txt, dep_with_hadoop_2.txt, dep_without_hadoop_2.txt For every maven command profile needs to be specified explicitly. It will be better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 profile. With this change both the following commands will be equivalent {code} mvn clean install -DskipTests mvn clean install -DskipTests -Phadoop-2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9059) Remove wrappers for SparkJobInfo and SparkStageInfo
[ https://issues.apache.org/jira/browse/HIVE-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249654#comment-14249654 ] Hive QA commented on HIVE-9059: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687692/HIVE-9059.2-spark.patch {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 7236 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin_negative2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part1 org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Delimited {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/561/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/561/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-561/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687692 - PreCommit-HIVE-SPARK-Build Remove wrappers for SparkJobInfo and SparkStageInfo --- Key: HIVE-9059 URL: https://issues.apache.org/jira/browse/HIVE-9059 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Attachments: HIVE-9059.1-spark.patch, HIVE-9059.1-spark.patch, HIVE-9059.2-spark.patch SPARK-4567 is resolved. We can remove the wrappers we added to solve the serailization issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9142) Create table stored as ORC with CTAS fails
karthik palanisamy created HIVE-9142: Summary: Create table stored as ORC with CTAS fails Key: HIVE-9142 URL: https://issues.apache.org/jira/browse/HIVE-9142 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.14.0 Environment: Apache Hive 0.14 Reporter: karthik palanisamy Priority: Blocker hive create table orc_orc stored as orc as select * from tweets; Diagnostic Messages for this Task: Error: org/apache/hadoop/hive/ql/io/orc/OrcProto$RowIndex FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask 14/12/17 14:44:54 [main]: ERROR ql.Driver: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9137) Turn off Hive's PredicateTransitivePropagate optimizer when cbo is on
[ https://issues.apache.org/jira/browse/HIVE-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249670#comment-14249670 ] Hive QA commented on HIVE-9137: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687631/HIVE-9137.patch {color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 6713 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join28 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_nulls org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join40 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_nullsafe org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_filter_on_outerjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonblock_op_deduplicate org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union27 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_filter_join_breaktask org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_join_nullsafe org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2104/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2104/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2104/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 24 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687631 - PreCommit-HIVE-TRUNK-Build Turn off Hive's PredicateTransitivePropagate optimizer when cbo is on - Key: HIVE-9137 URL: https://issues.apache.org/jira/browse/HIVE-9137 Project: Hive Issue Type: Task Components: CBO, Logical Optimizer Affects Versions: 0.15.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-9137.patch Because Calcite contains rule called {{JoinPushTransitivePredicatesRule}} which does exactly this. So, if cbo is on, this optimization would have already taken place and we won't gain anything by running this again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8988) Support advanced aggregation in Hive to Calcite path
[ https://issues.apache.org/jira/browse/HIVE-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-8988: -- Attachment: HIVE-8988.04.patch Patch after HIVE-9129 has been applied to the trunk. Support advanced aggregation in Hive to Calcite path - Key: HIVE-8988 URL: https://issues.apache.org/jira/browse/HIVE-8988 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Labels: grouping, logical, optiq Fix For: 0.15.0 Attachments: HIVE-8988.01.patch, HIVE-8988.02.patch, HIVE-8988.03.patch, HIVE-8988.04.patch, HIVE-8988.patch CLEAR LIBRARY CACHE To close the gap between Hive and Calcite, we need to support the translation of GroupingSets into Calcite; currently this is not implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8972) Implement more fine-grained remote client-level events [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249704#comment-14249704 ] Hive QA commented on HIVE-8972: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687696/HIVE-8972.4-spark.patch {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 7236 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_10 org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/562/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/562/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-562/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687696 - PreCommit-HIVE-SPARK-Build Implement more fine-grained remote client-level events [Spark Branch] - Key: HIVE-8972 URL: https://issues.apache.org/jira/browse/HIVE-8972 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8972.1-spark.patch, HIVE-8972.2-spark.patch, HIVE-8972.3-spark.patch, HIVE-8972.3-spark.patch, HIVE-8972.4-spark.patch Follow up task of HIVE-8956. Fine-grained events are useful for better job monitor and failure handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9138) Add some explain to PTF operator
[ https://issues.apache.org/jira/browse/HIVE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249727#comment-14249727 ] Hive QA commented on HIVE-9138: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687635/HIVE-9138.1.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6713 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2105/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2105/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2105/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687635 - PreCommit-HIVE-TRUNK-Build Add some explain to PTF operator Key: HIVE-9138 URL: https://issues.apache.org/jira/browse/HIVE-9138 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-9138.1.patch.txt PTFOperator does not explain anything in explain statement, making it hard to understand the internal works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8972) Implement more fine-grained remote client-level events [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-8972: - Attachment: HIVE-8972.5-spark.patch Try again. The failures {{union_remove_10}} and {{join10}} are all due to timeout getting cluster infos, which seems unrelated to the patch. {noformat} 2014-12-17 02:24:30,458 ERROR [main]: ql.Driver (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get spark memory/core info: java.util.concurrent.TimeoutException org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark memory/core info: java.util.concurrent.TimeoutException at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837) at org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234) at org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_10(TestSparkCliDriver.java:210) .. {noformat} Implement more fine-grained remote client-level events [Spark Branch] - Key: HIVE-8972 URL: https://issues.apache.org/jira/browse/HIVE-8972 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8972.1-spark.patch, HIVE-8972.2-spark.patch, HIVE-8972.3-spark.patch, HIVE-8972.3-spark.patch, HIVE-8972.4-spark.patch, HIVE-8972.5-spark.patch Follow up task of HIVE-8956. Fine-grained events are useful for better job monitor and failure handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 28941: HIVE-8988
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28941/ --- (Updated Dec. 17, 2014, 12:11 p.m.) Review request for hive, John Pullokkaran and Julian Hyde. Changes --- Latest patch after CBO enabled and dependencies on Calcite have been solved. Bugs: HIVE-8988 https://issues.apache.org/jira/browse/HIVE-8988 Repository: hive-git Description --- HIVE-8988 Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveGroupingID.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java c02a65e2041e4742a56cf4a935da0a7c04d18fdb ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 29be69182667dbe2070bd092bf75b4bb97101554 ql/src/test/queries/clientpositive/groupby_cube1.q c12720b27059075050fc92d9f31420c081303699 ql/src/test/results/clientpositive/groupby_cube1.q.out 7b5d70ae8ffce47a4b351ed9dfedcd15ab1e139c Diff: https://reviews.apache.org/r/28941/diff/ Testing --- Thanks, Jesús Camacho Rodríguez
[jira] [Commented] (HIVE-9140) Add ReduceExpressionRules from Calcite into Hive
[ https://issues.apache.org/jira/browse/HIVE-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249793#comment-14249793 ] Hive QA commented on HIVE-9140: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687667/HIVE-9140.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6713 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2106/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2106/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2106/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687667 - PreCommit-HIVE-TRUNK-Build Add ReduceExpressionRules from Calcite into Hive Key: HIVE-9140 URL: https://issues.apache.org/jira/browse/HIVE-9140 Project: Hive Issue Type: Improvement Components: CBO, Logical Optimizer Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-9140.patch These rules provide a form of constant folding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9113) Explain on query failed with NPE
[ https://issues.apache.org/jira/browse/HIVE-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249865#comment-14249865 ] Hive QA commented on HIVE-9113: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687671/HIVE-9113.1.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6714 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2107/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2107/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2107/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687671 - PreCommit-HIVE-TRUNK-Build Explain on query failed with NPE Key: HIVE-9113 URL: https://issues.apache.org/jira/browse/HIVE-9113 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Chao Assignee: Navis Attachments: HIVE-9113.1.patch.txt Run explain on the following query: {noformat} select p.p_partkey, li.l_suppkey from (select distinct l_partkey as p_partkey from lineitem) p join lineitem li on p.p_partkey = li.l_partkey where li.l_linenumber = 1 and li.l_orderkey in (select l_orderkey where l_linenumber = li.l_linenumber) ; {noformat} gave me NPE: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.QBSubQuery.validateAndRewriteAST(QBSubQuery.java:516) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:2605) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8866) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9745) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9638) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10125) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:720) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:639) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:578) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {noformat} Is this query invalid? If so, it should at least give some explanations, not just a plain NPE message, and left user clueless. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7024) Escape control characters for explain result
[ https://issues.apache.org/jira/browse/HIVE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249870#comment-14249870 ] Hive QA commented on HIVE-7024: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687684/HIVE-7024.5.patch.txt Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2108/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2108/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2108/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-2108/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/scheduler/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target accumulo-handler/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target ql/src/test/results/clientnegative/subquery_missing_from.q.out ql/src/test/queries/clientnegative/subquery_missing_from.q + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1646247. At revision 1646247. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12687684 - PreCommit-HIVE-TRUNK-Build Escape control characters for explain result Key: HIVE-7024 URL: https://issues.apache.org/jira/browse/HIVE-7024 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-7024.1.patch.txt, HIVE-7024.2.patch.txt, HIVE-7024.3.patch.txt, HIVE-7024.4.patch.txt, HIVE-7024.5.patch.txt Comments for columns are now delimited by 0x00, which is binary and make git refuse to make proper diff file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9143) select user(), current_user()
Hari Sekhon created HIVE-9143: - Summary: select user(), current_user() Key: HIVE-9143 URL: https://issues.apache.org/jira/browse/HIVE-9143 Project: Hive Issue Type: Improvement Affects Versions: 0.13.0 Reporter: Hari Sekhon Priority: Minor Feature request to add support for determining in SQL session which user I am currently connected as - an old MySQL ability: {code}mysql select user(), current_user(); +++ | user() | current_user() | +++ | root@localhost | root@localhost | +++ 1 row in set (0.00 sec) {code} which doesn't seem to have a counterpart in Hive at this time: {code}0: jdbc:hive2://host:100 select user(); Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 Invalid function 'user' (state=42000,code=4) 0: jdbc:hive2://host:100 select current_user(); Error: Error while compiling statement: FAILED: SemanticException [Error 10011]: Line 1:7 Invalid function 'current_user' (state=42000,code=10011){code} Regards, Hari Sekhon http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9143) select user(), current_user()
[ https://issues.apache.org/jira/browse/HIVE-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sekhon updated HIVE-9143: -- Description: Feature request to add support for determining in HQL session which user I am currently connected as - an old MySQL ability: {code}mysql select user(), current_user(); +++ | user() | current_user() | +++ | root@localhost | root@localhost | +++ 1 row in set (0.00 sec) {code} which doesn't seem to have a counterpart in Hive at this time: {code}0: jdbc:hive2://host:100 select user(); Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 Invalid function 'user' (state=42000,code=4) 0: jdbc:hive2://host:100 select current_user(); Error: Error while compiling statement: FAILED: SemanticException [Error 10011]: Line 1:7 Invalid function 'current_user' (state=42000,code=10011){code} Regards, Hari Sekhon http://www.linkedin.com/in/harisekhon was: Feature request to add support for determining in SQL session which user I am currently connected as - an old MySQL ability: {code}mysql select user(), current_user(); +++ | user() | current_user() | +++ | root@localhost | root@localhost | +++ 1 row in set (0.00 sec) {code} which doesn't seem to have a counterpart in Hive at this time: {code}0: jdbc:hive2://host:100 select user(); Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 Invalid function 'user' (state=42000,code=4) 0: jdbc:hive2://host:100 select current_user(); Error: Error while compiling statement: FAILED: SemanticException [Error 10011]: Line 1:7 Invalid function 'current_user' (state=42000,code=10011){code} Regards, Hari Sekhon http://www.linkedin.com/in/harisekhon select user(), current_user() - Key: HIVE-9143 URL: https://issues.apache.org/jira/browse/HIVE-9143 Project: Hive Issue Type: Improvement Affects Versions: 0.13.0 Reporter: Hari Sekhon Priority: Minor Feature request to add support for determining in HQL session which user I am currently connected as - an old MySQL ability: {code}mysql select user(), current_user(); +++ | user() | current_user() | +++ | root@localhost | root@localhost | +++ 1 row in set (0.00 sec) {code} which doesn't seem to have a counterpart in Hive at this time: {code}0: jdbc:hive2://host:100 select user(); Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 Invalid function 'user' (state=42000,code=4) 0: jdbc:hive2://host:100 select current_user(); Error: Error while compiling statement: FAILED: SemanticException [Error 10011]: Line 1:7 Invalid function 'current_user' (state=42000,code=10011){code} Regards, Hari Sekhon http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9144) Beeline + Kerberos shouldn't prompt for unused username + password
Hari Sekhon created HIVE-9144: - Summary: Beeline + Kerberos shouldn't prompt for unused username + password Key: HIVE-9144 URL: https://issues.apache.org/jira/browse/HIVE-9144 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 0.13.0 Environment: Hive 0.13 on MapR 4.0.1 Reporter: Hari Sekhon Priority: Minor When using beeline to connect to a kerberized HiveServer2 it still prompts for a username and password that aren't used. It should be changed to not prompt when using Kerberos: {code}/opt/mapr/hive/hive-0.13/bin/beeline Beeline version 0.13.0-mapr-1409 by Apache Hive beeline !connect jdbc:hive2://host:1/default;principal=hive/host@REALM scan complete in 6ms Connecting to jdbc:hive2://host:1/default;principal=hive/host@REALM Enter username for jdbc:hive2://lonsl1101975.uk.net.intra:1/default;principal=hive/host@REALM: wronguser Enter password for jdbc:hive2://host:1/default;principal=hive/host@REALM: enter Connected to: Apache Hive (version 0.13.0-mapr-1409) Driver: Hive JDBC (version 0.13.0-mapr-1409) Transaction isolation: TRANSACTION_REPEATABLE_READ {code} Hive conf includes (as concisely shown by set): {code}hive.server2.authentication = KERBEROS hive.server2.enable.doAs = true hive.server2.enable.impersonation = true {code} I can't see how to demonstrate in HQL session that I am not connected as wronguser (which obviously doesn't exist either locally or as a Kerberos principal or account in my LDAP directory), so I've raised another ticket for that HIVE-9143, but it should be clear given I specifed a non-existent user and a completely blank password just hitting enter that it's not using those credentials. Same happens with enter, enter for both username and password. Regards, Hari Sekhon http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance
[ https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249916#comment-14249916 ] Xuefu Zhang commented on HIVE-9127: --- +1. Please modify the query if the patch is going to apply to trunk. Improve CombineHiveInputFormat.getSplit performance --- Key: HIVE-9127 URL: https://issues.apache.org/jira/browse/HIVE-9127 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9127.1-spark.patch.txt, HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. Caller ST: {noformat} 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.ShuffleDependency.init(Dependency.scala:79) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.dependencies(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301) 2014-12-16 14:36:22,203 INFO
[jira] [Updated] (HIVE-9144) Beeline + Kerberos shouldn't prompt for unused username + password
[ https://issues.apache.org/jira/browse/HIVE-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sekhon updated HIVE-9144: -- Description: When using beeline to connect to a kerberized HiveServer2 it still prompts for a username and password that aren't used. It should be changed to not prompt when using Kerberos: {code}/opt/mapr/hive/hive-0.13/bin/beeline Beeline version 0.13.0-mapr-1409 by Apache Hive beeline !connect jdbc:hive2://host:1/default;principal=hive/host@REALM scan complete in 6ms Connecting to jdbc:hive2://host:1/default;principal=hive/host@REALM Enter username for jdbc:hive2://host:1/default;principal=hive/host@REALM: wronguser Enter password for jdbc:hive2://host:1/default;principal=hive/host@REALM: enter Connected to: Apache Hive (version 0.13.0-mapr-1409) Driver: Hive JDBC (version 0.13.0-mapr-1409) Transaction isolation: TRANSACTION_REPEATABLE_READ {code} Hive conf includes (as concisely shown by set): {code}hive.server2.authentication = KERBEROS hive.server2.enable.doAs = true hive.server2.enable.impersonation = true {code} I can't see how to demonstrate in HQL session that I am not connected as wronguser (which obviously doesn't exist either locally or as a Kerberos principal or account in my LDAP directory), so I've raised another ticket for that HIVE-9143, but it should be clear given I specifed a non-existent user and a completely blank password just hitting enter that it's not using those credentials. Same happens with enter, enter for both username and password. Regards, Hari Sekhon http://www.linkedin.com/in/harisekhon was: When using beeline to connect to a kerberized HiveServer2 it still prompts for a username and password that aren't used. It should be changed to not prompt when using Kerberos: {code}/opt/mapr/hive/hive-0.13/bin/beeline Beeline version 0.13.0-mapr-1409 by Apache Hive beeline !connect jdbc:hive2://host:1/default;principal=hive/host@REALM scan complete in 6ms Connecting to jdbc:hive2://host:1/default;principal=hive/host@REALM Enter username for jdbc:hive2://lonsl1101975.uk.net.intra:1/default;principal=hive/host@REALM: wronguser Enter password for jdbc:hive2://host:1/default;principal=hive/host@REALM: enter Connected to: Apache Hive (version 0.13.0-mapr-1409) Driver: Hive JDBC (version 0.13.0-mapr-1409) Transaction isolation: TRANSACTION_REPEATABLE_READ {code} Hive conf includes (as concisely shown by set): {code}hive.server2.authentication = KERBEROS hive.server2.enable.doAs = true hive.server2.enable.impersonation = true {code} I can't see how to demonstrate in HQL session that I am not connected as wronguser (which obviously doesn't exist either locally or as a Kerberos principal or account in my LDAP directory), so I've raised another ticket for that HIVE-9143, but it should be clear given I specifed a non-existent user and a completely blank password just hitting enter that it's not using those credentials. Same happens with enter, enter for both username and password. Regards, Hari Sekhon http://www.linkedin.com/in/harisekhon Beeline + Kerberos shouldn't prompt for unused username + password -- Key: HIVE-9144 URL: https://issues.apache.org/jira/browse/HIVE-9144 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 0.13.0 Environment: Hive 0.13 on MapR 4.0.1 Reporter: Hari Sekhon Priority: Minor When using beeline to connect to a kerberized HiveServer2 it still prompts for a username and password that aren't used. It should be changed to not prompt when using Kerberos: {code}/opt/mapr/hive/hive-0.13/bin/beeline Beeline version 0.13.0-mapr-1409 by Apache Hive beeline !connect jdbc:hive2://host:1/default;principal=hive/host@REALM scan complete in 6ms Connecting to jdbc:hive2://host:1/default;principal=hive/host@REALM Enter username for jdbc:hive2://host:1/default;principal=hive/host@REALM: wronguser Enter password for jdbc:hive2://host:1/default;principal=hive/host@REALM: enter Connected to: Apache Hive (version 0.13.0-mapr-1409) Driver: Hive JDBC (version 0.13.0-mapr-1409) Transaction isolation: TRANSACTION_REPEATABLE_READ {code} Hive conf includes (as concisely shown by set): {code}hive.server2.authentication = KERBEROS hive.server2.enable.doAs = true hive.server2.enable.impersonation = true {code} I can't see how to demonstrate in HQL session that I am not connected as wronguser (which obviously doesn't exist either locally or as a Kerberos principal or account in my LDAP directory), so I've raised another ticket for that HIVE-9143, but it should be clear given I specifed a non-existent user and a completely blank password just hitting enter that it's not using those credentials. Same happens
Re: Review Request 29145: HIVE-9094 TimeoutException when trying get executor count from RSC [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29145/#review65323 --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/29145/#comment108440 If the same timeout is used for multiple rpc calls, then the description here might need to be updated. - Xuefu Zhang On Dec. 17, 2014, 6:28 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29145/ --- (Updated Dec. 17, 2014, 6:28 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9094 https://issues.apache.org/jira/browse/HIVE-9094 Repository: hive-git Description --- RemoteHiveSparkClient::getExecutorCount timeout after 5s as Spark cluster has not launched yet 1. set the timeout value configurable. 2. set default timeout value 60s. 3. enable timeout for get spark job info and get spark stage info. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 22f052a ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 5d6a02c ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java e1946d5 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java 6217de4 Diff: https://reviews.apache.org/r/29145/diff/ Testing --- Thanks, chengxiang li
[jira] [Commented] (HIVE-9094) TimeoutException when trying get executor count from RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249932#comment-14249932 ] Xuefu Zhang commented on HIVE-9094: --- Minor comments on RB. [~vanzin], could you also take a look? TimeoutException when trying get executor count from RSC [Spark Branch] --- Key: HIVE-9094 URL: https://issues.apache.org/jira/browse/HIVE-9094 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Attachments: HIVE-9094.1-spark.patch In http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/532/testReport, join25.q failed because: {code} 2014-12-12 19:14:50,084 ERROR [main]: ql.Driver (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get spark memory/core info: java.util.concurrent.TimeoutException org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark memory/core info: java.util.concurrent.TimeoutException at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837) at org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234) at org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25(TestSparkCliDriver.java:162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at
Re: Review Request 29147: HIVE-9059 Remove wrappers for SparkJobInfo and SparkStageInfo
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29147/#review65324 --- Ship it! Ship It! - Xuefu Zhang On Dec. 17, 2014, 7:29 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29147/ --- (Updated Dec. 17, 2014, 7:29 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9059 https://issues.apache.org/jira/browse/HIVE-9059 Repository: hive-git Description --- SPARK-4567 is resolved. We can remove the wrappers we added to solve the serailization issues. Diffs - pom.xml b3a22b5 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java 6217de4 spark-client/src/main/java/org/apache/hive/spark/client/rpc/KryoMessageCodec.java 437d61d spark-client/src/main/java/org/apache/hive/spark/client/status/HiveSparkJobInfo.java 8ea6969 spark-client/src/main/java/org/apache/hive/spark/client/status/HiveSparkStageInfo.java dfbb01e Diff: https://reviews.apache.org/r/29147/diff/ Testing --- Thanks, chengxiang li
[jira] [Commented] (HIVE-9059) Remove wrappers for SparkJobInfo and SparkStageInfo
[ https://issues.apache.org/jira/browse/HIVE-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249942#comment-14249942 ] Xuefu Zhang commented on HIVE-9059: --- +1 Remove wrappers for SparkJobInfo and SparkStageInfo --- Key: HIVE-9059 URL: https://issues.apache.org/jira/browse/HIVE-9059 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Attachments: HIVE-9059.1-spark.patch, HIVE-9059.1-spark.patch, HIVE-9059.2-spark.patch SPARK-4567 is resolved. We can remove the wrappers we added to solve the serailization issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8972) Implement more fine-grained remote client-level events [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249959#comment-14249959 ] Hive QA commented on HIVE-8972: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687722/HIVE-8972.5-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 7236 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/563/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/563/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-563/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687722 - PreCommit-HIVE-SPARK-Build Implement more fine-grained remote client-level events [Spark Branch] - Key: HIVE-8972 URL: https://issues.apache.org/jira/browse/HIVE-8972 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8972.1-spark.patch, HIVE-8972.2-spark.patch, HIVE-8972.3-spark.patch, HIVE-8972.3-spark.patch, HIVE-8972.4-spark.patch, HIVE-8972.5-spark.patch Follow up task of HIVE-8956. Fine-grained events are useful for better job monitor and failure handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9076) incompatFileSet in AbstractFileMergeOperator should be marked to skip task id check
[ https://issues.apache.org/jira/browse/HIVE-9076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249976#comment-14249976 ] Hive QA commented on HIVE-9076: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687687/HIVE-9076.4.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6714 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2109/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2109/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2109/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687687 - PreCommit-HIVE-TRUNK-Build incompatFileSet in AbstractFileMergeOperator should be marked to skip task id check --- Key: HIVE-9076 URL: https://issues.apache.org/jira/browse/HIVE-9076 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-9076.1.patch.txt, HIVE-9076.2.patch.txt, HIVE-9076.3.patch.txt, HIVE-9076.4.patch.txt In some file composition, AbstractFileMergeOperator removes incompatible files. For example, {noformat} 00_0 (v12) 00_0_copy_1 (v12) 00_1 (v11) 00_1_copy_1 (v11) 00_1_copy_2 (v11) 00_2 (v12) {noformat} 00_1 (v11) will be removed because 00 is assigned to new merged file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9145) authorization_admin_almighty1.q fails with result diff [Spark Branch]
Xuefu Zhang created HIVE-9145: - Summary: authorization_admin_almighty1.q fails with result diff [Spark Branch] Key: HIVE-9145 URL: https://issues.apache.org/jira/browse/HIVE-9145 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang HIVE-7979 enabled this test. However, the test result seems having a timestamp that depends on the date when the test run, which makes the test fail. The same test on trunk give -1 for the timestamp value and thus pass all the time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9130) vector_partition_diff_num_cols result is not updated after CBO upgrade
[ https://issues.apache.org/jira/browse/HIVE-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250041#comment-14250041 ] Sergey Shelukhin commented on HIVE-9130: Thanks! vector_partition_diff_num_cols result is not updated after CBO upgrade --- Key: HIVE-9130 URL: https://issues.apache.org/jira/browse/HIVE-9130 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Trivial Fix For: 0.15.0 Attachments: HIVE-9130.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-8843: -- Attachment: HIVE-8843.3-spark.patch Attached v3 again to re-run the tests. Release RDD cache when Hive query is done [Spark Branch] Key: HIVE-8843 URL: https://issues.apache.org/jira/browse/HIVE-8843 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang Attachments: HIVE-8843.1-spark.patch, HIVE-8843.2-spark.patch, HIVE-8843.3-spark.patch, HIVE-8843.3-spark.patch In some multi-inser cases, RDD.cache() is called to improve performance. RDD is SparkContext specific, but the caching is useful only for the query. Thus, once the query is executed, we need to release the cache used by calling RDD.uncache(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9135) Cache Map and Reduce works in RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HIVE-9135: - Assignee: Jimmy Xiang Cache Map and Reduce works in RSC [Spark Branch] Key: HIVE-9135 URL: https://issues.apache.org/jira/browse/HIVE-9135 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Jimmy Xiang HIVE-9127 works around the fact that we don't cache Map/Reduce works in Spark. However, other input formats such as HiveInputFormat will not benefit from that fix. We should investigate how to allow caching on the RSC while not on tasks (see HIVE-7431). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9059) Remove wrappers for SparkJobInfo and SparkStageInfo [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9059: -- Summary: Remove wrappers for SparkJobInfo and SparkStageInfo [Spark Branch] (was: Remove wrappers for SparkJobInfo and SparkStageInfo) Remove wrappers for SparkJobInfo and SparkStageInfo [Spark Branch] -- Key: HIVE-9059 URL: https://issues.apache.org/jira/browse/HIVE-9059 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Attachments: HIVE-9059.1-spark.patch, HIVE-9059.1-spark.patch, HIVE-9059.2-spark.patch SPARK-4567 is resolved. We can remove the wrappers we added to solve the serailization issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9059) Remove wrappers for SparkJobInfo and SparkStageInfo [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9059: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Committed to Spark branch. Thanks, Chengxiang. Remove wrappers for SparkJobInfo and SparkStageInfo [Spark Branch] -- Key: HIVE-9059 URL: https://issues.apache.org/jira/browse/HIVE-9059 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Fix For: spark-branch Attachments: HIVE-9059.1-spark.patch, HIVE-9059.1-spark.patch, HIVE-9059.2-spark.patch SPARK-4567 is resolved. We can remove the wrappers we added to solve the serailization issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8848) data loading from text files or text file processing doesn't handle nulls correctly
[ https://issues.apache.org/jira/browse/HIVE-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250085#comment-14250085 ] Sergey Shelukhin commented on HIVE-8848: +1 data loading from text files or text file processing doesn't handle nulls correctly --- Key: HIVE-8848 URL: https://issues.apache.org/jira/browse/HIVE-8848 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Navis Attachments: HIVE-8848.01.patch, HIVE-8848.2.patch.txt, HIVE-8848.3.patch.txt, HIVE-8848.4.patch.txt, HIVE-8848.patch I am not sure how nulls are supposed to be stored in text tables, but after loading some data with null or NULL strings, or x00 characters, we get bunch of annoying logging from LazyPrimitive that data is not in INT format and was converted to null, with data being null (string saying null, I assume from the code). Either load should load them as nulls, or there should be some defined way to load nulls. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8406) Research on skewed join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250088#comment-14250088 ] Xuefu Zhang commented on HIVE-8406: --- [~leftylev], this is just a research task. It doesn't seem needing any doc. Research on skewed join [Spark Branch] -- Key: HIVE-8406 URL: https://issues.apache.org/jira/browse/HIVE-8406 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: Skew join background.pdf Research on how to handle skewed join for hive on spark. Here is original hive's design doc for skewed join, https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7977) Avoid creating serde for partitions if possible in FetchTask
[ https://issues.apache.org/jira/browse/HIVE-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250092#comment-14250092 ] Ashutosh Chauhan commented on HIVE-7977: Don't think so. If problem persists, may be just create a new RB, instead of updating previous one. Avoid creating serde for partitions if possible in FetchTask Key: HIVE-7977 URL: https://issues.apache.org/jira/browse/HIVE-7977 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-7977.1.patch.txt, HIVE-7977.2.patch.txt, HIVE-7977.3.patch.txt, HIVE-7977.4.patch.txt, HIVE-7977.5.patch.txt, HIVE-7977.6.patch.txt Currently, FetchTask creates SerDe instance thrice for each partition, which can be avoided if it's same with table SerDe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9120) Hive Query log does not work when hive.exec.parallel is true
[ https://issues.apache.org/jira/browse/HIVE-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250098#comment-14250098 ] Hive QA commented on HIVE-9120: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687697/HIVE-9120.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6714 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2110/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2110/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2110/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687697 - PreCommit-HIVE-TRUNK-Build Hive Query log does not work when hive.exec.parallel is true Key: HIVE-9120 URL: https://issues.apache.org/jira/browse/HIVE-9120 Project: Hive Issue Type: Bug Components: HiveServer2, Logging Reporter: Dong Chen Assignee: Dong Chen Attachments: HIVE-9120.patch When hive.exec.parallel is true, the query log is not saved and Beeline can not retrieve it. When parallel, Driver.launchTask() may run the task in a new thread if other conditions are also on. TaskRunner.start() is invoked instead of TaskRunner.runSequential(). This cause the threadlocal variable OperationLog to be null and query logs are not logged. The OperationLog object should be set in the new thread in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9146) Query with left joins produces wrong result when join condition is written in different order
Kamil Gorlo created HIVE-9146: - Summary: Query with left joins produces wrong result when join condition is written in different order Key: HIVE-9146 URL: https://issues.apache.org/jira/browse/HIVE-9146 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Kamil Gorlo I have two queries which should be equal (I only swap two join conditions) but they are not. They are simplest queries I could produce to reproduce bug. I have two simple tables: desc kgorlo_comm; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | desc kgorlo_log; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | | tstamp| bigint | | With data: select * from kgorlo_comm; | kgorlo_comm.id | kgorlo_comm.dest_id | | 1 | 2| | 2 | 1| | 1 | 3| | 2 | 3| | 3 | 5| | 4 | 5| select * from kgorlo_log; | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp | | 1 | 2 | 0 | | 1 | 3 | 0 | | 1 | 5 | 0 | | 3 | 1 | 0 | And when I run this query (query no. 1): {quote} select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id; {quote} I get result (which is correct): | log.id | log.dest_id | com1.msgs | com2.msgs | | 1 | 2| 1 | 1 | | 1 | 3| 1 | NULL | | 1 | 5| NULL | NULL | | 3 | 1| NULL | 1 | But when I run second query (query no. 2): {quote} select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id; {quote} I got different (and bad, in my opinion) result: |log.id | log.dest_id | com1.msgs | com2.msgs| |1|2|1|1| |1|3|1|1| |1|5|NULL|NULL| |3|1|NULL|NULL| Query no. 1 and query no. 2 are different in only one place, it is second join condition: bf. com2.dest_id=log.id and com2.id=log.dest_id vs bf. com2.id=log.dest_id and com2.dest_id=log.id which in my opinion are equal. Explains for both queries are of course slightly different (columns are swapped) and they are here: https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good) https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9146) Query with left joins produces wrong result when join condition is written in different order
[ https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamil Gorlo updated HIVE-9146: -- Description: I have two queries which should be equal (I only swap two join conditions) but they are not. They are simplest queries I could produce to reproduce bug. I have two simple tables: desc kgorlo_comm; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | desc kgorlo_log; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | | tstamp| bigint | | With data: select * from kgorlo_comm; | kgorlo_comm.id | kgorlo_comm.dest_id | | 1 | 2| | 2 | 1| | 1 | 3| | 2 | 3| | 3 | 5| | 4 | 5| select * from kgorlo_log; | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp | | 1 | 2 | 0 | | 1 | 3 | 0 | | 1 | 5 | 0 | | 3 | 1 | 0 | And when I run this query (query no. 1): {quote} select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id; {quote} I get result (which is correct): | log.id | log.dest_id | com1.msgs | com2.msgs | | 1 | 2| 1 | 1 | | 1 | 3| 1 | NULL | | 1 | 5| NULL | NULL | | 3 | 1| NULL | 1 | But when I run second query (query no. 2): {quote} select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id; {quote} I get different (and bad, in my opinion) result: |log.id | log.dest_id | com1.msgs | com2.msgs| |1|2|1|1| |1|3|1|1| |1|5|NULL|NULL| |3|1|NULL|NULL| Query no. 1 and query no. 2 are different in only one place, it is second join condition: bf. com2.dest_id=log.id and com2.id=log.dest_id vs bf. com2.id=log.dest_id and com2.dest_id=log.id which in my opinion are equal. Explains for both queries are of course slightly different (columns are swapped) and they are here: https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good) https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad) was: I have two queries which should be equal (I only swap two join conditions) but they are not. They are simplest queries I could produce to reproduce bug. I have two simple tables: desc kgorlo_comm; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | desc kgorlo_log; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | | tstamp| bigint | | With data: select * from kgorlo_comm; | kgorlo_comm.id | kgorlo_comm.dest_id | | 1 | 2| | 2 | 1| | 1 | 3| | 2 | 3| | 3 | 5| | 4 | 5| select * from kgorlo_log; | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp | | 1 | 2 | 0 | | 1 | 3 | 0 | | 1 | 5 | 0 | | 3 | 1 | 0 | And when I run this query (query no. 1): {quote} select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id; {quote} I get result (which is correct): | log.id | log.dest_id | com1.msgs | com2.msgs | | 1 | 2| 1 | 1 | | 1 | 3| 1 | NULL | | 1 | 5| NULL | NULL | | 3 | 1|
[jira] [Commented] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance
[ https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250104#comment-14250104 ] Brock Noland commented on HIVE-9127: Thank you Xuefu! bq. Please modify the query if the patch is going to apply to trunk. I don't follow? The latest patch applies to trunk and was tested on trunk. Improve CombineHiveInputFormat.getSplit performance --- Key: HIVE-9127 URL: https://issues.apache.org/jira/browse/HIVE-9127 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9127.1-spark.patch.txt, HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. Caller ST: {noformat} 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.ShuffleDependency.init(Dependency.scala:79) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.dependencies(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at
[jira] [Updated] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance
[ https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9127: --- Affects Version/s: (was: spark-branch) 0.14.0 Improve CombineHiveInputFormat.getSplit performance --- Key: HIVE-9127 URL: https://issues.apache.org/jira/browse/HIVE-9127 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 0.14.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9127.1-spark.patch.txt, HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. Caller ST: {noformat} 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.ShuffleDependency.init(Dependency.scala:79) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.dependencies(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl
[jira] [Assigned] (HIVE-9136) Profile query compiler [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao reassigned HIVE-9136: -- Assignee: Chao Profile query compiler [Spark Branch] - Key: HIVE-9136 URL: https://issues.apache.org/jira/browse/HIVE-9136 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao We should put some performance counters around the compiler and evaluate how long it takes to compile a query in Spark versus the other execution frameworks. Query 28 is a good one to use for testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9120) Hive Query log does not work when hive.exec.parallel is true
[ https://issues.apache.org/jira/browse/HIVE-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250128#comment-14250128 ] Brock Noland commented on HIVE-9120: +1, thank you [~dongc]! Hive Query log does not work when hive.exec.parallel is true Key: HIVE-9120 URL: https://issues.apache.org/jira/browse/HIVE-9120 Project: Hive Issue Type: Bug Components: HiveServer2, Logging Reporter: Dong Chen Assignee: Dong Chen Attachments: HIVE-9120.patch When hive.exec.parallel is true, the query log is not saved and Beeline can not retrieve it. When parallel, Driver.launchTask() may run the task in a new thread if other conditions are also on. TaskRunner.start() is invoked instead of TaskRunner.runSequential(). This cause the threadlocal variable OperationLog to be null and query logs are not logged. The OperationLog object should be set in the new thread in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9146) Query with left joins produces wrong result when join condition is written in different order
[ https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250136#comment-14250136 ] Ashutosh Chauhan commented on HIVE-9146: you might be hitting into HIVE-8298 can you test your queries on Hive 0.14 and post your findings here. Query with left joins produces wrong result when join condition is written in different order - Key: HIVE-9146 URL: https://issues.apache.org/jira/browse/HIVE-9146 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Kamil Gorlo I have two queries which should be equal (I only swap two join conditions) but they are not. They are simplest queries I could produce to reproduce bug. I have two simple tables: desc kgorlo_comm; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | desc kgorlo_log; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | | tstamp| bigint | | With data: select * from kgorlo_comm; | kgorlo_comm.id | kgorlo_comm.dest_id | | 1 | 2| | 2 | 1| | 1 | 3| | 2 | 3| | 3 | 5| | 4 | 5| select * from kgorlo_log; | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp | | 1 | 2 | 0 | | 1 | 3 | 0 | | 1 | 5 | 0 | | 3 | 1 | 0 | And when I run this query (query no. 1): {quote} select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id; {quote} I get result (which is correct): | log.id | log.dest_id | com1.msgs | com2.msgs | | 1 | 2| 1 | 1 | | 1 | 3| 1 | NULL | | 1 | 5| NULL | NULL | | 3 | 1| NULL | 1 | But when I run second query (query no. 2): {quote} select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id; {quote} I get different (and bad, in my opinion) result: |log.id | log.dest_id | com1.msgs | com2.msgs| |1|2|1|1| |1|3|1|1| |1|5|NULL|NULL| |3|1|NULL|NULL| Query no. 1 and query no. 2 are different in only one place, it is second join condition: bf. com2.dest_id=log.id and com2.id=log.dest_id vs bf. com2.id=log.dest_id and com2.dest_id=log.id which in my opinion are equal. Explains for both queries are of course slightly different (columns are swapped) and they are here: https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good) https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9074) add ability to force direct sql usage for perf reasons
[ https://issues.apache.org/jira/browse/HIVE-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-9074: --- Attachment: HIVE-9074.patch add ability to force direct sql usage for perf reasons -- Key: HIVE-9074 URL: https://issues.apache.org/jira/browse/HIVE-9074 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Attachments: HIVE-9074.patch Some people run direct SQL and hit failures (e.g. due to Oracle 1000-in-expressions stupidity, illegal cast optimization in Derby and Oracle, or some other Hive and DB bugs). Currently, it falls back to ORM for such cases, however that can have huge impact on perf, and some people would rather have it fail so they can see the problem. In addition to off and on+fallback modes, on or fail mode needs to be added. The default will remain the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9074) add ability to force direct sql usage for perf reasons
[ https://issues.apache.org/jira/browse/HIVE-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-9074: --- Status: Patch Available (was: Open) add ability to force direct sql usage for perf reasons -- Key: HIVE-9074 URL: https://issues.apache.org/jira/browse/HIVE-9074 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Attachments: HIVE-9074.patch Some people run direct SQL and hit failures (e.g. due to Oracle 1000-in-expressions stupidity, illegal cast optimization in Derby and Oracle, or some other Hive and DB bugs). Currently, it falls back to ORM for such cases, however that can have huge impact on perf, and some people would rather have it fail so they can see the problem. In addition to off and on+fallback modes, on or fail mode needs to be added. The default will remain the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9074) add ability to force direct sql usage for perf reasons
[ https://issues.apache.org/jira/browse/HIVE-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250161#comment-14250161 ] Sergey Shelukhin commented on HIVE-9074: [~ashutoshc] can you review? Thanks add ability to force direct sql usage for perf reasons -- Key: HIVE-9074 URL: https://issues.apache.org/jira/browse/HIVE-9074 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Attachments: HIVE-9074.patch Some people run direct SQL and hit failures (e.g. due to Oracle 1000-in-expressions stupidity, illegal cast optimization in Derby and Oracle, or some other Hive and DB bugs). Currently, it falls back to ORM for such cases, however that can have huge impact on perf, and some people would rather have it fail so they can see the problem. In addition to off and on+fallback modes, on or fail mode needs to be added. The default will remain the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9140) Add ReduceExpressionRules from Calcite into Hive
[ https://issues.apache.org/jira/browse/HIVE-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250183#comment-14250183 ] Ashutosh Chauhan commented on HIVE-9140: [~jpullokkaran] Can you take a look at this one? I initially thought its better to have it in applyPreCBOTransformations() but than I realized that our join ordering algorithm leaves {where true} predicates while optimizing tree. Since, it will be good to remove such predicates, I have added these rules alongwith join ordering rules. Let me know what do you think. Add ReduceExpressionRules from Calcite into Hive Key: HIVE-9140 URL: https://issues.apache.org/jira/browse/HIVE-9140 Project: Hive Issue Type: Improvement Components: CBO, Logical Optimizer Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-9140.patch These rules provide a form of constant folding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-9140) Add ReduceExpressionRules from Calcite into Hive
[ https://issues.apache.org/jira/browse/HIVE-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250183#comment-14250183 ] Ashutosh Chauhan edited comment on HIVE-9140 at 12/17/14 5:39 PM: -- [~jpullokkaran] Can you take a look at this one? I initially thought its better to have it in applyPreCBOTransformations() but than I realized that our join ordering algorithm leaves {{where true}} predicates while optimizing tree. Since, it will be good to remove such predicates, I have added these rules alongwith join ordering rules. Let me know what do you think. was (Author: ashutoshc): [~jpullokkaran] Can you take a look at this one? I initially thought its better to have it in applyPreCBOTransformations() but than I realized that our join ordering algorithm leaves {where true} predicates while optimizing tree. Since, it will be good to remove such predicates, I have added these rules alongwith join ordering rules. Let me know what do you think. Add ReduceExpressionRules from Calcite into Hive Key: HIVE-9140 URL: https://issues.apache.org/jira/browse/HIVE-9140 Project: Hive Issue Type: Improvement Components: CBO, Logical Optimizer Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-9140.patch These rules provide a form of constant folding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 28933: HIVE-8131:Support timestamp in Avro
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28933/#review65334 --- Ship it! Ship It! - Ryan Blue On Dec. 15, 2014, 7:40 p.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28933/ --- (Updated Dec. 15, 2014, 7:40 p.m.) Review request for hive. Repository: hive-git Description --- The patch includes: 1.add timestamp support for AvroSerde 2.add related test cases Diffs - data/files/avro_timestamp.txt PRE-CREATION ql/src/test/queries/clientpositive/avro_timestamp.q PRE-CREATION ql/src/test/results/clientpositive/avro_timestamp.q.out PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 07c5ecf serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 7639a2b serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java c8eac89 serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaToTypeInfo.java c84b1a0 serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java 8cb2dc3 serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java cd5a0fa Diff: https://reviews.apache.org/r/28933/diff/ Testing --- Test passed for added cases Thanks, cheng xu
[jira] [Commented] (HIVE-8988) Support advanced aggregation in Hive to Calcite path
[ https://issues.apache.org/jira/browse/HIVE-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250226#comment-14250226 ] Hive QA commented on HIVE-8988: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687709/HIVE-8988.04.patch {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 6713 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_id2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_rollup1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_grouping_operators org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2111/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2111/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2111/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687709 - PreCommit-HIVE-TRUNK-Build Support advanced aggregation in Hive to Calcite path - Key: HIVE-8988 URL: https://issues.apache.org/jira/browse/HIVE-8988 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Labels: grouping, logical, optiq Fix For: 0.15.0 Attachments: HIVE-8988.01.patch, HIVE-8988.02.patch, HIVE-8988.03.patch, HIVE-8988.04.patch, HIVE-8988.patch CLEAR LIBRARY CACHE To close the gap between Hive and Calcite, we need to support the translation of GroupingSets into Calcite; currently this is not implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250251#comment-14250251 ] Hive QA commented on HIVE-8843: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687754/HIVE-8843.3-spark.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7236 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/564/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/564/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-564/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687754 - PreCommit-HIVE-SPARK-Build Release RDD cache when Hive query is done [Spark Branch] Key: HIVE-8843 URL: https://issues.apache.org/jira/browse/HIVE-8843 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang Attachments: HIVE-8843.1-spark.patch, HIVE-8843.2-spark.patch, HIVE-8843.3-spark.patch, HIVE-8843.3-spark.patch In some multi-inser cases, RDD.cache() is called to improve performance. RDD is SparkContext specific, but the caching is useful only for the query. Thus, once the query is executed, we need to release the cache used by calling RDD.uncache(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
What's the status of AccessServer?
Hi folks, I'm looking for ways to expose Apache Phoenix [0] to a wider audience. One potential way to do that is to follow in the Hive footsteps with a HS2 protocol-compatible service. I've done some prototyping along these lines and see that it's quite feasible. Along the way I came across this proposal for refactoring HS2 into the AccessServer [1]. What's the state of the AccessServer project? Is anyone working on it? Is there a relationship between this effort and Calcite's Avatica [2]? The system proposed in the AccessServer doc seems to fit nicely in line with Calcite's objectives. Thanks, Nick [0]: http://phoenix.apache.org [1]: https://cwiki.apache.org/confluence/display/Hive/AccessServer+Design+Proposal [2]: http://mail-archives.apache.org/mod_mbox/calcite-dev/201412.mbox/%3CCAMCtme%2BpVsVYP%2B-J1jDPk-fNCtAHj3f0eXif_hUG_Xy81Ufxsw%40mail.gmail.com%3E
Re: Review Request 29145: HIVE-9094 TimeoutException when trying get executor count from RSC [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29145/#review65348 --- Ship it! +1 to Xuefu's comments. The config name also looks very generic, since it's only applied to a couple of jobs submitted to the client. But I don't have a good suggestion here. - Marcelo Vanzin On Dec. 17, 2014, 6:28 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29145/ --- (Updated Dec. 17, 2014, 6:28 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9094 https://issues.apache.org/jira/browse/HIVE-9094 Repository: hive-git Description --- RemoteHiveSparkClient::getExecutorCount timeout after 5s as Spark cluster has not launched yet 1. set the timeout value configurable. 2. set default timeout value 60s. 3. enable timeout for get spark job info and get spark stage info. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 22f052a ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 5d6a02c ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java e1946d5 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java 6217de4 Diff: https://reviews.apache.org/r/29145/diff/ Testing --- Thanks, chengxiang li
[jira] [Commented] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance
[ https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250325#comment-14250325 ] Jimmy Xiang commented on HIVE-9127: --- In looking into HIVE-9135, I was wondering if it is better to fix the root cause of HIVE-7431 instead disabling the cache for Spark. If so, probably we don't need this work around? Improve CombineHiveInputFormat.getSplit performance --- Key: HIVE-9127 URL: https://issues.apache.org/jira/browse/HIVE-9127 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 0.14.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9127.1-spark.patch.txt, HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. Caller ST: {noformat} 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.ShuffleDependency.init(Dependency.scala:79) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.dependencies(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -
[jira] [Updated] (HIVE-9053) select constant in union all followed by group by gives wrong result
[ https://issues.apache.org/jira/browse/HIVE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9053: -- Attachment: HIVE-9053.04.patch-013 select constant in union all followed by group by gives wrong result Key: HIVE-9053 URL: https://issues.apache.org/jira/browse/HIVE-9053 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0 Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Fix For: 0.15.0 Attachments: HIVE-9053.01.patch, HIVE-9053.02.patch, HIVE-9053.03.patch, HIVE-9053.04.patch, HIVE-9053.04.patch-013 Here is the the way to reproduce with q test: select key from (select '1' as key from src union all select key from src)tab group by key; will give OK NULL 1 This is not correct as src contains many other keys. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9053) select constant in union all followed by group by gives wrong result
[ https://issues.apache.org/jira/browse/HIVE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9053: -- Affects Version/s: 0.13.0 0.14.0 select constant in union all followed by group by gives wrong result Key: HIVE-9053 URL: https://issues.apache.org/jira/browse/HIVE-9053 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0 Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Fix For: 0.15.0 Attachments: HIVE-9053.01.patch, HIVE-9053.02.patch, HIVE-9053.03.patch, HIVE-9053.04.patch, HIVE-9053.04.patch-013 Here is the the way to reproduce with q test: select key from (select '1' as key from src union all select key from src)tab group by key; will give OK NULL 1 This is not correct as src contains many other keys. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9074) add ability to force direct sql usage for perf reasons
[ https://issues.apache.org/jira/browse/HIVE-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250345#comment-14250345 ] Ashutosh Chauhan commented on HIVE-9074: Couple of comments: * You can use validator in HiveConf via ConfVars(String varname, Object defaultVal, Validator validator, String description) constructor to force allowed values for a particular config. This will allow you to get rid of {{isConfigEnabled}} variable and thus simplify a logic bit there. * This throws exception as soon as datastore is found to be incompatible. If direct sql query is indeed fired but than fails while executing against datastore, we still catch that exception and than falls back to ORM. This patch is not intended to capture that code path, is it? add ability to force direct sql usage for perf reasons -- Key: HIVE-9074 URL: https://issues.apache.org/jira/browse/HIVE-9074 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Attachments: HIVE-9074.patch Some people run direct SQL and hit failures (e.g. due to Oracle 1000-in-expressions stupidity, illegal cast optimization in Derby and Oracle, or some other Hive and DB bugs). Currently, it falls back to ORM for such cases, however that can have huge impact on perf, and some people would rather have it fail so they can see the problem. In addition to off and on+fallback modes, on or fail mode needs to be added. The default will remain the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9053) select constant in union all followed by group by gives wrong result
[ https://issues.apache.org/jira/browse/HIVE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250347#comment-14250347 ] Pengcheng Xiong commented on HIVE-9053: --- [~prasanth_j], could you please review and help me commit patch-013 to hive 0.13 branch? Thanks! select constant in union all followed by group by gives wrong result Key: HIVE-9053 URL: https://issues.apache.org/jira/browse/HIVE-9053 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0 Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Fix For: 0.15.0 Attachments: HIVE-9053.01.patch, HIVE-9053.02.patch, HIVE-9053.03.patch, HIVE-9053.04.patch, HIVE-9053.04.patch-013 Here is the the way to reproduce with q test: select key from (select '1' as key from src union all select key from src)tab group by key; will give OK NULL 1 This is not correct as src contains many other keys. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8972) Implement more fine-grained remote client-level events [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250353#comment-14250353 ] Marcelo Vanzin commented on HIVE-8972: -- The patch looks ok to me. I though about creating a separate API for these kinds of RPCs - these wouldn't be queued in the backend but executed right away. My only concern is that this could be abused (e.g. a caller using these calls to run a Spark job before the queue ones), but perhaps that's an app-level concern and the client shouldn't care if someone uses it that way. The netty framework we're using now could also make some things easier, like adding listeners to JobHandle and reporting job state changes to the client side when they happen (instead of the current poll-like approach?). We could also add client-level listeners so that interesting events are reported (e.g. spark context up and things like that). If there's interest in these things we could create a new task and I'll try to find some time to work on it. Implement more fine-grained remote client-level events [Spark Branch] - Key: HIVE-8972 URL: https://issues.apache.org/jira/browse/HIVE-8972 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8972.1-spark.patch, HIVE-8972.2-spark.patch, HIVE-8972.3-spark.patch, HIVE-8972.3-spark.patch, HIVE-8972.4-spark.patch, HIVE-8972.5-spark.patch Follow up task of HIVE-8956. Fine-grained events are useful for better job monitor and failure handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9074) add ability to force direct sql usage for perf reasons
[ https://issues.apache.org/jira/browse/HIVE-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250365#comment-14250365 ] Hive QA commented on HIVE-9074: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687772/HIVE-9074.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6713 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2112/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2112/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2112/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687772 - PreCommit-HIVE-TRUNK-Build add ability to force direct sql usage for perf reasons -- Key: HIVE-9074 URL: https://issues.apache.org/jira/browse/HIVE-9074 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Attachments: HIVE-9074.patch Some people run direct SQL and hit failures (e.g. due to Oracle 1000-in-expressions stupidity, illegal cast optimization in Derby and Oracle, or some other Hive and DB bugs). Currently, it falls back to ORM for such cases, however that can have huge impact on perf, and some people would rather have it fail so they can see the problem. In addition to off and on+fallback modes, on or fail mode needs to be added. The default will remain the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9147) Add unit test for HIVE-7323
Peter Slawski created HIVE-9147: --- Summary: Add unit test for HIVE-7323 Key: HIVE-9147 URL: https://issues.apache.org/jira/browse/HIVE-9147 Project: Hive Issue Type: Test Components: Statistics Affects Versions: 0.13.1, 0.14.0 Reporter: Peter Slawski Priority: Minor This unit test verifies that DateStatisticImpl doesn't store mutable objects from callers for minimum and maximum values. This ensures callers cannot modify the internal minimum and maximum values outside of DateStatisticImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance
[ https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250373#comment-14250373 ] Brock Noland commented on HIVE-9127: bq. In looking into HIVE-9135, I was wondering if it is better to fix the root cause of HIVE-7431 instead disabling the cache for Spark. I think that would be awesome. I think we disabled it early on when we were just trying to get HOS working. bq. If so, probably we don't need this work around? I think this work around results in better code generally. In CombineHiveInputFormat we were looking up the partition information on each loop iteration but with this fix we do it once before the loop, which is generally better. Improve CombineHiveInputFormat.getSplit performance --- Key: HIVE-9127 URL: https://issues.apache.org/jira/browse/HIVE-9127 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 0.14.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9127.1-spark.patch.txt, HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. Caller ST: {noformat} 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.ShuffleDependency.init(Dependency.scala:79) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl
[jira] [Updated] (HIVE-9147) Add unit test for HIVE-7323
[ https://issues.apache.org/jira/browse/HIVE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Slawski updated HIVE-9147: Attachment: HIVE-9147.1.patch Add unit test for HIVE-7323 --- Key: HIVE-9147 URL: https://issues.apache.org/jira/browse/HIVE-9147 Project: Hive Issue Type: Test Components: Statistics Affects Versions: 0.14.0, 0.13.1 Reporter: Peter Slawski Priority: Minor Attachments: HIVE-9147.1.patch This unit test verifies that DateStatisticImpl doesn't store mutable objects from callers for minimum and maximum values. This ensures callers cannot modify the internal minimum and maximum values outside of DateStatisticImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9147) Add unit test for HIVE-7323
[ https://issues.apache.org/jira/browse/HIVE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Slawski updated HIVE-9147: Fix Version/s: 0.15.0 Status: Patch Available (was: Open) Attached patch for unit test. Add unit test for HIVE-7323 --- Key: HIVE-9147 URL: https://issues.apache.org/jira/browse/HIVE-9147 Project: Hive Issue Type: Test Components: Statistics Affects Versions: 0.13.1, 0.14.0 Reporter: Peter Slawski Priority: Minor Fix For: 0.15.0 Attachments: HIVE-9147.1.patch This unit test verifies that DateStatisticImpl doesn't store mutable objects from callers for minimum and maximum values. This ensures callers cannot modify the internal minimum and maximum values outside of DateStatisticImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance
[ https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250383#comment-14250383 ] Jimmy Xiang commented on HIVE-9127: --- bq. I think this work around results in better code generally. Agreed. Thanks. Improve CombineHiveInputFormat.getSplit performance --- Key: HIVE-9127 URL: https://issues.apache.org/jira/browse/HIVE-9127 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 0.14.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9127.1-spark.patch.txt, HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. Caller ST: {noformat} 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.ShuffleDependency.init(Dependency.scala:79) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.dependencies(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301) 2014-12-16 14:36:22,203
[jira] [Commented] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance
[ https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250389#comment-14250389 ] Xuefu Zhang commented on HIVE-9127: --- {quote} Please modify the query if the patch is going to apply to trunk. {quote} My bad. I meant to say modify the JIRA, but now I see again and it seems alright except for a Spark component, which probably doesn't matter. Improve CombineHiveInputFormat.getSplit performance --- Key: HIVE-9127 URL: https://issues.apache.org/jira/browse/HIVE-9127 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 0.14.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9127.1-spark.patch.txt, HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. Caller ST: {noformat} 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.ShuffleDependency.init(Dependency.scala:79) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.dependencies(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl
[jira] [Created] (HIVE-9148) Fix default value for HWI_WAR_FILE
Peter Slawski created HIVE-9148: --- Summary: Fix default value for HWI_WAR_FILE Key: HIVE-9148 URL: https://issues.apache.org/jira/browse/HIVE-9148 Project: Hive Issue Type: Bug Components: Web UI Affects Versions: 0.13.1, 0.14.0 Reporter: Peter Slawski Priority: Minor The path to the hwi war file should be relative to hive home. However, HWI_WAR_FILE is set in hwi.sh to be an absolute path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9148) Fix default value for HWI_WAR_FILE
[ https://issues.apache.org/jira/browse/HIVE-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Slawski updated HIVE-9148: Attachment: HIVE-9148.1.patch Fix default value for HWI_WAR_FILE -- Key: HIVE-9148 URL: https://issues.apache.org/jira/browse/HIVE-9148 Project: Hive Issue Type: Bug Components: Web UI Affects Versions: 0.14.0, 0.13.1 Reporter: Peter Slawski Priority: Minor Attachments: HIVE-9148.1.patch The path to the hwi war file should be relative to hive home. However, HWI_WAR_FILE is set in hwi.sh to be an absolute path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9148) Fix default value for HWI_WAR_FILE
[ https://issues.apache.org/jira/browse/HIVE-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Slawski updated HIVE-9148: Fix Version/s: 0.15.0 Status: Patch Available (was: Open) Fix default value for HWI_WAR_FILE -- Key: HIVE-9148 URL: https://issues.apache.org/jira/browse/HIVE-9148 Project: Hive Issue Type: Bug Components: Web UI Affects Versions: 0.13.1, 0.14.0 Reporter: Peter Slawski Priority: Minor Fix For: 0.15.0 Attachments: HIVE-9148.1.patch The path to the hwi war file should be relative to hive home. However, HWI_WAR_FILE is set in hwi.sh to be an absolute path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Native ORC
I also feel in the long run it would be good to have ORC as a separate project (ASF) independent of hive. Having it in hive will only make it harder to modularize (SARG, vectorized readers, etc.). - Prasanth On Mon, Dec 15, 2014 at 5:27 PM, Thejas Nair the...@hortonworks.com wrote: IMO, in the long run, having ORC as a separate project makes a lot of sense, as it is used in many places outside of hive. On Mon, Dec 15, 2014 at 2:44 PM, Owen O'Malley omal...@apache.org wrote: All, We are working on a native (aka C++) ORC reader and writer. For now we are working on it over at my old github - https://github.com/hortonworks/orc . First of all, I wanted to let everyone know it is happening and invite others to give feedback or help. You can see the API for the reader in the src/orc directory. It leads to an interesting question. I'd like to contribute it back to Hive to keep the two implementations (Java and C++) together, but that would mean adding an optional C++ module to Hive, which is currently all Java. The other option is to take the native ORC reader to Apache incubator as a new project and eventually pull the Java one along with it. I'm very interested in the Hive development communities opinion. Thanks, Owen -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance
[ https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9127: --- Component/s: (was: Spark) Improve CombineHiveInputFormat.getSplit performance --- Key: HIVE-9127 URL: https://issues.apache.org/jira/browse/HIVE-9127 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9127.1-spark.patch.txt, HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. Caller ST: {noformat} 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.ShuffleDependency.init(Dependency.scala:79) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.dependencies(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at
[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250496#comment-14250496 ] Jimmy Xiang commented on HIVE-8843: --- These failures are not related to the patch. Release RDD cache when Hive query is done [Spark Branch] Key: HIVE-8843 URL: https://issues.apache.org/jira/browse/HIVE-8843 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang Attachments: HIVE-8843.1-spark.patch, HIVE-8843.2-spark.patch, HIVE-8843.3-spark.patch, HIVE-8843.3-spark.patch In some multi-inser cases, RDD.cache() is called to improve performance. RDD is SparkContext specific, but the caching is useful only for the query. Thus, once the query is executed, we need to release the cache used by calling RDD.uncache(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8972) Implement more fine-grained remote client-level events [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250522#comment-14250522 ] Xuefu Zhang commented on HIVE-8972: --- +1 to the latest patch. [~vanzin], I think it makes sense to have a separate API for short-lived tasks as well as push-based notification for job monitoring. Please feel free to create new tasks for those. Thanks. Implement more fine-grained remote client-level events [Spark Branch] - Key: HIVE-8972 URL: https://issues.apache.org/jira/browse/HIVE-8972 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8972.1-spark.patch, HIVE-8972.2-spark.patch, HIVE-8972.3-spark.patch, HIVE-8972.3-spark.patch, HIVE-8972.4-spark.patch, HIVE-8972.5-spark.patch Follow up task of HIVE-8956. Fine-grained events are useful for better job monitor and failure handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8972) Implement more fine-grained remote client-level events [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8972: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Committed to Spark branch. Thanks to Rui, Marcelo, and Chengxiang. Implement more fine-grained remote client-level events [Spark Branch] - Key: HIVE-8972 URL: https://issues.apache.org/jira/browse/HIVE-8972 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Fix For: spark-branch Attachments: HIVE-8972.1-spark.patch, HIVE-8972.2-spark.patch, HIVE-8972.3-spark.patch, HIVE-8972.3-spark.patch, HIVE-8972.4-spark.patch, HIVE-8972.5-spark.patch Follow up task of HIVE-8956. Fine-grained events are useful for better job monitor and failure handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8843: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Committed to Spark branch. Thanks, Jimmy. Release RDD cache when Hive query is done [Spark Branch] Key: HIVE-8843 URL: https://issues.apache.org/jira/browse/HIVE-8843 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-8843.1-spark.patch, HIVE-8843.2-spark.patch, HIVE-8843.3-spark.patch, HIVE-8843.3-spark.patch In some multi-inser cases, RDD.cache() is called to improve performance. RDD is SparkContext specific, but the caching is useful only for the query. Thus, once the query is executed, we need to release the cache used by calling RDD.uncache(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9147) Add unit test for HIVE-7323
[ https://issues.apache.org/jira/browse/HIVE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250538#comment-14250538 ] Hive QA commented on HIVE-9147: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687801/HIVE-9147.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6714 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2113/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2113/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2113/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687801 - PreCommit-HIVE-TRUNK-Build Add unit test for HIVE-7323 --- Key: HIVE-9147 URL: https://issues.apache.org/jira/browse/HIVE-9147 Project: Hive Issue Type: Test Components: Statistics Affects Versions: 0.14.0, 0.13.1 Reporter: Peter Slawski Priority: Minor Fix For: 0.15.0 Attachments: HIVE-9147.1.patch This unit test verifies that DateStatisticImpl doesn't store mutable objects from callers for minimum and maximum values. This ensures callers cannot modify the internal minimum and maximum values outside of DateStatisticImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9136) Profile query compiler [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-9136: --- Attachment: HIVE-9136.1.patch Profile query compiler [Spark Branch] - Key: HIVE-9136 URL: https://issues.apache.org/jira/browse/HIVE-9136 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Attachments: HIVE-9136.1.patch We should put some performance counters around the compiler and evaluate how long it takes to compile a query in Spark versus the other execution frameworks. Query 28 is a good one to use for testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9136) Profile query compiler [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-9136: --- Status: Patch Available (was: Open) Patch v1. I added several spark-specific log events to {{PerfLogger}}. The correspondence agains Tez is: || In Tez || In Spark || | TEZ_SUBMIT_TO_RUNNING | SPARK_SUBMIT_TO_RUNNING | | TEZ_BUILD_DAG | SPARK_BUILD_PLAN + SPARK_BUILD_RDD_GRAPH| | TEZ_SUBMIT_DAG | SPARK_SUBMIT_JOB | | TEZ_RUN_DAG | SPARK_RUN_JOB | | TEZ_CREATE_VERTEX | SPARK_CREATE_TRAN | | TEZ_RUN_VERTEX | SPARK_RUN_STAGE | | TEZ_INIITIALIZE_PROCESSOR | ? | | TEZ_RUN_PROCESSOR | ? | | TEZ_INITIALIZE_OPERATORS | SPARK_INITIALIZE_OPERATORS | For TEZ_INITIALIZE_PROCESSOR and TEZ_RUN_PROCESSOR, I didn't find correspondence in our Spark branch. Any idea? Maybe log the {{SparkBaseFunctionResultList}}? In addition, I added SPARK_FLUSH_HASHTABLE, to track perf on Spark hash table sink, and SPARK_GENERATE_OPERATOR_TREE, to track perf on, as the name suggests, generating operator tree. I'm also open to any kind of suggestions. Profile query compiler [Spark Branch] - Key: HIVE-9136 URL: https://issues.apache.org/jira/browse/HIVE-9136 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Attachments: HIVE-9136.1.patch We should put some performance counters around the compiler and evaluate how long it takes to compile a query in Spark versus the other execution frameworks. Query 28 is a good one to use for testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9136) Profile query compiler [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250604#comment-14250604 ] Chao commented on HIVE-9136: Sorry, there's a typo above: it should be TEZ_INITIALIZE_PROCESSOR, not TEZ_INIITIALIZE_PROCESSOR. Profile query compiler [Spark Branch] - Key: HIVE-9136 URL: https://issues.apache.org/jira/browse/HIVE-9136 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Attachments: HIVE-9136.1.patch We should put some performance counters around the compiler and evaluate how long it takes to compile a query in Spark versus the other execution frameworks. Query 28 is a good one to use for testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9110) Performance of SELECT COUNT(*) FROM STORE SALES WHERE ss_item_sk IS NOT NULL [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao reassigned HIVE-9110: -- Assignee: Chao Performance of SELECT COUNT(*) FROM STORE SALES WHERE ss_item_sk IS NOT NULL [Spark Branch] --- Key: HIVE-9110 URL: https://issues.apache.org/jira/browse/HIVE-9110 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao The query {noformat} SELECT COUNT(*) FROM STORE SALES WHERE ss_item_sk IS NOT NULL {noformat} could benefit from performance enhancements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9149) Add unit test to test implicit conversion during dynamic partitioning/distribute by
Jason Dere created HIVE-9149: Summary: Add unit test to test implicit conversion during dynamic partitioning/distribute by Key: HIVE-9149 URL: https://issues.apache.org/jira/browse/HIVE-9149 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere This particular case was failing in Hive 0.13 because the string key column was not being converted to INT when written to the ORC file, resulting in a type cast error when reading data from the table. HIVE-8151 seems to have fixed this issue, but I would like to add a unit test to make sure we don't regress. {noformat} create table implicit_cast_during_insert (c1 int, c2 string) partitioned by (p1 string) stored as orc; set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table implicit_cast_during_insert partition (p1) select key, value, key key1 from (select * from src where key = 0) q distribute by key1 sort by key1; select * from implicit_cast_during_insert; drop table implicit_cast_during_insert; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9149) Add unit test to test implicit conversion during dynamic partitioning/distribute by
[ https://issues.apache.org/jira/browse/HIVE-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-9149: - Status: Patch Available (was: Open) Add unit test to test implicit conversion during dynamic partitioning/distribute by --- Key: HIVE-9149 URL: https://issues.apache.org/jira/browse/HIVE-9149 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-9149.1.patch This particular case was failing in Hive 0.13 because the string key column was not being converted to INT when written to the ORC file, resulting in a type cast error when reading data from the table. HIVE-8151 seems to have fixed this issue, but I would like to add a unit test to make sure we don't regress. {noformat} create table implicit_cast_during_insert (c1 int, c2 string) partitioned by (p1 string) stored as orc; set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table implicit_cast_during_insert partition (p1) select key, value, key key1 from (select * from src where key = 0) q distribute by key1 sort by key1; select * from implicit_cast_during_insert; drop table implicit_cast_during_insert; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9149) Add unit test to test implicit conversion during dynamic partitioning/distribute by
[ https://issues.apache.org/jira/browse/HIVE-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-9149: - Attachment: HIVE-9149.1.patch Add unit test to test implicit conversion during dynamic partitioning/distribute by --- Key: HIVE-9149 URL: https://issues.apache.org/jira/browse/HIVE-9149 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-9149.1.patch This particular case was failing in Hive 0.13 because the string key column was not being converted to INT when written to the ORC file, resulting in a type cast error when reading data from the table. HIVE-8151 seems to have fixed this issue, but I would like to add a unit test to make sure we don't regress. {noformat} create table implicit_cast_during_insert (c1 int, c2 string) partitioned by (p1 string) stored as orc; set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table implicit_cast_during_insert partition (p1) select key, value, key key1 from (select * from src where key = 0) q distribute by key1 sort by key1; select * from implicit_cast_during_insert; drop table implicit_cast_during_insert; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9149) Add unit test to test implicit conversion during dynamic partitioning/distribute by
[ https://issues.apache.org/jira/browse/HIVE-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250648#comment-14250648 ] Prasanth Jayachandran commented on HIVE-9149: - [~jdere] Can you add explain to your test? So that we see the UDF cast. Add unit test to test implicit conversion during dynamic partitioning/distribute by --- Key: HIVE-9149 URL: https://issues.apache.org/jira/browse/HIVE-9149 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-9149.1.patch This particular case was failing in Hive 0.13 because the string key column was not being converted to INT when written to the ORC file, resulting in a type cast error when reading data from the table. HIVE-8151 seems to have fixed this issue, but I would like to add a unit test to make sure we don't regress. {noformat} create table implicit_cast_during_insert (c1 int, c2 string) partitioned by (p1 string) stored as orc; set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table implicit_cast_during_insert partition (p1) select key, value, key key1 from (select * from src where key = 0) q distribute by key1 sort by key1; select * from implicit_cast_during_insert; drop table implicit_cast_during_insert; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8639) Convert SMBJoin to MapJoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-8639: Attachment: HIVE-8639.2-spark.patch Address review comments, update some golden files, and fix another issue. The issue is that if SMBJoin and MapJoin operators are in the same tree, they trigger some code in SparkReduceSinkMapJoinProc and GenSparkWork that corrupts the graph. In particular, those processor had assumed that you only visit a MapJoin op once from a non-RS path (big-table), but this becomes false if the big-table is a child of SMBJoin, as that itself has multiple non-RS parents. The additional fix is to make sure we walk down once from SMBJoinOp, only the big-table path. Thus we skip further walking if it's a small-table, as anyway no further processing is necessary. RB is not working for me at the moment, will upload there once it is. Convert SMBJoin to MapJoin [Spark Branch] - Key: HIVE-8639 URL: https://issues.apache.org/jira/browse/HIVE-8639 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-8639.1-spark.patch, HIVE-8639.2-spark.patch HIVE-8202 supports auto-conversion of SMB Join. However, if the tables are partitioned, there could be a slow down as each mapper would need to get a very small chunk of a partition which has a single key. Thus, in some scenarios it's beneficial to convert SMB join to map join. The task is to research and support the conversion from SMB join to map join for Spark execution engine. See the equivalent of MapReduce in SortMergeJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9149) Add unit test to test implicit conversion during dynamic partitioning/distribute by
[ https://issues.apache.org/jira/browse/HIVE-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-9149: - Attachment: HIVE-9149.2.patch Good point - in fact UDFToInteger was not being called because a constant 0 was being used in place of key due to optimization from (where key = 0). I've changed the query slightly and added the explain. Add unit test to test implicit conversion during dynamic partitioning/distribute by --- Key: HIVE-9149 URL: https://issues.apache.org/jira/browse/HIVE-9149 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-9149.1.patch, HIVE-9149.2.patch This particular case was failing in Hive 0.13 because the string key column was not being converted to INT when written to the ORC file, resulting in a type cast error when reading data from the table. HIVE-8151 seems to have fixed this issue, but I would like to add a unit test to make sure we don't regress. {noformat} create table implicit_cast_during_insert (c1 int, c2 string) partitioned by (p1 string) stored as orc; set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table implicit_cast_during_insert partition (p1) select key, value, key key1 from (select * from src where key = 0) q distribute by key1 sort by key1; select * from implicit_cast_during_insert; drop table implicit_cast_during_insert; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9148) Fix default value for HWI_WAR_FILE
[ https://issues.apache.org/jira/browse/HIVE-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250676#comment-14250676 ] Hive QA commented on HIVE-9148: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687811/HIVE-9148.1.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6713 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2114/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2114/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2114/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687811 - PreCommit-HIVE-TRUNK-Build Fix default value for HWI_WAR_FILE -- Key: HIVE-9148 URL: https://issues.apache.org/jira/browse/HIVE-9148 Project: Hive Issue Type: Bug Components: Web UI Affects Versions: 0.14.0, 0.13.1 Reporter: Peter Slawski Priority: Minor Fix For: 0.15.0 Attachments: HIVE-9148.1.patch The path to the hwi war file should be relative to hive home. However, HWI_WAR_FILE is set in hwi.sh to be an absolute path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9136) Profile query compiler [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250678#comment-14250678 ] Hive QA commented on HIVE-9136: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687827/HIVE-9136.1.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2115/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2115/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2115/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-2115/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'bin/ext/hwi.sh' ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/scheduler/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target accumulo-handler/target hwi/target common/target common/src/gen contrib/target service/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1646347. At revision 1646347. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12687827 - PreCommit-HIVE-TRUNK-Build Profile query compiler [Spark Branch] - Key: HIVE-9136 URL: https://issues.apache.org/jira/browse/HIVE-9136 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Attachments: HIVE-9136.1.patch We should put some performance counters around the compiler and evaluate how long it takes to compile a query in Spark versus the other execution frameworks. Query 28 is a good one to use for testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8639) Convert SMBJoin to MapJoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-8639: Attachment: HIVE-8639.3-spark.patch Fix some import statements. Convert SMBJoin to MapJoin [Spark Branch] - Key: HIVE-8639 URL: https://issues.apache.org/jira/browse/HIVE-8639 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-8639.1-spark.patch, HIVE-8639.2-spark.patch, HIVE-8639.3-spark.patch HIVE-8202 supports auto-conversion of SMB Join. However, if the tables are partitioned, there could be a slow down as each mapper would need to get a very small chunk of a partition which has a single key. Thus, in some scenarios it's beneficial to convert SMB join to map join. The task is to research and support the conversion from SMB join to map join for Spark execution engine. See the equivalent of MapReduce in SortMergeJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9150) Unrelated types are compared in GenTezWork#getFollowingWorkIndex()
Ted Yu created HIVE-9150: Summary: Unrelated types are compared in GenTezWork#getFollowingWorkIndex() Key: HIVE-9150 URL: https://issues.apache.org/jira/browse/HIVE-9150 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Here is the code: {code} if (tezWork.getEdgeProperty(unionWork, baseWork).equals(TezEdgeProperty.EdgeType.CONTAINS)) { {code} getEdgeProperty() returns TezEdgeProperty which is compared with TezEdgeProperty$EdgeType -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9053) select constant in union all followed by group by gives wrong result
[ https://issues.apache.org/jira/browse/HIVE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9053: -- Attachment: (was: HIVE-9053.04.patch-013) select constant in union all followed by group by gives wrong result Key: HIVE-9053 URL: https://issues.apache.org/jira/browse/HIVE-9053 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0 Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Fix For: 0.15.0 Attachments: HIVE-9053.01.patch, HIVE-9053.02.patch, HIVE-9053.03.patch, HIVE-9053.04.patch Here is the the way to reproduce with q test: select key from (select '1' as key from src union all select key from src)tab group by key; will give OK NULL 1 This is not correct as src contains many other keys. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9151) Checking s against null in TezJobMonitor#getNameWithProgress() should be done earlier
Ted Yu created HIVE-9151: Summary: Checking s against null in TezJobMonitor#getNameWithProgress() should be done earlier Key: HIVE-9151 URL: https://issues.apache.org/jira/browse/HIVE-9151 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor {code} int spaceRemaining = COLUMN_1_WIDTH - s.length() - 1; String trimmedVName = s; // if the vertex name is longer than column 1 width, trim it down // Tez Merge File Work will become Tez Merge File.. if (s != null s.length() COLUMN_1_WIDTH) { {code} s is dereferenced first, rendering the null check ineffective. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HIVE-9136) Profile query compiler [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9136: --- Comment: was deleted (was: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687827/HIVE-9136.1.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2115/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2115/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2115/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-2115/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'bin/ext/hwi.sh' ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/scheduler/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target accumulo-handler/target hwi/target common/target common/src/gen contrib/target service/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1646347. At revision 1646347. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12687827 - PreCommit-HIVE-TRUNK-Build) Profile query compiler [Spark Branch] - Key: HIVE-9136 URL: https://issues.apache.org/jira/browse/HIVE-9136 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Attachments: HIVE-9136.1.patch We should put some performance counters around the compiler and evaluate how long it takes to compile a query in Spark versus the other execution frameworks. Query 28 is a good one to use for testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9136) Profile query compiler [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250696#comment-14250696 ] Brock Noland commented on HIVE-9136: Looks like the patch is named for trunk.. Profile query compiler [Spark Branch] - Key: HIVE-9136 URL: https://issues.apache.org/jira/browse/HIVE-9136 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Attachments: HIVE-9136.1.patch We should put some performance counters around the compiler and evaluate how long it takes to compile a query in Spark versus the other execution frameworks. Query 28 is a good one to use for testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9136) Profile query compiler [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250703#comment-14250703 ] Chao commented on HIVE-9136: Yes... my mistake. Profile query compiler [Spark Branch] - Key: HIVE-9136 URL: https://issues.apache.org/jira/browse/HIVE-9136 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Attachments: HIVE-9136.1-spark.patch, HIVE-9136.1.patch We should put some performance counters around the compiler and evaluate how long it takes to compile a query in Spark versus the other execution frameworks. Query 28 is a good one to use for testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9136) Profile query compiler [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-9136: --- Attachment: HIVE-9136.1-spark.patch Profile query compiler [Spark Branch] - Key: HIVE-9136 URL: https://issues.apache.org/jira/browse/HIVE-9136 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Attachments: HIVE-9136.1-spark.patch, HIVE-9136.1.patch We should put some performance counters around the compiler and evaluate how long it takes to compile a query in Spark versus the other execution frameworks. Query 28 is a good one to use for testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9152) Dynamic Partition Pruning [Spark Branch]
Brock Noland created HIVE-9152: -- Summary: Dynamic Partition Pruning [Spark Branch] Key: HIVE-9152 URL: https://issues.apache.org/jira/browse/HIVE-9152 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
Brock Noland created HIVE-9153: -- Summary: Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch] Key: HIVE-9153 URL: https://issues.apache.org/jira/browse/HIVE-9153 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in Spark, it might make sense for us to use {{HiveInputFormat}} as well. We should evaluate this on a query which has many input splits such as {{select count(*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9152) Dynamic Partition Pruning [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao reassigned HIVE-9152: -- Assignee: Chao Dynamic Partition Pruning [Spark Branch] Key: HIVE-9152 URL: https://issues.apache.org/jira/browse/HIVE-9152 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)