[jira] [Updated] (HIVE-9000) LAST_VALUE Window function returns wrong results
[ https://issues.apache.org/jira/browse/HIVE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-9000: Description: LAST_VALUE Windowing function has been returning bad results, as far as I can tell from day 1. And, it seems like the tests are also asserting that LAST_VALUE gives the wrong result. Here's the test output: https://github.com/apache/hive/blob/branch-0.14/ql/src/test/results/clientpositive/windowing_navfn.q.out#L587 The query is: {code} select t, s, i, last_value(i) over (partition by t order by s) from over10k where (s = 'oscar allen' or s = 'oscar carson') and t = 10 {code} The result is: {code} t si last_value(i) --- 10 oscar allen 65662 65662 10 oscar carson65549 65549 {code} {{LAST_VALUE( i )}} should have returned 65549 in both records, instead it simply ends up returning i. Another way you can make sure LAST_VALUE is bad is to verify it's result against LEAD(i,1) over (partition by t order by s). LAST_VALUE being last value should always be more (in terms of the specified 'order by s') than the lead by 1. While this doesn't directly apply to the above query, if the result set had more rows, you would clearly see records where lead is higher than last_value which is semantically incorrect. was: LAST_VALUE Windowing function has been returning bad results, as far as I can tell from day 1. And, it seems like the tests are also asserting that LAST_VALUE gives the wrong result. Here's the test output: https://github.com/apache/hive/blob/branch-0.14/ql/src/test/results/clientpositive/windowing_navfn.q.out#L587 The query is: {code} select t, s, i, last_value(i) over (partition by t order by s) {code} The result is: {code} t si last_value(i) --- 10 oscar allen 65662 65662 10 oscar carson65549 65549 {code} {{LAST_VALUE( i )}} should have returned 65549 in both records, instead it simply ends up returning i. Another way you can make sure LAST_VALUE is bad is to verify it's result against LEAD(i,1) over (partition by t order by s). LAST_VALUE being last value should always be more (in terms of the specified 'order by s') than the lead by 1. While this doesn't directly apply to the above query, if the result set had more rows, you would clearly see records where lead is higher than last_value which is semantically incorrect. LAST_VALUE Window function returns wrong results Key: HIVE-9000 URL: https://issues.apache.org/jira/browse/HIVE-9000 Project: Hive Issue Type: Bug Components: PTF-Windowing Affects Versions: 0.13.1 Reporter: Mark Grover Priority: Critical Fix For: 0.14.1 LAST_VALUE Windowing function has been returning bad results, as far as I can tell from day 1. And, it seems like the tests are also asserting that LAST_VALUE gives the wrong result. Here's the test output: https://github.com/apache/hive/blob/branch-0.14/ql/src/test/results/clientpositive/windowing_navfn.q.out#L587 The query is: {code} select t, s, i, last_value(i) over (partition by t order by s) from over10k where (s = 'oscar allen' or s = 'oscar carson') and t = 10 {code} The result is: {code} t si last_value(i) --- 10oscar allen 65662 65662 10oscar carson65549 65549 {code} {{LAST_VALUE( i )}} should have returned 65549 in both records, instead it simply ends up returning i. Another way you can make sure LAST_VALUE is bad is to verify it's result against LEAD(i,1) over (partition by t order by s). LAST_VALUE being last value should always be more (in terms of the specified 'order by s') than the lead by 1. While this doesn't directly apply to the above query, if the result set had more rows, you would clearly see records where lead is higher than last_value which is semantically incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9094) TimeoutException when trying get executor count from RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247911#comment-14247911 ] Chengxiang Li commented on HIVE-9094: - Here is the TimeoutException in spark client side: {noformat} 2014-12-15 12:14:09,062 ERROR [main]: ql.Driver (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get spark memory/core info: java.util.concurrent.TimeoutException org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark memory/core info: java.util.concurrent.TimeoutException at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837) at org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234) at org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join16(TestSparkCliDriver.java:166) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Caused by: java.util.concurrent.TimeoutException at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35) at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.getExecutorCount(RemoteHiveSparkClient.java:92) at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getMemoryAndCores(SparkSessionImpl.java:77) at
[jira] [Updated] (HIVE-7797) upgrade hive schema from 0.9.0 to 0.13.1 failed
[ https://issues.apache.org/jira/browse/HIVE-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-7797: Summary: upgrade hive schema from 0.9.0 to 0.13.1 failed (was: upgrade sql 014-HIVE-3764.postgres.sql failed) upgrade hive schema from 0.9.0 to 0.13.1 failed Key: HIVE-7797 URL: https://issues.apache.org/jira/browse/HIVE-7797 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.1 Reporter: Nemon Lou The sql is : INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES (1, '', 'Initial value'); And the result is: ERROR: null value in column SCHEMA_VERSION violates not-null constraint DETAIL: Failing row contains (1, null, Initial value). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
How to debug hive unit test in eclipse ?
The wiki page looks like already out-dated, https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-DebuggingHiveCode Is there any new wiki page for how to debug hive unit test in eclipse ? -- Best Regards Jeff Zhang
[jira] [Updated] (HIVE-7797) upgrade hive schema from 0.9.0 to 0.13.1 failed
[ https://issues.apache.org/jira/browse/HIVE-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-7797: Description: Using hive schema tool with the following command to upgrade hive schema failed: schematool -dbType postgres -upgradeSchemaFrom 0.9.0 ERROR: null value in column SCHEMA_VERSION violates not-null constraint was: The sql is : INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES (1, '', 'Initial value'); And the result is: ERROR: null value in column SCHEMA_VERSION violates not-null constraint DETAIL: Failing row contains (1, null, Initial value). upgrade hive schema from 0.9.0 to 0.13.1 failed Key: HIVE-7797 URL: https://issues.apache.org/jira/browse/HIVE-7797 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.1 Reporter: Nemon Lou Using hive schema tool with the following command to upgrade hive schema failed: schematool -dbType postgres -upgradeSchemaFrom 0.9.0 ERROR: null value in column SCHEMA_VERSION violates not-null constraint -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7797) upgrade hive schema from 0.9.0 to 0.13.1 failed
[ https://issues.apache.org/jira/browse/HIVE-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-7797: Affects Version/s: 0.14.0 upgrade hive schema from 0.9.0 to 0.13.1 failed Key: HIVE-7797 URL: https://issues.apache.org/jira/browse/HIVE-7797 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 0.13.1 Reporter: Nemon Lou Using hive schema tool with the following command to upgrade hive schema failed: schematool -dbType postgres -upgradeSchemaFrom 0.9.0 ERROR: null value in column SCHEMA_VERSION violates not-null constraint Log shows that the upgrade sql file 014-HIVE-3764.postgres.sql failed. The sql in it is : INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES (1, '', 'Initial value'); And the result is: ERROR: null value in column SCHEMA_VERSION violates not-null constraint DETAIL: Failing row contains (1, null, Initial value). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7797) upgrade hive schema from 0.9.0 to 0.13.1 failed
[ https://issues.apache.org/jira/browse/HIVE-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-7797: Description: Using hive schema tool with the following command to upgrade hive schema failed: schematool -dbType postgres -upgradeSchemaFrom 0.9.0 ERROR: null value in column SCHEMA_VERSION violates not-null constraint Log shows that the upgrade sql file 014-HIVE-3764.postgres.sql failed. The sql in it is : INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES (1, '', 'Initial value'); And the result is: ERROR: null value in column SCHEMA_VERSION violates not-null constraint DETAIL: Failing row contains (1, null, Initial value). was: Using hive schema tool with the following command to upgrade hive schema failed: schematool -dbType postgres -upgradeSchemaFrom 0.9.0 ERROR: null value in column SCHEMA_VERSION violates not-null constraint upgrade hive schema from 0.9.0 to 0.13.1 failed Key: HIVE-7797 URL: https://issues.apache.org/jira/browse/HIVE-7797 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 0.13.1 Reporter: Nemon Lou Using hive schema tool with the following command to upgrade hive schema failed: schematool -dbType postgres -upgradeSchemaFrom 0.9.0 ERROR: null value in column SCHEMA_VERSION violates not-null constraint Log shows that the upgrade sql file 014-HIVE-3764.postgres.sql failed. The sql in it is : INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES (1, '', 'Initial value'); And the result is: ERROR: null value in column SCHEMA_VERSION violates not-null constraint DETAIL: Failing row contains (1, null, Initial value). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7977) Avoid creating serde for partitions if possible in FetchTask
[ https://issues.apache.org/jira/browse/HIVE-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247946#comment-14247946 ] Hive QA commented on HIVE-7977: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687358/HIVE-7977.6.patch.txt {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6705 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_partition_diff_num_cols org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partition_diff_num_cols {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2088/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2088/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2088/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687358 - PreCommit-HIVE-TRUNK-Build Avoid creating serde for partitions if possible in FetchTask Key: HIVE-7977 URL: https://issues.apache.org/jira/browse/HIVE-7977 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-7977.1.patch.txt, HIVE-7977.2.patch.txt, HIVE-7977.3.patch.txt, HIVE-7977.4.patch.txt, HIVE-7977.5.patch.txt, HIVE-7977.6.patch.txt Currently, FetchTask creates SerDe instance thrice for each partition, which can be avoided if it's same with table SerDe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7797) upgrade hive schema from 0.9.0 to 0.13.1 failed
[ https://issues.apache.org/jira/browse/HIVE-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-7797: Attachment: HIVE-7797.1.patch Using blank space instead of '' ,so postgres won't convert the empty string into null. upgrade hive schema from 0.9.0 to 0.13.1 failed Key: HIVE-7797 URL: https://issues.apache.org/jira/browse/HIVE-7797 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 0.13.1 Reporter: Nemon Lou Attachments: HIVE-7797.1.patch Using hive schema tool with the following command to upgrade hive schema failed: schematool -dbType postgres -upgradeSchemaFrom 0.9.0 ERROR: null value in column SCHEMA_VERSION violates not-null constraint Log shows that the upgrade sql file 014-HIVE-3764.postgres.sql failed. The sql in it is : INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES (1, '', 'Initial value'); And the result is: ERROR: null value in column SCHEMA_VERSION violates not-null constraint DETAIL: Failing row contains (1, null, Initial value). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9120) Hive Query log does not work when hive.exec.parallel is true
Dong Chen created HIVE-9120: --- Summary: Hive Query log does not work when hive.exec.parallel is true Key: HIVE-9120 URL: https://issues.apache.org/jira/browse/HIVE-9120 Project: Hive Issue Type: Bug Components: HiveServer2, Logging Reporter: Dong Chen When hive.exec.parallel is true, the query log is not saved and Beeline can not retrieve it. When parallel, Driver.launchTask() may run the task in a new thread if other conditions are also on. TaskRunner.start() is invoked instead of TaskRunner.runSequential(). This cause the threadlocal variable OperationLog to be null and query logs are not logged. The OperationLog object should be set in the new thread in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9120) Hive Query log does not work when hive.exec.parallel is true
[ https://issues.apache.org/jira/browse/HIVE-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Chen reassigned HIVE-9120: --- Assignee: Dong Chen Hive Query log does not work when hive.exec.parallel is true Key: HIVE-9120 URL: https://issues.apache.org/jira/browse/HIVE-9120 Project: Hive Issue Type: Bug Components: HiveServer2, Logging Reporter: Dong Chen Assignee: Dong Chen When hive.exec.parallel is true, the query log is not saved and Beeline can not retrieve it. When parallel, Driver.launchTask() may run the task in a new thread if other conditions are also on. TaskRunner.start() is invoked instead of TaskRunner.runSequential(). This cause the threadlocal variable OperationLog to be null and query logs are not logged. The OperationLog object should be set in the new thread in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9107) Non-lowercase field names in structs causes NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247953#comment-14247953 ] Lefty Leverenz commented on HIVE-9107: -- Is this related to HIVE-8386 or HIVE-8870 (also HIVE-6198)? Non-lowercase field names in structs causes NullPointerException Key: HIVE-9107 URL: https://issues.apache.org/jira/browse/HIVE-9107 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Jim Pivarski If an HQL query references a struct field with mixed or upper case, Hive throws a NullPointerException instead of giving a better error message or simply lower-casing the name. For example, if I have a struct in column mystruct with a field named myfield, a query like select mystruct.MyField from tablename; passes the local initialize (it submits an M-R job) but the remote initialize jobs throw NullPointerExceptions. The exception is on line 61 of org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator, which is right after the field name is extracted and not forced to be lower-case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9094) TimeoutException when trying get executor count from RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li reassigned HIVE-9094: --- Assignee: Chengxiang Li TimeoutException when trying get executor count from RSC [Spark Branch] --- Key: HIVE-9094 URL: https://issues.apache.org/jira/browse/HIVE-9094 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li In http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/532/testReport, join25.q failed because: {code} 2014-12-12 19:14:50,084 ERROR [main]: ql.Driver (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get spark memory/core info: java.util.concurrent.TimeoutException org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark memory/core info: java.util.concurrent.TimeoutException at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837) at org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234) at org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25(TestSparkCliDriver.java:162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Caused by:
[jira] [Updated] (HIVE-6468) HS2 Metastore using SASL out of memory error when curl sends a get request
[ https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-6468: - Labels: TODOC14 (was: ) HS2 Metastore using SASL out of memory error when curl sends a get request Key: HIVE-6468 URL: https://issues.apache.org/jira/browse/HIVE-6468 Project: Hive Issue Type: Bug Components: HiveServer2, Metastore Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.13.1 Environment: Centos 6.3, hive 12, hadoop-2.2 Reporter: Abin Shahab Assignee: Navis Labels: TODOC14 Fix For: 0.14.1 Attachments: HIVE-6468.0.patch, HIVE-6468.1.patch.txt, HIVE-6468.2.patch.txt, HIVE-6468.3.patch, HIVE-6468.4.patch, HIVE-6468.5.patch, HIVE-6468.branch-0.14.patch, HIVE-6468.branch-0.14.patch We see an out of memory error when we run simple beeline calls. (The hive.server2.transport.mode is binary) curl localhost:1 Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap space at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6468) HS2 Metastore using SASL out of memory error when curl sends a get request
[ https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247972#comment-14247972 ] Lefty Leverenz commented on HIVE-6468: -- Doc note: Add *hive.thrift.sasl.message.limit* to the wiki in Configuration Properties. But which section? Perhaps it belongs in the Metastore section after *hive.metastore.sasl.enabled* -- most other mentions of SASL occur in the HiveServer2 section. * [Configuration Properties -- Metastore | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-MetaStore] ** [hive.metastore.sasl.enabled | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.metastore.sasl.enabled] * [Configuration Properties -- HiveServer2 | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HiveServer2] ** [hive.server2.authentication | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.authentication] ** [hive.server2.thrift.sasl.qop | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.thrift.sasl.qop] bq. On trunk this issue has been resolved with a thrift version upgrade. Does that mean *hive.thrift.sasl.message.limit* will only exist in the 0.14.1 release? HS2 Metastore using SASL out of memory error when curl sends a get request Key: HIVE-6468 URL: https://issues.apache.org/jira/browse/HIVE-6468 Project: Hive Issue Type: Bug Components: HiveServer2, Metastore Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.13.1 Environment: Centos 6.3, hive 12, hadoop-2.2 Reporter: Abin Shahab Assignee: Navis Labels: TODOC14 Fix For: 0.14.1 Attachments: HIVE-6468.0.patch, HIVE-6468.1.patch.txt, HIVE-6468.2.patch.txt, HIVE-6468.3.patch, HIVE-6468.4.patch, HIVE-6468.5.patch, HIVE-6468.branch-0.14.patch, HIVE-6468.branch-0.14.patch We see an out of memory error when we run simple beeline calls. (The hive.server2.transport.mode is binary) curl localhost:1 Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap space at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9121) Enable beeline query progress information for Spark job[Spark Branch]
Chengxiang Li created HIVE-9121: --- Summary: Enable beeline query progress information for Spark job[Spark Branch] Key: HIVE-9121 URL: https://issues.apache.org/jira/browse/HIVE-9121 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Critical We could not get query progress information in Beeline as SparkJobMonitor is filtered out of operation log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9121) Enable beeline query progress information for Spark job[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-9121: Status: Patch Available (was: Open) Enable beeline query progress information for Spark job[Spark Branch] - Key: HIVE-9121 URL: https://issues.apache.org/jira/browse/HIVE-9121 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Critical Labels: Spark-M4 Attachments: HIVE-9121.1-spark.patch We could not get query progress information in Beeline as SparkJobMonitor is filtered out of operation log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9121) Enable beeline query progress information for Spark job[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-9121: Attachment: HIVE-9121.1-spark.patch *NOTE*: I enable Beeline query progress information on Spark with this patch while hive.exec.parallel=false. with hive.exec.parallel=true, Hive hit into another Beeline query progress issue which is tracked by HIVE-9120. Enable beeline query progress information for Spark job[Spark Branch] - Key: HIVE-9121 URL: https://issues.apache.org/jira/browse/HIVE-9121 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Critical Labels: Spark-M4 Attachments: HIVE-9121.1-spark.patch We could not get query progress information in Beeline as SparkJobMonitor is filtered out of operation log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8993) Make sure Spark + HS2 work [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247996#comment-14247996 ] Chengxiang Li commented on HIVE-8993: - [~brocknoland], I talked with Dong offline, we found the reasons why Beeline query progress does not work, and created HIVE-9120 and HIVE-9121 to track them. Make sure Spark + HS2 work [Spark Branch] - Key: HIVE-8993 URL: https://issues.apache.org/jira/browse/HIVE-8993 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Labels: TODOC-SPARK Fix For: spark-branch Attachments: HIVE-8993.1-spark.patch, HIVE-8993.2-spark.patch, HIVE-8993.3-spark.patch We haven't formally tested this combination yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout for all transports types
[ https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248013#comment-14248013 ] Lefty Leverenz commented on HIVE-6679: -- Doc note: Patch 4 adds two configuration parameters (which are different from the two in patch 1): *hive.server2.tcp.socket.blocking.timeout* and *hive.server2.tcp.socket.keepalive*. Code review: The time specification for *hive.server2.tcp.socket.blocking.timeout* could be done with a TimeValidator like others in HiveConf.java since 0.14.0, for example: {code} METASTORE_CLIENT_CONNECT_RETRY_DELAY(hive.metastore.client.connect.retry.delay, 1s, new TimeValidator(TimeUnit.SECONDS), {code} HiveServer2 should support configurable the server side socket timeout for all transports types --- Key: HIVE-6679 URL: https://issues.apache.org/jira/browse/HIVE-6679 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0, 0.14.0 Reporter: Prasad Mujumdar Assignee: Navis Fix For: 0.14.1 Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, HIVE-6679.3.patch, HIVE-6679.4.patch HiveServer2 should support configurable the server side socket read timeout and TCP keep-alive option. Metastore server already support this (and the so is the old hive server). We now have multiple client connectivity options like Kerberos, Delegation Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The configuration should be applicable to all types (if possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9115) Hive build failure on hadoop-2.7 due to HADOOP-11356
[ https://issues.apache.org/jira/browse/HIVE-9115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248074#comment-14248074 ] Hive QA commented on HIVE-9115: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687365/HIVE-9115.1.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6705 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_covar_pop org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_partition_diff_num_cols org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partition_diff_num_cols {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2089/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2089/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2089/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687365 - PreCommit-HIVE-TRUNK-Build Hive build failure on hadoop-2.7 due to HADOOP-11356 Key: HIVE-9115 URL: https://issues.apache.org/jira/browse/HIVE-9115 Project: Hive Issue Type: Bug Reporter: Jason Dere Attachments: HIVE-9115.1.patch HADOOP-11356 removes org.apache.hadoop.fs.permission.AccessControlException, causing build break on Hive when compiling against hadoop-2.7: {noformat} shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java:[808,63] cannot find symbol symbol: class AccessControlException location: package org.apache.hadoop.fs.permission [INFO] 1 error {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9076) incompatFileSet in AbstractFileMergeOperator should be marked to skip task id check
[ https://issues.apache.org/jira/browse/HIVE-9076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248156#comment-14248156 ] Hive QA commented on HIVE-9076: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687375/HIVE-9076.3.patch.txt {color:red}ERROR:{color} -1 due to 26 failed/errored test(s), 6706 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lb_fs_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_multiskew_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_multiskew_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_multiskew_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_truncate_column_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_partition_diff_num_cols org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partition_diff_num_cols org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_truncate_column_list_bucketing {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2090/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2090/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2090/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 26 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687375 - PreCommit-HIVE-TRUNK-Build incompatFileSet in AbstractFileMergeOperator should be marked to skip task id check --- Key: HIVE-9076 URL: https://issues.apache.org/jira/browse/HIVE-9076 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-9076.1.patch.txt, HIVE-9076.2.patch.txt, HIVE-9076.3.patch.txt In some file composition, AbstractFileMergeOperator removes incompatible files. For example, {noformat} 00_0 (v12) 00_0_copy_1 (v12) 00_1 (v11) 00_1_copy_1 (v11) 00_1_copy_2 (v11) 00_2 (v12) {noformat} 00_1 (v11) will be removed because 00 is assigned to new merged file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8827) Remove SSLv2Hello from list of disabled protocols
[ https://issues.apache.org/jira/browse/HIVE-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-8827: --- Affects Version/s: 0.14.0 Remove SSLv2Hello from list of disabled protocols - Key: HIVE-8827 URL: https://issues.apache.org/jira/browse/HIVE-8827 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.15.0 Attachments: HIVE-8827.1.patch Turns out SSLv2Hello is not the same as SSLv2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9121) Enable beeline query progress information for Spark job[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248160#comment-14248160 ] Hive QA commented on HIVE-9121: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687465/HIVE-9121.1-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 7235 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/553/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/553/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-553/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687465 - PreCommit-HIVE-SPARK-Build Enable beeline query progress information for Spark job[Spark Branch] - Key: HIVE-9121 URL: https://issues.apache.org/jira/browse/HIVE-9121 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Critical Labels: Spark-M4 Attachments: HIVE-9121.1-spark.patch We could not get query progress information in Beeline as SparkJobMonitor is filtered out of operation log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8827) Remove SSLv2Hello from list of disabled protocols
[ https://issues.apache.org/jira/browse/HIVE-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-8827: --- Component/s: JDBC HiveServer2 Remove SSLv2Hello from list of disabled protocols - Key: HIVE-8827 URL: https://issues.apache.org/jira/browse/HIVE-8827 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.15.0 Attachments: HIVE-8827.1.patch Turns out SSLv2Hello is not the same as SSLv2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9097) Support runtime skew join for more queries [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9097: - Attachment: HIVE-9097.1-spark.patch The patch splits the original spark task into two tasks so that conditional map joins can be inserted to process skewed data. Changes to golden files are all in query plan. Support runtime skew join for more queries [Spark Branch] - Key: HIVE-9097 URL: https://issues.apache.org/jira/browse/HIVE-9097 Project: Hive Issue Type: Improvement Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-9097.1-spark.patch After HIVE-8913, runtime skew join is enabled for spark. But currently the optimization only supports the simplest case where join is the leaf ReduceWork in a work graph. This is because the results from the original join and the conditional map join have to be unioned to feed to downstream works, which can be a little tricky for spark. This JIRA is to research and find a way to relax the above restriction. A possible solution is to break the original task into two tasks on the join work, and insert the conditional task in between. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9097) Support runtime skew join for more queries [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9097: - Status: Patch Available (was: Open) Support runtime skew join for more queries [Spark Branch] - Key: HIVE-9097 URL: https://issues.apache.org/jira/browse/HIVE-9097 Project: Hive Issue Type: Improvement Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-9097.1-spark.patch After HIVE-8913, runtime skew join is enabled for spark. But currently the optimization only supports the simplest case where join is the leaf ReduceWork in a work graph. This is because the results from the original join and the conditional map join have to be unioned to feed to downstream works, which can be a little tricky for spark. This JIRA is to research and find a way to relax the above restriction. A possible solution is to break the original task into two tasks on the join work, and insert the conditional task in between. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9097) Support runtime skew join for more queries [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248217#comment-14248217 ] Hive QA commented on HIVE-9097: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687476/HIVE-9097.1-spark.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7235 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/554/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/554/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-554/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687476 - PreCommit-HIVE-SPARK-Build Support runtime skew join for more queries [Spark Branch] - Key: HIVE-9097 URL: https://issues.apache.org/jira/browse/HIVE-9097 Project: Hive Issue Type: Improvement Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-9097.1-spark.patch After HIVE-8913, runtime skew join is enabled for spark. But currently the optimization only supports the simplest case where join is the leaf ReduceWork in a work graph. This is because the results from the original join and the conditional map join have to be unioned to feed to downstream works, which can be a little tricky for spark. This JIRA is to research and find a way to relax the above restriction. A possible solution is to break the original task into two tasks on the join work, and insert the conditional task in between. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9113) Explain on query failed with NPE
[ https://issues.apache.org/jira/browse/HIVE-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248235#comment-14248235 ] Prabhu Joseph commented on HIVE-9113: - Hi Chao, The subquery inside IN clause does not have from clause. The correct query is below, {noformat} select p.p_partkey, li.suppkey from (select distinct partkey as p_partkey from lineitem) p join lineitem li on p.p_partkey = li.partkey where li.l_linenumber = 1 and li.l_orderkey in (select l_orderkey from lineitem where l_linenumber = li.l_linenumber); {noformat} Explain on query failed with NPE Key: HIVE-9113 URL: https://issues.apache.org/jira/browse/HIVE-9113 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Chao Run explain on the following query: {noformat} select p.p_partkey, li.l_suppkey from (select distinct l_partkey as p_partkey from lineitem) p join lineitem li on p.p_partkey = li.l_partkey where li.l_linenumber = 1 and li.l_orderkey in (select l_orderkey where l_linenumber = li.l_linenumber) ; {noformat} gave me NPE: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.QBSubQuery.validateAndRewriteAST(QBSubQuery.java:516) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:2605) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8866) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9745) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9638) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10125) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:720) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:639) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:578) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {noformat} Is this query invalid? If so, it should at least give some explanations, not just a plain NPE message, and left user clueless. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8819) Create unit test where we read from an read only encrypted table
[ https://issues.apache.org/jira/browse/HIVE-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu reassigned HIVE-8819: -- Assignee: Ferdinand Xu Create unit test where we read from an read only encrypted table Key: HIVE-8819 URL: https://issues.apache.org/jira/browse/HIVE-8819 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Ferdinand Xu the table should be chmoded 555 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8131) Support timestamp in Avro
[ https://issues.apache.org/jira/browse/HIVE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-8131: --- Attachment: HIVE-8131.1.patch Support timestamp in Avro - Key: HIVE-8131 URL: https://issues.apache.org/jira/browse/HIVE-8131 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Ferdinand Xu Attachments: HIVE-8131.1.patch, HIVE-8131.patch, HIVE-8131.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9109) Add support for Java 8 specific q-test out files
[ https://issues.apache.org/jira/browse/HIVE-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248266#comment-14248266 ] Hive QA commented on HIVE-9109: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687376/HIVE-9109.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6705 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_partition_diff_num_cols org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partition_diff_num_cols {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2091/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2091/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2091/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687376 - PreCommit-HIVE-TRUNK-Build Add support for Java 8 specific q-test out files Key: HIVE-9109 URL: https://issues.apache.org/jira/browse/HIVE-9109 Project: Hive Issue Type: Sub-task Components: Testing Infrastructure Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal Attachments: HIVE-9109.1.patch, HIVE-9109.patch Hash function differences between Java 7 and Java 8 lead to result order differences. While we have been able to fix a good number by converting hash maps to insert order hash maps, there are several cases where doing so is either not possible (because changes originate in external APIs) or change leads to even more out file differences. For example: (1) In TestCliDriver.testCliDriver_varchar_udf1, for the following query: {code} select str_to_map('a:1,b:2,c:3',',',':'), str_to_map(cast('a:1,b:2,c:3' as varchar(20)),',',':') from varchar_udf_1 limit 1;”) {code} the {{StandardMapObjectInspector}} used in {{LazySimpleSerDe}} to serialize the final output uses a {{HashMap}}. Changing it to {{LinkedHashMap}} will lead to several other q-test output differences. (2) In TestCliDriver.testCliDriver_parquet_map_null, data with {{map}} column is read from an Avro table into a Parquet table. Avro API, specifically {{GenericData.Record}} uses {{HashMap}} and returns data in different order. This patch adds supports to specify a hint called {{JAVA_VERSION_SPECIFIC_OUTPUT}} which may be added to a q-test, only if different outputs are expected for Java versions. For example: Under Java 7, test output file has .java1.7.out extension. Under Java 8, test output file has .java1.8.out extension. If hint is not added, we continue to generate a single .out file for the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8131) Support timestamp in Avro
[ https://issues.apache.org/jira/browse/HIVE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248271#comment-14248271 ] Ferdinand Xu commented on HIVE-8131: Thank [~mohitsabharwal] for your review. [~brocknoland], can you help me review it when you have some time? Support timestamp in Avro - Key: HIVE-8131 URL: https://issues.apache.org/jira/browse/HIVE-8131 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Ferdinand Xu Attachments: HIVE-8131.1.patch, HIVE-8131.patch, HIVE-8131.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8893) Implement whitelist for builtin UDFs to avoid untrused code execution in multiuser mode
[ https://issues.apache.org/jira/browse/HIVE-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248272#comment-14248272 ] Lefty Leverenz commented on HIVE-8893: -- Doc issue: [~prasadm], I noticed that you put *hive.server2.builtin.udf.whitelist* and *hive.server2.builtin.udf.blacklist* in the Configuration Properties doc after *hive.security.authorization.sqlstd.confwhitelist*, which is in the SQL Standard Based Authorization section. Don't they belong in the HiveServer2 section instead? Or do they only apply to SQL standard-based authorization? Wherever they go, I'll add links in the Restricted List and Whitelist subsection of Authentication/Authorization just like the link for *hive.security.authorization.sqlstd.confwhitelist*. If you have better ideas about how to organize all these parameters, please let me know. Quick reference: * [hive.security.authorization.sqlstd.confwhitelist | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.security.authorization.sqlstd.confwhitelist] followed by hive.server2.builtin.udf.whitelist and hive.server2.builtin.udf.blacklist * [HiveServer2 | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HiveServer2] * [Restricted List and Whitelist | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-RestrictedListandWhitelist] Implement whitelist for builtin UDFs to avoid untrused code execution in multiuser mode --- Key: HIVE-8893 URL: https://issues.apache.org/jira/browse/HIVE-8893 Project: Hive Issue Type: Bug Components: Authorization, HiveServer2, SQL Affects Versions: 0.14.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.15.0 Attachments: HIVE-8893.3.patch, HIVE-8893.4.patch, HIVE-8893.5.patch, HIVE-8893.6.patch The udfs like reflect() or java_method() enables executing a java method as udf. While this offers lot of flexibility in the standalone mode, it can become a security loophole in a secure multiuser environment. For example, in HiveServer2 one can execute any available java code with user hive's credentials. We need a whitelist and blacklist to restrict builtin udfs in Hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8611) grant/revoke syntax should support additional objects for authorization plugins
[ https://issues.apache.org/jira/browse/HIVE-8611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248301#comment-14248301 ] Lefty Leverenz commented on HIVE-8611: -- Added *hive.security.authorization.task.factory* to Configuration Properties in the SQL Standard Based Authorization section: * [hive.security.authorization.task.factory | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.security.authorization.task.factory] grant/revoke syntax should support additional objects for authorization plugins --- Key: HIVE-8611 URL: https://issues.apache.org/jira/browse/HIVE-8611 Project: Hive Issue Type: Bug Components: Authentication, SQL Affects Versions: 0.13.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.15.0 Attachments: HIVE-8611.1.patch, HIVE-8611.2.patch, HIVE-8611.2.patch, HIVE-8611.3.patch, HIVE-8611.4.patch The authorization framework supports URI and global objects. The SQL syntax however doesn't allow granting privileges on these objects. We should allow the compiler to parse these so that it can be handled by authorization plugins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9106) improve the performance of null scan optimizer when several table scans share a physical path
[ https://issues.apache.org/jira/browse/HIVE-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248366#comment-14248366 ] Hive QA commented on HIVE-9106: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687378/HIVE-9106.00.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6705 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_partition_diff_num_cols org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partition_diff_num_cols {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2092/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2092/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2092/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687378 - PreCommit-HIVE-TRUNK-Build improve the performance of null scan optimizer when several table scans share a physical path - Key: HIVE-9106 URL: https://issues.apache.org/jira/browse/HIVE-9106 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Priority: Minor Attachments: HIVE-9106.00.patch Current fix HIVE-9053 addresses the correctness issue. The solution can be improved further when several table scans share a physical path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: How to debug hive unit test in eclipse ?
Besides running the test in eclipse (if possible), the other way to do is to enable debug for sure-fire and connect it with eclipse using remote application debugging. http://maven.apache.org/surefire/maven-surefire-plugin/examples/debugging.html --Xuefu On Tue, Dec 16, 2014 at 12:17 AM, Jeff Zhang zjf...@gmail.com wrote: The wiki page looks like already out-dated, https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-DebuggingHiveCode Is there any new wiki page for how to debug hive unit test in eclipse ? -- Best Regards Jeff Zhang
[jira] [Commented] (HIVE-9107) Non-lowercase field names in structs causes NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248386#comment-14248386 ] Jim Pivarski commented on HIVE-9107: It's the same stack trace as HIVE-8870, but not the other two. (I couldn't find it when I searched for preexisting issues, and I see that it's been fixed for 0.14.) Thanks! Non-lowercase field names in structs causes NullPointerException Key: HIVE-9107 URL: https://issues.apache.org/jira/browse/HIVE-9107 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Jim Pivarski If an HQL query references a struct field with mixed or upper case, Hive throws a NullPointerException instead of giving a better error message or simply lower-casing the name. For example, if I have a struct in column mystruct with a field named myfield, a query like select mystruct.MyField from tablename; passes the local initialize (it submits an M-R job) but the remote initialize jobs throw NullPointerExceptions. The exception is on line 61 of org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator, which is right after the field name is extracted and not forced to be lower-case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9122) Need to remove additional references to hive-shims-common-secure, hive-shims-0.20
Jason Dere created HIVE-9122: Summary: Need to remove additional references to hive-shims-common-secure, hive-shims-0.20 Key: HIVE-9122 URL: https://issues.apache.org/jira/browse/HIVE-9122 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Jason Dere Assignee: Jason Dere HIVE-8828/HIVE-8979 removed hive-shims-0.20 and hive-shims-common-secure, but we still have a few references to those removed packages: {noformat} $ find . -name pom.xml -exec egrep hive-shims-common-secure|hive-shims-0.20[^S] {} \; -print artifactorg.apache.hive.shims:hive-shims-common-secure/artifact ./jdbc/pom.xml includeorg.apache.hive.shims:hive-shims-0.20/include includeorg.apache.hive.shims:hive-shims-common-secure/include ./ql/pom.xml artifactIdhive-shims-common-secure/artifactId ./shims/0.20S/pom.xml artifactIdhive-shims-common-secure/artifactId ./shims/0.23/pom.xml artifactIdhive-shims-common-secure/artifactId ./shims/scheduler/pom.xml {noformat} While building from trunk, you can see that maven is still pulling an old snapshot of hive-shims-common-secure.jar from repository.apache.org: {noformat} [INFO] [INFO] Building Hive Shims 0.20S 0.15.0-SNAPSHOT [INFO] Downloading: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/maven-metadata.xml Downloaded: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/maven-metadata.xml (801 B at 7.2 KB/sec) Downloading: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.pom Downloaded: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.pom (4 KB at 47.0 KB/sec) Downloading: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.jar Downloaded: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.jar (33 KB at 277.0 KB/sec) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9122) Need to remove additional references to hive-shims-common-secure, hive-shims-0.20
[ https://issues.apache.org/jira/browse/HIVE-9122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-9122: - Attachment: HIVE-9122.1.patch Actually the references to hive-shims-common-secure need to be changed to hive-shims-common, or Hive would not build. Attaching patch. Need to remove additional references to hive-shims-common-secure, hive-shims-0.20 - Key: HIVE-9122 URL: https://issues.apache.org/jira/browse/HIVE-9122 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-9122.1.patch HIVE-8828/HIVE-8979 removed hive-shims-0.20 and hive-shims-common-secure, but we still have a few references to those removed packages: {noformat} $ find . -name pom.xml -exec egrep hive-shims-common-secure|hive-shims-0.20[^S] {} \; -print artifactorg.apache.hive.shims:hive-shims-common-secure/artifact ./jdbc/pom.xml includeorg.apache.hive.shims:hive-shims-0.20/include includeorg.apache.hive.shims:hive-shims-common-secure/include ./ql/pom.xml artifactIdhive-shims-common-secure/artifactId ./shims/0.20S/pom.xml artifactIdhive-shims-common-secure/artifactId ./shims/0.23/pom.xml artifactIdhive-shims-common-secure/artifactId ./shims/scheduler/pom.xml {noformat} While building from trunk, you can see that maven is still pulling an old snapshot of hive-shims-common-secure.jar from repository.apache.org: {noformat} [INFO] [INFO] Building Hive Shims 0.20S 0.15.0-SNAPSHOT [INFO] Downloading: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/maven-metadata.xml Downloaded: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/maven-metadata.xml (801 B at 7.2 KB/sec) Downloading: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.pom Downloaded: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.pom (4 KB at 47.0 KB/sec) Downloading: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.jar Downloaded: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.jar (33 KB at 277.0 KB/sec) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9122) Need to remove additional references to hive-shims-common-secure, hive-shims-0.20
[ https://issues.apache.org/jira/browse/HIVE-9122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-9122: - Status: Patch Available (was: Open) Need to remove additional references to hive-shims-common-secure, hive-shims-0.20 - Key: HIVE-9122 URL: https://issues.apache.org/jira/browse/HIVE-9122 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-9122.1.patch HIVE-8828/HIVE-8979 removed hive-shims-0.20 and hive-shims-common-secure, but we still have a few references to those removed packages: {noformat} $ find . -name pom.xml -exec egrep hive-shims-common-secure|hive-shims-0.20[^S] {} \; -print artifactorg.apache.hive.shims:hive-shims-common-secure/artifact ./jdbc/pom.xml includeorg.apache.hive.shims:hive-shims-0.20/include includeorg.apache.hive.shims:hive-shims-common-secure/include ./ql/pom.xml artifactIdhive-shims-common-secure/artifactId ./shims/0.20S/pom.xml artifactIdhive-shims-common-secure/artifactId ./shims/0.23/pom.xml artifactIdhive-shims-common-secure/artifactId ./shims/scheduler/pom.xml {noformat} While building from trunk, you can see that maven is still pulling an old snapshot of hive-shims-common-secure.jar from repository.apache.org: {noformat} [INFO] [INFO] Building Hive Shims 0.20S 0.15.0-SNAPSHOT [INFO] Downloading: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/maven-metadata.xml Downloaded: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/maven-metadata.xml (801 B at 7.2 KB/sec) Downloading: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.pom Downloaded: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.pom (4 KB at 47.0 KB/sec) Downloading: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.jar Downloaded: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.jar (33 KB at 277.0 KB/sec) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9123) Query with join fails with NPE when using join auto conversion
Kamil Gorlo created HIVE-9123: - Summary: Query with join fails with NPE when using join auto conversion Key: HIVE-9123 URL: https://issues.apache.org/jira/browse/HIVE-9123 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Environment: CDH5 with Hive 0.13.1 Reporter: Kamil Gorlo I have two simple tables: desc kgorlo_comm; +---++--+--+ | col_name | data_type | comment | +---++--+--+ | id| bigint | | | dest_id | bigint | | +---++--+--+ desc kgorlo_log; +---++--+--+ | col_name | data_type | comment | +---++--+--+ | id| bigint | | | dest_id | bigint | | | tstamp| bigint | | +---++--+--+ With data: select * from kgorlo_comm; +-+--+--+ | kgorlo_comm.id | kgorlo_comm.dest_id | +-+--+--+ | 1 | 2| | 2 | 1| | 1 | 3| | 2 | 3| | 3 | 5| | 4 | 5| +-+--+--+ select * from kgorlo_log; ++-++--+ | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp | ++-++--+ | 1 | 2 | 0 | | 1 | 3 | 0 | | 1 | 5 | 0 | | 3 | 1 | 0 | ++-++--+ Following query fails in second stage of execution: select v.id, v.dest_id from kgorlo_log v join (select id, dest_id, count(*) as wiad from kgorlo_comm group by id, dest_id)com1 on com1.id=v.id and com1.dest_id=v.dest_id; with following exception: 2014-12-16 17:09:17,629 ERROR [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unxpected exception: null java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.getRefKey(MapJoinOperator.java:198) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.computeMapJoinKey(MapJoinOperator.java:186) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:216) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-12-16 17:09:17,659 FATAL [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {_col0:1,_col1:2} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at
[jira] [Updated] (HIVE-9123) Query with join fails with NPE when using join auto conversion
[ https://issues.apache.org/jira/browse/HIVE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamil Gorlo updated HIVE-9123: -- Description: I have two simple tables: desc kgorlo_comm; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | desc kgorlo_log; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | | tstamp| bigint | | With data: select * from kgorlo_comm; | kgorlo_comm.id | kgorlo_comm.dest_id | | 1 | 2| | 2 | 1| | 1 | 3| | 2 | 3| | 3 | 5| | 4 | 5| select * from kgorlo_log; | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp | | 1 | 2 | 0 | | 1 | 3 | 0 | | 1 | 5 | 0 | | 3 | 1 | 0 | Following query fails in second stage of execution: select v.id, v.dest_id from kgorlo_log v join (select id, dest_id, count(*) as wiad from kgorlo_comm group by id, dest_id)com1 on com1.id=v.id and com1.dest_id=v.dest_id; with following exception: 2014-12-16 17:09:17,629 ERROR [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unxpected exception: null java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.getRefKey(MapJoinOperator.java:198) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.computeMapJoinKey(MapJoinOperator.java:186) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:216) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-12-16 17:09:17,659 FATAL [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {_col0:1,_col1:2} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unxpected exception: null at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:254) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at
[jira] [Commented] (HIVE-9120) Hive Query log does not work when hive.exec.parallel is true
[ https://issues.apache.org/jira/browse/HIVE-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248443#comment-14248443 ] Brock Noland commented on HIVE-9120: Nice find! Thank you [~dongc]!! Hive Query log does not work when hive.exec.parallel is true Key: HIVE-9120 URL: https://issues.apache.org/jira/browse/HIVE-9120 Project: Hive Issue Type: Bug Components: HiveServer2, Logging Reporter: Dong Chen Assignee: Dong Chen When hive.exec.parallel is true, the query log is not saved and Beeline can not retrieve it. When parallel, Driver.launchTask() may run the task in a new thread if other conditions are also on. TaskRunner.start() is invoked instead of TaskRunner.runSequential(). This cause the threadlocal variable OperationLog to be null and query logs are not logged. The OperationLog object should be set in the new thread in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9123) Query with join fails with NPE when using join auto conversion
[ https://issues.apache.org/jira/browse/HIVE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamil Gorlo updated HIVE-9123: -- Description: I have two simple tables: desc kgorlo_comm; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | desc kgorlo_log; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | | tstamp| bigint | | With data: select * from kgorlo_comm; | kgorlo_comm.id | kgorlo_comm.dest_id | | 1 | 2| | 2 | 1| | 1 | 3| | 2 | 3| | 3 | 5| | 4 | 5| select * from kgorlo_log; | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp | | 1 | 2 | 0 | | 1 | 3 | 0 | | 1 | 5 | 0 | | 3 | 1 | 0 | Following query fails in second stage of execution: `select v.id, v.dest_id from kgorlo_log v join (select id, dest_id, count(*) as wiad from kgorlo_comm group by id, dest_id)com1 on com1.id=v.id and com1.dest_id=v.dest_id;` with following exception: 2014-12-16 17:09:17,629 ERROR [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unxpected exception: null java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.getRefKey(MapJoinOperator.java:198) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.computeMapJoinKey(MapJoinOperator.java:186) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:216) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-12-16 17:09:17,659 FATAL [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {_col0:1,_col1:2} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unxpected exception: null at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:254) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at
[jira] [Updated] (HIVE-9123) Query with join fails with NPE when using join auto conversion
[ https://issues.apache.org/jira/browse/HIVE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamil Gorlo updated HIVE-9123: -- Description: I have two simple tables: desc kgorlo_comm; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | desc kgorlo_log; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | | tstamp| bigint | | With data: select * from kgorlo_comm; | kgorlo_comm.id | kgorlo_comm.dest_id | | 1 | 2| | 2 | 1| | 1 | 3| | 2 | 3| | 3 | 5| | 4 | 5| select * from kgorlo_log; | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp | | 1 | 2 | 0 | | 1 | 3 | 0 | | 1 | 5 | 0 | | 3 | 1 | 0 | Following query fails in second stage of execution: 'select v.id, v.dest_id from kgorlo_log v join (select id, dest_id, count(*) as wiad from kgorlo_comm group by id, dest_id)com1 on com1.id=v.id and com1.dest_id=v.dest_id;' with following exception: 2014-12-16 17:09:17,629 ERROR [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unxpected exception: null java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.getRefKey(MapJoinOperator.java:198) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.computeMapJoinKey(MapJoinOperator.java:186) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:216) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-12-16 17:09:17,659 FATAL [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {_col0:1,_col1:2} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unxpected exception: null at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:254) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at
[jira] [Commented] (HIVE-9121) Enable beeline query progress information for Spark job[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248446#comment-14248446 ] Brock Noland commented on HIVE-9121: +1 Enable beeline query progress information for Spark job[Spark Branch] - Key: HIVE-9121 URL: https://issues.apache.org/jira/browse/HIVE-9121 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Critical Labels: Spark-M4 Attachments: HIVE-9121.1-spark.patch We could not get query progress information in Beeline as SparkJobMonitor is filtered out of operation log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9123) Query with join fails with NPE when using join auto conversion
[ https://issues.apache.org/jira/browse/HIVE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamil Gorlo updated HIVE-9123: -- Description: I have two simple tables: desc kgorlo_comm; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | desc kgorlo_log; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | | tstamp| bigint | | With data: select * from kgorlo_comm; | kgorlo_comm.id | kgorlo_comm.dest_id | | 1 | 2| | 2 | 1| | 1 | 3| | 2 | 3| | 3 | 5| | 4 | 5| select * from kgorlo_log; | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp | | 1 | 2 | 0 | | 1 | 3 | 0 | | 1 | 5 | 0 | | 3 | 1 | 0 | Following query fails in second stage of execution: bq. select v.id, v.dest_id from kgorlo_log v join (select id, dest_id, count(*) as wiad from kgorlo_comm group by id, dest_id)com1 on com1.id=v.id and com1.dest_id=v.dest_id; with following exception: {quote} 2014-12-16 17:09:17,629 ERROR [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unxpected exception: null java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.getRefKey(MapJoinOperator.java:198) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.computeMapJoinKey(MapJoinOperator.java:186) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:216) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-12-16 17:09:17,659 FATAL [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {_col0:1,_col1:2} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unxpected exception: null at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:254) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at
[jira] [Updated] (HIVE-9123) Query with join fails with NPE when using join auto conversion
[ https://issues.apache.org/jira/browse/HIVE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamil Gorlo updated HIVE-9123: -- Description: I have two simple tables: desc kgorlo_comm; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | desc kgorlo_log; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | | tstamp| bigint | | With data: select * from kgorlo_comm; | kgorlo_comm.id | kgorlo_comm.dest_id | | 1 | 2| | 2 | 1| | 1 | 3| | 2 | 3| | 3 | 5| | 4 | 5| select * from kgorlo_log; | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp | | 1 | 2 | 0 | | 1 | 3 | 0 | | 1 | 5 | 0 | | 3 | 1 | 0 | Following query fails in second stage of execution: bq. select v.id, v.dest_id from kgorlo_log v join (select id, dest_id, count(*) as wiad from kgorlo_comm group by id, dest_id)com1 on com1.id=v.id and com1.dest_id=v.dest_id; with following exception: 2014-12-16 17:09:17,629 ERROR [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unxpected exception: null java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.getRefKey(MapJoinOperator.java:198) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.computeMapJoinKey(MapJoinOperator.java:186) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:216) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-12-16 17:09:17,659 FATAL [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {_col0:1,_col1:2} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unxpected exception: null at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:254) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at
[jira] [Updated] (HIVE-9121) Enable beeline query progress information for Spark job[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9121: --- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Thank you very much [~chengxiang li]! I have committed this to branch. Enable beeline query progress information for Spark job[Spark Branch] - Key: HIVE-9121 URL: https://issues.apache.org/jira/browse/HIVE-9121 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Critical Labels: Spark-M4 Fix For: spark-branch Attachments: HIVE-9121.1-spark.patch We could not get query progress information in Beeline as SparkJobMonitor is filtered out of operation log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7248) UNION ALL in hive returns incorrect results on Hbase backed table
[ https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248466#comment-14248466 ] Hive QA commented on HIVE-7248: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687391/HIVE-7248.3.patch.txt {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6705 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_partition_diff_num_cols org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partition_diff_num_cols {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2093/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2093/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2093/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687391 - PreCommit-HIVE-TRUNK-Build UNION ALL in hive returns incorrect results on Hbase backed table - Key: HIVE-7248 URL: https://issues.apache.org/jira/browse/HIVE-7248 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.12.0, 0.13.0, 0.13.1 Reporter: Mala Chikka Kempanna Assignee: Navis Attachments: HIVE-7248.1.patch.txt, HIVE-7248.2.patch.txt, HIVE-7248.3.patch.txt The issue can be recreated with following steps 1) In hbase create 'TABLE_EMP','default' 2) On hive sudo -u hive hive CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES(hbase.columns.mapping = default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key, hbase.scan.cache = 500, hbase.scan.cacheblocks = false ) TBLPROPERTIES(hbase.table.name = TABLE_EMP,'serialization.null.format'=''); 3) On hbase insert the following data put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 4) On hive execute the following query hive SELECT * FROM ( SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = '0' AND CDS_PK = '9' AND CDS_UPDATED_DATE IS NOT NULL UNION ALL SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = 'a' AND CDS_PK = 'z' AND CDS_UPDATED_DATE IS NOT NULL )t ; 5) Output of the query 1 1 2 2 6) Output of just SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = '0' AND CDS_PK = '9' AND CDS_UPDATED_DATE IS NOT NULL is 1 2 7) Output of just SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = 'a' AND CDS_PK = 'z' AND CDS_UPDATED_DATE IS NOT NULL Empty 8) UNION is used to combine the result from multiple SELECT statements into a single result set. Hive currently only supports UNION ALL (bag union), in which duplicates are not eliminated Accordingly above query should return output 1 2 instead it is giving wrong output 1 1 2 2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8131) Support timestamp in Avro
[ https://issues.apache.org/jira/browse/HIVE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248467#comment-14248467 ] Brock Noland commented on HIVE-8131: I think it'd be great to have [~rdblue] review this as well. Support timestamp in Avro - Key: HIVE-8131 URL: https://issues.apache.org/jira/browse/HIVE-8131 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Ferdinand Xu Attachments: HIVE-8131.1.patch, HIVE-8131.patch, HIVE-8131.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248489#comment-14248489 ] Jimmy Xiang commented on HIVE-8843: --- Thanks a lot for reviewing it. Good point. Yes, it is a little intrusive for the RSC one. Let me fix it. Release RDD cache when Hive query is done [Spark Branch] Key: HIVE-8843 URL: https://issues.apache.org/jira/browse/HIVE-8843 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang Attachments: HIVE-8843.1-spark.patch In some multi-inser cases, RDD.cache() is called to improve performance. RDD is SparkContext specific, but the caching is useful only for the query. Thus, once the query is executed, we need to release the cache used by calling RDD.uncache(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8815) Create unit test join of encrypted and unencrypted table
[ https://issues.apache.org/jira/browse/HIVE-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248502#comment-14248502 ] Brock Noland commented on HIVE-8815: Hi, Looks good to me! Say, do we need to add {{encrypt_data.txt}} or can we re-use one of the existing files? Cheers! Create unit test join of encrypted and unencrypted table Key: HIVE-8815 URL: https://issues.apache.org/jira/browse/HIVE-8815 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Ferdinand Xu Attachments: HIVE-8815.patch NO PRECOMMIT TESTS The results should be inserted into a third table encrypted with a separate key. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8816) Create unit test join of two encrypted tables with different keys
[ https://issues.apache.org/jira/browse/HIVE-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248504#comment-14248504 ] Brock Noland commented on HIVE-8816: Can we add {{explain extended}} and then verify in the {{.out.orig}} file that the {{.hive-staging}} used is inside the table with the stronger key? Create unit test join of two encrypted tables with different keys - Key: HIVE-8816 URL: https://issues.apache.org/jira/browse/HIVE-8816 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Ferdinand Xu Fix For: encryption-branch Attachments: HIVE-8816.patch NO PRECOMMIT TESTS The results should be inserted into a third table encrypted with a separate key. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 29061: HIVE-9109 : Add support for Java 8 specific q-test out files
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29061/#review65208 --- Ship it! Ship It! - Brock Noland On Dec. 16, 2014, 2:16 a.m., Mohit Sabharwal wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29061/ --- (Updated Dec. 16, 2014, 2:16 a.m.) Review request for hive. Bugs: HIVE-9109 https://issues.apache.org/jira/browse/HIVE-9109 Repository: hive-git Description --- HIVE-9109 : Add support for Java 8 specific q-test out files Hash function differences between Java 7 and Java 8 lead to result order differences. While we have been able to fix a good number by converting hash maps to insert order hash maps, there are several cases where doing so is either not possible (because changes originate in external APIs) or change leads to even more out file differences. For example: (1) In TestCliDriver.testCliDriver_varchar_udf1, for the following query: select str_to_map('a:1,b:2,c:3',',',':'), str_to_map(cast('a:1,b:2,c:3' as varchar(20)),',',':') from varchar_udf_1 limit 1;”) the StandardMapObjectInspector used in LazySimpleSerDe to serialize the final output uses a HashMap. Changing it to LinkedHashMap will lead to several other q-test output differences. (2) In TestCliDriver.testCliDriver_parquet_map_null, data with map column is read from an Avro table into a Parquet table. Avro API, specifically GenericData.Record uses HashMap and returns data in different order. This patch adds supports to specify a hint called JAVA_VERSION_SPECIFIC_OUTPUT which may be added to a q-test, only if different outputs are expected for different Java versions. Under Java 7, test output file has .java1.7.out extension. Under Java 8, test output file has .java1.8.out extension. If hint is not added, we continue to generate a .out file for the test. Diffs - itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java e06d6828aa8924de2b56c8a63cc0955c5bd514d2 ql/src/test/queries/clientpositive/varchar_udf1.q 0a3012b5cd6d3e0cf065e51e7a680af1f0db859d ql/src/test/results/clientpositive/varchar_udf1.q.java1.7.out PRE-CREATION ql/src/test/results/clientpositive/varchar_udf1.q.java1.8.out PRE-CREATION ql/src/test/results/clientpositive/varchar_udf1.q.out 842bd38cb5070994df3a264cc691372384433ae3 Diff: https://reviews.apache.org/r/29061/diff/ Testing --- Tested using varchar_udf1.q. Out file changes for this test are included in the patch Thanks, Mohit Sabharwal
[jira] [Commented] (HIVE-9109) Add support for Java 8 specific q-test out files
[ https://issues.apache.org/jira/browse/HIVE-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248508#comment-14248508 ] Brock Noland commented on HIVE-9109: +1 thank you [~mohitsabharwal] Add support for Java 8 specific q-test out files Key: HIVE-9109 URL: https://issues.apache.org/jira/browse/HIVE-9109 Project: Hive Issue Type: Sub-task Components: Testing Infrastructure Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal Attachments: HIVE-9109.1.patch, HIVE-9109.patch Hash function differences between Java 7 and Java 8 lead to result order differences. While we have been able to fix a good number by converting hash maps to insert order hash maps, there are several cases where doing so is either not possible (because changes originate in external APIs) or change leads to even more out file differences. For example: (1) In TestCliDriver.testCliDriver_varchar_udf1, for the following query: {code} select str_to_map('a:1,b:2,c:3',',',':'), str_to_map(cast('a:1,b:2,c:3' as varchar(20)),',',':') from varchar_udf_1 limit 1;”) {code} the {{StandardMapObjectInspector}} used in {{LazySimpleSerDe}} to serialize the final output uses a {{HashMap}}. Changing it to {{LinkedHashMap}} will lead to several other q-test output differences. (2) In TestCliDriver.testCliDriver_parquet_map_null, data with {{map}} column is read from an Avro table into a Parquet table. Avro API, specifically {{GenericData.Record}} uses {{HashMap}} and returns data in different order. This patch adds supports to specify a hint called {{JAVA_VERSION_SPECIFIC_OUTPUT}} which may be added to a q-test, only if different outputs are expected for Java versions. For example: Under Java 7, test output file has .java1.7.out extension. Under Java 8, test output file has .java1.8.out extension. If hint is not added, we continue to generate a single .out file for the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9087) The move task does not handle properly in the case of loading data from the local file system path.
[ https://issues.apache.org/jira/browse/HIVE-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9087: --- Resolution: Fixed Fix Version/s: encryption-branch Status: Resolved (was: Patch Available) Thank you [~Ferd] for the patch and [~spena] for the review!! I have committed this to branch! The move task does not handle properly in the case of loading data from the local file system path. --- Key: HIVE-9087 URL: https://issues.apache.org/jira/browse/HIVE-9087 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Fix For: encryption-branch Attachments: HIVE-9087.1.patch, HIVE-9087.patch NO PRECOMMIT TESTS The following exception will be thrown. load data local inpath /root/testdata/encrypt_data.txt overwrite into table unencrypteddb.src; Getting log thread is interrupted, since query is done! 14/12/12 19:17:12 ERROR exec.Task: Failed with exception Wrong FS: file:/root/testdata/encrypt_data.txt, expected: hdfs://localhost:9000 java.lang.IllegalArgumentException: Wrong FS: file:/root/testdata/encrypt_data.txt, expected: hdfs://localhost:9000 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645) at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465) at org.apache.hadoop.hive.common.FileUtils.isSubDir(FileUtils.java:616) at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2340) at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2659) at org.apache.hadoop.hive.ql.metadata.Table.replaceFiles(Table.java:666) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1571) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:289) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1644) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1404) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1217) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1044) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:144) at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536) at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/12/12 19:17:12 ERROR ql.Driver: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask 14/12/12 19:17:12 ERROR operation.Operation: Error running hive query: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:314) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:146) at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536) at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
[jira] [Commented] (HIVE-9115) Hive build failure on hadoop-2.7 due to HADOOP-11356
[ https://issues.apache.org/jira/browse/HIVE-9115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248529#comment-14248529 ] Jason Dere commented on HIVE-9115: -- Hmm, not sure why any of these failures would be related. The last 3 do not seem to be as they were failing in previous unit test runs. udaf_covar_pop.q is a new failure, will run locally to see if there is any issue. Hive build failure on hadoop-2.7 due to HADOOP-11356 Key: HIVE-9115 URL: https://issues.apache.org/jira/browse/HIVE-9115 Project: Hive Issue Type: Bug Reporter: Jason Dere Attachments: HIVE-9115.1.patch HADOOP-11356 removes org.apache.hadoop.fs.permission.AccessControlException, causing build break on Hive when compiling against hadoop-2.7: {noformat} shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java:[808,63] cannot find symbol symbol: class AccessControlException location: package org.apache.hadoop.fs.permission [INFO] 1 error {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9115) Hive build failure on hadoop-2.7 due to HADOOP-11356
[ https://issues.apache.org/jira/browse/HIVE-9115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248533#comment-14248533 ] Jason Dere commented on HIVE-9115: -- Some discussion from [~ste...@apache.org] on HADOOP-11411, about concerns this will cause Hive to fail at runtime when running on Hadoop-2.7. If so, then this fix may also be a candidate to add to 0.14 branch. {quote} Steve Loughran added a comment - 7 hours ago Does this mean that at run time, older versions of Hive will not run against Hadoop 2.7? Jason Dere added a comment - 1 hour ago No, this just means that Hive will not compile against Hadoop 2.7. Run time should work on older versions. Steve Loughran added a comment - 9 minutes ago looking at the patch, there's enough referencing of the Hadoop class that there may be some link problems {quote} Hive build failure on hadoop-2.7 due to HADOOP-11356 Key: HIVE-9115 URL: https://issues.apache.org/jira/browse/HIVE-9115 Project: Hive Issue Type: Bug Reporter: Jason Dere Attachments: HIVE-9115.1.patch HADOOP-11356 removes org.apache.hadoop.fs.permission.AccessControlException, causing build break on Hive when compiling against hadoop-2.7: {noformat} shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java:[808,63] cannot find symbol symbol: class AccessControlException location: package org.apache.hadoop.fs.permission [INFO] 1 error {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248536#comment-14248536 ] Jimmy Xiang commented on HIVE-8843: --- Thought about it again. The current solution seems to be the simplest one. Did I miss anything? Release RDD cache when Hive query is done [Spark Branch] Key: HIVE-8843 URL: https://issues.apache.org/jira/browse/HIVE-8843 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang Attachments: HIVE-8843.1-spark.patch In some multi-inser cases, RDD.cache() is called to improve performance. RDD is SparkContext specific, but the caching is useful only for the query. Thus, once the query is executed, we need to release the cache used by calling RDD.uncache(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9124) Performance of query 28 from tpc-ds
Brock Noland created HIVE-9124: -- Summary: Performance of query 28 from tpc-ds Key: HIVE-9124 URL: https://issues.apache.org/jira/browse/HIVE-9124 Project: Hive Issue Type: Sub-task Reporter: Brock Noland As you can see the from the attached screenshot, one stage was submitted at {{2014/12/16 12:06:30}} and took 6 minutes (ending around 12:12). However the next stage was not submitted until {{2014/12/16 12:18:42}}. We should understand: * What is going on the mean time * Why is it taking so long -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9124) Performance of query 28 from tpc-ds
[ https://issues.apache.org/jira/browse/HIVE-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9124: --- Description: As you can see the from the attached screenshot, one stage was submitted at {{2014/12/16 12:06:30}} and took 6 minutes (ending around 12:12). However the next stage was not submitted until {{2014/12/16 12:18:42}}. We should understand: * What is going on the mean time * Why is it taking so long {noformat} select * from (select avg(ss_list_price) B1_LP ,count(ss_list_price) B1_CNT ,count(distinct ss_list_price) B1_CNTD from store_sales where ss_quantity between 0 and 5 and (ss_list_price between 11 and 11+10 or ss_coupon_amt between 460 and 460+1000 or ss_wholesale_cost between 14 and 14+20)) B1, (select avg(ss_list_price) B2_LP ,count(ss_list_price) B2_CNT ,count(distinct ss_list_price) B2_CNTD from store_sales where ss_quantity between 6 and 10 and (ss_list_price between 91 and 91+10 or ss_coupon_amt between 1430 and 1430+1000 or ss_wholesale_cost between 32 and 32+20)) B2, (select avg(ss_list_price) B3_LP ,count(ss_list_price) B3_CNT ,count(distinct ss_list_price) B3_CNTD from store_sales where ss_quantity between 11 and 15 and (ss_list_price between 66 and 66+10 or ss_coupon_amt between 920 and 920+1000 or ss_wholesale_cost between 4 and 4+20)) B3, (select avg(ss_list_price) B4_LP ,count(ss_list_price) B4_CNT ,count(distinct ss_list_price) B4_CNTD from store_sales where ss_quantity between 16 and 20 and (ss_list_price between 142 and 142+10 or ss_coupon_amt between 3054 and 3054+1000 or ss_wholesale_cost between 80 and 80+20)) B4, (select avg(ss_list_price) B5_LP ,count(ss_list_price) B5_CNT ,count(distinct ss_list_price) B5_CNTD from store_sales where ss_quantity between 21 and 25 and (ss_list_price between 135 and 135+10 or ss_coupon_amt between 14180 and 14180+1000 or ss_wholesale_cost between 38 and 38+20)) B5, (select avg(ss_list_price) B6_LP ,count(ss_list_price) B6_CNT ,count(distinct ss_list_price) B6_CNTD from store_sales where ss_quantity between 26 and 30 and (ss_list_price between 28 and 28+10 or ss_coupon_amt between 2513 and 2513+1000 or ss_wholesale_cost between 42 and 42+20)) B6 limit 100 {noformat} was: As you can see the from the attached screenshot, one stage was submitted at {{2014/12/16 12:06:30}} and took 6 minutes (ending around 12:12). However the next stage was not submitted until {{2014/12/16 12:18:42}}. We should understand: * What is going on the mean time * Why is it taking so long Performance of query 28 from tpc-ds --- Key: HIVE-9124 URL: https://issues.apache.org/jira/browse/HIVE-9124 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Attachments: Screen Shot 2014-12-16 at 9.30.41 AM.png As you can see the from the attached screenshot, one stage was submitted at {{2014/12/16 12:06:30}} and took 6 minutes (ending around 12:12). However the next stage was not submitted until {{2014/12/16 12:18:42}}. We should understand: * What is going on the mean time * Why is it taking so long {noformat} select * from (select avg(ss_list_price) B1_LP ,count(ss_list_price) B1_CNT ,count(distinct ss_list_price) B1_CNTD from store_sales where ss_quantity between 0 and 5 and (ss_list_price between 11 and 11+10 or ss_coupon_amt between 460 and 460+1000 or ss_wholesale_cost between 14 and 14+20)) B1, (select avg(ss_list_price) B2_LP ,count(ss_list_price) B2_CNT ,count(distinct ss_list_price) B2_CNTD from store_sales where ss_quantity between 6 and 10 and (ss_list_price between 91 and 91+10 or ss_coupon_amt between 1430 and 1430+1000 or ss_wholesale_cost between 32 and 32+20)) B2, (select avg(ss_list_price) B3_LP ,count(ss_list_price) B3_CNT ,count(distinct ss_list_price) B3_CNTD from store_sales where ss_quantity between 11 and 15 and (ss_list_price between 66 and 66+10 or ss_coupon_amt between 920 and 920+1000 or ss_wholesale_cost between 4 and 4+20)) B3, (select avg(ss_list_price) B4_LP ,count(ss_list_price) B4_CNT ,count(distinct ss_list_price) B4_CNTD from store_sales where ss_quantity between 16 and 20 and
[jira] [Updated] (HIVE-9124) Performance of query 28 from tpc-ds
[ https://issues.apache.org/jira/browse/HIVE-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9124: --- Attachment: Screen Shot 2014-12-16 at 9.30.41 AM.png Performance of query 28 from tpc-ds --- Key: HIVE-9124 URL: https://issues.apache.org/jira/browse/HIVE-9124 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Attachments: Screen Shot 2014-12-16 at 9.30.41 AM.png As you can see the from the attached screenshot, one stage was submitted at {{2014/12/16 12:06:30}} and took 6 minutes (ending around 12:12). However the next stage was not submitted until {{2014/12/16 12:18:42}}. We should understand: * What is going on the mean time * Why is it taking so long {noformat} select * from (select avg(ss_list_price) B1_LP ,count(ss_list_price) B1_CNT ,count(distinct ss_list_price) B1_CNTD from store_sales where ss_quantity between 0 and 5 and (ss_list_price between 11 and 11+10 or ss_coupon_amt between 460 and 460+1000 or ss_wholesale_cost between 14 and 14+20)) B1, (select avg(ss_list_price) B2_LP ,count(ss_list_price) B2_CNT ,count(distinct ss_list_price) B2_CNTD from store_sales where ss_quantity between 6 and 10 and (ss_list_price between 91 and 91+10 or ss_coupon_amt between 1430 and 1430+1000 or ss_wholesale_cost between 32 and 32+20)) B2, (select avg(ss_list_price) B3_LP ,count(ss_list_price) B3_CNT ,count(distinct ss_list_price) B3_CNTD from store_sales where ss_quantity between 11 and 15 and (ss_list_price between 66 and 66+10 or ss_coupon_amt between 920 and 920+1000 or ss_wholesale_cost between 4 and 4+20)) B3, (select avg(ss_list_price) B4_LP ,count(ss_list_price) B4_CNT ,count(distinct ss_list_price) B4_CNTD from store_sales where ss_quantity between 16 and 20 and (ss_list_price between 142 and 142+10 or ss_coupon_amt between 3054 and 3054+1000 or ss_wholesale_cost between 80 and 80+20)) B4, (select avg(ss_list_price) B5_LP ,count(ss_list_price) B5_CNT ,count(distinct ss_list_price) B5_CNTD from store_sales where ss_quantity between 21 and 25 and (ss_list_price between 135 and 135+10 or ss_coupon_amt between 14180 and 14180+1000 or ss_wholesale_cost between 38 and 38+20)) B5, (select avg(ss_list_price) B6_LP ,count(ss_list_price) B6_CNT ,count(distinct ss_list_price) B6_CNTD from store_sales where ss_quantity between 26 and 30 and (ss_list_price between 28 and 28+10 or ss_coupon_amt between 2513 and 2513+1000 or ss_wholesale_cost between 42 and 42+20)) B6 limit 100 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9125) RSC stdout is logged twice
Brock Noland created HIVE-9125: -- Summary: RSC stdout is logged twice Key: HIVE-9125 URL: https://issues.apache.org/jira/browse/HIVE-9125 Project: Hive Issue Type: Sub-task Reporter: Brock Noland This is quite strange and I don't see the issue at first glance. {noformat} 2014-12-16 12:44:48,826 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) - 2014-12-16T12:44:48.638-0500: [Full GC [PSYoungGen: 111616K-50711K(143360K)] [ParOldGen: 349385K-349385K(349696K)] 461001K-400097K(493056K) [PSPermGen: 58684K-58684K(58880K)], 0.1879000 secs] [Times: user=1.14 sys=0.00, real=0.19 secs] 2014-12-16 12:44:48,826 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) - 2014-12-16T12:44:48.638-0500: [Full GC [PSYoungGen: 111616K-50711K(143360K)] [ParOldGen: 349385K-349385K(349696K)] 461001K-400097K(493056K) [PSPermGen: 58684K-58684K(58880K)], 0.1879000 secs] [Times: user=1.14 sys=0.00, real=0.19 secs] {noformat} {noformat} 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) - sparkDriver-akka.actor.default-dispatcher-3 daemon prio=10 tid=0x7f9e3c5cc000 nid=0x3698 runnable [0x7f9e30376000] 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) - sparkDriver-akka.actor.default-dispatcher-3 daemon prio=10 tid=0x7f9e3c5cc000 nid=0x3698 runnable [0x7f9e30376000] 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -java.lang.Thread.State: RUNNABLE 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -java.lang.Thread.State: RUNNABLE 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:767) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:767) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) 2014-12-16 12:44:48,555 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) 2014-12-16 12:44:48,555 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) 2014-12-16 12:44:48,555 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) 2014-12-16 12:44:48,555 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) 2014-12-16 12:44:48,555 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) 2014-12-16 12:44:48,555 INFO [stdout-redir-1]:
[jira] [Commented] (HIVE-9125) RSC stdout is logged twice
[ https://issues.apache.org/jira/browse/HIVE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248559#comment-14248559 ] Brock Noland commented on HIVE-9125: FYI [~vanzin] [~xuefuz] RSC stdout is logged twice -- Key: HIVE-9125 URL: https://issues.apache.org/jira/browse/HIVE-9125 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland This is quite strange and I don't see the issue at first glance. {noformat} 2014-12-16 12:44:48,826 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) - 2014-12-16T12:44:48.638-0500: [Full GC [PSYoungGen: 111616K-50711K(143360K)] [ParOldGen: 349385K-349385K(349696K)] 461001K-400097K(493056K) [PSPermGen: 58684K-58684K(58880K)], 0.1879000 secs] [Times: user=1.14 sys=0.00, real=0.19 secs] 2014-12-16 12:44:48,826 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) - 2014-12-16T12:44:48.638-0500: [Full GC [PSYoungGen: 111616K-50711K(143360K)] [ParOldGen: 349385K-349385K(349696K)] 461001K-400097K(493056K) [PSPermGen: 58684K-58684K(58880K)], 0.1879000 secs] [Times: user=1.14 sys=0.00, real=0.19 secs] {noformat} {noformat} 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) - sparkDriver-akka.actor.default-dispatcher-3 daemon prio=10 tid=0x7f9e3c5cc000 nid=0x3698 runnable [0x7f9e30376000] 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) - sparkDriver-akka.actor.default-dispatcher-3 daemon prio=10 tid=0x7f9e3c5cc000 nid=0x3698 runnable [0x7f9e30376000] 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -java.lang.Thread.State: RUNNABLE 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -java.lang.Thread.State: RUNNABLE 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:767) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:767) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) 2014-12-16 12:44:48,555 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) 2014-12-16 12:44:48,555 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) 2014-12-16 12:44:48,555 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) 2014-12-16 12:44:48,555 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at
[jira] [Commented] (HIVE-9094) TimeoutException when trying get executor count from RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248560#comment-14248560 ] Marcelo Vanzin commented on HIVE-9094: -- 60s sounds reasonable. This initial timeout will always be hard to figure out, since launching the app will depend a lot on the cluster being used and a bunch of other things... :-/ Perhaps we could add some kind of context status even that the client side can listen to, and the driver side can periodically send the update... but that would probably still need some kind of timeout. Anyway, raising the timeout sounds fine for now. TimeoutException when trying get executor count from RSC [Spark Branch] --- Key: HIVE-9094 URL: https://issues.apache.org/jira/browse/HIVE-9094 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li In http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/532/testReport, join25.q failed because: {code} 2014-12-12 19:14:50,084 ERROR [main]: ql.Driver (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get spark memory/core info: java.util.concurrent.TimeoutException org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark memory/core info: java.util.concurrent.TimeoutException at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837) at org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234) at org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25(TestSparkCliDriver.java:162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at
[jira] [Updated] (HIVE-9125) RSC stdout is logged twice
[ https://issues.apache.org/jira/browse/HIVE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9125: --- Priority: Minor (was: Major) RSC stdout is logged twice -- Key: HIVE-9125 URL: https://issues.apache.org/jira/browse/HIVE-9125 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Priority: Minor This is quite strange and I don't see the issue at first glance. {noformat} 2014-12-16 12:44:48,826 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) - 2014-12-16T12:44:48.638-0500: [Full GC [PSYoungGen: 111616K-50711K(143360K)] [ParOldGen: 349385K-349385K(349696K)] 461001K-400097K(493056K) [PSPermGen: 58684K-58684K(58880K)], 0.1879000 secs] [Times: user=1.14 sys=0.00, real=0.19 secs] 2014-12-16 12:44:48,826 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) - 2014-12-16T12:44:48.638-0500: [Full GC [PSYoungGen: 111616K-50711K(143360K)] [ParOldGen: 349385K-349385K(349696K)] 461001K-400097K(493056K) [PSPermGen: 58684K-58684K(58880K)], 0.1879000 secs] [Times: user=1.14 sys=0.00, real=0.19 secs] {noformat} {noformat} 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) - sparkDriver-akka.actor.default-dispatcher-3 daemon prio=10 tid=0x7f9e3c5cc000 nid=0x3698 runnable [0x7f9e30376000] 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) - sparkDriver-akka.actor.default-dispatcher-3 daemon prio=10 tid=0x7f9e3c5cc000 nid=0x3698 runnable [0x7f9e30376000] 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -java.lang.Thread.State: RUNNABLE 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -java.lang.Thread.State: RUNNABLE 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:767) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:767) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) 2014-12-16 12:44:48,555 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) 2014-12-16 12:44:48,555 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) 2014-12-16 12:44:48,555 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) 2014-12-16 12:44:48,555 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at
[jira] [Updated] (HIVE-9125) RSC stdout is logged twice [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9125: --- Affects Version/s: spark-branch Summary: RSC stdout is logged twice [Spark Branch] (was: RSC stdout is logged twice) RSC stdout is logged twice [Spark Branch] - Key: HIVE-9125 URL: https://issues.apache.org/jira/browse/HIVE-9125 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Priority: Minor This is quite strange and I don't see the issue at first glance. {noformat} 2014-12-16 12:44:48,826 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) - 2014-12-16T12:44:48.638-0500: [Full GC [PSYoungGen: 111616K-50711K(143360K)] [ParOldGen: 349385K-349385K(349696K)] 461001K-400097K(493056K) [PSPermGen: 58684K-58684K(58880K)], 0.1879000 secs] [Times: user=1.14 sys=0.00, real=0.19 secs] 2014-12-16 12:44:48,826 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) - 2014-12-16T12:44:48.638-0500: [Full GC [PSYoungGen: 111616K-50711K(143360K)] [ParOldGen: 349385K-349385K(349696K)] 461001K-400097K(493056K) [PSPermGen: 58684K-58684K(58880K)], 0.1879000 secs] [Times: user=1.14 sys=0.00, real=0.19 secs] {noformat} {noformat} 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) - sparkDriver-akka.actor.default-dispatcher-3 daemon prio=10 tid=0x7f9e3c5cc000 nid=0x3698 runnable [0x7f9e30376000] 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) - sparkDriver-akka.actor.default-dispatcher-3 daemon prio=10 tid=0x7f9e3c5cc000 nid=0x3698 runnable [0x7f9e30376000] 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -java.lang.Thread.State: RUNNABLE 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -java.lang.Thread.State: RUNNABLE 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:767) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:767) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) 2014-12-16 12:44:48,554 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) 2014-12-16 12:44:48,555 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) 2014-12-16 12:44:48,555 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) 2014-12-16 12:44:48,555 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) 2014-12-16 12:44:48,555
[jira] [Assigned] (HIVE-9115) Hive build failure on hadoop-2.7 due to HADOOP-11356
[ https://issues.apache.org/jira/browse/HIVE-9115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere reassigned HIVE-9115: Assignee: Jason Dere Hive build failure on hadoop-2.7 due to HADOOP-11356 Key: HIVE-9115 URL: https://issues.apache.org/jira/browse/HIVE-9115 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-9115.1.patch HADOOP-11356 removes org.apache.hadoop.fs.permission.AccessControlException, causing build break on Hive when compiling against hadoop-2.7: {noformat} shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java:[808,63] cannot find symbol symbol: class AccessControlException location: package org.apache.hadoop.fs.permission [INFO] 1 error {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9106) improve the performance of null scan optimizer when several table scans share a physical path
[ https://issues.apache.org/jira/browse/HIVE-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248564#comment-14248564 ] Pengcheng Xiong commented on HIVE-9106: --- [~ashutoshc], I studied the 4 failures. 2 out of the 4, vector_partition_diff_num_cols and vector_partition_diff_num_cols are probably due to Sergey's enable CBO patch (and the following Vikram's patch is not updated so that the column names are different). I run groupby3_map_multi_distinct.q and optimize_nullscan.q, both of them passed on my laptop. Thus, I think it is safe to check in. [~jpullokkaran], please let us know if you have other opinions. improve the performance of null scan optimizer when several table scans share a physical path - Key: HIVE-9106 URL: https://issues.apache.org/jira/browse/HIVE-9106 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Priority: Minor Attachments: HIVE-9106.00.patch Current fix HIVE-9053 addresses the correctness issue. The solution can be improved further when several table scans share a physical path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
What more Hive can do when compared to PIG
*Hello all* *Can somebody help me in getting the answer for the below question* *Its regarding PIG vs HIVE:* We knew that PIG for large data sets analysis and Hive is good at data summrization and adhoc queries. But,I want to know , an usecase where Hive can handle it and the same can not be acheived with PIG I mean to say, what more a HIve query can achieve when the same is not possible with PIG latin script if possible i want to know the viceversa case as well Thanks Mohan 469-274-5677
[jira] [Commented] (HIVE-7024) Escape control characters for explain result
[ https://issues.apache.org/jira/browse/HIVE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248574#comment-14248574 ] Hive QA commented on HIVE-7024: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687399/HIVE-7024.4.patch.txt {color:red}ERROR:{color} -1 due to 120 failed/errored test(s), 6705 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binary_output_format org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_display_colstats_tbllvl org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppr_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input42 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join17 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join26 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join33 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join34 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join35 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters_overlap org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_map_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_multiskew_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_louter_join_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_mapjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_outer_join_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pcr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_vc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppr_allchildsarenull org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_push_or org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regexp_extract org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_router_join_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample2
[jira] [Commented] (HIVE-9124) Performance of query 28 from tpc-ds
[ https://issues.apache.org/jira/browse/HIVE-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248622#comment-14248622 ] Brock Noland commented on HIVE-9124: There is 5.6 minutes of split generation: {noformat} 2014-12-16 12:06:30,757 INFO [sparkDriver-akka.actor.default-dispatcher-15]: log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=getSplits start=1418749551759 end=1418749590756 duration=38997 from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat 2014-12-16 12:13:39,512 INFO [sparkDriver-akka.actor.default-dispatcher-15]: log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=getSplits start=1418749978847 end=1418750019512 duration=40665 from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat 2014-12-16 12:14:52,475 INFO [sparkDriver-akka.actor.default-dispatcher-15]: log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=getSplits start=1418750020483 end=1418750092475 duration=71992 from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat 2014-12-16 12:16:19,405 INFO [sparkDriver-akka.actor.default-dispatcher-15]: log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=getSplits start=1418750094132 end=1418750179405 duration=85273 from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat 2014-12-16 12:18:42,716 INFO [sparkDriver-akka.actor.default-dispatcher-15]: log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=getSplits start=1418750181259 end=1418750322716 duration=141457 from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat 2014-12-16 12:45:13,361 INFO [sparkDriver-akka.actor.default-dispatcher-3]: log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - /PERFLOG method=getSplits start=1418751829900 end=1418751913361 duration=83461 from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat {noformat} Performance of query 28 from tpc-ds --- Key: HIVE-9124 URL: https://issues.apache.org/jira/browse/HIVE-9124 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Attachments: Screen Shot 2014-12-16 at 9.30.41 AM.png, query28-explain.txt As you can see the from the attached screenshot, one stage was submitted at {{2014/12/16 12:06:30}} and took 6 minutes (ending around 12:12). However the next stage was not submitted until {{2014/12/16 12:18:42}}. We should understand: * What is going on the mean time * Why is it taking so long {noformat} select * from (select avg(ss_list_price) B1_LP ,count(ss_list_price) B1_CNT ,count(distinct ss_list_price) B1_CNTD from store_sales where ss_quantity between 0 and 5 and (ss_list_price between 11 and 11+10 or ss_coupon_amt between 460 and 460+1000 or ss_wholesale_cost between 14 and 14+20)) B1, (select avg(ss_list_price) B2_LP ,count(ss_list_price) B2_CNT ,count(distinct ss_list_price) B2_CNTD from store_sales where ss_quantity between 6 and 10 and (ss_list_price between 91 and 91+10 or ss_coupon_amt between 1430 and 1430+1000 or ss_wholesale_cost between 32 and 32+20)) B2, (select avg(ss_list_price) B3_LP ,count(ss_list_price) B3_CNT ,count(distinct ss_list_price) B3_CNTD from store_sales where ss_quantity between 11 and 15 and (ss_list_price between 66 and 66+10 or ss_coupon_amt between 920 and 920+1000 or ss_wholesale_cost between 4 and 4+20)) B3, (select avg(ss_list_price) B4_LP ,count(ss_list_price) B4_CNT ,count(distinct ss_list_price) B4_CNTD from store_sales where ss_quantity between 16 and 20 and (ss_list_price between 142 and 142+10 or ss_coupon_amt between 3054 and 3054+1000 or ss_wholesale_cost between 80 and 80+20)) B4, (select avg(ss_list_price) B5_LP ,count(ss_list_price) B5_CNT ,count(distinct ss_list_price) B5_CNTD from store_sales where ss_quantity between 21 and 25 and (ss_list_price between 135 and 135+10 or ss_coupon_amt between 14180 and 14180+1000 or ss_wholesale_cost between 38 and 38+20)) B5, (select avg(ss_list_price) B6_LP ,count(ss_list_price) B6_CNT ,count(distinct ss_list_price) B6_CNTD from store_sales where ss_quantity between 26 and 30 and (ss_list_price between 28 and 28+10 or ss_coupon_amt between 2513 and 2513+1000 or ss_wholesale_cost between 42 and 42+20)) B6 limit 100 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9115) Hive build failure on hadoop-2.7 due to HADOOP-11356
[ https://issues.apache.org/jira/browse/HIVE-9115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248648#comment-14248648 ] Steve Loughran commented on HIVE-9115: -- +1 for 0.14, though I was thinking more of revert that hadoop change from branch-2. We're looking at a minimal 2.7 release before the end of the month (first Java 7+ only) release, and don't want any regressions from hadoop 2.6. this would be one. Hive build failure on hadoop-2.7 due to HADOOP-11356 Key: HIVE-9115 URL: https://issues.apache.org/jira/browse/HIVE-9115 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-9115.1.patch HADOOP-11356 removes org.apache.hadoop.fs.permission.AccessControlException, causing build break on Hive when compiling against hadoop-2.7: {noformat} shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java:[808,63] cannot find symbol symbol: class AccessControlException location: package org.apache.hadoop.fs.permission [INFO] 1 error {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8827) Remove SSLv2Hello from list of disabled protocols
[ https://issues.apache.org/jira/browse/HIVE-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248653#comment-14248653 ] Vaibhav Gumashta commented on HIVE-8827: I think we should backport this to 14.1 as well. Will create a new jira for that. Remove SSLv2Hello from list of disabled protocols - Key: HIVE-8827 URL: https://issues.apache.org/jira/browse/HIVE-8827 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.15.0 Attachments: HIVE-8827.1.patch Turns out SSLv2Hello is not the same as SSLv2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8988) Support advanced aggregation in Hive to Calcite path
[ https://issues.apache.org/jira/browse/HIVE-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248652#comment-14248652 ] Laljo John Pullokkaran commented on HIVE-8988: -- [~jcamachorodriguez] could you replied the patch. Now after Sergeys patch CBO is on by default; we want to make sure all Cube/Rollup/GroupingSet tests succeed. Support advanced aggregation in Hive to Calcite path - Key: HIVE-8988 URL: https://issues.apache.org/jira/browse/HIVE-8988 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Labels: grouping, logical, optiq Fix For: 0.15.0 Attachments: HIVE-8988.01.patch, HIVE-8988.02.patch, HIVE-8988.02.patch, HIVE-8988.patch CLEAR LIBRARY CACHE To close the gap between Hive and Calcite, we need to support the translation of GroupingSets into Calcite; currently this is not implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9126) Backport HIVE-8827 (Remove SSLv2Hello from list of disabled protocols) to 0.14 branch
Vaibhav Gumashta created HIVE-9126: -- Summary: Backport HIVE-8827 (Remove SSLv2Hello from list of disabled protocols) to 0.14 branch Key: HIVE-9126 URL: https://issues.apache.org/jira/browse/HIVE-9126 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.1 Check HIVE-8827 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8988) Support advanced aggregation in Hive to Calcite path
[ https://issues.apache.org/jira/browse/HIVE-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-8988: -- Attachment: HIVE-8988.02.patch Support advanced aggregation in Hive to Calcite path - Key: HIVE-8988 URL: https://issues.apache.org/jira/browse/HIVE-8988 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Labels: grouping, logical, optiq Fix For: 0.15.0 Attachments: HIVE-8988.01.patch, HIVE-8988.02.patch, HIVE-8988.02.patch, HIVE-8988.02.patch, HIVE-8988.patch CLEAR LIBRARY CACHE To close the gap between Hive and Calcite, we need to support the translation of GroupingSets into Calcite; currently this is not implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8988) Support advanced aggregation in Hive to Calcite path
[ https://issues.apache.org/jira/browse/HIVE-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-8988: -- Attachment: (was: HIVE-8988.02.patch) Support advanced aggregation in Hive to Calcite path - Key: HIVE-8988 URL: https://issues.apache.org/jira/browse/HIVE-8988 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Labels: grouping, logical, optiq Fix For: 0.15.0 Attachments: HIVE-8988.01.patch, HIVE-8988.02.patch, HIVE-8988.patch CLEAR LIBRARY CACHE To close the gap between Hive and Calcite, we need to support the translation of GroupingSets into Calcite; currently this is not implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8988) Support advanced aggregation in Hive to Calcite path
[ https://issues.apache.org/jira/browse/HIVE-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-8988: -- Attachment: (was: HIVE-8988.02.patch) Support advanced aggregation in Hive to Calcite path - Key: HIVE-8988 URL: https://issues.apache.org/jira/browse/HIVE-8988 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Labels: grouping, logical, optiq Fix For: 0.15.0 Attachments: HIVE-8988.01.patch, HIVE-8988.02.patch, HIVE-8988.patch CLEAR LIBRARY CACHE To close the gap between Hive and Calcite, we need to support the translation of GroupingSets into Calcite; currently this is not implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9127) Cache Map/Reduce works in RSC [Spark Branch]
Brock Noland created HIVE-9127: -- Summary: Cache Map/Reduce works in RSC [Spark Branch] Key: HIVE-9127 URL: https://issues.apache.org/jira/browse/HIVE-9127 Project: Hive Issue Type: Sub-task Reporter: Brock Noland In HIVE-7431 we disabling caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9124) Performance of query 28 from tpc-ds
[ https://issues.apache.org/jira/browse/HIVE-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248688#comment-14248688 ] Brock Noland commented on HIVE-9124: I created HIVE-9127 for the split generation issue. We'll keep this JIRA open for further investigation once that issue is closed. Performance of query 28 from tpc-ds --- Key: HIVE-9124 URL: https://issues.apache.org/jira/browse/HIVE-9124 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Attachments: Screen Shot 2014-12-16 at 9.30.41 AM.png, query28-explain.txt As you can see the from the attached screenshot, one stage was submitted at {{2014/12/16 12:06:30}} and took 6 minutes (ending around 12:12). However the next stage was not submitted until {{2014/12/16 12:18:42}}. We should understand: * What is going on the mean time * Why is it taking so long {noformat} select * from (select avg(ss_list_price) B1_LP ,count(ss_list_price) B1_CNT ,count(distinct ss_list_price) B1_CNTD from store_sales where ss_quantity between 0 and 5 and (ss_list_price between 11 and 11+10 or ss_coupon_amt between 460 and 460+1000 or ss_wholesale_cost between 14 and 14+20)) B1, (select avg(ss_list_price) B2_LP ,count(ss_list_price) B2_CNT ,count(distinct ss_list_price) B2_CNTD from store_sales where ss_quantity between 6 and 10 and (ss_list_price between 91 and 91+10 or ss_coupon_amt between 1430 and 1430+1000 or ss_wholesale_cost between 32 and 32+20)) B2, (select avg(ss_list_price) B3_LP ,count(ss_list_price) B3_CNT ,count(distinct ss_list_price) B3_CNTD from store_sales where ss_quantity between 11 and 15 and (ss_list_price between 66 and 66+10 or ss_coupon_amt between 920 and 920+1000 or ss_wholesale_cost between 4 and 4+20)) B3, (select avg(ss_list_price) B4_LP ,count(ss_list_price) B4_CNT ,count(distinct ss_list_price) B4_CNTD from store_sales where ss_quantity between 16 and 20 and (ss_list_price between 142 and 142+10 or ss_coupon_amt between 3054 and 3054+1000 or ss_wholesale_cost between 80 and 80+20)) B4, (select avg(ss_list_price) B5_LP ,count(ss_list_price) B5_CNT ,count(distinct ss_list_price) B5_CNTD from store_sales where ss_quantity between 21 and 25 and (ss_list_price between 135 and 135+10 or ss_coupon_amt between 14180 and 14180+1000 or ss_wholesale_cost between 38 and 38+20)) B5, (select avg(ss_list_price) B6_LP ,count(ss_list_price) B6_CNT ,count(distinct ss_list_price) B6_CNTD from store_sales where ss_quantity between 26 and 30 and (ss_list_price between 28 and 28+10 or ss_coupon_amt between 2513 and 2513+1000 or ss_wholesale_cost between 42 and 42+20)) B6 limit 100 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7431) When run on spark cluster, some spark tasks may fail
[ https://issues.apache.org/jira/browse/HIVE-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248691#comment-14248691 ] Brock Noland commented on HIVE-7431: FYI that I created a jira to re-introduce some caching of these objects in RSC in HIVE-9127. When run on spark cluster, some spark tasks may fail Key: HIVE-7431 URL: https://issues.apache.org/jira/browse/HIVE-7431 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li Fix For: spark-branch Attachments: HIVE-7431.1.patch, HIVE-7431.2.patch When running queries on spark, some spark tasks fail (usually the first couple of tasks) with the following stack trace: {quote} org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154) org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:60) org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:35) org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:161) org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:161) org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559) org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) ... {quote} Observed for spark standalone cluster. Not verified for spark on yarn or mesos. NO PRECOMMIT TESTS. This is for spark branch only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9127) Cache Map/Reduce works in RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9127: --- Description: In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. (was: In HIVE-7431 we disabling caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance.) Cache Map/Reduce works in RSC [Spark Branch] Key: HIVE-9127 URL: https://issues.apache.org/jira/browse/HIVE-9127 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 29111: HIVE-9041 - Generate better plan for queries containing both union and multi-insert [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29111/ --- Review request for hive and Xuefu Zhang. Bugs: HIVE-9041 https://issues.apache.org/jira/browse/HIVE-9041 Repository: hive-git Description --- This JIRA removes UnionWork from Spark plan. UnionWork right now is just a dummy work - in execution, it is translated to IdentityTran, which does nothing. The actually union operation is implemented with rdd.union, which happens when a BaseWork has multiple parent BaseWorks. For instance: MW_1MW_2 \/ \ / RW_1 In this case, MW_1 and MW_2 both translates to RDD_1 and RDD_2, and then we create another RDD_3 which is the result of rdd.union(RDD_1, RDD_2). We then create RDD_4 for RW_1, whose parent is RDD_3. *Changes on GenSparkWork* To remove the UnionWork, most changes are in GenSparkWork. I got rid of a chunk of code that creates UnionWork and link the work with parent works. But, I still kept `currentUnionOperators` and `workWithUnionOperators`, since they are needed for removing union operators later. I also changed how `followingWork` is handled. This happens when we have the following operator tree: TS_0 TS_1 \ / \ / UNION_2 / RS_3 / FS_4 (You can see that I ignored quite a few operators here. They are not required to illustrate the problem) In this plan, we will reach `RS_3` via two different paths: `TS_0` and `TS_1`. The first time we get to `RS_3`, say via `TS_0`, we would break `RS_3` with its child, and create a work for the path `TS_0 - UNION_2 - RS_3`. Let's say the work is `MW_1`. We then proceed to `FS_4`, create another ReduceWork `RW_2` for it, and link `RW_2` with `MW_1`. We then will visit to `RS_3` for the second time, from `TS_1`, and create another work for the path `TS_1 - UNION_2 - RS_3`, say `MW_3`. But, the problem is that `RS_3` is already disconnected with `FS_4`. In order to link `MW_3` with `RW_2`, we need to save that information somewhere. This is why we need `leafOpToChildWorkInfo`. It is actually changed from `leafOpToFollowingWork`. But, I found that we also need to have the edge property between `RS_3` and its child saved, in order to connect. I also encountered a case where two BaseWorks may be connected twice. I've explained that in the comments for the source code. *Changes on SparkPlanGenerator* Without UnionWork, SparkPlanGenerator can be a bit cleaner. The changes on this class are mostly refactoring. I got rid of some redundant code in `generate(SparkWork)` method, and combined `generate(MapWork)` and `generate(ReduceWork)` into one. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/IdentityTran.java eb758e09888d7864acc9d88c7186ae2de48bc8f7 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 438efabb062112da8fefc1bed9d8bd90ade26c67 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java 78cbc6d2eebef5b8edc10fe693a1b580a6ee389c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java ad6b09be83a33c0cd97ab9c3bc7d02adb928f1f3 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 654ba333969cacaafddec38c3c3f45ccb4b81d4a ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 137df65d2bb2de20bca06e47b9e1386ddf511c68 ql/src/test/results/clientpositive/spark/auto_join27.q.out fb48351bea5df3a19c14c755eb3a3fbb7f503e61 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_10.q.out 8472df914b8f5bdcb7974fc6689313d33975a4ad ql/src/test/results/clientpositive/spark/column_access_stats.q.out 72b2bd7e9b48033ca9cb1bd96facad42f12b6450 ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 1757d16a736741f90c5d84b7a0cc0c168cb7d3ad ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 04f481d4a304fdc8a86e8cfd305899084bab2e8d ql/src/test/results/clientpositive/spark/join34.q.out 9a58a228002a2b704541dfed1c713b3880e71f35 ql/src/test/results/clientpositive/spark/join35.q.out 851a98128dca74f0008c20faf717d3cc974150e0 ql/src/test/results/clientpositive/spark/load_dyn_part13.q.out 92693e69a08d1ab2ea0c019f2b7f0634316d1eaf ql/src/test/results/clientpositive/spark/load_dyn_part14.q.out 060745dcc80d69b5d17101c2641c228b949c2fb8 ql/src/test/results/clientpositive/spark/multi_insert.q.out 0a38beab815fb50fbb991d6228f48bb02b009998 ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out 639f4bd729587ce21b509bd8e3595107c0cf71bc ql/src/test/results/clientpositive/spark/multi_join_union.q.out d8dc110c3562e5c1e925553df86fba8ceda55b4a
Re: Review Request 29111: HIVE-9041 - Generate better plan for queries containing both union and multi-insert [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29111/ --- (Updated Dec. 16, 2014, 7:02 p.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9041 https://issues.apache.org/jira/browse/HIVE-9041 Repository: hive-git Description --- This JIRA removes UnionWork from Spark plan. UnionWork right now is just a dummy work - in execution, it is translated to IdentityTran, which does nothing. The actually union operation is implemented with rdd.union, which happens when a BaseWork has multiple parent BaseWorks. For instance: MW_1MW_2 \/ \ / RW_1 In this case, MW_1 and MW_2 both translates to RDD_1 and RDD_2, and then we create another RDD_3 which is the result of rdd.union(RDD_1, RDD_2). We then create RDD_4 for RW_1, whose parent is RDD_3. *Changes on GenSparkWork* To remove the UnionWork, most changes are in GenSparkWork. I got rid of a chunk of code that creates UnionWork and link the work with parent works. But, I still kept `currentUnionOperators` and `workWithUnionOperators`, since they are needed for removing union operators later. I also changed how `followingWork` is handled. This happens when we have the following operator tree: TS_0 TS_1 \ / \ / UNION_2 / RS_3 / FS_4 (You can see that I ignored quite a few operators here. They are not required to illustrate the problem) In this plan, we will reach `RS_3` via two different paths: `TS_0` and `TS_1`. The first time we get to `RS_3`, say via `TS_0`, we would break `RS_3` with its child, and create a work for the path `TS_0 - UNION_2 - RS_3`. Let's say the work is `MW_1`. We then proceed to `FS_4`, create another ReduceWork `RW_2` for it, and link `RW_2` with `MW_1`. We then will visit to `RS_3` for the second time, from `TS_1`, and create another work for the path `TS_1 - UNION_2 - RS_3`, say `MW_3`. But, the problem is that `RS_3` is already disconnected with `FS_4`. In order to link `MW_3` with `RW_2`, we need to save that information somewhere. This is why we need `leafOpToChildWorkInfo`. It is actually changed from `leafOpToFollowingWork`. But, I found that we also need to have the edge property between `RS_3` and its child saved, in order to connect. I also encountered a case where two BaseWorks may be connected twice. I've explained that in the comments for the source code. *Changes on SparkPlanGenerator* Without UnionWork, SparkPlanGenerator can be a bit cleaner. The changes on this class are mostly refactoring. I got rid of some redundant code in `generate(SparkWork)` method, and combined `generate(MapWork)` and `generate(ReduceWork)` into one. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/IdentityTran.java eb758e09888d7864acc9d88c7186ae2de48bc8f7 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 438efabb062112da8fefc1bed9d8bd90ade26c67 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java 78cbc6d2eebef5b8edc10fe693a1b580a6ee389c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java ad6b09be83a33c0cd97ab9c3bc7d02adb928f1f3 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 654ba333969cacaafddec38c3c3f45ccb4b81d4a ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 137df65d2bb2de20bca06e47b9e1386ddf511c68 ql/src/test/results/clientpositive/spark/auto_join27.q.out fb48351bea5df3a19c14c755eb3a3fbb7f503e61 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_10.q.out 8472df914b8f5bdcb7974fc6689313d33975a4ad ql/src/test/results/clientpositive/spark/column_access_stats.q.out 72b2bd7e9b48033ca9cb1bd96facad42f12b6450 ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 1757d16a736741f90c5d84b7a0cc0c168cb7d3ad ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 04f481d4a304fdc8a86e8cfd305899084bab2e8d ql/src/test/results/clientpositive/spark/join34.q.out 9a58a228002a2b704541dfed1c713b3880e71f35 ql/src/test/results/clientpositive/spark/join35.q.out 851a98128dca74f0008c20faf717d3cc974150e0 ql/src/test/results/clientpositive/spark/load_dyn_part13.q.out 92693e69a08d1ab2ea0c019f2b7f0634316d1eaf ql/src/test/results/clientpositive/spark/load_dyn_part14.q.out 060745dcc80d69b5d17101c2641c228b949c2fb8 ql/src/test/results/clientpositive/spark/multi_insert.q.out 0a38beab815fb50fbb991d6228f48bb02b009998 ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out 639f4bd729587ce21b509bd8e3595107c0cf71bc ql/src/test/results/clientpositive/spark/multi_join_union.q.out d8dc110c3562e5c1e925553df86fba8ceda55b4a
[jira] [Updated] (HIVE-8988) Support advanced aggregation in Hive to Calcite path
[ https://issues.apache.org/jira/browse/HIVE-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-8988: -- Attachment: HIVE-8988.03.patch .03 is the new patch with CBO enabled. Support advanced aggregation in Hive to Calcite path - Key: HIVE-8988 URL: https://issues.apache.org/jira/browse/HIVE-8988 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Labels: grouping, logical, optiq Fix For: 0.15.0 Attachments: HIVE-8988.01.patch, HIVE-8988.02.patch, HIVE-8988.03.patch, HIVE-8988.patch CLEAR LIBRARY CACHE To close the gap between Hive and Calcite, we need to support the translation of GroupingSets into Calcite; currently this is not implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9111) Potential NPE in OrcStruct for list and map types
[ https://issues.apache.org/jira/browse/HIVE-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-9111: Resolution: Fixed Fix Version/s: 0.14.1 0.15.0 Status: Resolved (was: Patch Available) Committed to trunk and branch-0.14.1 Potential NPE in OrcStruct for list and map types - Key: HIVE-9111 URL: https://issues.apache.org/jira/browse/HIVE-9111 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.14.0, 0.15.0, 0.14.1 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Labels: orcfile Fix For: 0.15.0, 0.14.1 Attachments: HIVE-9111.1.patch Currently getters in OrcStruct class for list and map object inspectors does not have null checks which may throw NPE when UDFs like size() is used on list or map column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9041) Generate better plan for queries containing both union and multi-insert [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-9041: --- Attachment: HIVE-9041.2-spark.patch Minor change on patch v1 (variable name, comments, etc). Also included optimize_nullscan.q.out, which is NOT in the patch for RB. Generate better plan for queries containing both union and multi-insert [Spark Branch] -- Key: HIVE-9041 URL: https://issues.apache.org/jira/browse/HIVE-9041 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Attachments: HIVE-9041.1-spark.patch, HIVE-9041.2-spark.patch This is a follow-up for HIVE-8920. For queries like: {code} from (select * from table0 union all select * from table1) s insert overwrite table table3 select s.x, count(1) group by s.x insert overwrite table table4 select s.y, count(1) group by s.y; {code} Currently we generate the following plan: {noformat} M1M2 \ / \ U3 R5 | R4 {noformat} It's better, however, to have the following plan: {noformat} M1 M2 |\ /| | \/ | | /\ | R4 R5 {noformat} Also, we can do some reseach in this JIRA to see if it's possible to remove UnionWork once and for all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9114) union all query in cbo test has undefined ordering
[ https://issues.apache.org/jira/browse/HIVE-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248744#comment-14248744 ] Vikram Dixit K commented on HIVE-9114: -- +1 for 0.14 union all query in cbo test has undefined ordering -- Key: HIVE-9114 URL: https://issues.apache.org/jira/browse/HIVE-9114 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.15.0, 0.14.1 Attachments: HIVE-9114-branch-0.14.patch, HIVE-9114.patch Ordering changes based on platform. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9122) Need to remove additional references to hive-shims-common-secure, hive-shims-0.20
[ https://issues.apache.org/jira/browse/HIVE-9122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248752#comment-14248752 ] Ashutosh Chauhan commented on HIVE-9122: Changes in jdbc/pom.xml may not be required. See, HIVE-8270 So, for jdbc just change shims-common-secure to shims-common so that we have all requisite class. Need to remove additional references to hive-shims-common-secure, hive-shims-0.20 - Key: HIVE-9122 URL: https://issues.apache.org/jira/browse/HIVE-9122 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-9122.1.patch HIVE-8828/HIVE-8979 removed hive-shims-0.20 and hive-shims-common-secure, but we still have a few references to those removed packages: {noformat} $ find . -name pom.xml -exec egrep hive-shims-common-secure|hive-shims-0.20[^S] {} \; -print artifactorg.apache.hive.shims:hive-shims-common-secure/artifact ./jdbc/pom.xml includeorg.apache.hive.shims:hive-shims-0.20/include includeorg.apache.hive.shims:hive-shims-common-secure/include ./ql/pom.xml artifactIdhive-shims-common-secure/artifactId ./shims/0.20S/pom.xml artifactIdhive-shims-common-secure/artifactId ./shims/0.23/pom.xml artifactIdhive-shims-common-secure/artifactId ./shims/scheduler/pom.xml {noformat} While building from trunk, you can see that maven is still pulling an old snapshot of hive-shims-common-secure.jar from repository.apache.org: {noformat} [INFO] [INFO] Building Hive Shims 0.20S 0.15.0-SNAPSHOT [INFO] Downloading: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/maven-metadata.xml Downloaded: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/maven-metadata.xml (801 B at 7.2 KB/sec) Downloading: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.pom Downloaded: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.pom (4 KB at 47.0 KB/sec) Downloading: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.jar Downloaded: http://repository.apache.org/snapshots/org/apache/hive/shims/hive-shims-common-secure/0.15.0-SNAPSHOT/hive-shims-common-secure-0.15.0-20141127.164914-20.jar (33 KB at 277.0 KB/sec) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9128) Evaluate hive.rpc.query.plan performance [Spark Branch]
Brock Noland created HIVE-9128: -- Summary: Evaluate hive.rpc.query.plan performance [Spark Branch] Key: HIVE-9128 URL: https://issues.apache.org/jira/browse/HIVE-9128 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Tez uses [hive.rpc.query.plan|https://github.com/apache/hive/blob/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1874] which is used in {{Utilities.java}}. Basically instead of writing the query plan to HDFS, the query plan is placed in the JobConf object and then de-serialized form there. We should do some evaluation to see which is more performant for us. We might need to place some timings in {{Utilities}} to understand this if the PerfLog doesn't have enough information today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9114) union all query in cbo test has undefined ordering
[ https://issues.apache.org/jira/browse/HIVE-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-9114: --- Resolution: Fixed Status: Resolved (was: Patch Available) committed to both union all query in cbo test has undefined ordering -- Key: HIVE-9114 URL: https://issues.apache.org/jira/browse/HIVE-9114 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.15.0, 0.14.1 Attachments: HIVE-9114-branch-0.14.patch, HIVE-9114.patch Ordering changes based on platform. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9113) Explain on query failed with NPE
[ https://issues.apache.org/jira/browse/HIVE-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248770#comment-14248770 ] Chao commented on HIVE-9113: [~Prabhu Joseph] Thanks! You're right, I forgot the from lineitem in the subquery. But in any case, I was expecting a more helpful error message, perhaps like Parsing Error: missing the FROM keyword at line 4 ..., or something similar. Explain on query failed with NPE Key: HIVE-9113 URL: https://issues.apache.org/jira/browse/HIVE-9113 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Chao Run explain on the following query: {noformat} select p.p_partkey, li.l_suppkey from (select distinct l_partkey as p_partkey from lineitem) p join lineitem li on p.p_partkey = li.l_partkey where li.l_linenumber = 1 and li.l_orderkey in (select l_orderkey where l_linenumber = li.l_linenumber) ; {noformat} gave me NPE: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.QBSubQuery.validateAndRewriteAST(QBSubQuery.java:516) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:2605) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8866) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9745) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9638) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10125) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:720) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:639) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:578) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {noformat} Is this query invalid? If so, it should at least give some explanations, not just a plain NPE message, and left user clueless. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9127) Cache Map/Reduce works in RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland reassigned HIVE-9127: -- Assignee: Brock Noland Cache Map/Reduce works in RSC [Spark Branch] Key: HIVE-9127 URL: https://issues.apache.org/jira/browse/HIVE-9127 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Brock Noland In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9127) Cache Map/Reduce works in RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9127: --- Description: In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. Caller ST: {noformat} 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.ShuffleDependency.init(Dependency.scala:79) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.dependencies(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.scheduler.DAGScheduler.getParentStages(DAGScheduler.scala:313) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.scheduler.DAGScheduler.newStage(DAGScheduler.scala:247) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:735) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at
[jira] [Updated] (HIVE-9127) Cache Map/Reduce works in RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9127: --- Attachment: HIVE-9127.1-spark.patch.txt Cache Map/Reduce works in RSC [Spark Branch] Key: HIVE-9127 URL: https://issues.apache.org/jira/browse/HIVE-9127 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9127.1-spark.patch.txt In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. Caller ST: {noformat} 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.ShuffleDependency.init(Dependency.scala:79) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.dependencies(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.scheduler.DAGScheduler.getParentStages(DAGScheduler.scala:313) 2014-12-16
[jira] [Updated] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance in RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9127: --- Summary: Improve CombineHiveInputFormat.getSplit performance in RSC [Spark Branch] (was: Improve getSplit performance in RSC [Spark Branch]) Improve CombineHiveInputFormat.getSplit performance in RSC [Spark Branch] - Key: HIVE-9127 URL: https://issues.apache.org/jira/browse/HIVE-9127 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9127.1-spark.patch.txt In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. Caller ST: {noformat} 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.ShuffleDependency.init(Dependency.scala:79) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.dependencies(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301) 2014-12-16
[jira] [Updated] (HIVE-9127) Improve getSplit performance in RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9127: --- Summary: Improve getSplit performance in RSC [Spark Branch] (was: Cache Map/Reduce works in RSC [Spark Branch]) Improve getSplit performance in RSC [Spark Branch] -- Key: HIVE-9127 URL: https://issues.apache.org/jira/browse/HIVE-9127 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9127.1-spark.patch.txt In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. Caller ST: {noformat} 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.ShuffleDependency.init(Dependency.scala:79) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.dependencies(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl