[jira] [Commented] (HIVE-7119) Extended ACL's should be inherited if warehouse perm inheritance enabled
[ https://issues.apache.org/jira/browse/HIVE-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251351#comment-14251351 ] Lefty Leverenz commented on HIVE-7119: -- bq. This is already doc'ed ... So I deleted the release note, which was Document this addition. Extended ACL's should be inherited if warehouse perm inheritance enabled Key: HIVE-7119 URL: https://issues.apache.org/jira/browse/HIVE-7119 Project: Hive Issue Type: Bug Reporter: Szehon Ho Assignee: Szehon Ho Fix For: 0.14.0 Attachments: HIVE-7119.2.patch, HIVE-7119.3.patch, HIVE-7119.4.patch, HIVE-7119.patch HDFS recently came out with support for extended ACL's, ie permission for specific group/user in addition to the general owner/group/other permission. Hive permission inheritance should also inherit those as well, if user has set them at any point in the warehouse directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9141) HiveOnTez: mix of union all, distinct, group by generates error
[ https://issues.apache.org/jira/browse/HIVE-9141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251356#comment-14251356 ] Vikram Dixit K commented on HIVE-9141: -- [~navis] Nice patch. Simplifies the code as well. However, it looks like with this change, the followingWork can never be a union work because of moving the code. Do you see any way followingWork can be union work? I think we can remove that piece of the code as well as remove the getFollowingWorkIndex method as well. Thoughts? HiveOnTez: mix of union all, distinct, group by generates error --- Key: HIVE-9141 URL: https://issues.apache.org/jira/browse/HIVE-9141 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Assignee: Navis Attachments: HIVE-9141.1.patch.txt Here is the way to produce it: in Hive q test setting (with src table) set hive.execution.engine=tez; SELECT key, value FROM ( SELECT key, value FROM src UNION ALL SELECT key, key as value FROM ( SELECT distinct key FROM ( SELECT key, value FROM (SELECT key, value FROM src UNION ALL SELECT key, value FROM src )t1 group by key, value )t2 )t3 )t4 group by key, value; will generate 2014-12-16 23:19:13,593 ERROR ql.Driver (SessionState.java:printError(834)) - FAILED: ClassCastException org.apache.hadoop.hive.ql.plan.MapWork cannot be cast to org.apache.hadoop.hive.ql.plan.ReduceWork java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.MapWork cannot be cast to org.apache.hadoop.hive.ql.plan.ReduceWork at org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:361) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69) at org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:419) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1107) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1155) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1044) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1034) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:206) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:158) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:369) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:304) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:834) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:136) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_uniontez2(TestMiniTezCliDriver.java:120) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7081) HiveServer/HiveServer2 leaks jdbc connections when network interrupt
[ https://issues.apache.org/jira/browse/HIVE-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251358#comment-14251358 ] Thejas M Nair commented on HIVE-7081: - I don't think HIVE-5799 will help to free up this thread. HIVE-6679 (or the upgrade to thrift 0.9.2) would be needed. HiveServer/HiveServer2 leaks jdbc connections when network interrupt - Key: HIVE-7081 URL: https://issues.apache.org/jira/browse/HIVE-7081 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.12.0, 0.13.0 Environment: hadoop 1.2.1 hive 0.12.0 / hive 0.13.0 linux 2.6.32 Reporter: Wang Zhiqiang Labels: ConnectoinLeak, HiveServer2, JDBC HiveServer/HiveServer2 leaks jdbc connections when network between client and server is interrupted。 I test both use DBVisualizer and write JDBC code,when the network between client and hiveserver/hiverserver2 is interrupted,the tcp connection in the server side will be in ESTABLISH state forever util the server is stoped。By Using jstack to print out server's thread, I found thread is doing socketRead0()。 {quote} pool-1-thread-13 prio=10 tid=0x7fd00c0c6800 nid=0x5d21 runnable [0x7fd00018] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked 0xebc24f28 (a java.io.BufferedInputStream) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6892) Permission inheritance issues
[ https://issues.apache.org/jira/browse/HIVE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251359#comment-14251359 ] Lefty Leverenz commented on HIVE-6892: -- Can we remove the TODOC14 label now? Also, should any other docs have links to Permission Inheritance in Hive? For example, Authorization or Storage Based Authorization: * [Authorization | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization] * [Storage Based Authorization | https://cwiki.apache.org/confluence/display/Hive/Storage+Based+Authorization+in+the+Metastore+Server] Permission inheritance issues - Key: HIVE-6892 URL: https://issues.apache.org/jira/browse/HIVE-6892 Project: Hive Issue Type: Bug Components: Security Affects Versions: 0.13.0 Reporter: Szehon Ho Assignee: Szehon Ho Labels: TODOC14 *HDFS Background* * When a file or directory is created, its owner is the user identity of the client process, and its group is inherited from parent (the BSD rule). Permissions are taken from default umask. Extended Acl's are taken from parent unless they are set explicitly. *Goals* To reduce need to set fine-grain file security props after every operation, users may want the following Hive warehouse file/dir to auto-inherit security properties from their directory parents: * Directories created by new database/table/partition/bucket * Files added to tables via load/insert * Table directories exported/imported (open question of whether exported table inheriting perm from new parent needs another flag) What may be inherited: * Basic file permission * Groups (already done by HDFS for new directories) * Extended ACL's (already done by HDFS for new directories) *Behavior* * When hive.warehouse.subdir.inherit.perms flag is enabled in Hive, Hive will try to do all above inheritances. In the future, we can add more flags for more finer-grained control. * Failure by Hive to inherit will not cause operation to fail. Rule of thumb of when security-prop inheritance will happen is the following: ** To run chmod, a user must be the owner of the file, or else a super-user. ** To run chgrp, a user must be the owner of files, or else a super-user. ** Hence, user that hive runs as (either 'hive' or the logged-in user in case of impersonation), must be super-user or owner of the file whose security properties are going to be changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8920) SplitSparkWorkResolver doesn't work with UnionWork [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251381#comment-14251381 ] Hive QA commented on HIVE-8920: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687976/HIVE-8920.1-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 7237 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/571/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/571/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-571/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687976 - PreCommit-HIVE-SPARK-Build SplitSparkWorkResolver doesn't work with UnionWork [Spark Branch] - Key: HIVE-8920 URL: https://issues.apache.org/jira/browse/HIVE-8920 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Xuefu Zhang Attachments: HIVE-8920.1-spark.patch The following query will not work: {code} from (select * from table0 union all select * from table1) s insert overwrite table table3 select s.x, count(1) group by s.x insert overwrite table table4 select s.y, count(1) group by s.y; {code} Currently, the plan for this query, before SplitSparkWorkResolver, looks like below: {noformat} M1M2 \ / \ U3 R5 | R4 {noformat} In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the childWork could be UnionWork U3. Thus, the code will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9148) Fix default value for HWI_WAR_FILE
[ https://issues.apache.org/jira/browse/HIVE-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251385#comment-14251385 ] Lefty Leverenz commented on HIVE-9148: -- Does this mean the description of configuration parameter *hive.hwi.war.file* is wrong in HiveConf.java and the wiki? {code} HIVEHWIWARFILE(hive.hwi.war.file, ${env:HWI_WAR_FILE}, This sets the path to the HWI war file, relative to ${HIVE_HOME}. ), {code} * [Configuration Properties -- hive.hwi.war.file | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.hwi.war.file] Fix default value for HWI_WAR_FILE -- Key: HIVE-9148 URL: https://issues.apache.org/jira/browse/HIVE-9148 Project: Hive Issue Type: Bug Components: Web UI Affects Versions: 0.14.0, 0.13.1 Reporter: Peter Slawski Priority: Minor Fix For: 0.15.0 Attachments: HIVE-9148.1.patch The path to the hwi war file should be relative to hive home. However, HWI_WAR_FILE is set in hwi.sh to be an absolute path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9004) Reset doesn't work for the default empty value entry
[ https://issues.apache.org/jira/browse/HIVE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251401#comment-14251401 ] Hive QA commented on HIVE-9004: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687952/HIVE-9004.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6714 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2123/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2123/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2123/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687952 - PreCommit-HIVE-TRUNK-Build Reset doesn't work for the default empty value entry Key: HIVE-9004 URL: https://issues.apache.org/jira/browse/HIVE-9004 Project: Hive Issue Type: Bug Components: Configuration Reporter: Cheng Hao Assignee: Cheng Hao Fix For: spark-branch, 0.15.0, 0.14.1 Attachments: HIVE-9004.patch To illustrate that: In hive cli: hive set hive.table.parameters.default; hive.table.parameters.default is undefined hive set hive.table.parameters.default=key1=value1; hive reset; hive set hive.table.parameters.default; hive.table.parameters.default=key1=value1 I think we expect the last output as hive.table.parameters.default is undefined -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251407#comment-14251407 ] Rui Li commented on HIVE-9153: -- I used our cluster B to test this. Results show that CombineHiveInputFormat still performs much better than HiveInputFormat for spark. The test query is {code}select count(*) from store_sales where ss_sold_date_sk is not null;{code} With CombineHiveInputFormat spark spawns 1252 mappers and the query finishes in about 180s, while HiveInputFormat requires 13559 mappers and the query finishes in about 700s. I didn't find why Tez uses HiveInputFormat as default. But for Tez, HiveInputFormat spawns 332 mappers while CombineHiveInputFormat spawns 1252. So I think Tez has its own way to combine the splits. With 332 mappers, Tez finishes the query in about 90s, and with 1252 mappers, it took about 120s. Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch] - Key: HIVE-9153 URL: https://issues.apache.org/jira/browse/HIVE-9153 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Rui Li The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in Spark, it might make sense for us to use {{HiveInputFormat}} as well. We should evaluate this on a query which has many input splits such as {{select count(\*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9116) Add unit test for multi sessions.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-9116: Attachment: HIVE-9116.1-spark.patch Add unit test for multi sessions.[Spark Branch] --- Key: HIVE-9116 URL: https://issues.apache.org/jira/browse/HIVE-9116 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M4 Attachments: HIVE-9116.1-spark.patch HS2 multi sessions support is enabled in HoS, we should add some unit tests for verification and regression test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9116) Add unit test for multi sessions.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-9116: Status: Patch Available (was: Open) Add unit test for multi sessions.[Spark Branch] --- Key: HIVE-9116 URL: https://issues.apache.org/jira/browse/HIVE-9116 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M4 Attachments: HIVE-9116.1-spark.patch HS2 multi sessions support is enabled in HoS, we should add some unit tests for verification and regression test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 29200: HIVE-9116 Add unit test for multi sessions on Spark.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29200/ --- Review request for hive and Xuefu Zhang. Bugs: HIVE-9116 https://issues.apache.org/jira/browse/HIVE-9116 Repository: hive-git Description --- test with multi sessions HS2 with multi thread jdbc connections. Diffs - itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestMultiSessionsHS2WithLocalClusterSpark.java PRE-CREATION Diff: https://reviews.apache.org/r/29200/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 29145: HIVE-9094 TimeoutException when trying get executor count from RSC [Spark Branch]
On Dec. 17, 2014, 7:06 p.m., Marcelo Vanzin wrote: +1 to Xuefu's comments. The config name also looks very generic, since it's only applied to a couple of jobs submitted to the client. But I don't have a good suggestion here. While getExecutorCount/getJobInfo/getStageInfo, we use JobHandle.get() to wait result, so I use SPARK_CLIENT_FUTURE_TIMEOUT here, which means Hive would use this setting as timeout value while call JobHandle.get(), it seems more reasonable than previous name. - chengxiang --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29145/#review65348 --- On Dec. 17, 2014, 6:28 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29145/ --- (Updated Dec. 17, 2014, 6:28 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9094 https://issues.apache.org/jira/browse/HIVE-9094 Repository: hive-git Description --- RemoteHiveSparkClient::getExecutorCount timeout after 5s as Spark cluster has not launched yet 1. set the timeout value configurable. 2. set default timeout value 60s. 3. enable timeout for get spark job info and get spark stage info. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 22f052a ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 5d6a02c ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java e1946d5 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java 6217de4 Diff: https://reviews.apache.org/r/29145/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 29145: HIVE-9094 TimeoutException when trying get executor count from RSC [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29145/ --- (Updated Dec. 18, 2014, 9:40 a.m.) Review request for hive and Xuefu Zhang. Changes --- update patch, and the setting name/desc. Bugs: HIVE-9094 https://issues.apache.org/jira/browse/HIVE-9094 Repository: hive-git Description --- RemoteHiveSparkClient::getExecutorCount timeout after 5s as Spark cluster has not launched yet 1. set the timeout value configurable. 2. set default timeout value 60s. 3. enable timeout for get spark job info and get spark stage info. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 22f052a ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 5d6a02c ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 256d0b0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java 1d3a9d8 Diff: https://reviews.apache.org/r/29145/diff/ Testing --- Thanks, chengxiang li
[jira] [Updated] (HIVE-9094) TimeoutException when trying get executor count from RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-9094: Attachment: HIVE-9094.2-spark.patch update setting name and description. TimeoutException when trying get executor count from RSC [Spark Branch] --- Key: HIVE-9094 URL: https://issues.apache.org/jira/browse/HIVE-9094 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Attachments: HIVE-9094.1-spark.patch, HIVE-9094.2-spark.patch In http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/532/testReport, join25.q failed because: {code} 2014-12-12 19:14:50,084 ERROR [main]: ql.Driver (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get spark memory/core info: java.util.concurrent.TimeoutException org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark memory/core info: java.util.concurrent.TimeoutException at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837) at org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234) at org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25(TestSparkCliDriver.java:162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
[jira] [Resolved] (HIVE-9126) Backport HIVE-8827 (Remove SSLv2Hello from list of disabled protocols) to 0.14 branch
[ https://issues.apache.org/jira/browse/HIVE-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta resolved HIVE-9126. Resolution: Duplicate As suggested by [~thejas], committed HIVE-8827 to branch 14 instead. Backport HIVE-8827 (Remove SSLv2Hello from list of disabled protocols) to 0.14 branch - Key: HIVE-9126 URL: https://issues.apache.org/jira/browse/HIVE-9126 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.1 Attachments: HIVE-9126.1.patch Check HIVE-8827. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8827) Remove SSLv2Hello from list of disabled protocols
[ https://issues.apache.org/jira/browse/HIVE-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-8827: --- Fix Version/s: 0.14.1 Also committed to 14.1. Remove SSLv2Hello from list of disabled protocols - Key: HIVE-8827 URL: https://issues.apache.org/jira/browse/HIVE-8827 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.15.0, 0.14.1 Attachments: HIVE-8827.1.patch Turns out SSLv2Hello is not the same as SSLv2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9158) Multiple LDAP server URLs in hive.server2.authentication.ldap.url
[ https://issues.apache.org/jira/browse/HIVE-9158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251452#comment-14251452 ] Hive QA commented on HIVE-9158: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687960/HIVE-9158.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6714 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2124/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2124/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2124/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687960 - PreCommit-HIVE-TRUNK-Build Multiple LDAP server URLs in hive.server2.authentication.ldap.url - Key: HIVE-9158 URL: https://issues.apache.org/jira/browse/HIVE-9158 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.14.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Attachments: HIVE-9158.1.patch, LDAPClient.java Support for multiple LDAP servers for failover in the event that one stops responding or is down for maintenance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9146) Query with left joins produces wrong result when join condition is written in different order
[ https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251495#comment-14251495 ] Kamil Gorlo commented on HIVE-9146: --- I've tested in on HDP 2.2 with Hive 0.14 and in fact everything is working as expected. Thanks. Query with left joins produces wrong result when join condition is written in different order - Key: HIVE-9146 URL: https://issues.apache.org/jira/browse/HIVE-9146 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Kamil Gorlo I have two queries which should be equal (I only swap two join conditions) but they are not. They are simplest queries I could produce to reproduce bug. I have two simple tables: desc kgorlo_comm; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | desc kgorlo_log; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | | tstamp| bigint | | With data: select * from kgorlo_comm; | kgorlo_comm.id | kgorlo_comm.dest_id | | 1 | 2| | 2 | 1| | 1 | 3| | 2 | 3| | 3 | 5| | 4 | 5| select * from kgorlo_log; | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp | | 1 | 2 | 0 | | 1 | 3 | 0 | | 1 | 5 | 0 | | 3 | 1 | 0 | And when I run this query (query no. 1): {quote} select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id; {quote} I get result (which is correct): | log.id | log.dest_id | com1.msgs | com2.msgs | | 1 | 2| 1 | 1 | | 1 | 3| 1 | NULL | | 1 | 5| NULL | NULL | | 3 | 1| NULL | 1 | But when I run second query (query no. 2): {quote} select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id; {quote} I get different (and bad, in my opinion) result: |log.id | log.dest_id | com1.msgs | com2.msgs| |1|2|1|1| |1|3|1|1| |1|5|NULL|NULL| |3|1|NULL|NULL| Query no. 1 and query no. 2 are different in only one place, it is second join condition: bf. com2.dest_id=log.id and com2.id=log.dest_id vs bf. com2.id=log.dest_id and com2.dest_id=log.id which in my opinion are equal. Explains for both queries are of course slightly different (columns are swapped) and they are here: https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good) https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9116) Add unit test for multi sessions.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251521#comment-14251521 ] Hive QA commented on HIVE-9116: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687998/HIVE-9116.1-spark.patch {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 7238 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/572/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/572/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-572/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687998 - PreCommit-HIVE-SPARK-Build Add unit test for multi sessions.[Spark Branch] --- Key: HIVE-9116 URL: https://issues.apache.org/jira/browse/HIVE-9116 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M4 Attachments: HIVE-9116.1-spark.patch HS2 multi sessions support is enabled in HoS, we should add some unit tests for verification and regression test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8681) CBO: Column names are missing from join expression in Map join with CBO enabled
[ https://issues.apache.org/jira/browse/HIVE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251553#comment-14251553 ] Hive QA commented on HIVE-8681: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687974/HIVE-8681.4.patch {color:red}ERROR:{color} -1 due to 1120 failed/errored test(s), 6716 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alias_casted_column org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_table_null_partition org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_array_map_access_nonconstant org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join17 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join19 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join21 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join24 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join26 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join28 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join33 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
[jira] [Commented] (HIVE-8722) Enhance InputSplitShims to extend InputSplitWithLocationInfo [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251557#comment-14251557 ] Rui Li commented on HIVE-8722: -- I got this exception which also seems related: {noformat} 2014-12-18 12:25:18,399 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) - 14/12/18 12:25:18 DEBUG rdd.HadoopRDD: SplitLocationInfo and other new Hadoop classes are unavailable. Using the older Hadoop location info code. 2014-12-18 12:25:18,399 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) - java.lang.ClassNotFoundException: org.apache.hadoop.mapred.InputSplitWithLocationInfo 2014-12-18 12:25:18,399 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at java.net.URLClassLoader$1.run(URLClassLoader.java:366) 2014-12-18 12:25:18,399 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at java.net.URLClassLoader$1.run(URLClassLoader.java:355) 2014-12-18 12:25:18,399 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at java.security.AccessController.doPrivileged(Native Method) 2014-12-18 12:25:18,399 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at java.net.URLClassLoader.findClass(URLClassLoader.java:354) 2014-12-18 12:25:18,399 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at java.lang.ClassLoader.loadClass(ClassLoader.java:425) 2014-12-18 12:25:18,399 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) 2014-12-18 12:25:18,399 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at java.lang.ClassLoader.loadClass(ClassLoader.java:358) 2014-12-18 12:25:18,399 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at java.lang.Class.forName0(Native Method) 2014-12-18 12:25:18,399 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at java.lang.Class.forName(Class.java:190) 2014-12-18 12:25:18,400 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD$SplitInfoReflections.init(HadoopRDD.scala:381) 2014-12-18 12:25:18,400 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD$.liftedTree1$1(HadoopRDD.scala:391) 2014-12-18 12:25:18,400 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD$.init(HadoopRDD.scala:390) 2014-12-18 12:25:18,400 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD$.clinit(HadoopRDD.scala) 2014-12-18 12:25:18,400 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:179) 2014-12-18 12:25:18,400 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:197) 2014-12-18 12:25:18,400 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:206) 2014-12-18 12:25:18,400 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) 2014-12-18 12:25:18,400 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-18 12:25:18,400 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:204) 2014-12-18 12:25:18,400 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 2014-12-18 12:25:18,400 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:206) 2014-12-18 12:25:18,400 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) 2014-12-18 12:25:18,400 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-18 12:25:18,400 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:204) 2014-12-18 12:25:18,400 INFO
[jira] [Commented] (HIVE-9094) TimeoutException when trying get executor count from RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251593#comment-14251593 ] Hive QA commented on HIVE-9094: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12688000/HIVE-9094.2-spark.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7236 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/573/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/573/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-573/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12688000 - PreCommit-HIVE-SPARK-Build TimeoutException when trying get executor count from RSC [Spark Branch] --- Key: HIVE-9094 URL: https://issues.apache.org/jira/browse/HIVE-9094 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Attachments: HIVE-9094.1-spark.patch, HIVE-9094.2-spark.patch In http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/532/testReport, join25.q failed because: {code} 2014-12-12 19:14:50,084 ERROR [main]: ql.Driver (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get spark memory/core info: java.util.concurrent.TimeoutException org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark memory/core info: java.util.concurrent.TimeoutException at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837) at org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234) at org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25(TestSparkCliDriver.java:162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at
[jira] [Commented] (HIVE-9123) Query with join fails with NPE when using join auto conversion
[ https://issues.apache.org/jira/browse/HIVE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251596#comment-14251596 ] Kamil Gorlo commented on HIVE-9123: --- I've tried in HDP 2.2 (with Hive 0.14.0.2.2.0.0-1084) and also cannot reproduce. BUT, I 've also tried with HDP 2.1 (withi Hive 0.13.0.2.1.1.0-237) and also CANNOT reproduce. So it looks that this issue is only (?) with CDH 5.2.1 (with Hive 0.13.1-cdh5.2.1). Query with join fails with NPE when using join auto conversion -- Key: HIVE-9123 URL: https://issues.apache.org/jira/browse/HIVE-9123 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Environment: CDH5 with Hive 0.13.1 Reporter: Kamil Gorlo I have two simple tables: desc kgorlo_comm; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | desc kgorlo_log; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | | tstamp| bigint | | With data: select * from kgorlo_comm; | kgorlo_comm.id | kgorlo_comm.dest_id | | 1 | 2| | 2 | 1| | 1 | 3| | 2 | 3| | 3 | 5| | 4 | 5| select * from kgorlo_log; | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp | | 1 | 2 | 0 | | 1 | 3 | 0 | | 1 | 5 | 0 | | 3 | 1 | 0 | Following query fails in second stage of execution: bq. select v.id, v.dest_id from kgorlo_log v join (select id, dest_id, count(*) as wiad from kgorlo_comm group by id, dest_id)com1 on com1.id=v.id and com1.dest_id=v.dest_id; with following exception: {quote} 2014-12-16 17:09:17,629 ERROR [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unxpected exception: null java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.getRefKey(MapJoinOperator.java:198) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.computeMapJoinKey(MapJoinOperator.java:186) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:216) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-12-16 17:09:17,659 FATAL [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {_col0:1,_col1:2} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at
[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251597#comment-14251597 ] Rui Li commented on HIVE-9153: -- Judging from the results, I think fewer mappers can improve overall performance, which is true for both spark and tez. Problem is that, why spark is 60s slower than tez with same # of mappers. One possible reason is that we don't have data locality with CombineHiveInputFormat, which is tracked by HIVE-8722. I also noticed that the parallelism drops during execution (attach a screenshot later). This may be due to the delay schedule mechanism of spark, which attempts to schedule tasks with some locality first. Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch] - Key: HIVE-9153 URL: https://issues.apache.org/jira/browse/HIVE-9153 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Rui Li Attachments: screenshot.PNG The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in Spark, it might make sense for us to use {{HiveInputFormat}} as well. We should evaluate this on a query which has many input splits such as {{select count(\*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9153: - Attachment: screenshot.PNG Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch] - Key: HIVE-9153 URL: https://issues.apache.org/jira/browse/HIVE-9153 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Rui Li Attachments: screenshot.PNG The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in Spark, it might make sense for us to use {{HiveInputFormat}} as well. We should evaluate this on a query which has many input splits such as {{select count(\*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9160) Suspicious comparing logic in LazyPrimitive
[ https://issues.apache.org/jira/browse/HIVE-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251624#comment-14251624 ] Hive QA commented on HIVE-9160: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687981/HIVE-9160.1.patch.txt {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6714 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2126/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2126/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2126/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687981 - PreCommit-HIVE-TRUNK-Build Suspicious comparing logic in LazyPrimitive --- Key: HIVE-9160 URL: https://issues.apache.org/jira/browse/HIVE-9160 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-9160.1.patch.txt {code} @Override public boolean equals(Object obj) { if (!(obj instanceof LazyPrimitive?, ?)) { return false; } if (data == obj) { return true; } if (data == null || obj == null) { return false; } return data.equals(((LazyPrimitive?, ?) obj).getWritableObject()); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8722) Enhance InputSplitShims to extend InputSplitWithLocationInfo [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251643#comment-14251643 ] Rui Li commented on HIVE-8722: -- Never mind my last comments. That's because I used hadoop-2.4 which doesn't have that class. Enhance InputSplitShims to extend InputSplitWithLocationInfo [Spark Branch] --- Key: HIVE-8722 URL: https://issues.apache.org/jira/browse/HIVE-8722 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang We got thie following exception in hive.log: {noformat} 2014-11-03 11:45:49,865 DEBUG rdd.HadoopRDD (Logging.scala:logDebug(84)) - Failed to use InputSplitWithLocations. java.lang.ClassCastException: Cannot cast org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit to org.apache.hadoop.mapred.InputSplitWithLocationInfo at java.lang.Class.cast(Class.java:3094) at org.apache.spark.rdd.HadoopRDD.getPreferredLocations(HadoopRDD.scala:278) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:216) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:216) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:215) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1303) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1313) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1312) {noformat} My understanding is that the split location info helps Spark to execute tasks more efficiently. This could help other execution engine too. So we should consider to enhance InputSplitShim to implement InputSplitWithLocationInfo if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-9123) Query with join fails with NPE when using join auto conversion
[ https://issues.apache.org/jira/browse/HIVE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-9123. Resolution: Cannot Reproduce Query with join fails with NPE when using join auto conversion -- Key: HIVE-9123 URL: https://issues.apache.org/jira/browse/HIVE-9123 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Environment: CDH5 with Hive 0.13.1 Reporter: Kamil Gorlo I have two simple tables: desc kgorlo_comm; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | desc kgorlo_log; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | | tstamp| bigint | | With data: select * from kgorlo_comm; | kgorlo_comm.id | kgorlo_comm.dest_id | | 1 | 2| | 2 | 1| | 1 | 3| | 2 | 3| | 3 | 5| | 4 | 5| select * from kgorlo_log; | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp | | 1 | 2 | 0 | | 1 | 3 | 0 | | 1 | 5 | 0 | | 3 | 1 | 0 | Following query fails in second stage of execution: bq. select v.id, v.dest_id from kgorlo_log v join (select id, dest_id, count(*) as wiad from kgorlo_comm group by id, dest_id)com1 on com1.id=v.id and com1.dest_id=v.dest_id; with following exception: {quote} 2014-12-16 17:09:17,629 ERROR [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unxpected exception: null java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.getRefKey(MapJoinOperator.java:198) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.computeMapJoinKey(MapJoinOperator.java:186) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:216) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-12-16 17:09:17,659 FATAL [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {_col0:1,_col1:2} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
[jira] [Resolved] (HIVE-9146) Query with left joins produces wrong result when join condition is written in different order
[ https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-9146. Resolution: Fixed Fix Version/s: 0.14.0 Assignee: Ashutosh Chauhan Fixed via HIVE-8298 Query with left joins produces wrong result when join condition is written in different order - Key: HIVE-9146 URL: https://issues.apache.org/jira/browse/HIVE-9146 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.13.1 Reporter: Kamil Gorlo Assignee: Ashutosh Chauhan Fix For: 0.14.0 I have two queries which should be equal (I only swap two join conditions) but they are not. They are simplest queries I could produce to reproduce bug. I have two simple tables: desc kgorlo_comm; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | desc kgorlo_log; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | | tstamp| bigint | | With data: select * from kgorlo_comm; | kgorlo_comm.id | kgorlo_comm.dest_id | | 1 | 2| | 2 | 1| | 1 | 3| | 2 | 3| | 3 | 5| | 4 | 5| select * from kgorlo_log; | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp | | 1 | 2 | 0 | | 1 | 3 | 0 | | 1 | 5 | 0 | | 3 | 1 | 0 | And when I run this query (query no. 1): {quote} select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id; {quote} I get result (which is correct): | log.id | log.dest_id | com1.msgs | com2.msgs | | 1 | 2| 1 | 1 | | 1 | 3| 1 | NULL | | 1 | 5| NULL | NULL | | 3 | 1| NULL | 1 | But when I run second query (query no. 2): {quote} select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id; {quote} I get different (and bad, in my opinion) result: |log.id | log.dest_id | com1.msgs | com2.msgs| |1|2|1|1| |1|3|1|1| |1|5|NULL|NULL| |3|1|NULL|NULL| Query no. 1 and query no. 2 are different in only one place, it is second join condition: bf. com2.dest_id=log.id and com2.id=log.dest_id vs bf. com2.id=log.dest_id and com2.dest_id=log.id which in my opinion are equal. Explains for both queries are of course slightly different (columns are swapped) and they are here: https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good) https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9146) Query with left joins produces wrong result when join condition is written in different order
[ https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-9146: --- Component/s: Logical Optimizer Query with left joins produces wrong result when join condition is written in different order - Key: HIVE-9146 URL: https://issues.apache.org/jira/browse/HIVE-9146 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.13.1 Reporter: Kamil Gorlo Assignee: Ashutosh Chauhan Fix For: 0.14.0 I have two queries which should be equal (I only swap two join conditions) but they are not. They are simplest queries I could produce to reproduce bug. I have two simple tables: desc kgorlo_comm; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | desc kgorlo_log; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | | tstamp| bigint | | With data: select * from kgorlo_comm; | kgorlo_comm.id | kgorlo_comm.dest_id | | 1 | 2| | 2 | 1| | 1 | 3| | 2 | 3| | 3 | 5| | 4 | 5| select * from kgorlo_log; | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp | | 1 | 2 | 0 | | 1 | 3 | 0 | | 1 | 5 | 0 | | 3 | 1 | 0 | And when I run this query (query no. 1): {quote} select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id; {quote} I get result (which is correct): | log.id | log.dest_id | com1.msgs | com2.msgs | | 1 | 2| 1 | 1 | | 1 | 3| 1 | NULL | | 1 | 5| NULL | NULL | | 3 | 1| NULL | 1 | But when I run second query (query no. 2): {quote} select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id; {quote} I get different (and bad, in my opinion) result: |log.id | log.dest_id | com1.msgs | com2.msgs| |1|2|1|1| |1|3|1|1| |1|5|NULL|NULL| |3|1|NULL|NULL| Query no. 1 and query no. 2 are different in only one place, it is second join condition: bf. com2.dest_id=log.id and com2.id=log.dest_id vs bf. com2.id=log.dest_id and com2.dest_id=log.id which in my opinion are equal. Explains for both queries are of course slightly different (columns are swapped) and they are here: https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good) https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 29145: HIVE-9094 TimeoutException when trying get executor count from RSC [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29145/#review65489 --- Ship it! Ship It! - Xuefu Zhang On Dec. 18, 2014, 9:40 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29145/ --- (Updated Dec. 18, 2014, 9:40 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9094 https://issues.apache.org/jira/browse/HIVE-9094 Repository: hive-git Description --- RemoteHiveSparkClient::getExecutorCount timeout after 5s as Spark cluster has not launched yet 1. set the timeout value configurable. 2. set default timeout value 60s. 3. enable timeout for get spark job info and get spark stage info. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 22f052a ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 5d6a02c ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 256d0b0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java 1d3a9d8 Diff: https://reviews.apache.org/r/29145/diff/ Testing --- Thanks, chengxiang li
[jira] [Commented] (HIVE-9094) TimeoutException when trying get executor count from RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251667#comment-14251667 ] Xuefu Zhang commented on HIVE-9094: --- +1. TimeoutException when trying get executor count from RSC [Spark Branch] --- Key: HIVE-9094 URL: https://issues.apache.org/jira/browse/HIVE-9094 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Attachments: HIVE-9094.1-spark.patch, HIVE-9094.2-spark.patch In http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/532/testReport, join25.q failed because: {code} 2014-12-12 19:14:50,084 ERROR [main]: ql.Driver (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get spark memory/core info: java.util.concurrent.TimeoutException org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark memory/core info: java.util.concurrent.TimeoutException at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837) at org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234) at org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25(TestSparkCliDriver.java:162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at
[jira] [Commented] (HIVE-9141) HiveOnTez: mix of union all, distinct, group by generates error
[ https://issues.apache.org/jira/browse/HIVE-9141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251668#comment-14251668 ] Ashutosh Chauhan commented on HIVE-9141: +1 This also fixes {{optimize_nullscan.q}} breakage introduced by HIVE-9053 HiveOnTez: mix of union all, distinct, group by generates error --- Key: HIVE-9141 URL: https://issues.apache.org/jira/browse/HIVE-9141 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Assignee: Navis Attachments: HIVE-9141.1.patch.txt Here is the way to produce it: in Hive q test setting (with src table) set hive.execution.engine=tez; SELECT key, value FROM ( SELECT key, value FROM src UNION ALL SELECT key, key as value FROM ( SELECT distinct key FROM ( SELECT key, value FROM (SELECT key, value FROM src UNION ALL SELECT key, value FROM src )t1 group by key, value )t2 )t3 )t4 group by key, value; will generate 2014-12-16 23:19:13,593 ERROR ql.Driver (SessionState.java:printError(834)) - FAILED: ClassCastException org.apache.hadoop.hive.ql.plan.MapWork cannot be cast to org.apache.hadoop.hive.ql.plan.ReduceWork java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.MapWork cannot be cast to org.apache.hadoop.hive.ql.plan.ReduceWork at org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:361) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69) at org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:419) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1107) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1155) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1044) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1034) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:206) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:158) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:369) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:304) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:834) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:136) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_uniontez2(TestMiniTezCliDriver.java:120) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-9141) HiveOnTez: mix of union all, distinct, group by generates error
[ https://issues.apache.org/jira/browse/HIVE-9141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251668#comment-14251668 ] Ashutosh Chauhan edited comment on HIVE-9141 at 12/18/14 2:07 PM: -- +1 This also fixes {{optimize_nullscan.q}} breakage introduced by HIVE-9055 was (Author: ashutoshc): +1 This also fixes {{optimize_nullscan.q}} breakage introduced by HIVE-9053 HiveOnTez: mix of union all, distinct, group by generates error --- Key: HIVE-9141 URL: https://issues.apache.org/jira/browse/HIVE-9141 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Assignee: Navis Attachments: HIVE-9141.1.patch.txt Here is the way to produce it: in Hive q test setting (with src table) set hive.execution.engine=tez; SELECT key, value FROM ( SELECT key, value FROM src UNION ALL SELECT key, key as value FROM ( SELECT distinct key FROM ( SELECT key, value FROM (SELECT key, value FROM src UNION ALL SELECT key, value FROM src )t1 group by key, value )t2 )t3 )t4 group by key, value; will generate 2014-12-16 23:19:13,593 ERROR ql.Driver (SessionState.java:printError(834)) - FAILED: ClassCastException org.apache.hadoop.hive.ql.plan.MapWork cannot be cast to org.apache.hadoop.hive.ql.plan.ReduceWork java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.MapWork cannot be cast to org.apache.hadoop.hive.ql.plan.ReduceWork at org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:361) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69) at org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:419) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1107) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1155) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1044) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1034) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:206) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:158) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:369) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:304) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:834) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:136) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_uniontez2(TestMiniTezCliDriver.java:120) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8920) IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8920: -- Summary: IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch] (was: SplitSparkWorkResolver doesn't work with UnionWork [Spark Branch]) IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch] --- Key: HIVE-8920 URL: https://issues.apache.org/jira/browse/HIVE-8920 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Xuefu Zhang Attachments: HIVE-8920.1-spark.patch The following query will not work: {code} from (select * from table0 union all select * from table1) s insert overwrite table table3 select s.x, count(1) group by s.x insert overwrite table table4 select s.y, count(1) group by s.y; {code} Currently, the plan for this query, before SplitSparkWorkResolver, looks like below: {noformat} M1M2 \ / \ U3 R5 | R4 {noformat} In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the childWork could be UnionWork U3. Thus, the code will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8920) IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8920: -- Description: The following query will not work: {code} from (select * from table0 union all select * from table1) s insert overwrite table table3 select s.x, count(1) group by s.x insert overwrite table table4 select s.y, count(1) group by s.y; {code} Currently, the plan for this query, before SplitSparkWorkResolver, looks like below: {noformat} M1M2 \ / \ U3 R5 | R4 {noformat} In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the childWork could be UnionWork U3. Thus, the code will fail. HIVE-9041 addressed partially addressed the problem by removing union task. However, it's still necessary to cloning M1 and M2 to support multi-insert. Because M1 and M2 can run in a single JVM, the original solution of storing a global IOContext will not work because M1 and M2 have different io contexts, both needing to be stored. was: The following query will not work: {code} from (select * from table0 union all select * from table1) s insert overwrite table table3 select s.x, count(1) group by s.x insert overwrite table table4 select s.y, count(1) group by s.y; {code} Currently, the plan for this query, before SplitSparkWorkResolver, looks like below: {noformat} M1M2 \ / \ U3 R5 | R4 {noformat} In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the childWork could be UnionWork U3. Thus, the code will fail. IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch] --- Key: HIVE-8920 URL: https://issues.apache.org/jira/browse/HIVE-8920 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Xuefu Zhang Attachments: HIVE-8920.1-spark.patch The following query will not work: {code} from (select * from table0 union all select * from table1) s insert overwrite table table3 select s.x, count(1) group by s.x insert overwrite table table4 select s.y, count(1) group by s.y; {code} Currently, the plan for this query, before SplitSparkWorkResolver, looks like below: {noformat} M1M2 \ / \ U3 R5 | R4 {noformat} In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the childWork could be UnionWork U3. Thus, the code will fail. HIVE-9041 addressed partially addressed the problem by removing union task. However, it's still necessary to cloning M1 and M2 to support multi-insert. Because M1 and M2 can run in a single JVM, the original solution of storing a global IOContext will not work because M1 and M2 have different io contexts, both needing to be stored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 29205: HIVE-8920: IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29205/ --- Review request for hive and Chao Sun. Bugs: HIVE-8920 https://issues.apache.org/jira/browse/HIVE-8920 Repository: hive-git Description --- See bug description. Patch in HIVE-9084 is included here. Diffs - itests/src/test/resources/testconfiguration.properties fd732c1 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 46894ac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 0bd18e0 ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java b4c2c1f ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java 5ba6612 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SplitSparkWorkResolver.java 67dda02 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 1efbb12 ql/src/test/queries/clientpositive/multi_insert_union_src.q PRE-CREATION ql/src/test/results/clientpositive/multi_insert_union_src.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multi_insert_union_src.q.out PRE-CREATION Diff: https://reviews.apache.org/r/29205/diff/ Testing --- Added a new qtest. Thanks, Xuefu Zhang
[jira] [Commented] (HIVE-9133) CBO (Calcite Return Path): Refactor Semantic Analyzer to Move CBO code out
[ https://issues.apache.org/jira/browse/HIVE-9133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251689#comment-14251689 ] Hive QA commented on HIVE-9133: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687985/HIVE-9133.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6714 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2127/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2127/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2127/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687985 - PreCommit-HIVE-TRUNK-Build CBO (Calcite Return Path): Refactor Semantic Analyzer to Move CBO code out --- Key: HIVE-9133 URL: https://issues.apache.org/jira/browse/HIVE-9133 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.15.0 Attachments: HIVE-9133.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 29205: HIVE-8920: IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29205/ --- (Updated Dec. 18, 2014, 2:37 p.m.) Review request for hive and Chao Sun. Bugs: HIVE-8920 https://issues.apache.org/jira/browse/HIVE-8920 Repository: hive-git Description --- See bug description. Patch in HIVE-9084 is included here. Diffs (updated) - itests/src/test/resources/testconfiguration.properties fd732c1 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 46894ac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 0bd18e0 ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java b4c2c1f ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java 5ba6612 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SplitSparkWorkResolver.java 67dda02 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 1efbb12 ql/src/test/queries/clientpositive/multi_insert_union_src.q PRE-CREATION ql/src/test/results/clientpositive/multi_insert_union_src.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multi_insert_union_src.q.out PRE-CREATION Diff: https://reviews.apache.org/r/29205/diff/ Testing --- Added a new qtest. Thanks, Xuefu Zhang
[jira] [Updated] (HIVE-8920) IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8920: -- Attachment: HIVE-8920.2-spark.patch Patch #2 correct some code styling issue. IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch] --- Key: HIVE-8920 URL: https://issues.apache.org/jira/browse/HIVE-8920 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Xuefu Zhang Attachments: HIVE-8920.1-spark.patch, HIVE-8920.2-spark.patch The following query will not work: {code} from (select * from table0 union all select * from table1) s insert overwrite table table3 select s.x, count(1) group by s.x insert overwrite table table4 select s.y, count(1) group by s.y; {code} Currently, the plan for this query, before SplitSparkWorkResolver, looks like below: {noformat} M1M2 \ / \ U3 R5 | R4 {noformat} In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the childWork could be UnionWork U3. Thus, the code will fail. HIVE-9041 addressed partially addressed the problem by removing union task. However, it's still necessary to cloning M1 and M2 to support multi-insert. Because M1 and M2 can run in a single JVM, the original solution of storing a global IOContext will not work because M1 and M2 have different io contexts, both needing to be stored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8722) Enhance InputSplitShims to extend InputSplitWithLocationInfo [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8722: -- Issue Type: Sub-task (was: Improvement) Parent: HIVE-9134 Enhance InputSplitShims to extend InputSplitWithLocationInfo [Spark Branch] --- Key: HIVE-8722 URL: https://issues.apache.org/jira/browse/HIVE-8722 Project: Hive Issue Type: Sub-task Reporter: Jimmy Xiang We got thie following exception in hive.log: {noformat} 2014-11-03 11:45:49,865 DEBUG rdd.HadoopRDD (Logging.scala:logDebug(84)) - Failed to use InputSplitWithLocations. java.lang.ClassCastException: Cannot cast org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit to org.apache.hadoop.mapred.InputSplitWithLocationInfo at java.lang.Class.cast(Class.java:3094) at org.apache.spark.rdd.HadoopRDD.getPreferredLocations(HadoopRDD.scala:278) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:216) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:216) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:215) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1303) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1313) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1312) {noformat} My understanding is that the split location info helps Spark to execute tasks more efficiently. This could help other execution engine too. So we should consider to enhance InputSplitShim to implement InputSplitWithLocationInfo if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7122) Storage format for create like table
[ https://issues.apache.org/jira/browse/HIVE-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251751#comment-14251751 ] Vasanth kumar RJ commented on HIVE-7122: Hi [~Prabhu Joseph], sorry for late reply. In CTAS restriction says target table cannot be partitioned and external. Like table allows to create similar table and external as well. Storage format for create like table Key: HIVE-7122 URL: https://issues.apache.org/jira/browse/HIVE-7122 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Vasanth kumar RJ Assignee: Vasanth kumar RJ Attachments: HIVE-7122.patch Using create like table user can specify the table storage format. Example: create table table1 like table2 stored as ORC; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251755#comment-14251755 ] Xuefu Zhang commented on HIVE-9153: --- Thanks for the findings, [~lirui]. I heard that the spark snapshot we are using is 2X slower than previous version. this might explain the slowness. Also, I think the number of mappers and locality matter in speed, but the two may collide with each other. For instance, if we have more executors than mappers, it's desirable to have more map tasks. However, doing so might impact locality because some mappers might read remotely. On the other hand, if there are more mappers than executors, then few mappers will help the speed. Any way, it would be good to find out how Tez generates splits using HiveInputFormat. Also, we should fix HIVE-8722. Is there a way to disable Spark's delayed schedule to try out? Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch] - Key: HIVE-9153 URL: https://issues.apache.org/jira/browse/HIVE-9153 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Rui Li Attachments: screenshot.PNG The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in Spark, it might make sense for us to use {{HiveInputFormat}} as well. We should evaluate this on a query which has many input splits such as {{select count(\*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 29200: HIVE-9116 Add unit test for multi sessions on Spark.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29200/#review65494 --- Ship it! Ship It! - Xuefu Zhang On Dec. 18, 2014, 9:14 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29200/ --- (Updated Dec. 18, 2014, 9:14 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9116 https://issues.apache.org/jira/browse/HIVE-9116 Repository: hive-git Description --- test with multi sessions HS2 with multi thread jdbc connections. Diffs - itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestMultiSessionsHS2WithLocalClusterSpark.java PRE-CREATION Diff: https://reviews.apache.org/r/29200/diff/ Testing --- Thanks, chengxiang li
[jira] [Commented] (HIVE-9116) Add unit test for multi sessions.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251760#comment-14251760 ] Xuefu Zhang commented on HIVE-9116: --- +1 Add unit test for multi sessions.[Spark Branch] --- Key: HIVE-9116 URL: https://issues.apache.org/jira/browse/HIVE-9116 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M4 Attachments: HIVE-9116.1-spark.patch HS2 multi sessions support is enabled in HoS, we should add some unit tests for verification and regression test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 29198: HIVE-9136 - Profile query compiler [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29198/#review65497 --- Ship it! Ship It! - Xuefu Zhang On Dec. 18, 2014, 7:36 a.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29198/ --- (Updated Dec. 18, 2014, 7:36 a.m.) Review request for hive, Brock Noland and Xuefu Zhang. Bugs: HIVE-9136 https://issues.apache.org/jira/browse/HIVE-9136 Repository: hive-git Description --- Please check out the JIRA for a correspondence between Spark Tez log names. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java 1e0a749 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 46894ac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java 3f23541 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 215d53f ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java a5d73a7 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java a9fbf6c ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 362072f ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 90a2f9e ql/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java 4e2b130 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java b6a7ac2 Diff: https://reviews.apache.org/r/29198/diff/ Testing --- Thanks, Chao Sun
[jira] [Commented] (HIVE-9136) Profile query compiler [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251798#comment-14251798 ] Xuefu Zhang commented on HIVE-9136: --- Patch looks good to me. However, in SparkCompiler, we are only measuring time to generate task tree. We should also gauge the time spent on logical optimization as well as physical optimization. This can be addressed in a followup JIRA though. Profile query compiler [Spark Branch] - Key: HIVE-9136 URL: https://issues.apache.org/jira/browse/HIVE-9136 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Attachments: HIVE-9136.1-spark.patch, HIVE-9136.1.patch, HIVE-9136.2-spark.patch We should put some performance counters around the compiler and evaluate how long it takes to compile a query in Spark versus the other execution frameworks. Query 28 is a good one to use for testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9154) Cache pathToPartitionInfo in context aware record reader
[ https://issues.apache.org/jira/browse/HIVE-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251820#comment-14251820 ] Xuefu Zhang commented on HIVE-9154: --- +1 Cache pathToPartitionInfo in context aware record reader Key: HIVE-9154 URL: https://issues.apache.org/jira/browse/HIVE-9154 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: HIVE-9154.1-spark.patch, HIVE-9154.2.patch This is similar to HIVE-9127. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9006) hiveserver thrift api version is still 6
[ https://issues.apache.org/jira/browse/HIVE-9006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251823#comment-14251823 ] Hive QA commented on HIVE-9006: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687986/HIVE-9006.2.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6714 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2128/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2128/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2128/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687986 - PreCommit-HIVE-TRUNK-Build hiveserver thrift api version is still 6 Key: HIVE-9006 URL: https://issues.apache.org/jira/browse/HIVE-9006 Project: Hive Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HIVE-9006.1.patch, HIVE-9006.2.patch Look at the TCLIService.thrift, when open session, the protocol version info is still v6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8920) IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251886#comment-14251886 ] Hive QA commented on HIVE-8920: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12688043/HIVE-8920.2-spark.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 7237 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/574/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/574/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-574/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12688043 - PreCommit-HIVE-SPARK-Build IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch] --- Key: HIVE-8920 URL: https://issues.apache.org/jira/browse/HIVE-8920 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Xuefu Zhang Attachments: HIVE-8920.1-spark.patch, HIVE-8920.2-spark.patch The following query will not work: {code} from (select * from table0 union all select * from table1) s insert overwrite table table3 select s.x, count(1) group by s.x insert overwrite table table4 select s.y, count(1) group by s.y; {code} Currently, the plan for this query, before SplitSparkWorkResolver, looks like below: {noformat} M1M2 \ / \ U3 R5 | R4 {noformat} In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the childWork could be UnionWork U3. Thus, the code will fail. HIVE-9041 addressed partially addressed the problem by removing union task. However, it's still necessary to cloning M1 and M2 to support multi-insert. Because M1 and M2 can run in a single JVM, the original solution of storing a global IOContext will not work because M1 and M2 have different io contexts, both needing to be stored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9148) Fix default value for HWI_WAR_FILE
[ https://issues.apache.org/jira/browse/HIVE-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251912#comment-14251912 ] Peter Slawski commented on HIVE-9148: - No, the description is correct as *hive.hwi.war.file* is assumed to be relative to *$HIVE_HOME* in HWIServer.java. *$HWI_WAR_FILE* is being set wrongly in [hwi.sh|https://github.com/apache/hive/blob/b8250ac2f30539f6b23ce80a20a9e338d3d31458/bin/ext/hwi.sh#L29]. So if you didn't overwrote *hive.hwi.war.file* in hive-site.xml, the path to the HWI war file would be wrong. From [hwi/src/java/org/apache/hadoop/hive/hwi/HWIServer.java:77|https://github.com/apache/hive/blob/release-0.14.0/hwi/src/java/org/apache/hadoop/hive/hwi/HWIServer.java#L77] {code:java} String hwiWAR = conf.getVar(HiveConf.ConfVars.HIVEHWIWARFILE); String hivehome = System.getenv().get(HIVE_HOME); File hwiWARFile = new File(hivehome, hwiWAR); {code} Fix default value for HWI_WAR_FILE -- Key: HIVE-9148 URL: https://issues.apache.org/jira/browse/HIVE-9148 Project: Hive Issue Type: Bug Components: Web UI Affects Versions: 0.14.0, 0.13.1 Reporter: Peter Slawski Priority: Minor Fix For: 0.15.0 Attachments: HIVE-9148.1.patch The path to the hwi war file should be relative to hive home. However, HWI_WAR_FILE is set in hwi.sh to be an absolute path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7081) HiveServer/HiveServer2 leaks jdbc connections when network interrupt
[ https://issues.apache.org/jira/browse/HIVE-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251932#comment-14251932 ] Brock Noland commented on HIVE-7081: We upgraded to 0.9.2 already: https://github.com/apache/hive/blob/trunk/pom.xml#L141 HiveServer/HiveServer2 leaks jdbc connections when network interrupt - Key: HIVE-7081 URL: https://issues.apache.org/jira/browse/HIVE-7081 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.12.0, 0.13.0 Environment: hadoop 1.2.1 hive 0.12.0 / hive 0.13.0 linux 2.6.32 Reporter: Wang Zhiqiang Labels: ConnectoinLeak, HiveServer2, JDBC HiveServer/HiveServer2 leaks jdbc connections when network between client and server is interrupted。 I test both use DBVisualizer and write JDBC code,when the network between client and hiveserver/hiverserver2 is interrupted,the tcp connection in the server side will be in ESTABLISH state forever util the server is stoped。By Using jstack to print out server's thread, I found thread is doing socketRead0()。 {quote} pool-1-thread-13 prio=10 tid=0x7fd00c0c6800 nid=0x5d21 runnable [0x7fd00018] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked 0xebc24f28 (a java.io.BufferedInputStream) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7081) HiveServer/HiveServer2 leaks jdbc connections when network interrupt
[ https://issues.apache.org/jira/browse/HIVE-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251947#comment-14251947 ] Thejas M Nair commented on HIVE-7081: - Yes, to be more specific - trunk has a fix, 0.14 release does not have the fix. HiveServer/HiveServer2 leaks jdbc connections when network interrupt - Key: HIVE-7081 URL: https://issues.apache.org/jira/browse/HIVE-7081 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.12.0, 0.13.0 Environment: hadoop 1.2.1 hive 0.12.0 / hive 0.13.0 linux 2.6.32 Reporter: Wang Zhiqiang Labels: ConnectoinLeak, HiveServer2, JDBC HiveServer/HiveServer2 leaks jdbc connections when network between client and server is interrupted。 I test both use DBVisualizer and write JDBC code,when the network between client and hiveserver/hiverserver2 is interrupted,the tcp connection in the server side will be in ESTABLISH state forever util the server is stoped。By Using jstack to print out server's thread, I found thread is doing socketRead0()。 {quote} pool-1-thread-13 prio=10 tid=0x7fd00c0c6800 nid=0x5d21 runnable [0x7fd00018] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked 0xebc24f28 (a java.io.BufferedInputStream) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9161) Fix ordering differences on UDF functions due to Java8
Sergio Peña created HIVE-9161: - Summary: Fix ordering differences on UDF functions due to Java8 Key: HIVE-9161 URL: https://issues.apache.org/jira/browse/HIVE-9161 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.1 Reporter: Sergio Peña Assignee: Sergio Peña Java 8 uses a different hash function for HashMap, which is leading to iteration order differences in several cases. (See Java8 vs Java7) This part is related to UDF functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9161) Fix ordering differences on UDF functions due to Java8
[ https://issues.apache.org/jira/browse/HIVE-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-9161: -- Status: Patch Available (was: Open) Fix ordering differences on UDF functions due to Java8 -- Key: HIVE-9161 URL: https://issues.apache.org/jira/browse/HIVE-9161 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.1 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9161.1.patch Java 8 uses a different hash function for HashMap, which is leading to iteration order differences in several cases. (See Java8 vs Java7) This part is related to UDF functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9161) Fix ordering differences on UDF functions due to Java8
[ https://issues.apache.org/jira/browse/HIVE-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-9161: -- Attachment: HIVE-9161.1.patch Fix ordering differences on UDF functions due to Java8 -- Key: HIVE-9161 URL: https://issues.apache.org/jira/browse/HIVE-9161 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.1 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9161.1.patch Java 8 uses a different hash function for HashMap, which is leading to iteration order differences in several cases. (See Java8 vs Java7) This part is related to UDF functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9136) Profile query compiler [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251974#comment-14251974 ] Chao commented on HIVE-9136: Sure, we can do that as follow-up. Profile query compiler [Spark Branch] - Key: HIVE-9136 URL: https://issues.apache.org/jira/browse/HIVE-9136 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Attachments: HIVE-9136.1-spark.patch, HIVE-9136.1.patch, HIVE-9136.2-spark.patch We should put some performance counters around the compiler and evaluate how long it takes to compile a query in Spark versus the other execution frameworks. Query 28 is a good one to use for testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9116) Add unit test for multi sessions.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9116: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) committed to spark branch. Thanks, Chengxiang. Add unit test for multi sessions.[Spark Branch] --- Key: HIVE-9116 URL: https://issues.apache.org/jira/browse/HIVE-9116 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M4 Fix For: spark-branch Attachments: HIVE-9116.1-spark.patch HS2 multi sessions support is enabled in HoS, we should add some unit tests for verification and regression test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9136) Profile query compiler [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251984#comment-14251984 ] Xuefu Zhang commented on HIVE-9136: --- +1. [~csun], please create the JIRA and link it with this one. Thanks. Profile query compiler [Spark Branch] - Key: HIVE-9136 URL: https://issues.apache.org/jira/browse/HIVE-9136 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Attachments: HIVE-9136.1-spark.patch, HIVE-9136.1.patch, HIVE-9136.2-spark.patch We should put some performance counters around the compiler and evaluate how long it takes to compile a query in Spark versus the other execution frameworks. Query 28 is a good one to use for testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9136) Profile query compiler [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9136: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Committed to Spark branch. Thanks, Chao. Profile query compiler [Spark Branch] - Key: HIVE-9136 URL: https://issues.apache.org/jira/browse/HIVE-9136 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Fix For: spark-branch Attachments: HIVE-9136.1-spark.patch, HIVE-9136.1.patch, HIVE-9136.2-spark.patch We should put some performance counters around the compiler and evaluate how long it takes to compile a query in Spark versus the other execution frameworks. Query 28 is a good one to use for testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable
[ https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251986#comment-14251986 ] Jonathan Bender commented on HIVE-7049: --- Seems like we can get away with the following patch (confirm the fileSchema AKA writer's schema is actually a union type before trying to find the type that the reader schema expects). If not, just use the schema as is (it should be promoted to a union by Avro). This worked for me in local testing. ```diff --git a/src/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java b/src/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java index ce933ff..032761c 100644 --- a/src/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java +++ b/src/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java @@ -265,9 +265,12 @@ private Object deserializeNullableUnion(Object datum, Schema fileSchema, Schema if(schema.getType().equals(Schema.Type.NULL)) { return null; } +Schema writerSchema = fileSchema; +if (writerSchema != null writerSchema.getType().equals(Schema.Type.UNION)) { + writerSchema = writerSchema.getTypes().get(tag); +} -return worker(datum, fileSchema == null ? null : fileSchema.getTypes().get(tag), schema, -SchemaToTypeInfo.generateTypeInfo(schema)); +return worker(datum, writerSchema, schema, SchemaToTypeInfo.generateTypeInfo(schema)); } ``` Unable to deserialize AVRO data when file schema and record schema are different and nullable - Key: HIVE-7049 URL: https://issues.apache.org/jira/browse/HIVE-7049 Project: Hive Issue Type: Bug Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: HIVE-7049.1.patch It mainly happens when 1 )file schema and record schema are not same 2 ) Record schema is nullable but file schema is not. The potential code location is at class AvroDeserialize {noformat} if(AvroSerdeUtils.isNullableType(recordSchema)) { return deserializeNullableUnion(datum, fileSchema, recordSchema, columnType); } {noformat} In the above code snippet, recordSchema is verified if it is nullable. But the file schema is not checked. I tested with these values: {noformat} recordSchema= [null,string] fielSchema= string {noformat} And i got the following exception line numbers might not be the same due to mu debugged code version. {noformat} org.apache.avro.AvroRuntimeException: Not a union: string at org.apache.avro.Schema.getTypes(Schema.java:272) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174) at org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487) at org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9131) MiniTez optimize_nullscan test is unstable
[ https://issues.apache.org/jira/browse/HIVE-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252022#comment-14252022 ] Vikram Dixit K commented on HIVE-9131: -- This test had failed otherwise as well when I ran it without HIVE-9055. But this stack trace missed my attention. This will be fixed as part of HIVE-9141. Sorry for the trouble. MiniTez optimize_nullscan test is unstable -- Key: HIVE-9131 URL: https://issues.apache.org/jira/browse/HIVE-9131 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Sometimes fails with: {noformat} 2014-12-16 11:55:04,139 ERROR ql.Driver (SessionState.java:printError(834)) - FAILED: ClassCastException org.apache.hadoop.hive.ql.plan.MapWork cannot be cast to org.apache.hadoop.hive.ql.plan.ReduceWork java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.MapWork cannot be cast to org.apache.hadoop.hive.ql.plan.ReduceWork at org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:361) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69) at org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:419) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1107) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1155) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1044) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1034) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:206) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:158) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:369) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:304) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:834) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:136) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan(TestMiniTezCliDriver.java:120) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-9131) MiniTez optimize_nullscan test is unstable
[ https://issues.apache.org/jira/browse/HIVE-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K resolved HIVE-9131. -- Resolution: Duplicate MiniTez optimize_nullscan test is unstable -- Key: HIVE-9131 URL: https://issues.apache.org/jira/browse/HIVE-9131 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Sometimes fails with: {noformat} 2014-12-16 11:55:04,139 ERROR ql.Driver (SessionState.java:printError(834)) - FAILED: ClassCastException org.apache.hadoop.hive.ql.plan.MapWork cannot be cast to org.apache.hadoop.hive.ql.plan.ReduceWork java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.MapWork cannot be cast to org.apache.hadoop.hive.ql.plan.ReduceWork at org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:361) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69) at org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:419) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1107) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1155) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1044) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1034) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:206) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:158) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:369) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:304) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:834) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:136) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan(TestMiniTezCliDriver.java:120) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9055) Tez: union all followed by group by followed by another union all gives error
[ https://issues.apache.org/jira/browse/HIVE-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-9055: - Resolution: Fixed Status: Resolved (was: Patch Available) Tez: union all followed by group by followed by another union all gives error - Key: HIVE-9055 URL: https://issues.apache.org/jira/browse/HIVE-9055 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Assignee: Vikram Dixit K Attachments: HIVE-9055.1.patch, HIVE-9055.2.patch, HIVE-9055.3.patch Here is the way to produce it: in Hive q test setting (with src table) set hive.execution.engine=tez; select key from ( select key from src union all select key from src ) tab group by key union all select key from src; will give you ERROR 2014-12-09 11:38:48,316 ERROR ql.Driver (SessionState.java:printError(834)) - FAILED: IndexOutOfBoundsException Index: -1, Size: 1 java.lang.IndexOutOfBoundsException: Index: -1, Size: 1 at java.util.LinkedList.checkElementIndex(LinkedList.java:553) at java.util.LinkedList.get(LinkedList.java:474) at org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:354) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69) at org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:834) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:136) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_uniontez(TestMiniTezCliDriver.java:120) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) btw: there is not problem when it is run with MR -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance
[ https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9127: -- Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Brock. Improve CombineHiveInputFormat.getSplit performance --- Key: HIVE-9127 URL: https://issues.apache.org/jira/browse/HIVE-9127 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0 Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.15.0 Attachments: HIVE-9127.1-spark.patch.txt, HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. Caller ST: {noformat} 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.ShuffleDependency.init(Dependency.scala:79) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.dependencies(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301) 2014-12-16
[jira] [Commented] (HIVE-6892) Permission inheritance issues
[ https://issues.apache.org/jira/browse/HIVE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252036#comment-14252036 ] Szehon Ho commented on HIVE-6892: - Strange, I thought I added a link from Storage Based Authorization, but I must have forgotten to save it. I'll try to add it and remove the label. Permission inheritance issues - Key: HIVE-6892 URL: https://issues.apache.org/jira/browse/HIVE-6892 Project: Hive Issue Type: Bug Components: Security Affects Versions: 0.13.0 Reporter: Szehon Ho Assignee: Szehon Ho Labels: TODOC14 *HDFS Background* * When a file or directory is created, its owner is the user identity of the client process, and its group is inherited from parent (the BSD rule). Permissions are taken from default umask. Extended Acl's are taken from parent unless they are set explicitly. *Goals* To reduce need to set fine-grain file security props after every operation, users may want the following Hive warehouse file/dir to auto-inherit security properties from their directory parents: * Directories created by new database/table/partition/bucket * Files added to tables via load/insert * Table directories exported/imported (open question of whether exported table inheriting perm from new parent needs another flag) What may be inherited: * Basic file permission * Groups (already done by HDFS for new directories) * Extended ACL's (already done by HDFS for new directories) *Behavior* * When hive.warehouse.subdir.inherit.perms flag is enabled in Hive, Hive will try to do all above inheritances. In the future, we can add more flags for more finer-grained control. * Failure by Hive to inherit will not cause operation to fail. Rule of thumb of when security-prop inheritance will happen is the following: ** To run chmod, a user must be the owner of the file, or else a super-user. ** To run chgrp, a user must be the owner of files, or else a super-user. ** Hence, user that hive runs as (either 'hive' or the logged-in user in case of impersonation), must be super-user or owner of the file whose security properties are going to be changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9094) TimeoutException when trying get executor count from RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9094: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Committed to Spark branch. Thanks, Chengxiang. TimeoutException when trying get executor count from RSC [Spark Branch] --- Key: HIVE-9094 URL: https://issues.apache.org/jira/browse/HIVE-9094 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Fix For: spark-branch Attachments: HIVE-9094.1-spark.patch, HIVE-9094.2-spark.patch In http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/532/testReport, join25.q failed because: {code} 2014-12-12 19:14:50,084 ERROR [main]: ql.Driver (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get spark memory/core info: java.util.concurrent.TimeoutException org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark memory/core info: java.util.concurrent.TimeoutException at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837) at org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234) at org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25(TestSparkCliDriver.java:162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at
[jira] [Commented] (HIVE-9004) Reset doesn't work for the default empty value entry
[ https://issues.apache.org/jira/browse/HIVE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252047#comment-14252047 ] Szehon Ho commented on HIVE-9004: - Thanks. Looks good to me, +1 Reset doesn't work for the default empty value entry Key: HIVE-9004 URL: https://issues.apache.org/jira/browse/HIVE-9004 Project: Hive Issue Type: Bug Components: Configuration Reporter: Cheng Hao Assignee: Cheng Hao Fix For: spark-branch, 0.15.0, 0.14.1 Attachments: HIVE-9004.patch To illustrate that: In hive cli: hive set hive.table.parameters.default; hive.table.parameters.default is undefined hive set hive.table.parameters.default=key1=value1; hive reset; hive set hive.table.parameters.default; hive.table.parameters.default=key1=value1 I think we expect the last output as hive.table.parameters.default is undefined -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9162) stats19 test is environment-dependant
Sergey Shelukhin created HIVE-9162: -- Summary: stats19 test is environment-dependant Key: HIVE-9162 URL: https://issues.apache.org/jira/browse/HIVE-9162 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Priority: Minor This is a very annoying test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9162) stats19 test is environment-dependant
[ https://issues.apache.org/jira/browse/HIVE-9162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-9162: -- Assignee: Sergey Shelukhin stats19 test is environment-dependant - Key: HIVE-9162 URL: https://issues.apache.org/jira/browse/HIVE-9162 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.15.0, 0.14.1 This is a very annoying test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9162) stats19 test is environment-dependant
[ https://issues.apache.org/jira/browse/HIVE-9162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-9162: --- Fix Version/s: 0.14.1 0.15.0 stats19 test is environment-dependant - Key: HIVE-9162 URL: https://issues.apache.org/jira/browse/HIVE-9162 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.15.0, 0.14.1 This is a very annoying test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance
[ https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252059#comment-14252059 ] Xuefu Zhang commented on HIVE-9127: --- Spark patch is also committed to Spark branch. Improve CombineHiveInputFormat.getSplit performance --- Key: HIVE-9127 URL: https://issues.apache.org/jira/browse/HIVE-9127 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0 Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.15.0 Attachments: HIVE-9127.1-spark.patch.txt, HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. Caller ST: {noformat} 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.ShuffleDependency.init(Dependency.scala:79) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.dependencies(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]:
[jira] [Updated] (HIVE-9127) Improve CombineHiveInputFormat.getSplit performance
[ https://issues.apache.org/jira/browse/HIVE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9127: -- Fix Version/s: spark-branch Improve CombineHiveInputFormat.getSplit performance --- Key: HIVE-9127 URL: https://issues.apache.org/jira/browse/HIVE-9127 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0 Reporter: Brock Noland Assignee: Brock Noland Fix For: spark-branch, 0.15.0 Attachments: HIVE-9127.1-spark.patch.txt, HIVE-9127.2-spark.patch.txt, HIVE-9127.3.patch.txt In HIVE-7431 we disabled caching of Map/Reduce works because some tasks would fail. However, we should be able to cache these objects in RSC for split generation. See: https://issues.apache.org/jira/browse/HIVE-9124?focusedCommentId=14248622page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14248622 how this impacts performance. Caller ST: {noformat} 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:328) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:421) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:510) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,202 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.ShuffleDependency.init(Dependency.scala:79) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:192) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at scala.Option.getOrElse(Option.scala:120) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.rdd.RDD.dependencies(RDD.scala:190) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -at org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:301) 2014-12-16 14:36:22,203 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(435)) -
[jira] [Commented] (HIVE-9133) CBO (Calcite Return Path): Refactor Semantic Analyzer to Move CBO code out
[ https://issues.apache.org/jira/browse/HIVE-9133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252071#comment-14252071 ] Sergey Shelukhin commented on HIVE-9133: Left some partial comments on RB. Overall comment - is it possible to minimize the use of semanticAnalyzer, and esp. its fields and setters? Even if results in redundant args. If we cannot avoid dependency completely at least we should limit it to some logical methods... CBO (Calcite Return Path): Refactor Semantic Analyzer to Move CBO code out --- Key: HIVE-9133 URL: https://issues.apache.org/jira/browse/HIVE-9133 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.15.0 Attachments: HIVE-9133.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8639) Convert SMBJoin to MapJoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252076#comment-14252076 ] Szehon Ho commented on HIVE-8639: - [~brocknoland] yes there are tests that still do. The triggering factor is whether the tests have hive.auto.convert.sortmerge.join.to.mapjoin turned on. For example, all the auto_sortmerge_.* tests have at least one part that runs SMB join before that flag is turned on. [~xuefuz] can you review when you get a chance? Test failures seem unrelated. I looked at join32_lessSize, it seems caused by a TimeoutException in spark client's RPC layer. {noformat} Caused by: java.util.concurrent.TimeoutException: Timed out waiting for client connection. at org.apache.hive.spark.client.rpc.RpcServer$2.run(RpcServer.java:125) {noformat} Convert SMBJoin to MapJoin [Spark Branch] - Key: HIVE-8639 URL: https://issues.apache.org/jira/browse/HIVE-8639 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-8639.1-spark.patch, HIVE-8639.2-spark.patch, HIVE-8639.3-spark.patch, HIVE-8639.3-spark.patch, HIVE-8639.4-spark.patch HIVE-8202 supports auto-conversion of SMB Join. However, if the tables are partitioned, there could be a slow down as each mapper would need to get a very small chunk of a partition which has a single key. Thus, in some scenarios it's beneficial to convert SMB join to map join. The task is to research and support the conversion from SMB join to map join for Spark execution engine. See the equivalent of MapReduce in SortMergeJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8639) Convert SMBJoin to MapJoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252086#comment-14252086 ] Xuefu Zhang commented on HIVE-8639: --- [~szehon], I'm reviewing at the moment. Thanks. Convert SMBJoin to MapJoin [Spark Branch] - Key: HIVE-8639 URL: https://issues.apache.org/jira/browse/HIVE-8639 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-8639.1-spark.patch, HIVE-8639.2-spark.patch, HIVE-8639.3-spark.patch, HIVE-8639.3-spark.patch, HIVE-8639.4-spark.patch HIVE-8202 supports auto-conversion of SMB Join. However, if the tables are partitioned, there could be a slow down as each mapper would need to get a very small chunk of a partition which has a single key. Thus, in some scenarios it's beneficial to convert SMB join to map join. The task is to research and support the conversion from SMB join to map join for Spark execution engine. See the equivalent of MapReduce in SortMergeJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9006) hiveserver thrift api version is still 6
[ https://issues.apache.org/jira/browse/HIVE-9006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252092#comment-14252092 ] Szehon Ho commented on HIVE-9006: - Thanks, +1 hiveserver thrift api version is still 6 Key: HIVE-9006 URL: https://issues.apache.org/jira/browse/HIVE-9006 Project: Hive Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HIVE-9006.1.patch, HIVE-9006.2.patch Look at the TCLIService.thrift, when open session, the protocol version info is still v6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9162) stats19 test is environment-dependant
[ https://issues.apache.org/jira/browse/HIVE-9162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-9162: --- Attachment: HIVE-9162.patch Simple q file change. [~jpullokkaran] can you take a look? Comment in q file says set prefix to high value so path doesn't have to be hashed, but value is too low for some environments and it still gets hashed. [~vikram.dixit] ok for 14? stats19 test is environment-dependant - Key: HIVE-9162 URL: https://issues.apache.org/jira/browse/HIVE-9162 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.15.0, 0.14.1 Attachments: HIVE-9162.patch This is a very annoying test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9162) stats19 test is environment-dependant
[ https://issues.apache.org/jira/browse/HIVE-9162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-9162: --- Status: Patch Available (was: Open) stats19 test is environment-dependant - Key: HIVE-9162 URL: https://issues.apache.org/jira/browse/HIVE-9162 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.15.0, 0.14.1 Attachments: HIVE-9162.patch This is a very annoying test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8395) CBO: enable by default
[ https://issues.apache.org/jira/browse/HIVE-8395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252137#comment-14252137 ] Sergey Shelukhin commented on HIVE-8395: modified CBO: enable by default -- Key: HIVE-8395 URL: https://issues.apache.org/jira/browse/HIVE-8395 Project: Hive Issue Type: Improvement Components: CBO Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Labels: TODOC15 Fix For: 0.15.0 Attachments: HIVE-8395-27-28-delta.patch, HIVE-8395-28-29-delta.patch, HIVE-8395.01.patch, HIVE-8395.02.patch, HIVE-8395.03.patch, HIVE-8395.04.patch, HIVE-8395.05.patch, HIVE-8395.06.patch, HIVE-8395.07.patch, HIVE-8395.08.patch, HIVE-8395.09.patch, HIVE-8395.10.patch, HIVE-8395.11.patch, HIVE-8395.12.patch, HIVE-8395.12.patch, HIVE-8395.13.patch, HIVE-8395.13.patch, HIVE-8395.14.patch, HIVE-8395.15.patch, HIVE-8395.16.patch, HIVE-8395.17.patch, HIVE-8395.18.patch, HIVE-8395.18.patch, HIVE-8395.19.patch, HIVE-8395.20.patch, HIVE-8395.21.patch, HIVE-8395.22.patch, HIVE-8395.23.patch, HIVE-8395.23.withon.patch, HIVE-8395.24.patch, HIVE-8395.25.patch, HIVE-8395.25.patch, HIVE-8395.26.patch, HIVE-8395.27.patch, HIVE-8395.28.patch, HIVE-8395.29.patch, HIVE-8395.30.patch, HIVE-8395.31.patch, HIVE-8395.32.patch, HIVE-8395.33.patch, HIVE-8395.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 28930: HIVE-8639 : Convert SMBJoin to MapJoin [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28930/#review65528 --- Ship it! Ship It! - Xuefu Zhang On Dec. 18, 2014, 2:07 a.m., Szehon Ho wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28930/ --- (Updated Dec. 18, 2014, 2:07 a.m.) Review request for hive. Bugs: HIVE-8639 https://issues.apache.org/jira/browse/HIVE-8639 Repository: hive-git Description --- In MapReduce for auto-SMB joins, SortedMergeJoinProc is run in the earlier Optimizer layer to convert join to SMB join, and SortMergeJoinResolver is run in later PhysicalOptimizer layer to convert it to MapJoin. For Spark, we have an opportunity to make it cleaner by deciding putting both SMB and MapJoin conversions in the logical layer and deciding which one to call. This patch introduces a new unitied join processor called 'SparkJoinOptimizer' in the logical layer. This will call 'SparkMapJoinOptimizer' and 'SparkSortMergeJoinOptimizer' in a certain order depending on the flags that are set and which ever one is available fails. Thus no need to write a SMB - MapJoin path. 'SparkSortMergeJoinOptimizer' is a new class that wraps the logic of SortedMergeJoinProc but for Spark. To put both MapJoin/SMB processor in the same level, I had to do some fixes. 1. One fix is in 'NonBlockingOpDeDupProc', to fix the join context state, as now its run before the SMB code that relies on it. For this I submitted a trunk patch at HIVE-9060. 2. The second fix is that MapReduce's SMB code did two graph walks, one processor to calculate all 'rejected' joins, and another processor to change the non-rejected ones to SMB join. That would have made it so we do multiple walks, so I refactored the 'rejected' join logic in the same join-operator visit in SparkSortMergeJoinOptimizer. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java c2e643d ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkJoinOptimizer.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java 680c6fd ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java 83625ef ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinOptimizer.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 5e432ac ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java b6a7ac2 ql/src/test/results/clientpositive/spark/auto_join32.q.out 28c022e ql/src/test/results/clientpositive/spark/auto_join_stats.q.out bccd246 ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 842b4b3 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_1.q.out 2e35c66 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_12.q.out ee37010 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_13.q.out b2e928f ql/src/test/results/clientpositive/spark/auto_sortmerge_join_14.q.out 20ee657 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_15.q.out 0a48d00 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_2.q.out 5008a3f ql/src/test/results/clientpositive/spark/auto_sortmerge_join_3.q.out 3b081af ql/src/test/results/clientpositive/spark/auto_sortmerge_join_4.q.out 2a11fb2 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_5.q.out 0d971d2 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_6.q.out 9d455dc ql/src/test/results/clientpositive/spark/auto_sortmerge_join_7.q.out 61eb6ae ql/src/test/results/clientpositive/spark/auto_sortmerge_join_8.q.out 198d50d ql/src/test/results/clientpositive/spark/auto_sortmerge_join_9.q.out f59e57f ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_2.q.out b58091c ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_4.q.out 8ee392e ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_6.q.out 9c119df ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_7.q.out b9ad92d ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_8.q.out ed4d03f ql/src/test/results/clientpositive/spark/cross_product_check_2.q.out 6fb69a5 ql/src/test/results/clientpositive/spark/parquet_join.q.out 240989a ql/src/test/results/clientpositive/spark/smb_mapjoin_17.q.out 268ae23 ql/src/test/results/clientpositive/spark/smb_mapjoin_25.q.out df66cc2 ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out f635949 Diff:
[jira] [Commented] (HIVE-8639) Convert SMBJoin to MapJoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252142#comment-14252142 ] Xuefu Zhang commented on HIVE-8639: --- +1 Convert SMBJoin to MapJoin [Spark Branch] - Key: HIVE-8639 URL: https://issues.apache.org/jira/browse/HIVE-8639 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-8639.1-spark.patch, HIVE-8639.2-spark.patch, HIVE-8639.3-spark.patch, HIVE-8639.3-spark.patch, HIVE-8639.4-spark.patch HIVE-8202 supports auto-conversion of SMB Join. However, if the tables are partitioned, there could be a slow down as each mapper would need to get a very small chunk of a partition which has a single key. Thus, in some scenarios it's beneficial to convert SMB join to map join. The task is to research and support the conversion from SMB join to map join for Spark execution engine. See the equivalent of MapReduce in SortMergeJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8639) Convert SMBJoin to MapJoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8639: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Committed to Spark branch. Thanks to Szehon for this nice piece. Convert SMBJoin to MapJoin [Spark Branch] - Key: HIVE-8639 URL: https://issues.apache.org/jira/browse/HIVE-8639 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Szehon Ho Fix For: spark-branch Attachments: HIVE-8639.1-spark.patch, HIVE-8639.2-spark.patch, HIVE-8639.3-spark.patch, HIVE-8639.3-spark.patch, HIVE-8639.4-spark.patch HIVE-8202 supports auto-conversion of SMB Join. However, if the tables are partitioned, there could be a slow down as each mapper would need to get a very small chunk of a partition which has a single key. Thus, in some scenarios it's beneficial to convert SMB join to map join. The task is to research and support the conversion from SMB join to map join for Spark execution engine. See the equivalent of MapReduce in SortMergeJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9162) stats19 test is environment-dependant
[ https://issues.apache.org/jira/browse/HIVE-9162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252155#comment-14252155 ] Laljo John Pullokkaran commented on HIVE-9162: -- +1 stats19 test is environment-dependant - Key: HIVE-9162 URL: https://issues.apache.org/jira/browse/HIVE-9162 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.15.0, 0.14.1 Attachments: HIVE-9162.patch This is a very annoying test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9161) Fix ordering differences on UDF functions due to Java8
[ https://issues.apache.org/jira/browse/HIVE-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252159#comment-14252159 ] Szehon Ho commented on HIVE-9161: - Looks good to me, +1 pending tests Fix ordering differences on UDF functions due to Java8 -- Key: HIVE-9161 URL: https://issues.apache.org/jira/browse/HIVE-9161 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.1 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9161.1.patch Java 8 uses a different hash function for HashMap, which is leading to iteration order differences in several cases. (See Java8 vs Java7) This part is related to UDF functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9161) Fix ordering differences on UDF functions due to Java8
[ https://issues.apache.org/jira/browse/HIVE-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252165#comment-14252165 ] Hive QA commented on HIVE-9161: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12688078/HIVE-9161.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6714 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_varchar_udf1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2129/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2129/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2129/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12688078 - PreCommit-HIVE-TRUNK-Build Fix ordering differences on UDF functions due to Java8 -- Key: HIVE-9161 URL: https://issues.apache.org/jira/browse/HIVE-9161 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.1 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9161.1.patch Java 8 uses a different hash function for HashMap, which is leading to iteration order differences in several cases. (See Java8 vs Java7) This part is related to UDF functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9158) Multiple LDAP server URLs in hive.server2.authentication.ldap.url
[ https://issues.apache.org/jira/browse/HIVE-9158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252173#comment-14252173 ] Szehon Ho commented on HIVE-9158: - Seems reasonable, +1 Multiple LDAP server URLs in hive.server2.authentication.ldap.url - Key: HIVE-9158 URL: https://issues.apache.org/jira/browse/HIVE-9158 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.14.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Attachments: HIVE-9158.1.patch, LDAPClient.java Support for multiple LDAP servers for failover in the event that one stops responding or is down for maintenance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9158) Multiple LDAP server URLs in hive.server2.authentication.ldap.url
[ https://issues.apache.org/jira/browse/HIVE-9158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-9158: Labels: TODOC15 (was: ) Need to doc [Configuration Properties| https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HiveServer2] Multiple LDAP server URLs in hive.server2.authentication.ldap.url - Key: HIVE-9158 URL: https://issues.apache.org/jira/browse/HIVE-9158 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.14.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Labels: TODOC15 Attachments: HIVE-9158.1.patch, LDAPClient.java Support for multiple LDAP servers for failover in the event that one stops responding or is down for maintenance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9158) Multiple LDAP server URLs in hive.server2.authentication.ldap.url
[ https://issues.apache.org/jira/browse/HIVE-9158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252190#comment-14252190 ] Naveen Gangam commented on HIVE-9158: - Thanks Szehon. I just updated the Configuration Properties to add this info. Multiple LDAP server URLs in hive.server2.authentication.ldap.url - Key: HIVE-9158 URL: https://issues.apache.org/jira/browse/HIVE-9158 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.14.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Labels: TODOC15 Attachments: HIVE-9158.1.patch, LDAPClient.java Support for multiple LDAP servers for failover in the event that one stops responding or is down for maintenance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8131) Support timestamp in Avro
[ https://issues.apache.org/jira/browse/HIVE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8131: --- Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Support timestamp in Avro - Key: HIVE-8131 URL: https://issues.apache.org/jira/browse/HIVE-8131 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Ferdinand Xu Fix For: 0.15.0 Attachments: HIVE-8131.1.patch, HIVE-8131.patch, HIVE-8131.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8131) Support timestamp in Avro
[ https://issues.apache.org/jira/browse/HIVE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252196#comment-14252196 ] Brock Noland commented on HIVE-8131: Thank you for your contribution! I have committed this to trunk! Support timestamp in Avro - Key: HIVE-8131 URL: https://issues.apache.org/jira/browse/HIVE-8131 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Ferdinand Xu Fix For: 0.15.0 Attachments: HIVE-8131.1.patch, HIVE-8131.patch, HIVE-8131.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9161) Fix ordering differences on UDF functions due to Java8
[ https://issues.apache.org/jira/browse/HIVE-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-9161: -- Status: Open (was: Patch Available) Fix ordering differences on UDF functions due to Java8 -- Key: HIVE-9161 URL: https://issues.apache.org/jira/browse/HIVE-9161 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.1 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9161.1.patch Java 8 uses a different hash function for HashMap, which is leading to iteration order differences in several cases. (See Java8 vs Java7) This part is related to UDF functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9161) Fix ordering differences on UDF functions due to Java8
[ https://issues.apache.org/jira/browse/HIVE-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-9161: -- Attachment: HIVE-9161.2.patch Fixed varchar_udf1.q for java7 Fix ordering differences on UDF functions due to Java8 -- Key: HIVE-9161 URL: https://issues.apache.org/jira/browse/HIVE-9161 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.1 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9161.1.patch, HIVE-9161.2.patch Java 8 uses a different hash function for HashMap, which is leading to iteration order differences in several cases. (See Java8 vs Java7) This part is related to UDF functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9161) Fix ordering differences on UDF functions due to Java8
[ https://issues.apache.org/jira/browse/HIVE-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-9161: -- Status: Patch Available (was: Open) Fix ordering differences on UDF functions due to Java8 -- Key: HIVE-9161 URL: https://issues.apache.org/jira/browse/HIVE-9161 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.1 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9161.1.patch, HIVE-9161.2.patch Java 8 uses a different hash function for HashMap, which is leading to iteration order differences in several cases. (See Java8 vs Java7) This part is related to UDF functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8816) Create unit test join of two encrypted tables with different keys
[ https://issues.apache.org/jira/browse/HIVE-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252225#comment-14252225 ] Brock Noland commented on HIVE-8816: Thank you! I think we want to do: {noformat} EXPLAIN EXTENDED SELECT * FROM encryptedWith256BitsKeyDB.encryptedTableIn256BitsKey t1 JOIN encryptedWith128BitsKeyDB.encryptedTableIn128BitsKey t2 WHERE t1.key = t2.key; {noformat} then in the same q-file: actually execute the join: {noformat} SELECT * FROM encryptedWith256BitsKeyDB.encryptedTableIn256BitsKey t1 JOIN encryptedWith128BitsKeyDB.encryptedTableIn128BitsKey t2 WHERE t1.key = t2.key; {noformat} Also please add {{--SORT_QUERY_RESULTS}} since different JVM's or execution engines could order results differently. Create unit test join of two encrypted tables with different keys - Key: HIVE-8816 URL: https://issues.apache.org/jira/browse/HIVE-8816 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Ferdinand Xu Fix For: encryption-branch Attachments: HIVE-8816.1.patch, HIVE-8816.patch NO PRECOMMIT TESTS The results should be inserted into a third table encrypted with a separate key. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9163) create temporary table will fail with wasb storage because MoveTask.moveFile tries to move data to hdfs dir instead of wasb dir
Hari Sankar Sivarama Subramaniyan created HIVE-9163: --- Summary: create temporary table will fail with wasb storage because MoveTask.moveFile tries to move data to hdfs dir instead of wasb dir Key: HIVE-9163 URL: https://issues.apache.org/jira/browse/HIVE-9163 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan {source} create temporary table s10k stored as orc as select * from studenttab10k; create temporary table v10k as select * from votertab10k; select registration from s10k s join v10k v on (s.name = v.name) join studentparttab30k p on (p.name = v.name) where s.age 25 and v.age 25 and p.age 25; {source} It fails because it tries to move data to hdfs dir instead of wasb dir: {source} 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.log.dir does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.server2.map.fair.scheduler.queue does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist Logging initialized using configuration in file:/C:/apps/dist/hive-0.14.0.2.2.1.0-2073/conf/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/C:/apps/dist/hadoop-2.6.0.2.2.1.0-2073/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLogger Binder.class] SLF4J: Found binding in [jar:file:/C:/apps/dist/hive-0.14.0.2.2.1.0-2073/lib/hive-jdbc-0.14.0.2.2.1.0-2073-standalone.jar!/org/slf4j/impl/StaticLogger Binder.class] SLF4J: Found binding in [jar:file:/C:/apps/dist/hbase-0.98.4.2.2.1.0-2073-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class ] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Query ID = hadoopqa_20141211002525_e36a9a92-7102-4bd7-8f4a-cb4bfd7d2012 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1418224548060_0070, Tracking URL = http://headnode0:9014/proxy/application_1418224548060_0070/ Kill Command = C:\apps\dist\hadoop-2.6.0.2.2.1.0-2073\bin\hadoop.cmd job -kill job_1418224548060_0070 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-12-11 00:25:39,949 Stage-1 map = 0%, reduce = 0% 2014-12-11 00:25:52,603 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.421 sec MapReduce Total cumulative CPU time: 4 seconds 421 msec Ended Job = job_1418224548060_0070 Stage-3 is selected by condition resolver. Stage-2 is filtered out by condition resolver. Stage-4 is filtered out by condition resolver. Moving data to: wasb://asvhive22-2014-12-1004-00...@hwxasvtesting.blob.core.windows.net/hive/scratch/hadoopqa/008c3436-c468-48da-b9a3-eb3ffa649594/hiv e_2014-12-11_00-25-20_179_5217265991480659378-1/-ext-10001 Moving data to: hdfs://headnode0:9000/hive/scratch/hadoopqa/008c3436-c468-48da-b9a3-eb3ffa649594/_tmp_space.db/452568d8-7ac2-4e7f-901c-e0c12dba2063 Failed with exception Unable to move source wasb://asvhive22-2014-12-1004-00...@hwxasvtesting.blob.core.windows.net/hive/scratch/hadoopqa/008c3436-c46 8-48da-b9a3-eb3ffa649594/hive_2014-12-11_00-25-20_179_5217265991480659378-1/-ext-10001 to destination hdfs://headnode0:9000/hive/scratch/hadoopqa/008c 3436-c468-48da-b9a3-eb3ffa649594/_tmp_space.db/452568d8-7ac2-4e7f-901c-e0c12dba2063 FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 4.421 sec HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 421 msec {source} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9163) create temporary table will fail with wasb storage because MoveTask.moveFile tries to move data to hdfs dir instead of wasb dir
[ https://issues.apache.org/jira/browse/HIVE-9163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-9163: Description: {quote} create temporary table s10k stored as orc as select * from studenttab10k; create temporary table v10k as select * from votertab10k; select registration from s10k s join v10k v on (s.name = v.name) join studentparttab30k p on (p.name = v.name) where s.age 25 and v.age 25 and p.age 25; {quote} It fails because it tries to move data to hdfs dir instead of wasb dir: {quote} 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.log.dir does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.server2.map.fair.scheduler.queue does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist Logging initialized using configuration in file:/C:/apps/dist/hive-0.14.0.2.2.1.0-2073/conf/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/C:/apps/dist/hadoop-2.6.0.2.2.1.0-2073/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLogger Binder.class] SLF4J: Found binding in [jar:file:/C:/apps/dist/hive-0.14.0.2.2.1.0-2073/lib/hive-jdbc-0.14.0.2.2.1.0-2073-standalone.jar!/org/slf4j/impl/StaticLogger Binder.class] SLF4J: Found binding in [jar:file:/C:/apps/dist/hbase-0.98.4.2.2.1.0-2073-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class ] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Query ID = hadoopqa_20141211002525_e36a9a92-7102-4bd7-8f4a-cb4bfd7d2012 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1418224548060_0070, Tracking URL = http://headnode0:9014/proxy/application_1418224548060_0070/ Kill Command = C:\apps\dist\hadoop-2.6.0.2.2.1.0-2073\bin\hadoop.cmd job -kill job_1418224548060_0070 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-12-11 00:25:39,949 Stage-1 map = 0%, reduce = 0% 2014-12-11 00:25:52,603 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.421 sec MapReduce Total cumulative CPU time: 4 seconds 421 msec Ended Job = job_1418224548060_0070 Stage-3 is selected by condition resolver. Stage-2 is filtered out by condition resolver. Stage-4 is filtered out by condition resolver. Moving data to: wasb://asvhive22-2014-12-1004-00...@hwxasvtesting.blob.core.windows.net/hive/scratch/hadoopqa/008c3436-c468-48da-b9a3-eb3ffa649594/hiv e_2014-12-11_00-25-20_179_5217265991480659378-1/-ext-10001 Moving data to: hdfs://headnode0:9000/hive/scratch/hadoopqa/008c3436-c468-48da-b9a3-eb3ffa649594/_tmp_space.db/452568d8-7ac2-4e7f-901c-e0c12dba2063 Failed with exception Unable to move source wasb://asvhive22-2014-12-1004-00...@hwxasvtesting.blob.core.windows.net/hive/scratch/hadoopqa/008c3436-c46 8-48da-b9a3-eb3ffa649594/hive_2014-12-11_00-25-20_179_5217265991480659378-1/-ext-10001 to destination hdfs://headnode0:9000/hive/scratch/hadoopqa/008c 3436-c468-48da-b9a3-eb3ffa649594/_tmp_space.db/452568d8-7ac2-4e7f-901c-e0c12dba2063 FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 4.421 sec HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 421 msec {quote} was: {source} create temporary table s10k stored as orc as select * from studenttab10k; create temporary table v10k as select * from votertab10k; select registration from s10k s join v10k v on (s.name = v.name) join studentparttab30k p on (p.name = v.name) where s.age 25 and v.age 25 and p.age 25; {source} It fails because it tries to move data to hdfs dir instead of wasb dir: {source} 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.log.dir does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.server2.map.fair.scheduler.queue does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist Logging initialized using configuration in
[jira] [Created] (HIVE-9164) Profile query compiler #2 [Spark Branch]
Chao created HIVE-9164: -- Summary: Profile query compiler #2 [Spark Branch] Key: HIVE-9164 URL: https://issues.apache.org/jira/browse/HIVE-9164 Project: Hive Issue Type: Improvement Components: Spark Affects Versions: spark-branch Reporter: Chao In addition to the logs in HIVE-9136, we should also log logical/physical optimization in {{SparkCompiler}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9163) create temporary table will fail with wasb storage because MoveTask.moveFile tries to move data to hdfs dir instead of wasb dir
[ https://issues.apache.org/jira/browse/HIVE-9163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-9163: Attachment: HIVE-9163.1.patch create temporary table will fail with wasb storage because MoveTask.moveFile tries to move data to hdfs dir instead of wasb dir --- Key: HIVE-9163 URL: https://issues.apache.org/jira/browse/HIVE-9163 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-9163.1.patch {quote} create temporary table s10k stored as orc as select * from studenttab10k; create temporary table v10k as select * from votertab10k; select registration from s10k s join v10k v on (s.name = v.name) join studentparttab30k p on (p.name = v.name) where s.age 25 and v.age 25 and p.age 25; {quote} It fails because it tries to move data to hdfs dir instead of wasb dir: {quote} 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.log.dir does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.server2.map.fair.scheduler.queue does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist Logging initialized using configuration in file:/C:/apps/dist/hive-0.14.0.2.2.1.0-2073/conf/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/C:/apps/dist/hadoop-2.6.0.2.2.1.0-2073/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLogger Binder.class] SLF4J: Found binding in [jar:file:/C:/apps/dist/hive-0.14.0.2.2.1.0-2073/lib/hive-jdbc-0.14.0.2.2.1.0-2073-standalone.jar!/org/slf4j/impl/StaticLogger Binder.class] SLF4J: Found binding in [jar:file:/C:/apps/dist/hbase-0.98.4.2.2.1.0-2073-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class ] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Query ID = hadoopqa_20141211002525_e36a9a92-7102-4bd7-8f4a-cb4bfd7d2012 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1418224548060_0070, Tracking URL = http://headnode0:9014/proxy/application_1418224548060_0070/ Kill Command = C:\apps\dist\hadoop-2.6.0.2.2.1.0-2073\bin\hadoop.cmd job -kill job_1418224548060_0070 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-12-11 00:25:39,949 Stage-1 map = 0%, reduce = 0% 2014-12-11 00:25:52,603 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.421 sec MapReduce Total cumulative CPU time: 4 seconds 421 msec Ended Job = job_1418224548060_0070 Stage-3 is selected by condition resolver. Stage-2 is filtered out by condition resolver. Stage-4 is filtered out by condition resolver. Moving data to: wasb://asvhive22-2014-12-1004-00...@hwxasvtesting.blob.core.windows.net/hive/scratch/hadoopqa/008c3436-c468-48da-b9a3-eb3ffa649594/hiv e_2014-12-11_00-25-20_179_5217265991480659378-1/-ext-10001 Moving data to: hdfs://headnode0:9000/hive/scratch/hadoopqa/008c3436-c468-48da-b9a3-eb3ffa649594/_tmp_space.db/452568d8-7ac2-4e7f-901c-e0c12dba2063 Failed with exception Unable to move source wasb://asvhive22-2014-12-1004-00...@hwxasvtesting.blob.core.windows.net/hive/scratch/hadoopqa/008c3436-c46 8-48da-b9a3-eb3ffa649594/hive_2014-12-11_00-25-20_179_5217265991480659378-1/-ext-10001 to destination hdfs://headnode0:9000/hive/scratch/hadoopqa/008c 3436-c468-48da-b9a3-eb3ffa649594/_tmp_space.db/452568d8-7ac2-4e7f-901c-e0c12dba2063 FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 4.421 sec HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 421 msec {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9163) create temporary table will fail with wasb storage because MoveTask.moveFile tries to move data to hdfs dir instead of wasb dir
[ https://issues.apache.org/jira/browse/HIVE-9163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-9163: Status: Patch Available (was: Open) create temporary table will fail with wasb storage because MoveTask.moveFile tries to move data to hdfs dir instead of wasb dir --- Key: HIVE-9163 URL: https://issues.apache.org/jira/browse/HIVE-9163 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-9163.1.patch {quote} create temporary table s10k stored as orc as select * from studenttab10k; create temporary table v10k as select * from votertab10k; select registration from s10k s join v10k v on (s.name = v.name) join studentparttab30k p on (p.name = v.name) where s.age 25 and v.age 25 and p.age 25; {quote} It fails because it tries to move data to hdfs dir instead of wasb dir: {quote} 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.log.dir does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.server2.map.fair.scheduler.queue does not exist 14/12/11 00:25:16 WARN conf.HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist Logging initialized using configuration in file:/C:/apps/dist/hive-0.14.0.2.2.1.0-2073/conf/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/C:/apps/dist/hadoop-2.6.0.2.2.1.0-2073/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLogger Binder.class] SLF4J: Found binding in [jar:file:/C:/apps/dist/hive-0.14.0.2.2.1.0-2073/lib/hive-jdbc-0.14.0.2.2.1.0-2073-standalone.jar!/org/slf4j/impl/StaticLogger Binder.class] SLF4J: Found binding in [jar:file:/C:/apps/dist/hbase-0.98.4.2.2.1.0-2073-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class ] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Query ID = hadoopqa_20141211002525_e36a9a92-7102-4bd7-8f4a-cb4bfd7d2012 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1418224548060_0070, Tracking URL = http://headnode0:9014/proxy/application_1418224548060_0070/ Kill Command = C:\apps\dist\hadoop-2.6.0.2.2.1.0-2073\bin\hadoop.cmd job -kill job_1418224548060_0070 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-12-11 00:25:39,949 Stage-1 map = 0%, reduce = 0% 2014-12-11 00:25:52,603 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.421 sec MapReduce Total cumulative CPU time: 4 seconds 421 msec Ended Job = job_1418224548060_0070 Stage-3 is selected by condition resolver. Stage-2 is filtered out by condition resolver. Stage-4 is filtered out by condition resolver. Moving data to: wasb://asvhive22-2014-12-1004-00...@hwxasvtesting.blob.core.windows.net/hive/scratch/hadoopqa/008c3436-c468-48da-b9a3-eb3ffa649594/hiv e_2014-12-11_00-25-20_179_5217265991480659378-1/-ext-10001 Moving data to: hdfs://headnode0:9000/hive/scratch/hadoopqa/008c3436-c468-48da-b9a3-eb3ffa649594/_tmp_space.db/452568d8-7ac2-4e7f-901c-e0c12dba2063 Failed with exception Unable to move source wasb://asvhive22-2014-12-1004-00...@hwxasvtesting.blob.core.windows.net/hive/scratch/hadoopqa/008c3436-c46 8-48da-b9a3-eb3ffa649594/hive_2014-12-11_00-25-20_179_5217265991480659378-1/-ext-10001 to destination hdfs://headnode0:9000/hive/scratch/hadoopqa/008c 3436-c468-48da-b9a3-eb3ffa649594/_tmp_space.db/452568d8-7ac2-4e7f-901c-e0c12dba2063 FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 4.421 sec HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 421 msec {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)