[jira] [Commented] (HIVE-8192) Check DDL's writetype in DummyTxnManager
[ https://issues.apache.org/jira/browse/HIVE-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229612#comment-14229612 ] Hive QA commented on HIVE-8192: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12684356/HIVE-8192.5.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6696 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mapjoin_mapjoin org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1938/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1938/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1938/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12684356 - PreCommit-HIVE-TRUNK-Build Check DDL's writetype in DummyTxnManager Key: HIVE-8192 URL: https://issues.apache.org/jira/browse/HIVE-8192 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.13.0, 0.13.1 Environment: hive0.13.1 Reporter: Wan Chang Priority: Minor Labels: patch Attachments: HIVE-8192.2.patch, HIVE-8192.3.patch, HIVE-8192.4.patch, HIVE-8192.5.patch The patch of HIVE-6734 added some DDL writetypes and checked DDL writetype in DbTxnManager.java. We use DummyTxnManager as the default value of hive.txn.manager in hive-site.xml. We noticed that the operation of CREATE TEMPORARY FUNCTION has a DLL_NO_LOCK writetype but it requires a EXCLUSIVE lock. If we try to create a temporary function while there's a SELECT is processing at the same database, then the console will print 'conflicting lock present for default mode EXCLUSIVE' and the CREATE TEMPORARY FUNCTION operation won't get the lock until the SELECT is done. Maybe it's a good idea to check the DDL's writetype in DummyTxnManager too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8999) hiveserver2 CUSTOM authentication Fails
Amithsha created HIVE-8999: -- Summary: hiveserver2 CUSTOM authentication Fails Key: HIVE-8999 URL: https://issues.apache.org/jira/browse/HIVE-8999 Project: Hive Issue Type: Bug Components: Beeline, HiveServer2 Affects Versions: 0.14.0 Environment: Centos 6.5 Hadoop 2.4.1 Hive 0.14.0 Reporter: Amithsha Planned to secure the hiverserver2 Using Custom authentication Method. But when the beeline starts and sets the server Ip and port using command. It hanges in the terminal after providing the username and Password. **Procedure Followed *Compiled Java File to create a jar import java.util.Hashtable; import javax.security.sasl.AuthenticationException; import org.apache.hive.service.auth.PasswdAuthenticationProvider; public class SampleAuthenticator implements PasswdAuthenticationProvider { HashtableString, String store = null; public SampleAuthenticator () { store = new HashtableString, String(); store.put(user1, passwd1); store.put(user2, passwd2); } @Override public void Authenticate(String user, String password) throws AuthenticationException { String storedPasswd = store.get(user); if (storedPasswd != null storedPasswd.equals(password)) return; throw new AuthenticationException(SampleAuthenticator: Error validating user); } } - *Properties Used in Hive-site.xml property namehive.server2.authentication/name valueCUSTOM/value /property property namehive.server2.custom.authentication.class/name valueorg.apache.hive.service.auth.PasswdAuthenticationProvider.SampleAuth/value /property -- *Started Beeline beeline !connect jdbc:hive2://localhost:1/default scan complete in 13ms Connecting to jdbc:hive2://localhost:1/default Enter username for jdbc:hive2://localhost:1/default: user1 Enter password for jdbc:hive2://localhost:1/default: *** SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/apache-hive/lib/hive-jdbc-0.14.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] -- Can Anyone help me by providing the correct Java file and Procedures to use Custom Authentication Thank you Amithsha.S -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-860) Persistent distributed cache
[ https://issues.apache.org/jira/browse/HIVE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-860: -- Attachment: HIVE-860.4.patch Persistent distributed cache Key: HIVE-860 URL: https://issues.apache.org/jira/browse/HIVE-860 Project: Hive Issue Type: Improvement Affects Versions: 0.12.0 Reporter: Zheng Shao Assignee: Ferdinand Xu Fix For: 0.15.0 Attachments: HIVE-860.1.patch, HIVE-860.2.patch, HIVE-860.2.patch, HIVE-860.3.patch, HIVE-860.4.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch DistributedCache is shared across multiple jobs, if the hdfs file name is the same. We need to make sure Hive put the same file into the same location every time and do not overwrite if the file content is the same. We can achieve 2 different results: A1. Files added with the same name, timestamp, and md5 in the same session will have a single copy in distributed cache. A2. Filed added with the same name, timestamp, and md5 will have a single copy in distributed cache. A2 has a bigger benefit in sharing but may raise a question on when Hive should clean it up in hdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9000) LAST_VALUE Window function returns wrong results
Mark Grover created HIVE-9000: - Summary: LAST_VALUE Window function returns wrong results Key: HIVE-9000 URL: https://issues.apache.org/jira/browse/HIVE-9000 Project: Hive Issue Type: Bug Components: PTF-Windowing Affects Versions: 0.13.1 Reporter: Mark Grover Priority: Critical Fix For: 0.14.1 LAST_VALUE Windowing function has been returning bad results, as far as I can tell from day 1. And, it seems like the tests are also asserting that LAST_VALUE gives the wrong result. Here's the test output: https://github.com/apache/hive/blob/branch-0.14/ql/src/test/results/clientpositive/windowing_navfn.q.out#L587 The query is: {code} select t, s, i, last_value(i) over (partition by t order by s) {code} The result is: {code} t si last_value(i) --- 10 oscar allen 65662 65662 10 oscar carson65549 65549 {code} LAST_VALUE(i) should have returned 65549 in both records, instead it simply ends up returning i. Another way you can make sure LAST_VALUE is bad is to verify it's result against LEAD(i,1) over (partition by t order by s). LAST_VALUE being last value should always be more (in terms of the specified 'order by s') than the lead by 1. While this doesn't directly apply to the above query, if the result set had more rows, you would clearly see records where lead is higher than last_value which is semantically incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9000) LAST_VALUE Window function returns wrong results
[ https://issues.apache.org/jira/browse/HIVE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Grover updated HIVE-9000: -- Description: LAST_VALUE Windowing function has been returning bad results, as far as I can tell from day 1. And, it seems like the tests are also asserting that LAST_VALUE gives the wrong result. Here's the test output: https://github.com/apache/hive/blob/branch-0.14/ql/src/test/results/clientpositive/windowing_navfn.q.out#L587 The query is: {code} select t, s, i, last_value(i) over (partition by t order by s) {code} The result is: {code} t si last_value(i) --- 10 oscar allen 65662 65662 10 oscar carson65549 65549 {code} {{LAST_VALUE( i )}} should have returned 65549 in both records, instead it simply ends up returning i. Another way you can make sure LAST_VALUE is bad is to verify it's result against LEAD(i,1) over (partition by t order by s). LAST_VALUE being last value should always be more (in terms of the specified 'order by s') than the lead by 1. While this doesn't directly apply to the above query, if the result set had more rows, you would clearly see records where lead is higher than last_value which is semantically incorrect. was: LAST_VALUE Windowing function has been returning bad results, as far as I can tell from day 1. And, it seems like the tests are also asserting that LAST_VALUE gives the wrong result. Here's the test output: https://github.com/apache/hive/blob/branch-0.14/ql/src/test/results/clientpositive/windowing_navfn.q.out#L587 The query is: {code} select t, s, i, last_value(i) over (partition by t order by s) {code} The result is: {code} t si last_value(i) --- 10 oscar allen 65662 65662 10 oscar carson65549 65549 {code} LAST_VALUE(i) should have returned 65549 in both records, instead it simply ends up returning i. Another way you can make sure LAST_VALUE is bad is to verify it's result against LEAD(i,1) over (partition by t order by s). LAST_VALUE being last value should always be more (in terms of the specified 'order by s') than the lead by 1. While this doesn't directly apply to the above query, if the result set had more rows, you would clearly see records where lead is higher than last_value which is semantically incorrect. LAST_VALUE Window function returns wrong results Key: HIVE-9000 URL: https://issues.apache.org/jira/browse/HIVE-9000 Project: Hive Issue Type: Bug Components: PTF-Windowing Affects Versions: 0.13.1 Reporter: Mark Grover Priority: Critical Fix For: 0.14.1 LAST_VALUE Windowing function has been returning bad results, as far as I can tell from day 1. And, it seems like the tests are also asserting that LAST_VALUE gives the wrong result. Here's the test output: https://github.com/apache/hive/blob/branch-0.14/ql/src/test/results/clientpositive/windowing_navfn.q.out#L587 The query is: {code} select t, s, i, last_value(i) over (partition by t order by s) {code} The result is: {code} t si last_value(i) --- 10oscar allen 65662 65662 10oscar carson65549 65549 {code} {{LAST_VALUE( i )}} should have returned 65549 in both records, instead it simply ends up returning i. Another way you can make sure LAST_VALUE is bad is to verify it's result against LEAD(i,1) over (partition by t order by s). LAST_VALUE being last value should always be more (in terms of the specified 'order by s') than the lead by 1. While this doesn't directly apply to the above query, if the result set had more rows, you would clearly see records where lead is higher than last_value which is semantically incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8135) Pool zookeeper connections
[ https://issues.apache.org/jira/browse/HIVE-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229993#comment-14229993 ] Brock Noland commented on HIVE-8135: Curator would be fantastic to use. Do they have a connection pooling recipe or a read-write lock? Pool zookeeper connections -- Key: HIVE-8135 URL: https://issues.apache.org/jira/browse/HIVE-8135 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Ferdinand Xu Today we create a ZK connection per client. We should instead have a connection pool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8998) Logging is not configured in spark-submit sub-process
[ https://issues.apache.org/jira/browse/HIVE-8998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8998: --- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Logging is not configured in spark-submit sub-process - Key: HIVE-8998 URL: https://issues.apache.org/jira/browse/HIVE-8998 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Brock Noland Fix For: spark-branch Attachments: HIVE-8998.1-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 28557: HIVE-8851 Broadcast files for small tables via SparkContext.addFile() and SparkFiles.get() [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28557/ --- Review request for hive and Xuefu Zhang. Bugs: HIVE-8851 https://issues.apache.org/jira/browse/HIVE-8851 Repository: hive-git Description --- Deploy small files with Spark job, instead of loading them from HDFS. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java cfc1501 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 2fe33f5 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClient.java a456d6c ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java faa91e3 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java f46c1b4 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 30b7632 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 0bd18e0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSession.java 461f359 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java c95d868 Diff: https://reviews.apache.org/r/28557/diff/ Testing --- Unit tests. Thanks, Jimmy Xiang
[jira] [Updated] (HIVE-8851) Broadcast files for small tables via SparkContext.addFile() and SparkFiles.get() [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-8851: -- Attachment: HIVE-8851.1-spark.patch Attached patch v1. It is also on RB: https://reviews.apache.org/r/28557/ Broadcast files for small tables via SparkContext.addFile() and SparkFiles.get() [Spark Branch] --- Key: HIVE-8851 URL: https://issues.apache.org/jira/browse/HIVE-8851 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-8851.1-spark.patch Currently files generated by SparkHashTableSinkOperator for small tables are written directly on HDFS with a high replication factor. When map join happens, map join operator is going to load these files into hash tables. Since on multiple partitions can be process on the same worker node, reading the same set of files multiple times are not ideal. The improvment can be done by calling SparkContext.addFiles() on these files, and use SparkFiles.getFile() to download them to the worker node just once. Please note that SparkFiles.getFile() is a static method. Code invoking this method needs to be in a static method. This calling method needs to be synchronized because it may get called in different threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8851) Broadcast files for small tables via SparkContext.addFile() and SparkFiles.get() [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-8851: -- Fix Version/s: spark-branch Status: Patch Available (was: Open) Broadcast files for small tables via SparkContext.addFile() and SparkFiles.get() [Spark Branch] --- Key: HIVE-8851 URL: https://issues.apache.org/jira/browse/HIVE-8851 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-8851.1-spark.patch Currently files generated by SparkHashTableSinkOperator for small tables are written directly on HDFS with a high replication factor. When map join happens, map join operator is going to load these files into hash tables. Since on multiple partitions can be process on the same worker node, reading the same set of files multiple times are not ideal. The improvment can be done by calling SparkContext.addFiles() on these files, and use SparkFiles.getFile() to download them to the worker node just once. Please note that SparkFiles.getFile() is a static method. Code invoking this method needs to be in a static method. This calling method needs to be synchronized because it may get called in different threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8875) hive.optimize.sort.dynamic.partition should be turned off for ACID
[ https://issues.apache.org/jira/browse/HIVE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230134#comment-14230134 ] Alan Gates commented on HIVE-8875: -- Already added. Search on hive.optimize.sort.dynamic.partition in https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML hive.optimize.sort.dynamic.partition should be turned off for ACID -- Key: HIVE-8875 URL: https://issues.apache.org/jira/browse/HIVE-8875 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.15.0 Attachments: HIVE-8875.2.patch, HIVE-8875.patch Turning this on causes ACID insert, updates, and deletes to produce non-optimal plans with extra reduce phases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230169#comment-14230169 ] Chao commented on HIVE-8970: I just regenerated all qfile outputs, and noticed some result differences for join38.q, join_literals.q, join_nullsafe.q and subquery_in.q. Look like they are not related to map-join work, since if I reset my branch to the HIVE-8946 commit then the results are correct. Maybe it's caused by the recent merge or RSC commits. Enable map join optimization only when hive.auto.convert.join is true [Spark Branch] Key: HIVE-8970 URL: https://issues.apache.org/jira/browse/HIVE-8970 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Fix For: spark-branch Attachments: HIVE-8970.1-spark.patch, HIVE-8970.2-spark.patch Right now, in Spark branch we enable MJ without looking at this configuration. The related code in {{SparkMapJoinOptimizer}} is commented out. We should only enable MJ when the flag is true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8946) Enable Map Join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230173#comment-14230173 ] Chao commented on HIVE-8946: Also, for join38.q I ran the query on CLI local mode with spark.master=local[4], and the result was correct, but running unit test gave different result. Enable Map Join [Spark Branch] -- Key: HIVE-8946 URL: https://issues.apache.org/jira/browse/HIVE-8946 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Fix For: spark-branch Attachments: HIVE-8946.1-spark.patch, HIVE-8946.2-spark.patch, HIVE-8946.3-spark.patch Since all the related issues have been identified and tracked by related JIRAs, in this JIRA we turn on the map join optimization for Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8946) Enable Map Join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230175#comment-14230175 ] Chao commented on HIVE-8946: Sorry for the mistake - the above comment should belong to HIVE-8970. Enable Map Join [Spark Branch] -- Key: HIVE-8946 URL: https://issues.apache.org/jira/browse/HIVE-8946 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Fix For: spark-branch Attachments: HIVE-8946.1-spark.patch, HIVE-8946.2-spark.patch, HIVE-8946.3-spark.patch Since all the related issues have been identified and tracked by related JIRAs, in this JIRA we turn on the map join optimization for Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8774) CBO: enable groupBy index
[ https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-8774: -- Attachment: HIVE-8774.10.patch CBO: enable groupBy index - Key: HIVE-8774 URL: https://issues.apache.org/jira/browse/HIVE-8774 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, HIVE-8774.5.patch, HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, HIVE-8774.9.patch Right now, even when groupby index is build, CBO is not able to use it. In this patch, we are trying to make it use groupby index that we build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8774) CBO: enable groupBy index
[ https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-8774: -- Attachment: (was: HIVE-8774.9.1.patch) CBO: enable groupBy index - Key: HIVE-8774 URL: https://issues.apache.org/jira/browse/HIVE-8774 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, HIVE-8774.5.patch, HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, HIVE-8774.9.patch Right now, even when groupby index is build, CBO is not able to use it. In this patch, we are trying to make it use groupby index that we build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8774) CBO: enable groupBy index
[ https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-8774: -- Status: Open (was: Patch Available) CBO: enable groupBy index - Key: HIVE-8774 URL: https://issues.apache.org/jira/browse/HIVE-8774 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, HIVE-8774.5.patch, HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, HIVE-8774.9.patch Right now, even when groupby index is build, CBO is not able to use it. In this patch, we are trying to make it use groupby index that we build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8774) CBO: enable groupBy index
[ https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-8774: -- Status: Patch Available (was: Open) CBO: enable groupBy index - Key: HIVE-8774 URL: https://issues.apache.org/jira/browse/HIVE-8774 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, HIVE-8774.5.patch, HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, HIVE-8774.9.patch Right now, even when groupby index is build, CBO is not able to use it. In this patch, we are trying to make it use groupby index that we build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8991) Fix custom_input_output_format [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230227#comment-14230227 ] Marcelo Vanzin commented on HIVE-8991: -- Hi [~lirui], the patch looks good if it unblocks the unit tests. I have to think a bit about whether it would work in a real deployment scenario, since IIRC hive-exec shades a lot of dependencies and it might cause problems with Spark. But the main one (Guava) should be solved in Spark, so hopefully there won't be other cases like that. Fix custom_input_output_format [Spark Branch] - Key: HIVE-8991 URL: https://issues.apache.org/jira/browse/HIVE-8991 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8991.1-spark.patch After HIVE-8836, {{custom_input_output_format}} fails because of missing hive-it-util in remote driver's class path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 27713: CBO: enable groupBy index
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27713/ --- (Updated Dec. 1, 2014, 6:57 p.m.) Review request for hive and John Pullokkaran. Repository: hive-git Description --- Right now, even when groupby index is build, CBO is not able to use it. In this patch, we are trying to make it use groupby index that we build. The basic problem is that for SEL1-SEL2-GRY-...-SEL3, the previous version only modify SEL2, which immediately precedes GRY. Now, with CBO, we have lots of SELs, e.g., SEL1. So, the solution is to modify all of them. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 9ffa708 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java 02216de ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 0f06ec9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java 74614f3 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java d699308 ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_1.q PRE-CREATION ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_2.q PRE-CREATION ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out fdc1dc6 ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_1.q.out PRE-CREATION ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/27713/diff/ Testing --- Thanks, pengcheng xiong
[jira] [Commented] (HIVE-8995) Find thread leak in RSC Tests
[ https://issues.apache.org/jira/browse/HIVE-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230254#comment-14230254 ] Marcelo Vanzin commented on HIVE-8995: -- The three threads are from akka; I wonder if the test code is failing to properly shut down clients or the library itself (i.e. call {{SparkClientFactory.stop()}}). Find thread leak in RSC Tests - Key: HIVE-8995 URL: https://issues.apache.org/jira/browse/HIVE-8995 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland I was regenerating output as part of the merge: {noformat} mvn test -Dtest=TestSparkCliDriver -Phadoop-2 -Dtest.output.overwrite=true -Dqfile=annotate_stats_join.q,auto_join0.q,auto_join1.q,auto_join10.q,auto_join11.q,auto_join12.q,auto_join13.q,auto_join14.q,auto_join15.q,auto_join16.q,auto_join17.q,auto_join18.q,auto_join18_multi_distinct.q,auto_join19.q,auto_join2.q,auto_join20.q,auto_join21.q,auto_join22.q,auto_join23.q,auto_join24.q,auto_join26.q,auto_join27.q,auto_join28.q,auto_join29.q,auto_join3.q,auto_join30.q,auto_join31.q,auto_join32.q,auto_join9.q,auto_join_reordering_values.q auto_join_without_localtask.q,auto_smb_mapjoin_14.q,auto_sortmerge_join_1.q,auto_sortmerge_join_10.q,auto_sortmerge_join_11.q,auto_sortmerge_join_12.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,auto_sortmerge_join_2.q,auto_sortmerge_join_3.q,auto_sortmerge_join_4.q,auto_sortmerge_join_5.q,auto_sortmerge_join_6.q,auto_sortmerge_join_7.q,auto_sortmerge_join_8.q,auto_sortmerge_join_9.q,bucket_map_join_1.q,bucket_map_join_2.q,bucket_map_join_tez1.q,bucket_map_join_tez2.q,bucketmapjoin1.q,bucketmapjoin10.q,bucketmapjoin11.q,bucketmapjoin12.q,bucketmapjoin13.q,bucketmapjoin2.q,bucketmapjoin3.q,bucketmapjoin4.q,bucketmapjoin5.q,bucketmapjoin7.q bucketmapjoin8.q,bucketmapjoin9.q,bucketmapjoin_negative.q,bucketmapjoin_negative2.q,bucketmapjoin_negative3.q,column_access_stats.q,cross_join.q,ctas.q,custom_input_output_format.q,groupby4.q,groupby7_noskew_multi_single_reducer.q,groupby_complex_types.q,groupby_complex_types_multi_single_reducer.q,groupby_multi_single_reducer2.q,groupby_multi_single_reducer3.q,groupby_position.q,groupby_sort_1_23.q,groupby_sort_skew_1_23.q,having.q,index_auto_self_join.q,infer_bucket_sort_convert_join.q,innerjoin.q,input12.q,join0.q,join1.q,join11.q,join12.q,join13.q,join14.q,join15.q join17.q,join18.q,join18_multi_distinct.q,join19.q,join2.q,join20.q,join21.q,join22.q,join23.q,join25.q,join26.q,join27.q,join28.q,join29.q,join3.q,join30.q,join31.q,join32.q,join32_lessSize.q,join33.q,join35.q,join36.q,join37.q,join38.q,join39.q,join40.q,join41.q,join9.q,join_alt_syntax.q,join_cond_pushdown_1.q join_cond_pushdown_2.q,join_cond_pushdown_3.q,join_cond_pushdown_4.q,join_cond_pushdown_unqual1.q,join_cond_pushdown_unqual2.q,join_cond_pushdown_unqual3.q,join_cond_pushdown_unqual4.q,join_filters_overlap.q,join_hive_626.q,join_map_ppr.q,join_merge_multi_expressions.q,join_merging.q,join_nullsafe.q,join_rc.q,join_reorder.q,join_reorder2.q,join_reorder3.q,join_reorder4.q,join_star.q,join_thrift.q,join_vc.q,join_view.q,limit_pushdown.q,load_dyn_part13.q,load_dyn_part14.q,louter_join_ppr.q,mapjoin1.q,mapjoin_decimal.q,mapjoin_distinct.q,mapjoin_filter_on_outerjoin.q mapjoin_hook.q,mapjoin_mapjoin.q,mapjoin_memcheck.q,mapjoin_subquery.q,mapjoin_subquery2.q,mapjoin_test_outer.q,mergejoins.q,mergejoins_mixed.q,multi_insert.q,multi_insert_gby.q,multi_insert_gby2.q,multi_insert_gby3.q,multi_insert_lateral_view.q,multi_insert_mixed.q,multi_insert_move_tasks_share_dependencies.q,multi_join_union.q,optimize_nullscan.q,outer_join_ppr.q,parallel.q,parallel_join0.q,parallel_join1.q,parquet_join.q,pcr.q,ppd_gby_join.q,ppd_join.q,ppd_join2.q,ppd_join3.q,ppd_join4.q,ppd_join5.q,ppd_join_filter.q ppd_multi_insert.q,ppd_outer_join1.q,ppd_outer_join2.q,ppd_outer_join3.q,ppd_outer_join4.q,ppd_outer_join5.q,ppd_transform.q,reduce_deduplicate_exclude_join.q,router_join_ppr.q,sample10.q,sample8.q,script_pipe.q,semijoin.q,skewjoin.q,skewjoin_noskew.q,skewjoin_union_remove_1.q,skewjoin_union_remove_2.q,skewjoinopt1.q,skewjoinopt10.q,skewjoinopt11.q,skewjoinopt12.q,skewjoinopt13.q,skewjoinopt14.q,skewjoinopt15.q,skewjoinopt16.q,skewjoinopt17.q,skewjoinopt18.q,skewjoinopt19.q,skewjoinopt2.q,skewjoinopt20.q
[jira] [Issue Comment Deleted] (HIVE-8946) Enable Map Join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8946: -- Comment: was deleted (was: Also, for join38.q I ran the query on CLI local mode with spark.master=local[4], and the result was correct, but running unit test gave different result.) Enable Map Join [Spark Branch] -- Key: HIVE-8946 URL: https://issues.apache.org/jira/browse/HIVE-8946 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Fix For: spark-branch Attachments: HIVE-8946.1-spark.patch, HIVE-8946.2-spark.patch, HIVE-8946.3-spark.patch Since all the related issues have been identified and tracked by related JIRAs, in this JIRA we turn on the map join optimization for Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HIVE-8946) Enable Map Join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8946: -- Comment: was deleted (was: Sorry for the mistake - the above comment should belong to HIVE-8970.) Enable Map Join [Spark Branch] -- Key: HIVE-8946 URL: https://issues.apache.org/jira/browse/HIVE-8946 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Fix For: spark-branch Attachments: HIVE-8946.1-spark.patch, HIVE-8946.2-spark.patch, HIVE-8946.3-spark.patch Since all the related issues have been identified and tracked by related JIRAs, in this JIRA we turn on the map join optimization for Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8851) Broadcast files for small tables via SparkContext.addFile() and SparkFiles.get() [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230262#comment-14230262 ] Hive QA commented on HIVE-8851: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/1268/HIVE-8851.1-spark.patch {color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 7229 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join30 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join31 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_custom_input_output_format org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_position org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join29 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join31 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_empty org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_leftsemijoin_mr org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join_filter org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ptf org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_table_access_keys_stats org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_tez_join_tests org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_tez_joins_explain org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_ptf org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/468/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/468/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-468/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 24 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 1268 - PreCommit-HIVE-SPARK-Build Broadcast files for small tables via SparkContext.addFile() and SparkFiles.get() [Spark Branch] --- Key: HIVE-8851 URL: https://issues.apache.org/jira/browse/HIVE-8851 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-8851.1-spark.patch Currently files generated by SparkHashTableSinkOperator for small tables are written directly on HDFS with a high replication factor. When map join happens, map join operator is going to load these files into hash tables. Since on multiple partitions can be process on the same worker node, reading the same set of files multiple times are not ideal. The improvment can be done by calling SparkContext.addFiles() on these files, and use SparkFiles.getFile() to download them to the worker node just once. Please note that SparkFiles.getFile() is a static method. Code invoking this method needs to be in a static method. This calling method needs to be synchronized because it may get called in different threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8995) Find thread leak in RSC Tests
[ https://issues.apache.org/jira/browse/HIVE-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230264#comment-14230264 ] Brock Noland commented on HIVE-8995: We are not. However that appears to be a JVM wide impact and we had dozens of instances of the three akka threads and we won't be able to call {{SparkClientFactory.stop()}} after each session terminates when running inside HS2 since HS2 will have dozens of sessions concurrently and thousands of sessions over a few weeks. Find thread leak in RSC Tests - Key: HIVE-8995 URL: https://issues.apache.org/jira/browse/HIVE-8995 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland I was regenerating output as part of the merge: {noformat} mvn test -Dtest=TestSparkCliDriver -Phadoop-2 -Dtest.output.overwrite=true -Dqfile=annotate_stats_join.q,auto_join0.q,auto_join1.q,auto_join10.q,auto_join11.q,auto_join12.q,auto_join13.q,auto_join14.q,auto_join15.q,auto_join16.q,auto_join17.q,auto_join18.q,auto_join18_multi_distinct.q,auto_join19.q,auto_join2.q,auto_join20.q,auto_join21.q,auto_join22.q,auto_join23.q,auto_join24.q,auto_join26.q,auto_join27.q,auto_join28.q,auto_join29.q,auto_join3.q,auto_join30.q,auto_join31.q,auto_join32.q,auto_join9.q,auto_join_reordering_values.q auto_join_without_localtask.q,auto_smb_mapjoin_14.q,auto_sortmerge_join_1.q,auto_sortmerge_join_10.q,auto_sortmerge_join_11.q,auto_sortmerge_join_12.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,auto_sortmerge_join_2.q,auto_sortmerge_join_3.q,auto_sortmerge_join_4.q,auto_sortmerge_join_5.q,auto_sortmerge_join_6.q,auto_sortmerge_join_7.q,auto_sortmerge_join_8.q,auto_sortmerge_join_9.q,bucket_map_join_1.q,bucket_map_join_2.q,bucket_map_join_tez1.q,bucket_map_join_tez2.q,bucketmapjoin1.q,bucketmapjoin10.q,bucketmapjoin11.q,bucketmapjoin12.q,bucketmapjoin13.q,bucketmapjoin2.q,bucketmapjoin3.q,bucketmapjoin4.q,bucketmapjoin5.q,bucketmapjoin7.q bucketmapjoin8.q,bucketmapjoin9.q,bucketmapjoin_negative.q,bucketmapjoin_negative2.q,bucketmapjoin_negative3.q,column_access_stats.q,cross_join.q,ctas.q,custom_input_output_format.q,groupby4.q,groupby7_noskew_multi_single_reducer.q,groupby_complex_types.q,groupby_complex_types_multi_single_reducer.q,groupby_multi_single_reducer2.q,groupby_multi_single_reducer3.q,groupby_position.q,groupby_sort_1_23.q,groupby_sort_skew_1_23.q,having.q,index_auto_self_join.q,infer_bucket_sort_convert_join.q,innerjoin.q,input12.q,join0.q,join1.q,join11.q,join12.q,join13.q,join14.q,join15.q join17.q,join18.q,join18_multi_distinct.q,join19.q,join2.q,join20.q,join21.q,join22.q,join23.q,join25.q,join26.q,join27.q,join28.q,join29.q,join3.q,join30.q,join31.q,join32.q,join32_lessSize.q,join33.q,join35.q,join36.q,join37.q,join38.q,join39.q,join40.q,join41.q,join9.q,join_alt_syntax.q,join_cond_pushdown_1.q join_cond_pushdown_2.q,join_cond_pushdown_3.q,join_cond_pushdown_4.q,join_cond_pushdown_unqual1.q,join_cond_pushdown_unqual2.q,join_cond_pushdown_unqual3.q,join_cond_pushdown_unqual4.q,join_filters_overlap.q,join_hive_626.q,join_map_ppr.q,join_merge_multi_expressions.q,join_merging.q,join_nullsafe.q,join_rc.q,join_reorder.q,join_reorder2.q,join_reorder3.q,join_reorder4.q,join_star.q,join_thrift.q,join_vc.q,join_view.q,limit_pushdown.q,load_dyn_part13.q,load_dyn_part14.q,louter_join_ppr.q,mapjoin1.q,mapjoin_decimal.q,mapjoin_distinct.q,mapjoin_filter_on_outerjoin.q mapjoin_hook.q,mapjoin_mapjoin.q,mapjoin_memcheck.q,mapjoin_subquery.q,mapjoin_subquery2.q,mapjoin_test_outer.q,mergejoins.q,mergejoins_mixed.q,multi_insert.q,multi_insert_gby.q,multi_insert_gby2.q,multi_insert_gby3.q,multi_insert_lateral_view.q,multi_insert_mixed.q,multi_insert_move_tasks_share_dependencies.q,multi_join_union.q,optimize_nullscan.q,outer_join_ppr.q,parallel.q,parallel_join0.q,parallel_join1.q,parquet_join.q,pcr.q,ppd_gby_join.q,ppd_join.q,ppd_join2.q,ppd_join3.q,ppd_join4.q,ppd_join5.q,ppd_join_filter.q ppd_multi_insert.q,ppd_outer_join1.q,ppd_outer_join2.q,ppd_outer_join3.q,ppd_outer_join4.q,ppd_outer_join5.q,ppd_transform.q,reduce_deduplicate_exclude_join.q,router_join_ppr.q,sample10.q,sample8.q,script_pipe.q,semijoin.q,skewjoin.q,skewjoin_noskew.q,skewjoin_union_remove_1.q,skewjoin_union_remove_2.q,skewjoinopt1.q,skewjoinopt10.q,skewjoinopt11.q,skewjoinopt12.q,skewjoinopt13.q,skewjoinopt14.q,skewjoinopt15.q,skewjoinopt16.q,skewjoinopt17.q,skewjoinopt18.q,skewjoinopt19.q,skewjoinopt2.q,skewjoinopt20.q
[jira] [Commented] (HIVE-8995) Find thread leak in RSC Tests
[ https://issues.apache.org/jira/browse/HIVE-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230273#comment-14230273 ] Marcelo Vanzin commented on HIVE-8995: -- You don't need to call that method for every session. The pattern here is: * Call {{SparkClientFactory.initialize()}} once * Create / use as many clients as you want * When app shuts down, call {{SparkClientFactory.stop()}} So this should work nicely for HS2 (call initialize during bring up, call stop during shut down). I see {{RemoteHiveSparkClient}} calls initialize; that seems wrong, if my understanding of that class is correct (that it will be instantiated once for each session). Another option is to make {{initialize}} idempotent; right now it will just leak the old akka actor system, which is bad. This should be a trivial change (just add a check for {{initialized}}). Find thread leak in RSC Tests - Key: HIVE-8995 URL: https://issues.apache.org/jira/browse/HIVE-8995 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland I was regenerating output as part of the merge: {noformat} mvn test -Dtest=TestSparkCliDriver -Phadoop-2 -Dtest.output.overwrite=true -Dqfile=annotate_stats_join.q,auto_join0.q,auto_join1.q,auto_join10.q,auto_join11.q,auto_join12.q,auto_join13.q,auto_join14.q,auto_join15.q,auto_join16.q,auto_join17.q,auto_join18.q,auto_join18_multi_distinct.q,auto_join19.q,auto_join2.q,auto_join20.q,auto_join21.q,auto_join22.q,auto_join23.q,auto_join24.q,auto_join26.q,auto_join27.q,auto_join28.q,auto_join29.q,auto_join3.q,auto_join30.q,auto_join31.q,auto_join32.q,auto_join9.q,auto_join_reordering_values.q auto_join_without_localtask.q,auto_smb_mapjoin_14.q,auto_sortmerge_join_1.q,auto_sortmerge_join_10.q,auto_sortmerge_join_11.q,auto_sortmerge_join_12.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,auto_sortmerge_join_2.q,auto_sortmerge_join_3.q,auto_sortmerge_join_4.q,auto_sortmerge_join_5.q,auto_sortmerge_join_6.q,auto_sortmerge_join_7.q,auto_sortmerge_join_8.q,auto_sortmerge_join_9.q,bucket_map_join_1.q,bucket_map_join_2.q,bucket_map_join_tez1.q,bucket_map_join_tez2.q,bucketmapjoin1.q,bucketmapjoin10.q,bucketmapjoin11.q,bucketmapjoin12.q,bucketmapjoin13.q,bucketmapjoin2.q,bucketmapjoin3.q,bucketmapjoin4.q,bucketmapjoin5.q,bucketmapjoin7.q bucketmapjoin8.q,bucketmapjoin9.q,bucketmapjoin_negative.q,bucketmapjoin_negative2.q,bucketmapjoin_negative3.q,column_access_stats.q,cross_join.q,ctas.q,custom_input_output_format.q,groupby4.q,groupby7_noskew_multi_single_reducer.q,groupby_complex_types.q,groupby_complex_types_multi_single_reducer.q,groupby_multi_single_reducer2.q,groupby_multi_single_reducer3.q,groupby_position.q,groupby_sort_1_23.q,groupby_sort_skew_1_23.q,having.q,index_auto_self_join.q,infer_bucket_sort_convert_join.q,innerjoin.q,input12.q,join0.q,join1.q,join11.q,join12.q,join13.q,join14.q,join15.q join17.q,join18.q,join18_multi_distinct.q,join19.q,join2.q,join20.q,join21.q,join22.q,join23.q,join25.q,join26.q,join27.q,join28.q,join29.q,join3.q,join30.q,join31.q,join32.q,join32_lessSize.q,join33.q,join35.q,join36.q,join37.q,join38.q,join39.q,join40.q,join41.q,join9.q,join_alt_syntax.q,join_cond_pushdown_1.q join_cond_pushdown_2.q,join_cond_pushdown_3.q,join_cond_pushdown_4.q,join_cond_pushdown_unqual1.q,join_cond_pushdown_unqual2.q,join_cond_pushdown_unqual3.q,join_cond_pushdown_unqual4.q,join_filters_overlap.q,join_hive_626.q,join_map_ppr.q,join_merge_multi_expressions.q,join_merging.q,join_nullsafe.q,join_rc.q,join_reorder.q,join_reorder2.q,join_reorder3.q,join_reorder4.q,join_star.q,join_thrift.q,join_vc.q,join_view.q,limit_pushdown.q,load_dyn_part13.q,load_dyn_part14.q,louter_join_ppr.q,mapjoin1.q,mapjoin_decimal.q,mapjoin_distinct.q,mapjoin_filter_on_outerjoin.q mapjoin_hook.q,mapjoin_mapjoin.q,mapjoin_memcheck.q,mapjoin_subquery.q,mapjoin_subquery2.q,mapjoin_test_outer.q,mergejoins.q,mergejoins_mixed.q,multi_insert.q,multi_insert_gby.q,multi_insert_gby2.q,multi_insert_gby3.q,multi_insert_lateral_view.q,multi_insert_mixed.q,multi_insert_move_tasks_share_dependencies.q,multi_join_union.q,optimize_nullscan.q,outer_join_ppr.q,parallel.q,parallel_join0.q,parallel_join1.q,parquet_join.q,pcr.q,ppd_gby_join.q,ppd_join.q,ppd_join2.q,ppd_join3.q,ppd_join4.q,ppd_join5.q,ppd_join_filter.q
[jira] [Commented] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230272#comment-14230272 ] Chao commented on HIVE-8970: Also, for join38.q I ran the query on CLI local mode with spark.master=local[4], and the result was correct, but running unit test gave different result. Enable map join optimization only when hive.auto.convert.join is true [Spark Branch] Key: HIVE-8970 URL: https://issues.apache.org/jira/browse/HIVE-8970 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Fix For: spark-branch Attachments: HIVE-8970.1-spark.patch, HIVE-8970.2-spark.patch Right now, in Spark branch we enable MJ without looking at this configuration. The related code in {{SparkMapJoinOptimizer}} is commented out. We should only enable MJ when the flag is true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8995) Find thread leak in RSC Tests
[ https://issues.apache.org/jira/browse/HIVE-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230291#comment-14230291 ] Brock Noland commented on HIVE-8995: bq. You don't need to call that method for every session Yes I was just saying that calling this might not be our problem. bq. I see {{RemoteHiveSparkClient}} calls initialize; This seems like it's the issue. We should change this and throw an exception if it's called twice. Find thread leak in RSC Tests - Key: HIVE-8995 URL: https://issues.apache.org/jira/browse/HIVE-8995 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland I was regenerating output as part of the merge: {noformat} mvn test -Dtest=TestSparkCliDriver -Phadoop-2 -Dtest.output.overwrite=true -Dqfile=annotate_stats_join.q,auto_join0.q,auto_join1.q,auto_join10.q,auto_join11.q,auto_join12.q,auto_join13.q,auto_join14.q,auto_join15.q,auto_join16.q,auto_join17.q,auto_join18.q,auto_join18_multi_distinct.q,auto_join19.q,auto_join2.q,auto_join20.q,auto_join21.q,auto_join22.q,auto_join23.q,auto_join24.q,auto_join26.q,auto_join27.q,auto_join28.q,auto_join29.q,auto_join3.q,auto_join30.q,auto_join31.q,auto_join32.q,auto_join9.q,auto_join_reordering_values.q auto_join_without_localtask.q,auto_smb_mapjoin_14.q,auto_sortmerge_join_1.q,auto_sortmerge_join_10.q,auto_sortmerge_join_11.q,auto_sortmerge_join_12.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,auto_sortmerge_join_2.q,auto_sortmerge_join_3.q,auto_sortmerge_join_4.q,auto_sortmerge_join_5.q,auto_sortmerge_join_6.q,auto_sortmerge_join_7.q,auto_sortmerge_join_8.q,auto_sortmerge_join_9.q,bucket_map_join_1.q,bucket_map_join_2.q,bucket_map_join_tez1.q,bucket_map_join_tez2.q,bucketmapjoin1.q,bucketmapjoin10.q,bucketmapjoin11.q,bucketmapjoin12.q,bucketmapjoin13.q,bucketmapjoin2.q,bucketmapjoin3.q,bucketmapjoin4.q,bucketmapjoin5.q,bucketmapjoin7.q bucketmapjoin8.q,bucketmapjoin9.q,bucketmapjoin_negative.q,bucketmapjoin_negative2.q,bucketmapjoin_negative3.q,column_access_stats.q,cross_join.q,ctas.q,custom_input_output_format.q,groupby4.q,groupby7_noskew_multi_single_reducer.q,groupby_complex_types.q,groupby_complex_types_multi_single_reducer.q,groupby_multi_single_reducer2.q,groupby_multi_single_reducer3.q,groupby_position.q,groupby_sort_1_23.q,groupby_sort_skew_1_23.q,having.q,index_auto_self_join.q,infer_bucket_sort_convert_join.q,innerjoin.q,input12.q,join0.q,join1.q,join11.q,join12.q,join13.q,join14.q,join15.q join17.q,join18.q,join18_multi_distinct.q,join19.q,join2.q,join20.q,join21.q,join22.q,join23.q,join25.q,join26.q,join27.q,join28.q,join29.q,join3.q,join30.q,join31.q,join32.q,join32_lessSize.q,join33.q,join35.q,join36.q,join37.q,join38.q,join39.q,join40.q,join41.q,join9.q,join_alt_syntax.q,join_cond_pushdown_1.q join_cond_pushdown_2.q,join_cond_pushdown_3.q,join_cond_pushdown_4.q,join_cond_pushdown_unqual1.q,join_cond_pushdown_unqual2.q,join_cond_pushdown_unqual3.q,join_cond_pushdown_unqual4.q,join_filters_overlap.q,join_hive_626.q,join_map_ppr.q,join_merge_multi_expressions.q,join_merging.q,join_nullsafe.q,join_rc.q,join_reorder.q,join_reorder2.q,join_reorder3.q,join_reorder4.q,join_star.q,join_thrift.q,join_vc.q,join_view.q,limit_pushdown.q,load_dyn_part13.q,load_dyn_part14.q,louter_join_ppr.q,mapjoin1.q,mapjoin_decimal.q,mapjoin_distinct.q,mapjoin_filter_on_outerjoin.q mapjoin_hook.q,mapjoin_mapjoin.q,mapjoin_memcheck.q,mapjoin_subquery.q,mapjoin_subquery2.q,mapjoin_test_outer.q,mergejoins.q,mergejoins_mixed.q,multi_insert.q,multi_insert_gby.q,multi_insert_gby2.q,multi_insert_gby3.q,multi_insert_lateral_view.q,multi_insert_mixed.q,multi_insert_move_tasks_share_dependencies.q,multi_join_union.q,optimize_nullscan.q,outer_join_ppr.q,parallel.q,parallel_join0.q,parallel_join1.q,parquet_join.q,pcr.q,ppd_gby_join.q,ppd_join.q,ppd_join2.q,ppd_join3.q,ppd_join4.q,ppd_join5.q,ppd_join_filter.q ppd_multi_insert.q,ppd_outer_join1.q,ppd_outer_join2.q,ppd_outer_join3.q,ppd_outer_join4.q,ppd_outer_join5.q,ppd_transform.q,reduce_deduplicate_exclude_join.q,router_join_ppr.q,sample10.q,sample8.q,script_pipe.q,semijoin.q,skewjoin.q,skewjoin_noskew.q,skewjoin_union_remove_1.q,skewjoin_union_remove_2.q,skewjoinopt1.q,skewjoinopt10.q,skewjoinopt11.q,skewjoinopt12.q,skewjoinopt13.q,skewjoinopt14.q,skewjoinopt15.q,skewjoinopt16.q,skewjoinopt17.q,skewjoinopt18.q,skewjoinopt19.q,skewjoinopt2.q,skewjoinopt20.q
[jira] [Comment Edited] (HIVE-8995) Find thread leak in RSC Tests
[ https://issues.apache.org/jira/browse/HIVE-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230291#comment-14230291 ] Brock Noland edited comment on HIVE-8995 at 12/1/14 7:31 PM: - bq. You don't need to call that method for every session Yes I was just saying that calling this might not be our problem. bq. I see {{RemoteHiveSparkClient}} calls initialize; This seems like it's the issue. We should change this and throw an exception if it's called twice. was (Author: brocknoland): bq. You don't need to call that method for every session Yes I was just saying that calling this might not be our problem. bq. I see {{RemoteHiveSparkClient}} calls initialize; This seems like it's the issue. We should change this and throw an exception if it's called twice. Find thread leak in RSC Tests - Key: HIVE-8995 URL: https://issues.apache.org/jira/browse/HIVE-8995 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland I was regenerating output as part of the merge: {noformat} mvn test -Dtest=TestSparkCliDriver -Phadoop-2 -Dtest.output.overwrite=true -Dqfile=annotate_stats_join.q,auto_join0.q,auto_join1.q,auto_join10.q,auto_join11.q,auto_join12.q,auto_join13.q,auto_join14.q,auto_join15.q,auto_join16.q,auto_join17.q,auto_join18.q,auto_join18_multi_distinct.q,auto_join19.q,auto_join2.q,auto_join20.q,auto_join21.q,auto_join22.q,auto_join23.q,auto_join24.q,auto_join26.q,auto_join27.q,auto_join28.q,auto_join29.q,auto_join3.q,auto_join30.q,auto_join31.q,auto_join32.q,auto_join9.q,auto_join_reordering_values.q auto_join_without_localtask.q,auto_smb_mapjoin_14.q,auto_sortmerge_join_1.q,auto_sortmerge_join_10.q,auto_sortmerge_join_11.q,auto_sortmerge_join_12.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,auto_sortmerge_join_2.q,auto_sortmerge_join_3.q,auto_sortmerge_join_4.q,auto_sortmerge_join_5.q,auto_sortmerge_join_6.q,auto_sortmerge_join_7.q,auto_sortmerge_join_8.q,auto_sortmerge_join_9.q,bucket_map_join_1.q,bucket_map_join_2.q,bucket_map_join_tez1.q,bucket_map_join_tez2.q,bucketmapjoin1.q,bucketmapjoin10.q,bucketmapjoin11.q,bucketmapjoin12.q,bucketmapjoin13.q,bucketmapjoin2.q,bucketmapjoin3.q,bucketmapjoin4.q,bucketmapjoin5.q,bucketmapjoin7.q bucketmapjoin8.q,bucketmapjoin9.q,bucketmapjoin_negative.q,bucketmapjoin_negative2.q,bucketmapjoin_negative3.q,column_access_stats.q,cross_join.q,ctas.q,custom_input_output_format.q,groupby4.q,groupby7_noskew_multi_single_reducer.q,groupby_complex_types.q,groupby_complex_types_multi_single_reducer.q,groupby_multi_single_reducer2.q,groupby_multi_single_reducer3.q,groupby_position.q,groupby_sort_1_23.q,groupby_sort_skew_1_23.q,having.q,index_auto_self_join.q,infer_bucket_sort_convert_join.q,innerjoin.q,input12.q,join0.q,join1.q,join11.q,join12.q,join13.q,join14.q,join15.q join17.q,join18.q,join18_multi_distinct.q,join19.q,join2.q,join20.q,join21.q,join22.q,join23.q,join25.q,join26.q,join27.q,join28.q,join29.q,join3.q,join30.q,join31.q,join32.q,join32_lessSize.q,join33.q,join35.q,join36.q,join37.q,join38.q,join39.q,join40.q,join41.q,join9.q,join_alt_syntax.q,join_cond_pushdown_1.q join_cond_pushdown_2.q,join_cond_pushdown_3.q,join_cond_pushdown_4.q,join_cond_pushdown_unqual1.q,join_cond_pushdown_unqual2.q,join_cond_pushdown_unqual3.q,join_cond_pushdown_unqual4.q,join_filters_overlap.q,join_hive_626.q,join_map_ppr.q,join_merge_multi_expressions.q,join_merging.q,join_nullsafe.q,join_rc.q,join_reorder.q,join_reorder2.q,join_reorder3.q,join_reorder4.q,join_star.q,join_thrift.q,join_vc.q,join_view.q,limit_pushdown.q,load_dyn_part13.q,load_dyn_part14.q,louter_join_ppr.q,mapjoin1.q,mapjoin_decimal.q,mapjoin_distinct.q,mapjoin_filter_on_outerjoin.q mapjoin_hook.q,mapjoin_mapjoin.q,mapjoin_memcheck.q,mapjoin_subquery.q,mapjoin_subquery2.q,mapjoin_test_outer.q,mergejoins.q,mergejoins_mixed.q,multi_insert.q,multi_insert_gby.q,multi_insert_gby2.q,multi_insert_gby3.q,multi_insert_lateral_view.q,multi_insert_mixed.q,multi_insert_move_tasks_share_dependencies.q,multi_join_union.q,optimize_nullscan.q,outer_join_ppr.q,parallel.q,parallel_join0.q,parallel_join1.q,parquet_join.q,pcr.q,ppd_gby_join.q,ppd_join.q,ppd_join2.q,ppd_join3.q,ppd_join4.q,ppd_join5.q,ppd_join_filter.q ppd_multi_insert.q,ppd_outer_join1.q,ppd_outer_join2.q,ppd_outer_join3.q,ppd_outer_join4.q,ppd_outer_join5.q,ppd_transform.q,reduce_deduplicate_exclude_join.q,router_join_ppr.q,sample10.q,sample8.q,script_pipe.q,semijoin.q,skewjoin.q,skewjoin_noskew.q,skewjoin_union_remove_1.q,skewjoin_union_remove_2.q,skewjoinopt1.q,skewjoinopt10.q,skewjoinopt11.q,skewjoinopt12.q,skewjoinopt13.q,skewjoinopt14.q,skewjoinopt15.q,skewjoinopt16.q,skewjoinopt17.q,skewjoinopt18.q,skewjoinopt19.q,skewjoinopt2.q,skewjoinopt20.q
[jira] [Updated] (HIVE-8851) Broadcast files for small tables via SparkContext.addFile() and SparkFiles.get() [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-8851: -- Status: Open (was: Patch Available) Broadcast files for small tables via SparkContext.addFile() and SparkFiles.get() [Spark Branch] --- Key: HIVE-8851 URL: https://issues.apache.org/jira/browse/HIVE-8851 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-8851.1-spark.patch Currently files generated by SparkHashTableSinkOperator for small tables are written directly on HDFS with a high replication factor. When map join happens, map join operator is going to load these files into hash tables. Since on multiple partitions can be process on the same worker node, reading the same set of files multiple times are not ideal. The improvment can be done by calling SparkContext.addFiles() on these files, and use SparkFiles.getFile() to download them to the worker node just once. Please note that SparkFiles.getFile() is a static method. Code invoking this method needs to be in a static method. This calling method needs to be synchronized because it may get called in different threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
0.15 release
Hi, In 2014 we did two large releases. Thank you very much to the RM's for pushing those out! I've found that Apache projects gain traction through releasing often, thus I think we should aim to increase the rate of releases in 2015. (Not that I cannot complain since I did not volunteer to RM any release.) As such I'd like to volunteer as RM for the 0.15 release. Cheers, Brock
[jira] [Updated] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-8970: --- Attachment: HIVE-8970.3-spark.patch Excluded the following tests from this patch: {noformat} join38.q join_literals.q join_nullsafe.q subquery_in.q ppd_join4.q ppd_multi_insert.q {noformat} Their results are different from MR. I will create follow-up JIRA to address the issue. Enable map join optimization only when hive.auto.convert.join is true [Spark Branch] Key: HIVE-8970 URL: https://issues.apache.org/jira/browse/HIVE-8970 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Fix For: spark-branch Attachments: HIVE-8970.1-spark.patch, HIVE-8970.2-spark.patch, HIVE-8970.3-spark.patch Right now, in Spark branch we enable MJ without looking at this configuration. The related code in {{SparkMapJoinOptimizer}} is commented out. We should only enable MJ when the flag is true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-860) Persistent distributed cache
[ https://issues.apache.org/jira/browse/HIVE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230346#comment-14230346 ] Hive QA commented on HIVE-860: -- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12684389/HIVE-860.4.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1939/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1939/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1939/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult [localFile=/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1939/failed/TestParseNegative, remoteFile=/home/hiveptest/54.167.108.186-hiveptest-2/logs/, getExitCode()=12, getException()=null, getUser()=hiveptest, getHost()=54.167.108.186, getInstance()=2]: 'Address 54.167.108.186 maps to ec2-54-167-108-186.compute-1.amazonaws.com, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT! receiving incremental file list ./ hive.log 0 0%0.00kB/s0:00:00 41320448 1% 39.41MB/s0:01:14 85950464 2% 41.00MB/s0:01:10 130056192 4% 41.37MB/s0:01:08 174522368 5% 41.63MB/s0:01:07 218693632 7% 42.31MB/s0:01:05 262471680 8% 42.11MB/s0:01:04 306970624 10% 42.16MB/s0:01:03 350945280 11% 42.01MB/s0:01:02 384303104 12% 38.22MB/s0:01:08 395575296 12% 30.30MB/s0:01:25 425820160 13% 27.08MB/s0:01:34 rsync: write failed on /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1939/failed/TestParseNegative/hive.log: No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(301) [receiver=3.0.6] rsync: connection unexpectedly closed (107 bytes received so far) [generator] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [generator=3.0.6] Address 54.167.108.186 maps to ec2-54-167-108-186.compute-1.amazonaws.com, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT! receiving incremental file list ./ hive.log 0 0%0.00kB/s0:00:00 rsync: write failed on /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1939/failed/TestParseNegative/hive.log: No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(301) [receiver=3.0.6] rsync: connection unexpectedly closed (107 bytes received so far) [generator] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [generator=3.0.6] Address 54.167.108.186 maps to ec2-54-167-108-186.compute-1.amazonaws.com, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT! receiving incremental file list ./ hive.log 0 0%0.00kB/s0:00:00 rsync: write failed on /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1939/failed/TestParseNegative/hive.log: No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(301) [receiver=3.0.6] rsync: connection unexpectedly closed (107 bytes received so far) [generator] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [generator=3.0.6] Address 54.167.108.186 maps to ec2-54-167-108-186.compute-1.amazonaws.com, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT! receiving incremental file list ./ hive.log 0 0%0.00kB/s0:00:00 rsync: write failed on /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1939/failed/TestParseNegative/hive.log: No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(301) [receiver=3.0.6] rsync: connection unexpectedly closed (107 bytes received so far) [generator] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [generator=3.0.6] Address 54.167.108.186 maps to ec2-54-167-108-186.compute-1.amazonaws.com, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT! receiving incremental file list ./ hive.log 0 0%0.00kB/s0:00:00 rsync: write failed on /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1939/failed/TestParseNegative/hive.log: No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(301) [receiver=3.0.6] rsync: connection unexpectedly closed (107 bytes received so far) [generator] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [generator=3.0.6] ' {noformat} This message is automatically generated. ATTACHMENT ID: 12684389 - PreCommit-HIVE-TRUNK-Build Persistent distributed cache
Re: Review Request 28372: HIVE-8950: Add support in ParquetHiveSerde to create table schema from a parquet file
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28372/#review63424 --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/28372/#comment105680 What does this change do? ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ColInfoFromParquetFile.java https://reviews.apache.org/r/28372/#comment105672 I don't think ColInfoFromParquetFile is a great name for this class. What you've implemented here is a schema converter, like HiveSchemaConverter but from MessageType to TypeInfo rather than the other way around. I think this should either go in HiveSchemaConverter or a new class, like ParquetToHiveSchemaConverter. This update would also help clarify the methods exposed. I think that the primary method this should expose is: StructType convert(GroupType parquetSchema) File reading should be done elsewhere if it is necessary. Also, I think this should return a StructType instead of a Pair with Strings. That would be a lot cleaner and would avoid the custom type string building immediately followed by parsing those type strings. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java https://reviews.apache.org/r/28372/#comment105675 Could you please undo the import moves? I like to avoid non-functional changes. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java https://reviews.apache.org/r/28372/#comment105676 Is it possible to avoid setting a file property? There are lots of cases where a file in the dataset would be removed so this is a brittle method of configuring the table. Ideally, we would check for the LIST_COLUMN_TYPES property and if we don't find it, conver the schema of the first file that we find. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java https://reviews.apache.org/r/28372/#comment105678 This non-functional change is fine with me because it fixes the style in a method you're editing. But if you add a newline after this conditional, then you should also add one after line 149. - Ryan Blue On Nov. 26, 2014, 5:08 p.m., Ashish Singh wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28372/ --- (Updated Nov. 26, 2014, 5:08 p.m.) Review request for hive. Bugs: HIVE-8950 https://issues.apache.org/jira/browse/HIVE-8950 Repository: hive-git Description --- HIVE-8950: Add support in ParquetHiveSerde to create table schema from a parquet file Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fafd78e63e9b41c9fdb0e017b567dc719d151784 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ColInfoFromParquetFile.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe736fcf9d3715f03eed9885c299a7aa040dd ql/src/test/queries/clientpositive/parquet_array_of_multi_field_struct_gen_schema.q PRE-CREATION ql/src/test/queries/clientpositive/parquet_array_of_optional_elements_gen_schema.q PRE-CREATION ql/src/test/queries/clientpositive/parquet_array_of_required_elements_gen_schema.q PRE-CREATION ql/src/test/queries/clientpositive/parquet_array_of_single_field_struct_gen_schema.q PRE-CREATION ql/src/test/queries/clientpositive/parquet_array_of_structs_gen_schema.q PRE-CREATION ql/src/test/queries/clientpositive/parquet_array_of_unannotated_groups_gen_schema.q PRE-CREATION ql/src/test/queries/clientpositive/parquet_array_of_unannotated_primitives_gen_schema.q PRE-CREATION ql/src/test/queries/clientpositive/parquet_avro_array_of_primitives_gen_schema.q PRE-CREATION ql/src/test/queries/clientpositive/parquet_avro_array_of_single_field_struct_gen_schema.q PRE-CREATION ql/src/test/queries/clientpositive/parquet_decimal_gen_schema.q PRE-CREATION ql/src/test/queries/clientpositive/parquet_thrift_array_of_primitives_gen_schema.q PRE-CREATION ql/src/test/queries/clientpositive/parquet_thrift_array_of_single_field_struct_gen_schema.q PRE-CREATION ql/src/test/results/clientpositive/parquet_array_of_multi_field_struct_gen_schema.q.out PRE-CREATION ql/src/test/results/clientpositive/parquet_array_of_optional_elements_gen_schema.q.out PRE-CREATION ql/src/test/results/clientpositive/parquet_array_of_required_elements_gen_schema.q.out PRE-CREATION ql/src/test/results/clientpositive/parquet_array_of_single_field_struct_gen_schema.q.out PRE-CREATION ql/src/test/results/clientpositive/parquet_array_of_structs_gen_schema.q.out PRE-CREATION
[jira] [Commented] (HIVE-8981) Not a directory error in mapjoin_hook.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230375#comment-14230375 ] Chao commented on HIVE-8981: [~szehon] I just ran unit test on this one and the result looks correct to me. Any idea on how to reproduce this issue? Thanks. Not a directory error in mapjoin_hook.q [Spark Branch] -- Key: HIVE-8981 URL: https://issues.apache.org/jira/browse/HIVE-8981 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Environment: Using remote-spark context with spark-master=local-cluster [2,2,1024] Reporter: Szehon Ho Assignee: Chao Hits the following exception: {noformat} 2014-11-26 15:17:11,728 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - 14/11/26 15:17:11 WARN TaskSetManager: Lost task 0.0 in stage 8.0 (TID 18, 172.16.3.52): java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container 2014-11-26 15:17:11,728 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:160) 2014-11-26 15:17:11,728 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:28) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at scala.collection.Iterator$class.foreach(Iterator.scala:727) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.scheduler.Task.run(Task.scala:56) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at java.lang.Thread.run(Thread.java:744) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl
Re: Review Request 28510: HIVE-8974
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28510/#review63433 --- Ship it! Ship It! - Julian Hyde On Nov. 27, 2014, 2:37 p.m., Jesús Camacho Rodríguez wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28510/ --- (Updated Nov. 27, 2014, 2:37 p.m.) Review request for hive, John Pullokkaran and Julian Hyde. Bugs: HIVE-8974 https://issues.apache.org/jira/browse/HIVE-8974 Repository: hive-git Description --- Upgrade to Calcite 1.0.0-SNAPSHOT Diffs - pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/HiveDefaultRelMetadataProvider.java e9e052ffe8759fa9c49377c58d41450feee0b126 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/HiveOptiqUtil.java 80f657e9b1e7e9e965e6814ae76de78316367135 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/HiveTypeSystemImpl.java 1bc5a2cfca071ea02a446ae517481f927193f23c ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/OptiqSemanticException.java d2b08fa64b868942b7636df171ed89f0081f7253 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/RelOptHiveTable.java 080d27fa873f071fb2e0f7932ad26819b79d0477 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/TraitsUtil.java 4b44a28ca77540fd643fc03b89dcb4b2155d081a ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/cost/HiveCost.java 72fe5d6f26d0fd9a34c8e89be3040cce4593fd4a ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/cost/HiveCostUtil.java 7436f12f662542c41e71a7fee37179e35e4e2553 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/cost/HiveVolcanoPlanner.java 5deb801649f47e0629b3583ef57c62d4a4699f78 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveAggregateRel.java fc198958735e12cb3503a0b4c486d8328a10a2fa ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveFilterRel.java 8b850463ac1c3270163725f876404449ef8dc5f9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveJoinRel.java 3d6aa848cd4c83ec8eb22f7df449911d67a53b9b ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveLimitRel.java f8755d0175c10e5b5461649773bf44abe998b44e ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveProjectRel.java 7b434ea58451bef6a6566eb241933843ee855606 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveRel.java 4738c4ac2d33cd15d2db7fe4b8336e1f59dd5212 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveSortRel.java f85363d50c1c3eb9cef39072106057669454d4da ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveTableScanRel.java bd66459def099df6432f344a9d8439deef09daa6 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveUnionRel.java d34fe9540e239c13f6bd23894056305c0c402e0d ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/rules/HiveMergeProjectRule.java d6581e64fc8ea183666ea6c91397378456461088 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/rules/HivePartitionPrunerRule.java ee19a6cbab0597242214e915745631f76214f70f ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/rules/HivePushFilterPastJoinRule.java 1c483eabcc1aa43cc80d7b71e21a4ae4d30a7e12 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/rules/PartitionPruner.java bdc8373877c1684855d256c9d45743f383fc7615 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/FilterSelectivityEstimator.java 28bf2ad506656b78894467c30364d751b180676e ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/HiveRelMdDistinctRowCount.java 4be57b110c1a45819467d55e8a69e5529989c8f6 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/HiveRelMdRowCount.java 8c7f643940b74dd7743635c3eaa046d52d41346f ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/HiveRelMdSelectivity.java 49d2ee5a67b72fbf6134ce71de1d7260069cd16f ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/HiveRelMdUniqueKeys.java c3c8bdd2466b0f46d49437fcf8d49dbb689cfcda ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTBuilder.java 58320c73aafbfeec025f52ee813b3cfd06fa0821 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTConverter.java a217d70e48da0835fed3565ba510bcc9e86c0fa1 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ExprNodeConverter.java 65c6322d68ef234fbf55a4a36b4cb47e69c30cac
[jira] [Commented] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230427#comment-14230427 ] Xuefu Zhang commented on HIVE-8970: --- [~csun], just to clarify, w/o your patch here, do these tests give correct result? Do they give correct result when master=local[4]? Basically I'm unclear if the current golden files are correct. Enable map join optimization only when hive.auto.convert.join is true [Spark Branch] Key: HIVE-8970 URL: https://issues.apache.org/jira/browse/HIVE-8970 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Fix For: spark-branch Attachments: HIVE-8970.1-spark.patch, HIVE-8970.2-spark.patch, HIVE-8970.3-spark.patch Right now, in Spark branch we enable MJ without looking at this configuration. The related code in {{SparkMapJoinOptimizer}} is commented out. We should only enable MJ when the flag is true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8995) Find thread leak in RSC Tests
[ https://issues.apache.org/jira/browse/HIVE-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang reassigned HIVE-8995: - Assignee: Rui Li [~ruili], could you take a look at this? Thanks. Find thread leak in RSC Tests - Key: HIVE-8995 URL: https://issues.apache.org/jira/browse/HIVE-8995 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Rui Li I was regenerating output as part of the merge: {noformat} mvn test -Dtest=TestSparkCliDriver -Phadoop-2 -Dtest.output.overwrite=true -Dqfile=annotate_stats_join.q,auto_join0.q,auto_join1.q,auto_join10.q,auto_join11.q,auto_join12.q,auto_join13.q,auto_join14.q,auto_join15.q,auto_join16.q,auto_join17.q,auto_join18.q,auto_join18_multi_distinct.q,auto_join19.q,auto_join2.q,auto_join20.q,auto_join21.q,auto_join22.q,auto_join23.q,auto_join24.q,auto_join26.q,auto_join27.q,auto_join28.q,auto_join29.q,auto_join3.q,auto_join30.q,auto_join31.q,auto_join32.q,auto_join9.q,auto_join_reordering_values.q auto_join_without_localtask.q,auto_smb_mapjoin_14.q,auto_sortmerge_join_1.q,auto_sortmerge_join_10.q,auto_sortmerge_join_11.q,auto_sortmerge_join_12.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,auto_sortmerge_join_2.q,auto_sortmerge_join_3.q,auto_sortmerge_join_4.q,auto_sortmerge_join_5.q,auto_sortmerge_join_6.q,auto_sortmerge_join_7.q,auto_sortmerge_join_8.q,auto_sortmerge_join_9.q,bucket_map_join_1.q,bucket_map_join_2.q,bucket_map_join_tez1.q,bucket_map_join_tez2.q,bucketmapjoin1.q,bucketmapjoin10.q,bucketmapjoin11.q,bucketmapjoin12.q,bucketmapjoin13.q,bucketmapjoin2.q,bucketmapjoin3.q,bucketmapjoin4.q,bucketmapjoin5.q,bucketmapjoin7.q bucketmapjoin8.q,bucketmapjoin9.q,bucketmapjoin_negative.q,bucketmapjoin_negative2.q,bucketmapjoin_negative3.q,column_access_stats.q,cross_join.q,ctas.q,custom_input_output_format.q,groupby4.q,groupby7_noskew_multi_single_reducer.q,groupby_complex_types.q,groupby_complex_types_multi_single_reducer.q,groupby_multi_single_reducer2.q,groupby_multi_single_reducer3.q,groupby_position.q,groupby_sort_1_23.q,groupby_sort_skew_1_23.q,having.q,index_auto_self_join.q,infer_bucket_sort_convert_join.q,innerjoin.q,input12.q,join0.q,join1.q,join11.q,join12.q,join13.q,join14.q,join15.q join17.q,join18.q,join18_multi_distinct.q,join19.q,join2.q,join20.q,join21.q,join22.q,join23.q,join25.q,join26.q,join27.q,join28.q,join29.q,join3.q,join30.q,join31.q,join32.q,join32_lessSize.q,join33.q,join35.q,join36.q,join37.q,join38.q,join39.q,join40.q,join41.q,join9.q,join_alt_syntax.q,join_cond_pushdown_1.q join_cond_pushdown_2.q,join_cond_pushdown_3.q,join_cond_pushdown_4.q,join_cond_pushdown_unqual1.q,join_cond_pushdown_unqual2.q,join_cond_pushdown_unqual3.q,join_cond_pushdown_unqual4.q,join_filters_overlap.q,join_hive_626.q,join_map_ppr.q,join_merge_multi_expressions.q,join_merging.q,join_nullsafe.q,join_rc.q,join_reorder.q,join_reorder2.q,join_reorder3.q,join_reorder4.q,join_star.q,join_thrift.q,join_vc.q,join_view.q,limit_pushdown.q,load_dyn_part13.q,load_dyn_part14.q,louter_join_ppr.q,mapjoin1.q,mapjoin_decimal.q,mapjoin_distinct.q,mapjoin_filter_on_outerjoin.q mapjoin_hook.q,mapjoin_mapjoin.q,mapjoin_memcheck.q,mapjoin_subquery.q,mapjoin_subquery2.q,mapjoin_test_outer.q,mergejoins.q,mergejoins_mixed.q,multi_insert.q,multi_insert_gby.q,multi_insert_gby2.q,multi_insert_gby3.q,multi_insert_lateral_view.q,multi_insert_mixed.q,multi_insert_move_tasks_share_dependencies.q,multi_join_union.q,optimize_nullscan.q,outer_join_ppr.q,parallel.q,parallel_join0.q,parallel_join1.q,parquet_join.q,pcr.q,ppd_gby_join.q,ppd_join.q,ppd_join2.q,ppd_join3.q,ppd_join4.q,ppd_join5.q,ppd_join_filter.q ppd_multi_insert.q,ppd_outer_join1.q,ppd_outer_join2.q,ppd_outer_join3.q,ppd_outer_join4.q,ppd_outer_join5.q,ppd_transform.q,reduce_deduplicate_exclude_join.q,router_join_ppr.q,sample10.q,sample8.q,script_pipe.q,semijoin.q,skewjoin.q,skewjoin_noskew.q,skewjoin_union_remove_1.q,skewjoin_union_remove_2.q,skewjoinopt1.q,skewjoinopt10.q,skewjoinopt11.q,skewjoinopt12.q,skewjoinopt13.q,skewjoinopt14.q,skewjoinopt15.q,skewjoinopt16.q,skewjoinopt17.q,skewjoinopt18.q,skewjoinopt19.q,skewjoinopt2.q,skewjoinopt20.q skewjoinopt3.q,skewjoinopt4.q,skewjoinopt5.q,skewjoinopt6.q,skewjoinopt7.q,skewjoinopt8.q,skewjoinopt9.q,smb_mapjoin9.q,smb_mapjoin_1.q,smb_mapjoin_10.q,smb_mapjoin_13.q,smb_mapjoin_14.q,smb_mapjoin_15.q,smb_mapjoin_16.q,smb_mapjoin_17.q,smb_mapjoin_2.q,smb_mapjoin_25.q,smb_mapjoin_3.q,smb_mapjoin_4.q,smb_mapjoin_5.q,smb_mapjoin_6.q,smb_mapjoin_7.q,sort_merge_join_desc_1.q,sort_merge_join_desc_2.q,sort_merge_join_desc_3.q,sort_merge_join_desc_4.q,sort_merge_join_desc_5.q,sort_merge_join_desc_6.q,sort_merge_join_desc_7.q,sort_merge_join_desc_8.q
[jira] [Commented] (HIVE-8981) Not a directory error in mapjoin_hook.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230465#comment-14230465 ] Szehon Ho commented on HIVE-8981: - That's interesting, maybe it went away with some of the recent checkins.. I guess we'll keep an eye out if it happens again. Not a directory error in mapjoin_hook.q [Spark Branch] -- Key: HIVE-8981 URL: https://issues.apache.org/jira/browse/HIVE-8981 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Environment: Using remote-spark context with spark-master=local-cluster [2,2,1024] Reporter: Szehon Ho Assignee: Chao Hits the following exception: {noformat} 2014-11-26 15:17:11,728 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - 14/11/26 15:17:11 WARN TaskSetManager: Lost task 0.0 in stage 8.0 (TID 18, 172.16.3.52): java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container 2014-11-26 15:17:11,728 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:160) 2014-11-26 15:17:11,728 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:28) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at scala.collection.Iterator$class.foreach(Iterator.scala:727) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.scheduler.Task.run(Task.scala:56) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at java.lang.Thread.run(Thread.java:744) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl
[jira] [Commented] (HIVE-8991) Fix custom_input_output_format [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230468#comment-14230468 ] Xuefu Zhang commented on HIVE-8991: --- [~vanzin], this doesn't block anything, and so let's do it in the right way. In the meantime, does it make sense for you to take this JIRA while you're doing the research? Thanks. Fix custom_input_output_format [Spark Branch] - Key: HIVE-8991 URL: https://issues.apache.org/jira/browse/HIVE-8991 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8991.1-spark.patch After HIVE-8836, {{custom_input_output_format}} fails because of missing hive-it-util in remote driver's class path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230470#comment-14230470 ] Chao commented on HIVE-8970: Yes, I believe so. When I enable mapjoin, I compared the unit test results against the previous results in the spark branch, which was previously compared against the MR results. Enable map join optimization only when hive.auto.convert.join is true [Spark Branch] Key: HIVE-8970 URL: https://issues.apache.org/jira/browse/HIVE-8970 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Fix For: spark-branch Attachments: HIVE-8970.1-spark.patch, HIVE-8970.2-spark.patch, HIVE-8970.3-spark.patch Right now, in Spark branch we enable MJ without looking at this configuration. The related code in {{SparkMapJoinOptimizer}} is commented out. We should only enable MJ when the flag is true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8957) Remote spark context needs to clean up itself in case of connection timeout [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8957: -- Status: Open (was: Patch Available) Remote spark context needs to clean up itself in case of connection timeout [Spark Branch] -- Key: HIVE-8957 URL: https://issues.apache.org/jira/browse/HIVE-8957 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-8957.1-spark.patch In the current SparkClient implementation (class SparkClientImpl), the constructor does some initialization and in the end waits for the remote driver to connect. In case of timeout, it just throws an exception without cleaning itself. The cleanup is necessary to release system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8957) Remote spark context needs to clean up itself in case of connection timeout [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230473#comment-14230473 ] Xuefu Zhang commented on HIVE-8957: --- [~vanzin], would you mind owning the JIRA for now until you figure out a solution? Remote spark context needs to clean up itself in case of connection timeout [Spark Branch] -- Key: HIVE-8957 URL: https://issues.apache.org/jira/browse/HIVE-8957 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-8957.1-spark.patch In the current SparkClient implementation (class SparkClientImpl), the constructor does some initialization and in the end waits for the remote driver to connect. In case of timeout, it just throws an exception without cleaning itself. The cleanup is necessary to release system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8957) Remote spark context needs to clean up itself in case of connection timeout [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230478#comment-14230478 ] Marcelo Vanzin commented on HIVE-8957: -- If you don't mind the bug remaining unattended for several days, sure. I have my hands full with all sorts of other things at the moment. Remote spark context needs to clean up itself in case of connection timeout [Spark Branch] -- Key: HIVE-8957 URL: https://issues.apache.org/jira/browse/HIVE-8957 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-8957.1-spark.patch In the current SparkClient implementation (class SparkClientImpl), the constructor does some initialization and in the end waits for the remote driver to connect. In case of timeout, it just throws an exception without cleaning itself. The cleanup is necessary to release system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8981) Not a directory error in mapjoin_hook.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230486#comment-14230486 ] Xuefu Zhang commented on HIVE-8981: --- Yeah, the test seems passed in the latest test run: https://issues.apache.org/jira/browse/HIVE-8998?focusedCommentId=14229321page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14229321 Closing this for now. Not a directory error in mapjoin_hook.q [Spark Branch] -- Key: HIVE-8981 URL: https://issues.apache.org/jira/browse/HIVE-8981 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Environment: Using remote-spark context with spark-master=local-cluster [2,2,1024] Reporter: Szehon Ho Assignee: Chao Hits the following exception: {noformat} 2014-11-26 15:17:11,728 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - 14/11/26 15:17:11 WARN TaskSetManager: Lost task 0.0 in stage 8.0 (TID 18, 172.16.3.52): java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container 2014-11-26 15:17:11,728 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:160) 2014-11-26 15:17:11,728 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:28) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at scala.collection.Iterator$class.foreach(Iterator.scala:727) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.scheduler.Task.run(Task.scala:56) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at java.lang.Thread.run(Thread.java:744) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while
[jira] [Resolved] (HIVE-8981) Not a directory error in mapjoin_hook.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-8981. --- Resolution: Cannot Reproduce Not a directory error in mapjoin_hook.q [Spark Branch] -- Key: HIVE-8981 URL: https://issues.apache.org/jira/browse/HIVE-8981 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Environment: Using remote-spark context with spark-master=local-cluster [2,2,1024] Reporter: Szehon Ho Assignee: Chao Hits the following exception: {noformat} 2014-11-26 15:17:11,728 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - 14/11/26 15:17:11 WARN TaskSetManager: Lost task 0.0 in stage 8.0 (TID 18, 172.16.3.52): java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container 2014-11-26 15:17:11,728 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:160) 2014-11-26 15:17:11,728 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:28) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at scala.collection.Iterator$class.foreach(Iterator.scala:727) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.scheduler.Task.run(Task.scala:56) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at java.lang.Thread.run(Thread.java:744) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:100) 2014-11-26 15:17:11,729
[jira] [Commented] (HIVE-8774) CBO: enable groupBy index
[ https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230484#comment-14230484 ] Hive QA commented on HIVE-8774: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12684451/HIVE-8774.10.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6697 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mapjoin_mapjoin org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1940/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1940/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1940/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12684451 - PreCommit-HIVE-TRUNK-Build CBO: enable groupBy index - Key: HIVE-8774 URL: https://issues.apache.org/jira/browse/HIVE-8774 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, HIVE-8774.5.patch, HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, HIVE-8774.9.patch Right now, even when groupby index is build, CBO is not able to use it. In this patch, we are trying to make it use groupby index that we build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230489#comment-14230489 ] Hive QA commented on HIVE-8970: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12684466/HIVE-8970.3-spark.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7223 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_custom_input_output_format org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/469/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/469/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-469/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12684466 - PreCommit-HIVE-SPARK-Build Enable map join optimization only when hive.auto.convert.join is true [Spark Branch] Key: HIVE-8970 URL: https://issues.apache.org/jira/browse/HIVE-8970 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Fix For: spark-branch Attachments: HIVE-8970.1-spark.patch, HIVE-8970.2-spark.patch, HIVE-8970.3-spark.patch Right now, in Spark branch we enable MJ without looking at this configuration. The related code in {{SparkMapJoinOptimizer}} is commented out. We should only enable MJ when the flag is true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 28283: HIVE-8900:Create encryption testing framework
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28283/#review63437 --- itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java https://reviews.apache.org/r/28283/#comment105686 Do we need to set this value? For what I know, AES/CTR/NoPadding is the only cipher mode that HDFS supports. itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java https://reviews.apache.org/r/28283/#comment105687 I think this method 'in itEncryptionRelatedConfIfNeeded()' can be called inside the block line 370 as it is only called when clusterType is encrypted. Also, we may rename the method for a shorter name as IfNeeded won't be used. itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java https://reviews.apache.org/r/28283/#comment105688 What if we move this line inside initEncryptionConf()? It is part of encryption initialization. itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java https://reviews.apache.org/r/28283/#comment105689 - May we rename this method so that starts with the 'init' verb? This is just a good pratice I've learned in order to read code much better. Also, IfNeeded() is the correct syntax. - We could also get rid of the IfNeeded() word (making the name shorter) if if add the validation when this method is called instead of inside the method. It is just an opinion. itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java https://reviews.apache.org/r/28283/#comment105690 Just to comment that AES-256 can be used only if JCE is installed in your environment. Otherwise, any encryption with this key will fail. Keys can be created, but when you try to encrypt something, fails. We should put a comment here so that another developer knows this. ql/src/test/templates/TestEncrytedHDFSCliDriver.vm https://reviews.apache.org/r/28283/#comment105692 Why do we need this new class instead of TestCliDriver.vm? shims/0.20/src/main/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java https://reviews.apache.org/r/28283/#comment105696 I think we should leave the 'hadoop.encryption.is.not.supported' key name on unsupported hadoop versions. This was left only as a comment for developers. Nobody will use this configuration key anyways. shims/0.20/src/main/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java https://reviews.apache.org/r/28283/#comment105695 Do we need these two configuration values in the configuration environment? These are used only for test purposes on QTestUtil. The user won't use these fields on hive-site.xml ever. Or not yet. shims/0.20S/src/main/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java https://reviews.apache.org/r/28283/#comment105693 I think we should leave the 'hadoop.encryption.is.not.supported' key name on unsupported hadoop versions. This was left only as a comment for developers. Nobody will use this configuration key anyways. shims/0.20S/src/main/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java https://reviews.apache.org/r/28283/#comment105694 Do we need these two configuration values in the configuration environment? These are used only for test purposes on QTestUtil. The user won't use these fields on hive-site.xml ever. Or not yet. shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java https://reviews.apache.org/r/28283/#comment105697 Let's import the necessary modules only. I think the IDE did this replacement. shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java https://reviews.apache.org/r/28283/#comment105698 Why was this block removed? I see the keyProvider variable is initialized inside getMiniDfs() method (testing). But what will happen with production code? - Sergio Pena On Nov. 28, 2014, 1:45 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28283/ --- (Updated Nov. 28, 2014, 1:45 a.m.) Review request for hive. Repository: hive-git Description --- The patch includes: 1. enable security properties for hive security cluster Diffs - .gitignore c5decaf data/scripts/q_test_cleanup_for_encryption.sql PRE-CREATION data/scripts/q_test_init_for_encryption.sql PRE-CREATION itests/qtest/pom.xml 376f4a9 itests/src/test/resources/testconfiguration.properties 3ae001d itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 31d5c29 ql/src/test/queries/clientpositive/create_encrypted_table.q PRE-CREATION ql/src/test/templates/TestEncrytedHDFSCliDriver.vm PRE-CREATION
[jira] [Commented] (HIVE-8957) Remote spark context needs to clean up itself in case of connection timeout [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230502#comment-14230502 ] Xuefu Zhang commented on HIVE-8957: --- That's all right. I think I can bug you on this when you have cycles. Remote spark context needs to clean up itself in case of connection timeout [Spark Branch] -- Key: HIVE-8957 URL: https://issues.apache.org/jira/browse/HIVE-8957 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-8957.1-spark.patch In the current SparkClient implementation (class SparkClientImpl), the constructor does some initialization and in the end waits for the remote driver to connect. In case of timeout, it just throws an exception without cleaning itself. The cleanup is necessary to release system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230548#comment-14230548 ] Chao commented on HIVE-8982: I ran mapjoin_mapjoin and auto_join31 each 10 times on the latest spark branch, but couldn't reproduce the issue. Is this still occuring on jenkins? IndexOutOfBounds exception in mapjoin [Spark Branch] Key: HIVE-8982 URL: https://issues.apache.org/jira/browse/HIVE-8982 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho There are sometimes random failures in spark mapjoin during unit tests like: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77) ... 20 more org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at
[jira] [Assigned] (HIVE-8992) Fix two bucket related test failures, infer_bucket_sort_convert_join.q and parquet_join.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HIVE-8992: - Assignee: Jimmy Xiang Fix two bucket related test failures, infer_bucket_sort_convert_join.q and parquet_join.q [Spark Branch] Key: HIVE-8992 URL: https://issues.apache.org/jira/browse/HIVE-8992 Project: Hive Issue Type: Sub-task Components: spark-branch Reporter: Xuefu Zhang Assignee: Jimmy Xiang Failures shown in HIVE-8836. The seemed related to wrong reducer numbers in terms of bucket join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8374) schematool fails on Postgres versions 9.2
[ https://issues.apache.org/jira/browse/HIVE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-8374: --- Fix Version/s: 0.14.1 schematool fails on Postgres versions 9.2 --- Key: HIVE-8374 URL: https://issues.apache.org/jira/browse/HIVE-8374 Project: Hive Issue Type: Bug Components: Database/Schema Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal Fix For: 0.15.0, 0.14.1 Attachments: HIVE-8374.1.patch, HIVE-8374.2.patch, HIVE-8374.3.patch, HIVE-8374.patch The upgrade script for HIVE-5700 creates an UDF with language 'plpgsql', which is available by default only for Postgres 9.2+. For older Postgres versions, the language must be explicitly created, otherwise schematool fails with the error: {code} Error: ERROR: language plpgsql does not exist Hint: Use CREATE LANGUAGE to load the language into the database. (state=42704,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8374) schematool fails on Postgres versions 9.2
[ https://issues.apache.org/jira/browse/HIVE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230555#comment-14230555 ] Sergey Shelukhin commented on HIVE-8374: backported to 14 schematool fails on Postgres versions 9.2 --- Key: HIVE-8374 URL: https://issues.apache.org/jira/browse/HIVE-8374 Project: Hive Issue Type: Bug Components: Database/Schema Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal Fix For: 0.15.0, 0.14.1 Attachments: HIVE-8374.1.patch, HIVE-8374.2.patch, HIVE-8374.3.patch, HIVE-8374.patch The upgrade script for HIVE-5700 creates an UDF with language 'plpgsql', which is available by default only for Postgres 9.2+. For older Postgres versions, the language must be explicitly created, otherwise schematool fails with the error: {code} Error: ERROR: language plpgsql does not exist Hint: Use CREATE LANGUAGE to load the language into the database. (state=42704,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: 0.15 release
+1 . Regarding the next version being 0.15 - I have some thoughts on the versioning of hive. I will start a different thread on that. On Mon, Dec 1, 2014 at 11:43 AM, Brock Noland br...@cloudera.com wrote: Hi, In 2014 we did two large releases. Thank you very much to the RM's for pushing those out! I've found that Apache projects gain traction through releasing often, thus I think we should aim to increase the rate of releases in 2015. (Not that I cannot complain since I did not volunteer to RM any release.) As such I'd like to volunteer as RM for the 0.15 release. Cheers, Brock -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-8886) Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup
[ https://issues.apache.org/jira/browse/HIVE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-8886: --- Status: In Progress (was: Patch Available) Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup --- Key: HIVE-8886 URL: https://issues.apache.org/jira/browse/HIVE-8886 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.1 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.1 Attachments: HIVE-8886.01.patch, HIVE-8886.02.patch {noformat} SELECT CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) AS `field` FROM vectortab2korc GROUP BY CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) LIMIT 50; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy
Hari Sankar Sivarama Subramaniyan created HIVE-9001: --- Summary: Ship with log4j.properties file that has a reliable time based rolling policy Key: HIVE-9001 URL: https://issues.apache.org/jira/browse/HIVE-9001 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan The hive log gets locked by the hive process and cannot be rolled in windows OS. Install Hive in Windows, start hive, try and rename hive log while Hive is running. Wait for log4j tries to rename it and it will throw the same error as it is locked by the process. The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 should be integrated to Hive (Internal as well as trunk) for a reliable rollover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8886) Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup
[ https://issues.apache.org/jira/browse/HIVE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-8886: --- Attachment: HIVE-8886.02.patch Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup --- Key: HIVE-8886 URL: https://issues.apache.org/jira/browse/HIVE-8886 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.1 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.1 Attachments: HIVE-8886.01.patch, HIVE-8886.02.patch {noformat} SELECT CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) AS `field` FROM vectortab2korc GROUP BY CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) LIMIT 50; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230567#comment-14230567 ] Szehon Ho commented on HIVE-8982: - Yea. I still see some random failures in mapjoin tests like: [http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/464/testReport/junit/org.apache.hadoop.hive.cli/TestSparkCliDriver/testCliDriver_mapjoin_hook/|http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/464/testReport/junit/org.apache.hadoop.hive.cli/TestSparkCliDriver/testCliDriver_mapjoin_hook/] Usually when I get those, I see this exception. I didnt dig too deep into the latest random failure logs to confirm again though. IndexOutOfBounds exception in mapjoin [Spark Branch] Key: HIVE-8982 URL: https://issues.apache.org/jira/browse/HIVE-8982 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho There are sometimes random failures in spark mapjoin during unit tests like: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77) ... 20 more org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at
[jira] [Commented] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230568#comment-14230568 ] Xuefu Zhang commented on HIVE-8982: --- It doesn't seem they are happening any more. Feel free to close this. IndexOutOfBounds exception in mapjoin [Spark Branch] Key: HIVE-8982 URL: https://issues.apache.org/jira/browse/HIVE-8982 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho There are sometimes random failures in spark mapjoin during unit tests like: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77) ... 20 more org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at
[jira] [Updated] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy
[ https://issues.apache.org/jira/browse/HIVE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-9001: Attachment: HIVE-9001.1.patch cc-ing [~sushanth] for reviewing this change. Ship with log4j.properties file that has a reliable time based rolling policy - Key: HIVE-9001 URL: https://issues.apache.org/jira/browse/HIVE-9001 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-9001.1.patch The hive log gets locked by the hive process and cannot be rolled in windows OS. Install Hive in Windows, start hive, try and rename hive log while Hive is running. Wait for log4j tries to rename it and it will throw the same error as it is locked by the process. The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 should be integrated to Hive (Internal as well as trunk) for a reliable rollover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8886) Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup
[ https://issues.apache.org/jira/browse/HIVE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-8886: --- Status: Patch Available (was: In Progress) Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup --- Key: HIVE-8886 URL: https://issues.apache.org/jira/browse/HIVE-8886 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.1 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.1 Attachments: HIVE-8886.01.patch, HIVE-8886.02.patch {noformat} SELECT CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) AS `field` FROM vectortab2korc GROUP BY CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) LIMIT 50; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy
[ https://issues.apache.org/jira/browse/HIVE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-9001: Status: Patch Available (was: Open) Ship with log4j.properties file that has a reliable time based rolling policy - Key: HIVE-9001 URL: https://issues.apache.org/jira/browse/HIVE-9001 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-9001.1.patch The hive log gets locked by the hive process and cannot be rolled in windows OS. Install Hive in Windows, start hive, try and rename hive log while Hive is running. Wait for log4j tries to rename it and it will throw the same error as it is locked by the process. The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 should be integrated to Hive for a reliable rollover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy
[ https://issues.apache.org/jira/browse/HIVE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-9001: Description: The hive log gets locked by the hive process and cannot be rolled in windows OS. Install Hive in Windows, start hive, try and rename hive log while Hive is running. Wait for log4j tries to rename it and it will throw the same error as it is locked by the process. The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 should be integrated to Hive for a reliable rollover. was: The hive log gets locked by the hive process and cannot be rolled in windows OS. Install Hive in Windows, start hive, try and rename hive log while Hive is running. Wait for log4j tries to rename it and it will throw the same error as it is locked by the process. The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 should be integrated to Hive (Internal as well as trunk) for a reliable rollover. Ship with log4j.properties file that has a reliable time based rolling policy - Key: HIVE-9001 URL: https://issues.apache.org/jira/browse/HIVE-9001 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-9001.1.patch The hive log gets locked by the hive process and cannot be rolled in windows OS. Install Hive in Windows, start hive, try and rename hive log while Hive is running. Wait for log4j tries to rename it and it will throw the same error as it is locked by the process. The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 should be integrated to Hive for a reliable rollover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230572#comment-14230572 ] Chao commented on HIVE-8982: OK, closing for now. IndexOutOfBounds exception in mapjoin [Spark Branch] Key: HIVE-8982 URL: https://issues.apache.org/jira/browse/HIVE-8982 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho There are sometimes random failures in spark mapjoin during unit tests like: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77) ... 20 more org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at
[jira] [Assigned] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao reassigned HIVE-8982: -- Assignee: Chao IndexOutOfBounds exception in mapjoin [Spark Branch] Key: HIVE-8982 URL: https://issues.apache.org/jira/browse/HIVE-8982 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Chao There are sometimes random failures in spark mapjoin during unit tests like: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77) ... 20 more org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at
[jira] [Updated] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy
[ https://issues.apache.org/jira/browse/HIVE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-9001: Attachment: (was: HIVE-9001.1.patch) Ship with log4j.properties file that has a reliable time based rolling policy - Key: HIVE-9001 URL: https://issues.apache.org/jira/browse/HIVE-9001 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-9001.1.patch The hive log gets locked by the hive process and cannot be rolled in windows OS. Install Hive in Windows, start hive, try and rename hive log while Hive is running. Wait for log4j tries to rename it and it will throw the same error as it is locked by the process. The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 should be integrated to Hive for a reliable rollover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy
[ https://issues.apache.org/jira/browse/HIVE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-9001: Attachment: HIVE-9001.1.patch Ship with log4j.properties file that has a reliable time based rolling policy - Key: HIVE-9001 URL: https://issues.apache.org/jira/browse/HIVE-9001 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-9001.1.patch The hive log gets locked by the hive process and cannot be rolled in windows OS. Install Hive in Windows, start hive, try and rename hive log while Hive is running. Wait for log4j tries to rename it and it will throw the same error as it is locked by the process. The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 should be integrated to Hive for a reliable rollover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao resolved HIVE-8982. Resolution: Cannot Reproduce IndexOutOfBounds exception in mapjoin [Spark Branch] Key: HIVE-8982 URL: https://issues.apache.org/jira/browse/HIVE-8982 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Chao There are sometimes random failures in spark mapjoin during unit tests like: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77) ... 20 more org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at
[jira] [Commented] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230583#comment-14230583 ] Szehon Ho commented on HIVE-8982: - I dug a little and found the exception again here as part of run 464. See [http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-464/failed/TestSparkCliDriver-groupby_complex_types.q-auto_join9.q-groupby_map_ppr.q-and-12-more/spark.log|http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-464/failed/TestSparkCliDriver-groupby_complex_types.q-auto_join9.q-groupby_map_ppr.q-and-12-more/spark.log]. I think its still unresolved.. IndexOutOfBounds exception in mapjoin [Spark Branch] Key: HIVE-8982 URL: https://issues.apache.org/jira/browse/HIVE-8982 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Chao There are sometimes random failures in spark mapjoin during unit tests like: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77) ... 20 more org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at
[jira] [Reopened] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao reopened HIVE-8982: IndexOutOfBounds exception in mapjoin [Spark Branch] Key: HIVE-8982 URL: https://issues.apache.org/jira/browse/HIVE-8982 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Chao There are sometimes random failures in spark mapjoin during unit tests like: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77) ... 20 more org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at
[jira] [Resolved] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao resolved HIVE-8982. Resolution: Cannot Reproduce IndexOutOfBounds exception in mapjoin [Spark Branch] Key: HIVE-8982 URL: https://issues.apache.org/jira/browse/HIVE-8982 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Chao There are sometimes random failures in spark mapjoin during unit tests like: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77) ... 20 more org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at
[jira] [Commented] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230588#comment-14230588 ] Szehon Ho commented on HIVE-8982: - Sorry this is the Not a directory exception that was closed in the other JIRA.. IndexOutOfBounds exception in mapjoin [Spark Branch] Key: HIVE-8982 URL: https://issues.apache.org/jira/browse/HIVE-8982 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Chao There are sometimes random failures in spark mapjoin during unit tests like: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128) at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77) ... 20 more org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
[GitHub] hive pull request: How to calculate the Kendall coefficient of cor...
GitHub user MarcinKosinski opened a pull request: https://github.com/apache/hive/pull/24 How to calculate the Kendall coefficient of correlation of a pair of a numeric columns in the group? In this [wiki page](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF) there is a function `corr()` that calculates the Pearson coefficient of correlation, but my question is that: is there any function in Hive that enables to calculate the Kendall coefficient of correlation of a pair of a numeric columns in the group? You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/hive HIVE-8065 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/24.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #24 commit 1628cb08e0bf1c6b168a9aa7b6f978a943cdc105 Author: Brock Noland br...@apache.org Date: 2014-11-05T23:38:20Z Creating branch for HIVE-8065 git-svn-id: https://svn.apache.org/repos/asf/hive/branches/HIVE-8065@1637006 13f79535-47bb-0310-9956-ffa450edef68 commit a9a413d6f4bd7273caf3d26bd4dd2b0d9672d56d Author: Brock Noland br...@apache.org Date: 2014-11-14T00:04:08Z HIVE-8749 - Change Hadoop version on HIVE-8065 to 2.6-SNAPSHOT (Sergio Pena via Brock) git-svn-id: https://svn.apache.org/repos/asf/hive/branches/HIVE-8065@1639558 13f79535-47bb-0310-9956-ffa450edef68 commit b45941d8b64e3b2553034cc6ae212a31084a694d Author: Brock Noland br...@apache.org Date: 2014-11-17T22:36:47Z HIVE-8750 - Commit initial encryption work (Sergio Pena via Brock) git-svn-id: https://svn.apache.org/repos/asf/hive/branches/HIVE-8065@1640247 13f79535-47bb-0310-9956-ffa450edef68 commit 184cf1ef21d7f9e8ce6b9d39044708d6daf1ffab Author: Brock Noland br...@apache.org Date: 2014-11-18T22:51:55Z HIVE-8904 - Hive should support multiple Key provider modes (Ferdinand Xu via Brock) git-svn-id: https://svn.apache.org/repos/asf/hive/branches/HIVE-8065@1640446 13f79535-47bb-0310-9956-ffa450edef68 commit 61c468250512d7242aa343d59f2a81e3174ea112 Author: Brock Noland br...@apache.org Date: 2014-11-20T06:10:44Z HIVE-8919 - Fix FileUtils.copy() method to call distcp only for HDFS files (not local files) (Sergio Pena via Brock) git-svn-id: https://svn.apache.org/repos/asf/hive/branches/HIVE-8065@1640684 13f79535-47bb-0310-9956-ffa450edef68 commit 018b67cadc0dbad64df05819d92b87f2dc5bdaf8 Author: Brock Noland br...@apache.org Date: 2014-11-21T21:57:23Z HIVE-8945 - Allow user to read encrypted read-only tables only if the scratch directory is encrypted (Sergio Pena via Brock) git-svn-id: https://svn.apache.org/repos/asf/hive/branches/HIVE-8065@1641007 13f79535-47bb-0310-9956-ffa450edef68 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Reopened] (HIVE-8981) Not a directory error in mapjoin_hook.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao reopened HIVE-8981: Not a directory error in mapjoin_hook.q [Spark Branch] -- Key: HIVE-8981 URL: https://issues.apache.org/jira/browse/HIVE-8981 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Environment: Using remote-spark context with spark-master=local-cluster [2,2,1024] Reporter: Szehon Ho Assignee: Chao Hits the following exception: {noformat} 2014-11-26 15:17:11,728 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - 14/11/26 15:17:11 WARN TaskSetManager: Lost task 0.0 in stage 8.0 (TID 18, 172.16.3.52): java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container 2014-11-26 15:17:11,728 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:160) 2014-11-26 15:17:11,728 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:28) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at scala.collection.Iterator$class.foreach(Iterator.scala:727) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.scheduler.Task.run(Task.scala:56) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at java.lang.Thread.run(Thread.java:744) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:100) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl
[jira] [Commented] (HIVE-8981) Not a directory error in mapjoin_hook.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230600#comment-14230600 ] Chao commented on HIVE-8981: [~szehon] Is this issue also happening randomly? What is the failing test? Any suggestion on how to reproduce it? Not a directory error in mapjoin_hook.q [Spark Branch] -- Key: HIVE-8981 URL: https://issues.apache.org/jira/browse/HIVE-8981 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Environment: Using remote-spark context with spark-master=local-cluster [2,2,1024] Reporter: Szehon Ho Assignee: Chao Hits the following exception: {noformat} 2014-11-26 15:17:11,728 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - 14/11/26 15:17:11 WARN TaskSetManager: Lost task 0.0 in stage 8.0 (TID 18, 172.16.3.52): java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container 2014-11-26 15:17:11,728 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:160) 2014-11-26 15:17:11,728 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:28) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at scala.collection.Iterator$class.foreach(Iterator.scala:727) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.scheduler.Task.run(Task.scala:56) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - at java.lang.Thread.run(Thread.java:744) 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(364)) - Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container 2014-11-26 15:17:11,729 INFO [stderr-redir-1]: client.SparkClientImpl
[jira] [Updated] (HIVE-6421) abs() should preserve precision/scale of decimal input
[ https://issues.apache.org/jira/browse/HIVE-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-6421: - Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks for review Ashutosh abs() should preserve precision/scale of decimal input -- Key: HIVE-6421 URL: https://issues.apache.org/jira/browse/HIVE-6421 Project: Hive Issue Type: Bug Components: UDF Reporter: Jason Dere Assignee: Jason Dere Fix For: 0.15.0 Attachments: HIVE-6421.1.txt, HIVE-6421.2.patch, HIVE-6421.3.patch {noformat} hive describe dec1; OK c1decimal(10,2) None hive explain select c1, abs(c1) from dec1; ... Select Operator expressions: c1 (type: decimal(10,2)), abs(c1) (type: decimal(38,18)) {noformat} Given that abs() is a GenericUDF it should be possible for the return type precision/scale to match the input precision/scale. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: 0.15 release
Brock, When you say more frequent releases, what schedule do you have in mind ? I think a (approximately) quarterly release cycle would be good. We branched for hive 0.14 on Sept 25, which means we have been adding new features not in 0.14 for more than 2 months. How about branching for 0.15 equivalent in another month or two ? Sometime in Jan ? On Mon, Dec 1, 2014 at 2:19 PM, Thejas Nair the...@hortonworks.com wrote: +1 . Regarding the next version being 0.15 - I have some thoughts on the versioning of hive. I will start a different thread on that. On Mon, Dec 1, 2014 at 11:43 AM, Brock Noland br...@cloudera.com wrote: Hi, In 2014 we did two large releases. Thank you very much to the RM's for pushing those out! I've found that Apache projects gain traction through releasing often, thus I think we should aim to increase the rate of releases in 2015. (Not that I cannot complain since I did not volunteer to RM any release.) As such I'd like to volunteer as RM for the 0.15 release. Cheers, Brock -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Review Request 27713: CBO: enable groupBy index
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27713/#review63459 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java https://reviews.apache.org/r/27713/#comment105714 I don't think you can allow function wraping index key. Since we don't know if UDF is going to mutate the values (Non Null - Null, Null - Non Null). Example: select a, count(b) from (select a, (case a is null then 1 else a) as b from r1)r2 group by a; - John Pullokkaran On Dec. 1, 2014, 6:57 p.m., pengcheng xiong wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27713/ --- (Updated Dec. 1, 2014, 6:57 p.m.) Review request for hive and John Pullokkaran. Repository: hive-git Description --- Right now, even when groupby index is build, CBO is not able to use it. In this patch, we are trying to make it use groupby index that we build. The basic problem is that for SEL1-SEL2-GRY-...-SEL3, the previous version only modify SEL2, which immediately precedes GRY. Now, with CBO, we have lots of SELs, e.g., SEL1. So, the solution is to modify all of them. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 9ffa708 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java 02216de ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 0f06ec9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java 74614f3 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java d699308 ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_1.q PRE-CREATION ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_2.q PRE-CREATION ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out fdc1dc6 ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_1.q.out PRE-CREATION ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/27713/diff/ Testing --- Thanks, pengcheng xiong
Re: Review Request 28510: HIVE-8974
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28510/#review63461 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java https://reviews.apache.org/r/28510/#comment105717 Why can't we reuse HiveAggregateRel? - John Pullokkaran On Nov. 27, 2014, 2:37 p.m., Jesús Camacho Rodríguez wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28510/ --- (Updated Nov. 27, 2014, 2:37 p.m.) Review request for hive, John Pullokkaran and Julian Hyde. Bugs: HIVE-8974 https://issues.apache.org/jira/browse/HIVE-8974 Repository: hive-git Description --- Upgrade to Calcite 1.0.0-SNAPSHOT Diffs - pom.xml 630b10ce35032e4b2dee50ef3dfe5feb58223b78 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/HiveDefaultRelMetadataProvider.java e9e052ffe8759fa9c49377c58d41450feee0b126 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/HiveOptiqUtil.java 80f657e9b1e7e9e965e6814ae76de78316367135 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/HiveTypeSystemImpl.java 1bc5a2cfca071ea02a446ae517481f927193f23c ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/OptiqSemanticException.java d2b08fa64b868942b7636df171ed89f0081f7253 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/RelOptHiveTable.java 080d27fa873f071fb2e0f7932ad26819b79d0477 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/TraitsUtil.java 4b44a28ca77540fd643fc03b89dcb4b2155d081a ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/cost/HiveCost.java 72fe5d6f26d0fd9a34c8e89be3040cce4593fd4a ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/cost/HiveCostUtil.java 7436f12f662542c41e71a7fee37179e35e4e2553 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/cost/HiveVolcanoPlanner.java 5deb801649f47e0629b3583ef57c62d4a4699f78 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveAggregateRel.java fc198958735e12cb3503a0b4c486d8328a10a2fa ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveFilterRel.java 8b850463ac1c3270163725f876404449ef8dc5f9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveJoinRel.java 3d6aa848cd4c83ec8eb22f7df449911d67a53b9b ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveLimitRel.java f8755d0175c10e5b5461649773bf44abe998b44e ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveProjectRel.java 7b434ea58451bef6a6566eb241933843ee855606 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveRel.java 4738c4ac2d33cd15d2db7fe4b8336e1f59dd5212 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveSortRel.java f85363d50c1c3eb9cef39072106057669454d4da ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveTableScanRel.java bd66459def099df6432f344a9d8439deef09daa6 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveUnionRel.java d34fe9540e239c13f6bd23894056305c0c402e0d ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/rules/HiveMergeProjectRule.java d6581e64fc8ea183666ea6c91397378456461088 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/rules/HivePartitionPrunerRule.java ee19a6cbab0597242214e915745631f76214f70f ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/rules/HivePushFilterPastJoinRule.java 1c483eabcc1aa43cc80d7b71e21a4ae4d30a7e12 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/rules/PartitionPruner.java bdc8373877c1684855d256c9d45743f383fc7615 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/FilterSelectivityEstimator.java 28bf2ad506656b78894467c30364d751b180676e ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/HiveRelMdDistinctRowCount.java 4be57b110c1a45819467d55e8a69e5529989c8f6 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/HiveRelMdRowCount.java 8c7f643940b74dd7743635c3eaa046d52d41346f ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/HiveRelMdSelectivity.java 49d2ee5a67b72fbf6134ce71de1d7260069cd16f ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/stats/HiveRelMdUniqueKeys.java c3c8bdd2466b0f46d49437fcf8d49dbb689cfcda ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTBuilder.java 58320c73aafbfeec025f52ee813b3cfd06fa0821 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTConverter.java a217d70e48da0835fed3565ba510bcc9e86c0fa1
[jira] [Updated] (HIVE-8948) TestStreaming is flaky
[ https://issues.apache.org/jira/browse/HIVE-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8948: - Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Patch checked in. Thanks Eugene for the review. TestStreaming is flaky -- Key: HIVE-8948 URL: https://issues.apache.org/jira/browse/HIVE-8948 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.15.0 Attachments: HIVE-8948.patch TestStreaming seems to fail in one of its tests or another about 1 in 50 times. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
[ https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-8974: - Description: CLEAR LIBRARY CACHE Calcite recently (after 0.9.2, before 1.0.0) re-organized its package structure and renamed a lot of classes. CALCITE-296 has the details, including a description of the before:after mapping. This task is to upgrade to the version of Calcite that has the renamed packages. There is a 1.0.0-SNAPSHOT in Apache nexus. Calcite functionality has not changed significantly, so it should be straightforward to rename. This task should be completed ASAP, before Calcite moves on. was: Calcite recently (after 0.9.2, before 1.0.0) re-organized its package structure and renamed a lot of classes. CALCITE-296 has the details, including a description of the before:after mapping. This task is to upgrade to the version of Calcite that has the renamed packages. There is a 1.0.0-SNAPSHOT in Apache nexus. Calcite functionality has not changed significantly, so it should be straightforward to rename. This task should be completed ASAP, before Calcite moves on. Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames) Key: HIVE-8974 URL: https://issues.apache.org/jira/browse/HIVE-8974 Project: Hive Issue Type: Task Affects Versions: 0.15.0 Reporter: Julian Hyde Assignee: Jesus Camacho Rodriguez Fix For: 0.15.0 Attachments: HIVE-8974.01.patch, HIVE-8974.patch CLEAR LIBRARY CACHE Calcite recently (after 0.9.2, before 1.0.0) re-organized its package structure and renamed a lot of classes. CALCITE-296 has the details, including a description of the before:after mapping. This task is to upgrade to the version of Calcite that has the renamed packages. There is a 1.0.0-SNAPSHOT in Apache nexus. Calcite functionality has not changed significantly, so it should be straightforward to rename. This task should be completed ASAP, before Calcite moves on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
[ https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230718#comment-14230718 ] Laljo John Pullokkaran commented on HIVE-8974: -- The failure may be due to QA not clearing the local mvn repo. I have updated your bug description (which should prompt qa run to clear cache). Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames) Key: HIVE-8974 URL: https://issues.apache.org/jira/browse/HIVE-8974 Project: Hive Issue Type: Task Affects Versions: 0.15.0 Reporter: Julian Hyde Assignee: Jesus Camacho Rodriguez Fix For: 0.15.0 Attachments: HIVE-8974.01.patch, HIVE-8974.patch CLEAR LIBRARY CACHE Calcite recently (after 0.9.2, before 1.0.0) re-organized its package structure and renamed a lot of classes. CALCITE-296 has the details, including a description of the before:after mapping. This task is to upgrade to the version of Calcite that has the renamed packages. There is a 1.0.0-SNAPSHOT in Apache nexus. Calcite functionality has not changed significantly, so it should be straightforward to rename. This task should be completed ASAP, before Calcite moves on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8947) HIVE-8876 also affects Postgres 9.2
[ https://issues.apache.org/jira/browse/HIVE-8947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230726#comment-14230726 ] Sergey Shelukhin commented on HIVE-8947: [~vikram.dixit] ok for 14.1? HIVE-8876 also affects Postgres 9.2 - Key: HIVE-8947 URL: https://issues.apache.org/jira/browse/HIVE-8947 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.15.0 Attachments: HIVE-8947.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9002) union all does not generate correct result for order by and limit
Pengcheng Xiong created HIVE-9002: - Summary: union all does not generate correct result for order by and limit Key: HIVE-9002 URL: https://issues.apache.org/jira/browse/HIVE-9002 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Right now if we have select col from A union all select col from B [Operator] it is treated as (select col from A) union all (select col from B [Operator]) Although it is correct for where, group by (having) join operators, it is not correct for order by and limit operators. They should be (select col from A union all select col from B) [order by, limit] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9002) union all does not generate correct result for order by and limit
[ https://issues.apache.org/jira/browse/HIVE-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9002: -- Description: Right now if we have select col from A union all select col from B [Operator] it is treated as (select col from A) union all (select col from B [Operator]) Although it is correct for where, group by (having) join operators, it is not correct for order by and limit operators. They should be (select col from A union all select col from B) [order by, limit] For order by, we can refer to MySQL, Oracle, DB2 mysql http://dev.mysql.com/doc/refman/5.1/en/union.html oracle https://docs.oracle.com/cd/E17952_01/refman-5.0-en/union.html ibm http://www-01.ibm.com/support/knowledgecenter/ssw_i5_54/sqlp/rbafykeyu.htm was: Right now if we have select col from A union all select col from B [Operator] it is treated as (select col from A) union all (select col from B [Operator]) Although it is correct for where, group by (having) join operators, it is not correct for order by and limit operators. They should be (select col from A union all select col from B) [order by, limit] union all does not generate correct result for order by and limit - Key: HIVE-9002 URL: https://issues.apache.org/jira/browse/HIVE-9002 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Right now if we have select col from A union all select col from B [Operator] it is treated as (select col from A) union all (select col from B [Operator]) Although it is correct for where, group by (having) join operators, it is not correct for order by and limit operators. They should be (select col from A union all select col from B) [order by, limit] For order by, we can refer to MySQL, Oracle, DB2 mysql http://dev.mysql.com/doc/refman/5.1/en/union.html oracle https://docs.oracle.com/cd/E17952_01/refman-5.0-en/union.html ibm http://www-01.ibm.com/support/knowledgecenter/ssw_i5_54/sqlp/rbafykeyu.htm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9002) union all does not generate correct result for order by and limit
[ https://issues.apache.org/jira/browse/HIVE-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230734#comment-14230734 ] Pengcheng Xiong commented on HIVE-9002: --- 3 candidate ways to fix it (1) fix that within HiveParser.g (2) fix that in QB by rewriting (3) partially reverse the patch of https://issues.apache.org/jira/browse/HIVE-6189 and use subqueries for union all [~jpullokkaran], could you please take a look? union all does not generate correct result for order by and limit - Key: HIVE-9002 URL: https://issues.apache.org/jira/browse/HIVE-9002 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Right now if we have select col from A union all select col from B [Operator] it is treated as (select col from A) union all (select col from B [Operator]) Although it is correct for where, group by (having) join operators, it is not correct for order by and limit operators. They should be (select col from A union all select col from B) [order by, limit] For order by, we can refer to MySQL, Oracle, DB2 mysql http://dev.mysql.com/doc/refman/5.1/en/union.html oracle https://docs.oracle.com/cd/E17952_01/refman-5.0-en/union.html ibm http://www-01.ibm.com/support/knowledgecenter/ssw_i5_54/sqlp/rbafykeyu.htm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9003) Vectorized IF expr broken for the scalar and scalar case
Matt McCline created HIVE-9003: -- Summary: Vectorized IF expr broken for the scalar and scalar case Key: HIVE-9003 URL: https://issues.apache.org/jira/browse/HIVE-9003 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.1 SELECT IF (bool_col, 'first', 'second') FROM ... is broken for Vectorization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-860) Persistent distributed cache
[ https://issues.apache.org/jira/browse/HIVE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-860: -- Attachment: HIVE-860.4.patch reattach since the error No space left on device (28) Persistent distributed cache Key: HIVE-860 URL: https://issues.apache.org/jira/browse/HIVE-860 Project: Hive Issue Type: Improvement Affects Versions: 0.12.0 Reporter: Zheng Shao Assignee: Ferdinand Xu Fix For: 0.15.0 Attachments: HIVE-860.1.patch, HIVE-860.2.patch, HIVE-860.2.patch, HIVE-860.3.patch, HIVE-860.4.patch, HIVE-860.4.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch DistributedCache is shared across multiple jobs, if the hdfs file name is the same. We need to make sure Hive put the same file into the same location every time and do not overwrite if the file content is the same. We can achieve 2 different results: A1. Files added with the same name, timestamp, and md5 in the same session will have a single copy in distributed cache. A2. Filed added with the same name, timestamp, and md5 will have a single copy in distributed cache. A2 has a bigger benefit in sharing but may raise a question on when Hive should clean it up in hdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8135) Pool zookeeper connections
[ https://issues.apache.org/jira/browse/HIVE-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230762#comment-14230762 ] Ferdinand Xu commented on HIVE-8135: I think so. http://curator.apache.org/curator-recipes/index.html And I will take a look into the details. Pool zookeeper connections -- Key: HIVE-8135 URL: https://issues.apache.org/jira/browse/HIVE-8135 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Ferdinand Xu Today we create a ZK connection per client. We should instead have a connection pool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy
[ https://issues.apache.org/jira/browse/HIVE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230774#comment-14230774 ] Hive QA commented on HIVE-9001: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12684492/HIVE-9001.1.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6695 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mapjoin_mapjoin org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Delimited org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1941/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1941/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1941/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12684492 - PreCommit-HIVE-TRUNK-Build Ship with log4j.properties file that has a reliable time based rolling policy - Key: HIVE-9001 URL: https://issues.apache.org/jira/browse/HIVE-9001 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-9001.1.patch The hive log gets locked by the hive process and cannot be rolled in windows OS. Install Hive in Windows, start hive, try and rename hive log while Hive is running. Wait for log4j tries to rename it and it will throw the same error as it is locked by the process. The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 should be integrated to Hive for a reliable rollover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 28500: HIVE-8943 : Fix memory limit check for combine nested mapjoins [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28500/ --- (Updated Dec. 2, 2014, 1:34 a.m.) Review request for hive, Chao Sun, Suhas Satish, and Xuefu Zhang. Changes --- Fix algorithm and cleanup after discussion with Xuefu. Original code was too aggressively incorporating connected mapjoins into its size calculation, new code only looks at the big table's connected mapjoins. Bugs: HIVE-8943 https://issues.apache.org/jira/browse/HIVE-8943 Repository: hive-git Description --- SparkMapJoinOptimizer by default combines nested mapjoins into one work due to removal of RS for big-table. So we need to enhance the mapjoin check to calculate if all the MapJoins in that work (spark-stage) will fit into the memory, otherwise it might overwhelm memory for that particular spark executor. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java 819eef1 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java 0c339a5 ql/src/test/queries/clientpositive/auto_join_stats.q PRE-CREATION ql/src/test/queries/clientpositive/auto_join_stats2.q PRE-CREATION ql/src/test/results/clientpositive/auto_join_stats.q.out PRE-CREATION ql/src/test/results/clientpositive/auto_join_stats2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/auto_join_stats.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/auto_join_stats2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/28500/diff/ Testing --- Added two unit tests: 1. auto_join_stats, which sets a memory limit and checks that algorithm does not put more than 1 mapjoin in one BaseWork 2. auto_join_stats2, which is the same query without memory limit, and check that algorithm puts all mapjoin in one BaseWork because it can. Thanks, Szehon Ho
[jira] [Updated] (HIVE-8943) Fix memory limit check for combine nested mapjoins [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-8943: Attachment: HIVE-8943-4.spark.patch Fix memory limit check for combine nested mapjoins [Spark Branch] - Key: HIVE-8943 URL: https://issues.apache.org/jira/browse/HIVE-8943 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-8943-4.spark.patch, HIVE-8943.1-spark.patch, HIVE-8943.1-spark.patch, HIVE-8943.2-spark.patch, HIVE-8943.3-spark.patch Its the opposite problem of what we thought in HIVE-8701. SparkMapJoinOptimizer does combine nested mapjoins into one work due to removal of RS for big-table. So we need to enhance the check to calculate if all the MapJoins in that work (spark-stage) will fit into the memory, otherwise it might overwhelm memory for that particular spark executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8943) Fix memory limit check for combine nested mapjoins [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-8943: Attachment: HIVE-8943-4.spark.branch Fix algorithm and cleanup after discussion with Xuefu. Original code was too aggressively incorporating connected mapjoins into its size calculation, new code only looks at the big table's connected mapjoins. Fix memory limit check for combine nested mapjoins [Spark Branch] - Key: HIVE-8943 URL: https://issues.apache.org/jira/browse/HIVE-8943 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-8943-4.spark.patch, HIVE-8943.1-spark.patch, HIVE-8943.1-spark.patch, HIVE-8943.2-spark.patch, HIVE-8943.3-spark.patch Its the opposite problem of what we thought in HIVE-8701. SparkMapJoinOptimizer does combine nested mapjoins into one work due to removal of RS for big-table. So we need to enhance the check to calculate if all the MapJoins in that work (spark-stage) will fit into the memory, otherwise it might overwhelm memory for that particular spark executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8943) Fix memory limit check for combine nested mapjoins [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-8943: Attachment: (was: HIVE-8943-4.spark.branch) Fix memory limit check for combine nested mapjoins [Spark Branch] - Key: HIVE-8943 URL: https://issues.apache.org/jira/browse/HIVE-8943 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-8943-4.spark.patch, HIVE-8943.1-spark.patch, HIVE-8943.1-spark.patch, HIVE-8943.2-spark.patch, HIVE-8943.3-spark.patch Its the opposite problem of what we thought in HIVE-8701. SparkMapJoinOptimizer does combine nested mapjoins into one work due to removal of RS for big-table. So we need to enhance the check to calculate if all the MapJoins in that work (spark-stage) will fit into the memory, otherwise it might overwhelm memory for that particular spark executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8991) Fix custom_input_output_format [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230811#comment-14230811 ] Rui Li commented on HIVE-8991: -- Hi [~vanzin], just as [~xuefuz] said, this JIRA is only meant to fix the test {{custom_input_output_format.q}} after we enable unit tests with remote spark context. Please feel free to take it if you think of a better solution. Thanks! Fix custom_input_output_format [Spark Branch] - Key: HIVE-8991 URL: https://issues.apache.org/jira/browse/HIVE-8991 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8991.1-spark.patch After HIVE-8836, {{custom_input_output_format}} fails because of missing hive-it-util in remote driver's class path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8995) Find thread leak in RSC Tests
[ https://issues.apache.org/jira/browse/HIVE-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230812#comment-14230812 ] Rui Li commented on HIVE-8995: -- OK I'll have a look. Find thread leak in RSC Tests - Key: HIVE-8995 URL: https://issues.apache.org/jira/browse/HIVE-8995 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Rui Li I was regenerating output as part of the merge: {noformat} mvn test -Dtest=TestSparkCliDriver -Phadoop-2 -Dtest.output.overwrite=true -Dqfile=annotate_stats_join.q,auto_join0.q,auto_join1.q,auto_join10.q,auto_join11.q,auto_join12.q,auto_join13.q,auto_join14.q,auto_join15.q,auto_join16.q,auto_join17.q,auto_join18.q,auto_join18_multi_distinct.q,auto_join19.q,auto_join2.q,auto_join20.q,auto_join21.q,auto_join22.q,auto_join23.q,auto_join24.q,auto_join26.q,auto_join27.q,auto_join28.q,auto_join29.q,auto_join3.q,auto_join30.q,auto_join31.q,auto_join32.q,auto_join9.q,auto_join_reordering_values.q auto_join_without_localtask.q,auto_smb_mapjoin_14.q,auto_sortmerge_join_1.q,auto_sortmerge_join_10.q,auto_sortmerge_join_11.q,auto_sortmerge_join_12.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,auto_sortmerge_join_2.q,auto_sortmerge_join_3.q,auto_sortmerge_join_4.q,auto_sortmerge_join_5.q,auto_sortmerge_join_6.q,auto_sortmerge_join_7.q,auto_sortmerge_join_8.q,auto_sortmerge_join_9.q,bucket_map_join_1.q,bucket_map_join_2.q,bucket_map_join_tez1.q,bucket_map_join_tez2.q,bucketmapjoin1.q,bucketmapjoin10.q,bucketmapjoin11.q,bucketmapjoin12.q,bucketmapjoin13.q,bucketmapjoin2.q,bucketmapjoin3.q,bucketmapjoin4.q,bucketmapjoin5.q,bucketmapjoin7.q bucketmapjoin8.q,bucketmapjoin9.q,bucketmapjoin_negative.q,bucketmapjoin_negative2.q,bucketmapjoin_negative3.q,column_access_stats.q,cross_join.q,ctas.q,custom_input_output_format.q,groupby4.q,groupby7_noskew_multi_single_reducer.q,groupby_complex_types.q,groupby_complex_types_multi_single_reducer.q,groupby_multi_single_reducer2.q,groupby_multi_single_reducer3.q,groupby_position.q,groupby_sort_1_23.q,groupby_sort_skew_1_23.q,having.q,index_auto_self_join.q,infer_bucket_sort_convert_join.q,innerjoin.q,input12.q,join0.q,join1.q,join11.q,join12.q,join13.q,join14.q,join15.q join17.q,join18.q,join18_multi_distinct.q,join19.q,join2.q,join20.q,join21.q,join22.q,join23.q,join25.q,join26.q,join27.q,join28.q,join29.q,join3.q,join30.q,join31.q,join32.q,join32_lessSize.q,join33.q,join35.q,join36.q,join37.q,join38.q,join39.q,join40.q,join41.q,join9.q,join_alt_syntax.q,join_cond_pushdown_1.q join_cond_pushdown_2.q,join_cond_pushdown_3.q,join_cond_pushdown_4.q,join_cond_pushdown_unqual1.q,join_cond_pushdown_unqual2.q,join_cond_pushdown_unqual3.q,join_cond_pushdown_unqual4.q,join_filters_overlap.q,join_hive_626.q,join_map_ppr.q,join_merge_multi_expressions.q,join_merging.q,join_nullsafe.q,join_rc.q,join_reorder.q,join_reorder2.q,join_reorder3.q,join_reorder4.q,join_star.q,join_thrift.q,join_vc.q,join_view.q,limit_pushdown.q,load_dyn_part13.q,load_dyn_part14.q,louter_join_ppr.q,mapjoin1.q,mapjoin_decimal.q,mapjoin_distinct.q,mapjoin_filter_on_outerjoin.q mapjoin_hook.q,mapjoin_mapjoin.q,mapjoin_memcheck.q,mapjoin_subquery.q,mapjoin_subquery2.q,mapjoin_test_outer.q,mergejoins.q,mergejoins_mixed.q,multi_insert.q,multi_insert_gby.q,multi_insert_gby2.q,multi_insert_gby3.q,multi_insert_lateral_view.q,multi_insert_mixed.q,multi_insert_move_tasks_share_dependencies.q,multi_join_union.q,optimize_nullscan.q,outer_join_ppr.q,parallel.q,parallel_join0.q,parallel_join1.q,parquet_join.q,pcr.q,ppd_gby_join.q,ppd_join.q,ppd_join2.q,ppd_join3.q,ppd_join4.q,ppd_join5.q,ppd_join_filter.q ppd_multi_insert.q,ppd_outer_join1.q,ppd_outer_join2.q,ppd_outer_join3.q,ppd_outer_join4.q,ppd_outer_join5.q,ppd_transform.q,reduce_deduplicate_exclude_join.q,router_join_ppr.q,sample10.q,sample8.q,script_pipe.q,semijoin.q,skewjoin.q,skewjoin_noskew.q,skewjoin_union_remove_1.q,skewjoin_union_remove_2.q,skewjoinopt1.q,skewjoinopt10.q,skewjoinopt11.q,skewjoinopt12.q,skewjoinopt13.q,skewjoinopt14.q,skewjoinopt15.q,skewjoinopt16.q,skewjoinopt17.q,skewjoinopt18.q,skewjoinopt19.q,skewjoinopt2.q,skewjoinopt20.q skewjoinopt3.q,skewjoinopt4.q,skewjoinopt5.q,skewjoinopt6.q,skewjoinopt7.q,skewjoinopt8.q,skewjoinopt9.q,smb_mapjoin9.q,smb_mapjoin_1.q,smb_mapjoin_10.q,smb_mapjoin_13.q,smb_mapjoin_14.q,smb_mapjoin_15.q,smb_mapjoin_16.q,smb_mapjoin_17.q,smb_mapjoin_2.q,smb_mapjoin_25.q,smb_mapjoin_3.q,smb_mapjoin_4.q,smb_mapjoin_5.q,smb_mapjoin_6.q,smb_mapjoin_7.q,sort_merge_join_desc_1.q,sort_merge_join_desc_2.q,sort_merge_join_desc_3.q,sort_merge_join_desc_4.q,sort_merge_join_desc_5.q,sort_merge_join_desc_6.q,sort_merge_join_desc_7.q,sort_merge_join_desc_8.q
[jira] [Updated] (HIVE-8774) CBO: enable groupBy index
[ https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-8774: -- Status: Open (was: Patch Available) CBO: enable groupBy index - Key: HIVE-8774 URL: https://issues.apache.org/jira/browse/HIVE-8774 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, HIVE-8774.5.patch, HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, HIVE-8774.9.patch Right now, even when groupby index is build, CBO is not able to use it. In this patch, we are trying to make it use groupby index that we build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8774) CBO: enable groupBy index
[ https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-8774: -- Attachment: HIVE-8774.11.patch address [~jpullokkaran]'s comments to remove support for constant and function inside parameters of count CBO: enable groupBy index - Key: HIVE-8774 URL: https://issues.apache.org/jira/browse/HIVE-8774 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, HIVE-8774.11.patch, HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, HIVE-8774.5.patch, HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, HIVE-8774.9.patch Right now, even when groupby index is build, CBO is not able to use it. In this patch, we are trying to make it use groupby index that we build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8774) CBO: enable groupBy index
[ https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-8774: -- Status: Patch Available (was: Open) CBO: enable groupBy index - Key: HIVE-8774 URL: https://issues.apache.org/jira/browse/HIVE-8774 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, HIVE-8774.11.patch, HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, HIVE-8774.5.patch, HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, HIVE-8774.9.patch Right now, even when groupby index is build, CBO is not able to use it. In this patch, we are trying to make it use groupby index that we build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)