[jira] [Commented] (HIVE-8915) Log file explosion due to non-existence of COMPACTION_QUEUE table
[ https://issues.apache.org/jira/browse/HIVE-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528428#comment-14528428 ] Hive QA commented on HIVE-8915: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12729641/HIVE-8915.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 8895 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3733/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3733/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3733/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12729641 - PreCommit-HIVE-TRUNK-Build Log file explosion due to non-existence of COMPACTION_QUEUE table - Key: HIVE-8915 URL: https://issues.apache.org/jira/browse/HIVE-8915 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0, 0.15.0, 0.14.1 Reporter: Sushanth Sowmyan Assignee: Alan Gates Attachments: HIVE-8915.patch I hit an issue with a fresh set up of hive in a vm, where I did not have db tables as specified by hive-txn-schema-0.14.0.mysql.sql created. On metastore startup, I got an endless loop of errors being populated to the log file, which caused the log file to grow to 1.7GB in 5 minutes, with 950k copies of the same error stack trace in it before I realized what was happening and killed it. We should either have a delay of sorts to make sure we don't endlessly respin on that error so quickly, or we should error out and fail if we're not able to start. The stack trace in question is as follows: {noformat} 2014-11-19 01:44:57,654 ERROR compactor.Cleaner (Cleaner.java:run(143)) - Caught an exception in the main loop of compactor cleaner, MetaException(message:Unable to connect to transaction database com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'hive.COMPACTION_QUEUE' doesn't exist at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.Util.getInstance(Util.java:386) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2619) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2569) at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1524) at com.jolbox.bonecp.StatementHandle.executeQuery(StatementHandle.java:464) at org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:266) at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86) ) at org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:291) at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.
[ https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528348#comment-14528348 ] Hive QA commented on HIVE-9736: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730299/HIVE-9736.6.patch {color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 8895 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3731/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3731/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3731/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 24 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12730299 - PreCommit-HIVE-TRUNK-Build StorageBasedAuthProvider should batch namenode-calls where possible. Key: HIVE-9736 URL: https://issues.apache.org/jira/browse/HIVE-9736 Project: Hive Issue Type: Bug Components: Metastore, Security Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch Consider a table partitioned by 2 keys (dt, region). Say a dt partition could have 1 associated regions. Consider that the user does: {code:sql} ALTER TABLE my_table DROP PARTITION (dt='20150101'); {code} As things stand now, {{StorageBasedAuthProvider}} will make individual {{DistributedFileSystem.listStatus()}} calls for each partition-directory, and authorize each one separately. It'd be faster to batch the calls, and examine multiple FileStatus objects at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)
[ https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reuben Kuhnert updated HIVE-10190: -- Attachment: HIVE-10190.09.patch CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE) - Key: HIVE-10190 URL: https://issues.apache.org/jira/browse/HIVE-10190 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Reuben Kuhnert Priority: Trivial Labels: perfomance Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, HIVE-10190.07.patch, HIVE-10190.08.patch, HIVE-10190.09.patch {code} public static boolean validateASTForUnsupportedTokens(ASTNode ast) { String astTree = ast.toStringTree(); // if any of following tokens are present in AST, bail out String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE }; for (String token : tokens) { if (astTree.contains(token)) { return false; } } return true; } {code} This is an issue for a SQL query which is bigger in AST form than in text (~700kb). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10552) hive 1.1.0 rename column fails: Invalid method name: 'alter_table_with_cascade'
[ https://issues.apache.org/jira/browse/HIVE-10552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Watzke resolved HIVE-10552. - Resolution: Invalid You're right, I haven't done that. Thanks for the tip. I don't have time right now to test it out but let's close this bug and I'll reopen it in case it doesn't help - but it will ;) hive 1.1.0 rename column fails: Invalid method name: 'alter_table_with_cascade' --- Key: HIVE-10552 URL: https://issues.apache.org/jira/browse/HIVE-10552 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 1.1.0 Environment: centos 6.6, cloudera 5.3.3 Reporter: David Watzke Assignee: Chaoyu Tang Priority: Blocker Hi, we're trying out hive 1.1.0 with cloudera 5.3.3 and since hive 1.0.0 there's (what appears to be) a regression. This ALTER command that renames a table column used to work fine in older versions but in hive 1.1.0 it does throws this error: hive CREATE TABLE test_change (a int, b int, c int); OK Time taken: 2.303 seconds hive ALTER TABLE test_change CHANGE a a1 INT; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. Invalid method name: 'alter_table_with_cascade' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10521) TxnHandler.timeOutTxns only times out some of the expired transactions
[ https://issues.apache.org/jira/browse/HIVE-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528353#comment-14528353 ] Hive QA commented on HIVE-10521: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12729817/HIVE-10521.3.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3732/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3732/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3732/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-3732/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 3f72f81 HIVE-5545 : HCatRecord getInteger method returns String when used on Partition columns of type INT (Sushanth Sowmyan, reviewed by Jason Dere) + git clean -f -d + git checkout master Already on 'master' + git reset --hard origin/master HEAD is now at 3f72f81 HIVE-5545 : HCatRecord getInteger method returns String when used on Partition columns of type INT (Sushanth Sowmyan, reviewed by Jason Dere) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12729817 - PreCommit-HIVE-TRUNK-Build TxnHandler.timeOutTxns only times out some of the expired transactions -- Key: HIVE-10521 URL: https://issues.apache.org/jira/browse/HIVE-10521 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0, 1.0.0, 1.1.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-10521.2.patch, HIVE-10521.3.patch, HIVE-10521.patch {code} for (int i = 0; i 20 rs.next(); i++) deadTxns.add(rs.getLong(1)); // We don't care whether all of the transactions get deleted or not, // if some didn't it most likely means someone else deleted them in the interum if (deadTxns.size() 0) abortTxns(dbConn, deadTxns); {code} While it makes sense to limit the number of transactions aborted in one pass (since this get's translated to an IN clause) we should still make sure all are timed out. Also, 20 seems pretty small as a batch size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10454) Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified.
[ https://issues.apache.org/jira/browse/HIVE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528436#comment-14528436 ] Aihua Xu commented on HIVE-10454: - It makes sense. Seems before I misunderstood Xuefu's point. I will resolve as won't fix then. Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified. --- Key: HIVE-10454 URL: https://issues.apache.org/jira/browse/HIVE-10454 Project: Hive Issue Type: Bug Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-10454.2.patch, HIVE-10454.patch The following queries fail: {noformat} create table t1 (c1 int) PARTITIONED BY (c2 string); set hive.mapred.mode=strict; select * from t1 where t1.c2 to_date(date_add(from_unixtime( unix_timestamp() ),1)); {noformat} The query failed with No partition predicate found for alias t1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10526) CBO (Calcite Return Path): HiveCost epsilon comparison should take row count in to account
[ https://issues.apache.org/jira/browse/HIVE-10526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529502#comment-14529502 ] Laljo John Pullokkaran commented on HIVE-10526: --- uploaded modified patch last week. For some reason QA run didn't kick in. CBO (Calcite Return Path): HiveCost epsilon comparison should take row count in to account -- Key: HIVE-10526 URL: https://issues.apache.org/jira/browse/HIVE-10526 Project: Hive Issue Type: Sub-task Components: CBO Affects Versions: 0.12.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 1.2.0 Attachments: HIVE-10526.1.patch, HIVE-10526.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9392: -- Attachment: (was: HIVE-9392.01.patch) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName Key: HIVE-9392 URL: https://issues.apache.org/jira/browse/HIVE-9392 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch In JoinStatsRule.process the join column statistics are stored in HashMap joinedColStats, the key used which is the ColStatistics.fqColName is duplicated between join column in the same vertex, as a result distinctVals ends up having duplicated values which negatively affects the join cardinality estimation. The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9392: -- Attachment: HIVE-9392.3.patch JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName Key: HIVE-9392 URL: https://issues.apache.org/jira/browse/HIVE-9392 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch In JoinStatsRule.process the join column statistics are stored in HashMap joinedColStats, the key used which is the ColStatistics.fqColName is duplicated between join column in the same vertex, as a result distinctVals ends up having duplicated values which negatively affects the join cardinality estimation. The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529626#comment-14529626 ] Pengcheng Xiong commented on HIVE-9392: --- rename the patch to get QA run. JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName Key: HIVE-9392 URL: https://issues.apache.org/jira/browse/HIVE-9392 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch In JoinStatsRule.process the join column statistics are stored in HashMap joinedColStats, the key used which is the ColStatistics.fqColName is duplicated between join column in the same vertex, as a result distinctVals ends up having duplicated values which negatively affects the join cardinality estimation. The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others
[ https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529676#comment-14529676 ] Thejas M Nair commented on HIVE-7018: - I think the change here was in the right direction, however it breaks the preferred way to upgrade hive (using schematool). This is a release blocker for 1.2.0. . A patch to revert the changes here has been uploaded to HIVE-10614 . I think we should go ahead with that, and reopen this jira after it is committed. Once the schematool/beeline breakage is fixed, this change can go back into hive. Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others - Key: HIVE-7018 URL: https://issues.apache.org/jira/browse/HIVE-7018 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Yongzhi Chen Fix For: 1.2.0 Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch It appears that at least postgres and oracle do not have the LINK_TARGET_ID column while mysql does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9534) incorrect result set for query that projects a windowed aggregate
[ https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529716#comment-14529716 ] Chaoyu Tang commented on HIVE-9534: --- Oracle 11.2 treats avg(distinct tsint.csint) over () as analytic function instead of aggregation function, so the query return 4 rows of returns 2.5. Note, there is not order by clause or window clause inside the parenthesis of over. Could you try query like select avg(distinct tsint.csint) over (order by rnum rows between 1 preceding and 1 following) from tsint to see if it works in Oracle c12? It did not work in 11.2. incorrect result set for query that projects a windowed aggregate - Key: HIVE-9534 URL: https://issues.apache.org/jira/browse/HIVE-9534 Project: Hive Issue Type: Bug Components: SQL Reporter: N Campbell Assignee: Chaoyu Tang Result set returned by Hive has one row instead of 5 {code} select avg(distinct tsint.csint) over () from tsint create table if not exists TSINT (RNUM int , CSINT smallint) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; 0|\N 1|-1 2|0 3|1 4|10 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly
[ https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529572#comment-14529572 ] Sushanth Sowmyan commented on HIVE-10565: - Hi Matt, who would be the ideal person to review this patch? LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly Key: HIVE-10565 URL: https://issues.apache.org/jira/browse/HIVE-10565 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 1.2.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, HIVE-10565.06.patch Filtering can knock out some of the rows for a repeated key, but those knocked out rows need to be included in the LEFT OUTER JOIN result and are currently not when only some rows are filtered out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10614) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure
[ https://issues.apache.org/jira/browse/HIVE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-10614: - Attachment: HIVE-10614.1.patch This happens because schematool runs via beeline and when there is a ; in the command, beeline interprets it as the command terminator. Stored procedures use ; as delimiter between statements, thus the entire stored procedure does not get send to mysql as a single command and hence the above error. I am uploading a patch to back out the fix for HIVE-7018 for now. Once we have the fix for HIVE-7018 working with schematool, we can add them back. The task mentioned in the previous line can be done via a follow up jira. cc-ing [~sushanth], [~thejas] for reviewing the change. Thanks Hari schemaTool upgrade from 0.14.0 to 1.3.0 causes failure -- Key: HIVE-10614 URL: https://issues.apache.org/jira/browse/HIVE-10614 Project: Hive Issue Type: Bug Components: Metastore Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Priority: Critical Attachments: HIVE-10614.1.patch ./schematool -dbType mysql -upgradeSchemaFrom 0.14.0 -verbose {code} ++--+ | | ++--+ | HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from Mysql for other DBs do not have it | ++--+ 1 row selected (0.004 seconds) 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS RM_TLBS_LINKID No rows affected (0.005 seconds) 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS RM_PARTITIONS_LINKID No rows affected (0.006 seconds) 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS RM_LINKID No rows affected (0.002 seconds) 0: jdbc:mysql://node-1.example.com/hive CREATE PROCEDURE RM_TLBS_LINKID() BEGIN IF EXISTS (SELECT * FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` = 'LINK_TARGET_ID') THEN ALTER TABLE `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; ALTER TABLE `TBLS` DROP COLUMN `LINK_TARGET_ID` ; END IF; END Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1 (state=42000,code=1064) Closing: 0: jdbc:mysql://node-1.example.com/hive?createDatabaseIfNotExist=true org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !! org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !! at org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:229) at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:468) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.io.IOException: Schema script failed, errorcode 2 at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:355) at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326) at org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:224) {code} Looks like HIVE-7018 has introduced stored procedure as part of mysql upgrade script and it is causing issues with schematool upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9743) Incorrect result set for vectorized left outer join
[ https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529775#comment-14529775 ] Matt McCline commented on HIVE-9743: [~vikram.dixit] Ok, SMB removed. I think this one is good to go as soon as the Apache tests pass. Incorrect result set for vectorized left outer join --- Key: HIVE-9743 URL: https://issues.apache.org/jira/browse/HIVE-9743 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.14.0 Reporter: N Campbell Assignee: Matt McCline Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, HIVE-9743.06.patch, HIVE-9743.08.patch, HIVE-9743.09.patch This query is supposed to return 3 rows and will when run without Tez but returns 2 rows when run with Tez. select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 15 ) tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 1 20 25 null 2 null 50 null instead of tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 0 10 15 null 1 20 25 null 2 null 50 null create table if not exists TJOIN1 (RNUM int , C1 int, C2 int) STORED AS orc ; 0|10|15 1|20|25 2|\N|50 create table if not exists TJOIN2 (RNUM int , C1 int, C2 char(2)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE ; 0|10|BB 1|15|DD 2|\N|EE 3|10|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch
[ https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Slawski updated HIVE-10538: - Attachment: HIVE-10538.2.patch I've attached the second revision of the patch which updates failed Spark qtests. Fix NPE in FileSinkOperator from hashcode mismatch -- Key: HIVE-10538 URL: https://issues.apache.org/jira/browse/HIVE-10538 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0, 1.2.0 Reporter: Peter Slawski Assignee: Peter Slawski Priority: Critical Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, HIVE-10538.1.patch, HIVE-10538.2.patch A Null Pointer Exception occurs when in FileSinkOperator when using bucketed tables and distribute by with multiFileSpray enabled. The following snippet query reproduces this issue: {code} set hive.enforce.bucketing = true; set hive.exec.reducers.max = 20; create table bucket_a(key int, value_a string) clustered by (key) into 256 buckets; create table bucket_b(key int, value_b string) clustered by (key) into 256 buckets; create table bucket_ab(key int, value_a string, value_b string) clustered by (key) into 256 buckets; -- Insert data into bucket_a and bucket_b insert overwrite table bucket_ab select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key = b.key) distribute by key; {code} The following stack trace is logged. {code} 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer (ExecReducer.java:reduce(255)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) ... 8 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9451) Add max size of column dictionaries to ORC metadata
[ https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529566#comment-14529566 ] Sushanth Sowmyan commented on HIVE-9451: Hi, given the previous +1 pending tests, and tests having run, do the tests look okay to commit? Add max size of column dictionaries to ORC metadata --- Key: HIVE-9451 URL: https://issues.apache.org/jira/browse/HIVE-9451 Project: Hive Issue Type: Improvement Reporter: Owen O'Malley Assignee: Owen O'Malley Labels: ORC Fix For: 1.2.0 Attachments: HIVE-9451.patch, HIVE-9451.patch To predict the amount of memory required to read an ORC file we need to know the size of the dictionaries for the columns that we are reading. I propose adding the number of bytes for each column's dictionary to the stripe's column statistics. The file's column statistics would have the maximum dictionary size for each column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9534) incorrect result set for query that projects a windowed aggregate
[ https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529608#comment-14529608 ] N Campbell commented on HIVE-9534: -- re your comment about ORACLE select avg(distinct tsint.csint) over () from tsint null, -1, 0, 1, 10 ORACLE Oracle Database 12c Enterprise Edition ( 12.1.0.2.0) returns 2.5, 2.5, 2.5, 2.5, 2.5 incorrect result set for query that projects a windowed aggregate - Key: HIVE-9534 URL: https://issues.apache.org/jira/browse/HIVE-9534 Project: Hive Issue Type: Bug Components: SQL Reporter: N Campbell Assignee: Chaoyu Tang Result set returned by Hive has one row instead of 5 {code} select avg(distinct tsint.csint) over () from tsint create table if not exists TSINT (RNUM int , CSINT smallint) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; 0|\N 1|-1 2|0 3|1 4|10 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10617) LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to partitioned locking
[ https://issues.apache.org/jira/browse/HIVE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10617: Summary: LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to partitioned locking (was: fix allocator concurrency rarely causing spurious failure to allocate due to partitioned locking) LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to partitioned locking Key: HIVE-10617 URL: https://issues.apache.org/jira/browse/HIVE-10617 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin See HIVE-10482 and the comment in code. Simple case - thread can reserve memory from manager and bounce between checking arena 1 and arena 2 for memory as other threads allocate and deallocate from respective arenas in reverse order, making it look like there's no memory. More importantly this can happen when buddy blocks are split when lots of stuff is allocated. This can be solved either with some form of helping (esp. for split case) or by making allocator an actor (or set of actors, one per 1-N arenas that they would own), to satisfy alloc requests more deterministically (and also get rid of most sync). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch
[ https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529690#comment-14529690 ] Prasanth Jayachandran commented on HIVE-10538: -- The result difference seems to be an expected change because of hashcode difference. [~petersla] Can you put an updated patch by running the tests again with -Dtest.output.overwrite=true option? This will overwrite the q.out files. Fix NPE in FileSinkOperator from hashcode mismatch -- Key: HIVE-10538 URL: https://issues.apache.org/jira/browse/HIVE-10538 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0, 1.2.0 Reporter: Peter Slawski Assignee: Peter Slawski Priority: Critical Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, HIVE-10538.1.patch A Null Pointer Exception occurs when in FileSinkOperator when using bucketed tables and distribute by with multiFileSpray enabled. The following snippet query reproduces this issue: {code} set hive.enforce.bucketing = true; set hive.exec.reducers.max = 20; create table bucket_a(key int, value_a string) clustered by (key) into 256 buckets; create table bucket_b(key int, value_b string) clustered by (key) into 256 buckets; create table bucket_ab(key int, value_a string, value_b string) clustered by (key) into 256 buckets; -- Insert data into bucket_a and bucket_b insert overwrite table bucket_ab select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key = b.key) distribute by key; {code} The following stack trace is logged. {code} 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer (ExecReducer.java:reduce(255)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) ... 8 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9743) Incorrect result set for vectorized left outer join
[ https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-9743: --- Attachment: HIVE-9743.09.patch Incorrect result set for vectorized left outer join --- Key: HIVE-9743 URL: https://issues.apache.org/jira/browse/HIVE-9743 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.14.0 Reporter: N Campbell Assignee: Matt McCline Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, HIVE-9743.06.patch, HIVE-9743.08.patch, HIVE-9743.09.patch This query is supposed to return 3 rows and will when run without Tez but returns 2 rows when run with Tez. select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 15 ) tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 1 20 25 null 2 null 50 null instead of tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 0 10 15 null 1 20 25 null 2 null 50 null create table if not exists TJOIN1 (RNUM int , C1 int, C2 int) STORED AS orc ; 0|10|15 1|20|25 2|\N|50 create table if not exists TJOIN2 (RNUM int , C1 int, C2 char(2)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE ; 0|10|BB 1|15|DD 2|\N|EE 3|10|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10591) Support limited integer type promotion in ORC
[ https://issues.apache.org/jira/browse/HIVE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529491#comment-14529491 ] Hive QA commented on HIVE-10591: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730595/HIVE-10591.2.patch {color:red}ERROR:{color} -1 due to 53 failed/errored test(s), 8901 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_non_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_tmp_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_no_match org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_non_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_update_delete org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_acid org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_all_non_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_tmp_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_where_no_match org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_where_non_partitioned org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_non_partitioned org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_tmp_table org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_no_match org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_non_partitioned org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_update_delete org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_all_non_partitioned org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_tmp_table org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_where_no_match org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_where_non_partitioned org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc org.apache.hadoop.hive.ql.TestTxnCommands2.testBucketizedInputFormat org.apache.hadoop.hive.ql.TestTxnCommands2.testDeleteIn org.apache.hadoop.hive.ql.TestTxnCommands2.testUpdateMixedCase org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.majorCompactAfterAbort
[jira] [Commented] (HIVE-10506) CBO (Calcite Return Path): Disallow return path to be enable if CBO is off
[ https://issues.apache.org/jira/browse/HIVE-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529489#comment-14529489 ] Laljo John Pullokkaran commented on HIVE-10506: --- +1 CBO (Calcite Return Path): Disallow return path to be enable if CBO is off -- Key: HIVE-10506 URL: https://issues.apache.org/jira/browse/HIVE-10506 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 Attachments: HIVE-10506.01.patch, HIVE-10506.patch If hive.cbo.enable=false and hive.cbo.returnpath=true then some optimizations would kick in. It's quite possible that in customer environment, they might end up in these scenarios; we should prevent it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-10564) webhcat should use webhcat-site.xml properties for controller job submission
[ https://issues.apache.org/jira/browse/HIVE-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reopened HIVE-10564: --- Unfortunately this has unexpected side effects. Every time a job is submitted, various properties are passed in cmd line using -Dfoo=bar This change causes AppConfig Configuration object to accumulate the union of all these properties so Job N+1 includes properties that belong previous jobs. for example, if you run a job with -D, templeton.statusdir=TestSqoop_1 and then another job that does not specify statusdir, the 2nd job will write to TestSqoop_1 this will cause a major problem webhcat should use webhcat-site.xml properties for controller job submission Key: HIVE-10564 URL: https://issues.apache.org/jira/browse/HIVE-10564 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-10564.1.patch webhcat should use webhcat-site.xml in configuration for the TempletonController map-only job that it launches. This will allow users to set any MR/hdfs properties that want to see used for the controller job. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529623#comment-14529623 ] Pengcheng Xiong commented on HIVE-9392: --- [~mmokhtar], could you please take a look? Thanks. JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName Key: HIVE-9392 URL: https://issues.apache.org/jira/browse/HIVE-9392 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-9392.01.patch, HIVE-9392.1.patch, HIVE-9392.2.patch In JoinStatsRule.process the join column statistics are stored in HashMap joinedColStats, the key used which is the ColStatistics.fqColName is duplicated between join column in the same vertex, as a result distinctVals ends up having duplicated values which negatively affects the join cardinality estimation. The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10617) LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to partitioned locking
[ https://issues.apache.org/jira/browse/HIVE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10617: Description: See HIVE-10482 and the comment in code. Right now this is worked around by retrying. Simple case - thread can reserve memory from manager and bounce between checking arena 1 and arena 2 for memory as other threads allocate and deallocate from respective arenas in reverse order, making it look like there's no memory. More importantly this can happen when buddy blocks are split when lots of stuff is allocated. This can be solved either with some form of helping (esp. for split case) or by making allocator an actor (or set of actors, one per 1-N arenas that they would own), to satisfy alloc requests more deterministically (and also get rid of most sync). was: See HIVE-10482 and the comment in code. Simple case - thread can reserve memory from manager and bounce between checking arena 1 and arena 2 for memory as other threads allocate and deallocate from respective arenas in reverse order, making it look like there's no memory. More importantly this can happen when buddy blocks are split when lots of stuff is allocated. This can be solved either with some form of helping (esp. for split case) or by making allocator an actor (or set of actors, one per 1-N arenas that they would own), to satisfy alloc requests more deterministically (and also get rid of most sync). LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to partitioned locking Key: HIVE-10617 URL: https://issues.apache.org/jira/browse/HIVE-10617 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin See HIVE-10482 and the comment in code. Right now this is worked around by retrying. Simple case - thread can reserve memory from manager and bounce between checking arena 1 and arena 2 for memory as other threads allocate and deallocate from respective arenas in reverse order, making it look like there's no memory. More importantly this can happen when buddy blocks are split when lots of stuff is allocated. This can be solved either with some form of helping (esp. for split case) or by making allocator an actor (or set of actors, one per 1-N arenas that they would own), to satisfy alloc requests more deterministically (and also get rid of most sync). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10617) LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to partitioned locking
[ https://issues.apache.org/jira/browse/HIVE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-10617: --- Assignee: Sergey Shelukhin LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to partitioned locking Key: HIVE-10617 URL: https://issues.apache.org/jira/browse/HIVE-10617 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin See HIVE-10482 and the comment in code. Right now this is worked around by retrying. Simple case - thread can reserve memory from manager and bounce between checking arena 1 and arena 2 for memory as other threads allocate and deallocate from respective arenas in reverse order, making it look like there's no memory. More importantly this can happen when buddy blocks are split when lots of stuff is allocated. This can be solved either with some form of helping (esp. for split case) or by making allocator an actor (or set of actors, one per 1-N arenas that they would own), to satisfy alloc requests more deterministically (and also get rid of most sync). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10595) Dropping a table can cause NPEs in the compactor
[ https://issues.apache.org/jira/browse/HIVE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529725#comment-14529725 ] Eugene Koifman commented on HIVE-10595: --- I'm not sure I understand how this works. The Initiator (if the table/partition is no longer there) will not add anything to compaction queue. So then there is nothing for Worker/Cleaner to do in this case. How will data from TXNS, COMPLETED_TXN_COMPONENTS, TXN_COMPONENTS which relates to these table get cleaned up? Dropping a table can cause NPEs in the compactor Key: HIVE-10595 URL: https://issues.apache.org/jira/browse/HIVE-10595 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.1.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-10595.patch Reproduction: # start metastore with compactor off # insert enough entries in a table to trigger a compaction # drop the table # stop metastore # restart metastore with compactor on Result: NPE in the compactor threads. I suspect this would also happen if the inserts and drops were done in between a run of the compactor, but I haven't proven it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10482) LLAP: AsertionError cannot allocate when reading from orc
[ https://issues.apache.org/jira/browse/HIVE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529493#comment-14529493 ] Sergey Shelukhin commented on HIVE-10482: - This happens when BuddyAllocator has one block of memory larger than target allocation. When memory is reserved and several threads go to allocate, they go from target size and then try to split larger sizes. If several threads try to split the block at the same time, one will split and re-add the remainder to lower level lists (e.g. 768k out of 1Mb block, after using 256k, will be added as one 512k block and one 256k block), but when the split is done, the others are waiting on the lock for the 1Mb-block list and will never again look at lower level lists. There are several ways to fix this; adding some sort of helping to get threads to provide blocks to other threads after split is very complex (many special cases) and may have perf overhead in common case, plus in general case it may not solve similar issues e.g. with multiple arenas, where we examine full arena 1, then go to non-full arena 2, meanwhile someone allocates from 2 and deallocates to 1, so we are screwed again; making allocator use actor-like model (removing all sync and having allocator thread that serves request queue); a retry loop that would retry as long as any changes have happened since last attempt. Not sure yet if 2 or 3 are best. LLAP: AsertionError cannot allocate when reading from orc - Key: HIVE-10482 URL: https://issues.apache.org/jira/browse/HIVE-10482 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Sergey Shelukhin Fix For: llap This was from a run of tpch query 1. [~sershe] - not sure if you've already seen this. Creating a jira so that it doesn't get lost. {code} 2015-04-24 13:11:54,180 [TezTaskRunner_attempt_1429683757595_0326_4_00_000199_0(container_1_0326_01_003216_sseth_20150424131137_8ec6200c-77c8-43ea-a6a3-a0ab1da6e1ac:4_Map 1_199_0)] ERROR org.apache.hadoop.hive.ql.exec.tez.TezProcessor: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: java.lang.AssertionError: Cannot allocate at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:314) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:329) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: java.io.IOException: java.lang.AssertionError: Cannot allocate at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137) at
[jira] [Updated] (HIVE-10526) CBO (Calcite Return Path): HiveCost epsilon comparison should take row count in to account
[ https://issues.apache.org/jira/browse/HIVE-10526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10526: Attachment: HIVE-10526.2.patch Reuploading .1.patch as .2.patch CBO (Calcite Return Path): HiveCost epsilon comparison should take row count in to account -- Key: HIVE-10526 URL: https://issues.apache.org/jira/browse/HIVE-10526 Project: Hive Issue Type: Sub-task Components: CBO Affects Versions: 0.12.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 1.2.0 Attachments: HIVE-10526.1.patch, HIVE-10526.2.patch, HIVE-10526.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable
[ https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6679: --- Affects Version/s: 1.2.0 HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable -- Key: HIVE-6679 URL: https://issues.apache.org/jira/browse/HIVE-6679 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0 Reporter: Prasad Mujumdar Assignee: Navis Labels: TODOC1.0, TODOC15 Fix For: 1.2.0 Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, HIVE-6679.3.patch, HIVE-6679.4.patch, HIVE-6679.5.patch, HIVE-6679.6.patch HiveServer2 should support configurable the server side socket read timeout and TCP keep-alive option. Metastore server already support this (and the so is the old hive server). We now have multiple client connectivity options like Kerberos, Delegation Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The configuration should be applicable to all types (if possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable
[ https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6679: --- Affects Version/s: 1.1.0 1.0.0 HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable -- Key: HIVE-6679 URL: https://issues.apache.org/jira/browse/HIVE-6679 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0 Reporter: Prasad Mujumdar Assignee: Navis Labels: TODOC1.0, TODOC15 Fix For: 1.2.0 Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, HIVE-6679.3.patch, HIVE-6679.4.patch, HIVE-6679.5.patch, HIVE-6679.6.patch HiveServer2 should support configurable the server side socket read timeout and TCP keep-alive option. Metastore server already support this (and the so is the old hive server). We now have multiple client connectivity options like Kerberos, Delegation Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The configuration should be applicable to all types (if possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable
[ https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6679: --- Fix Version/s: (was: 1.1.0) 1.2.0 HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable -- Key: HIVE-6679 URL: https://issues.apache.org/jira/browse/HIVE-6679 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0 Reporter: Prasad Mujumdar Assignee: Navis Labels: TODOC1.0, TODOC15 Fix For: 1.2.0 Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, HIVE-6679.3.patch, HIVE-6679.4.patch, HIVE-6679.5.patch, HIVE-6679.6.patch HiveServer2 should support configurable the server side socket read timeout and TCP keep-alive option. Metastore server already support this (and the so is the old hive server). We now have multiple client connectivity options like Kerberos, Delegation Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The configuration should be applicable to all types (if possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10482) LLAP: AsertionError cannot allocate when reading from orc
[ https://issues.apache.org/jira/browse/HIVE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-10482. - Resolution: Fixed committed a workaround LLAP: AsertionError cannot allocate when reading from orc - Key: HIVE-10482 URL: https://issues.apache.org/jira/browse/HIVE-10482 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Sergey Shelukhin Fix For: llap This was from a run of tpch query 1. [~sershe] - not sure if you've already seen this. Creating a jira so that it doesn't get lost. {code} 2015-04-24 13:11:54,180 [TezTaskRunner_attempt_1429683757595_0326_4_00_000199_0(container_1_0326_01_003216_sseth_20150424131137_8ec6200c-77c8-43ea-a6a3-a0ab1da6e1ac:4_Map 1_199_0)] ERROR org.apache.hadoop.hive.ql.exec.tez.TezProcessor: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: java.lang.AssertionError: Cannot allocate at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:314) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:329) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: java.io.IOException: java.lang.AssertionError: Cannot allocate at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62) ... 16 more Caused by: java.io.IOException: java.lang.AssertionError: Cannot allocate at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:257) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:209) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:147) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:97) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) ... 22 more Caused by: java.lang.AssertionError: Cannot allocate at org.apache.hadoop.hive.ql.io.orc.InStream.readEncodedStream(InStream.java:761) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:441) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:294) at
[jira] [Commented] (HIVE-10614) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure
[ https://issues.apache.org/jira/browse/HIVE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529684#comment-14529684 ] Thejas M Nair commented on HIVE-10614: -- +1 for current patch, it would work with 1.2 branch. We need another one for master (that also has similar change for hive-schema-1.3.0.mysql.sql) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure -- Key: HIVE-10614 URL: https://issues.apache.org/jira/browse/HIVE-10614 Project: Hive Issue Type: Bug Components: Metastore Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Priority: Critical Attachments: HIVE-10614.1.patch ./schematool -dbType mysql -upgradeSchemaFrom 0.14.0 -verbose {code} ++--+ | | ++--+ | HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from Mysql for other DBs do not have it | ++--+ 1 row selected (0.004 seconds) 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS RM_TLBS_LINKID No rows affected (0.005 seconds) 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS RM_PARTITIONS_LINKID No rows affected (0.006 seconds) 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS RM_LINKID No rows affected (0.002 seconds) 0: jdbc:mysql://node-1.example.com/hive CREATE PROCEDURE RM_TLBS_LINKID() BEGIN IF EXISTS (SELECT * FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` = 'LINK_TARGET_ID') THEN ALTER TABLE `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; ALTER TABLE `TBLS` DROP COLUMN `LINK_TARGET_ID` ; END IF; END Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1 (state=42000,code=1064) Closing: 0: jdbc:mysql://node-1.example.com/hive?createDatabaseIfNotExist=true org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !! org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !! at org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:229) at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:468) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.io.IOException: Schema script failed, errorcode 2 at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:355) at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326) at org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:224) {code} Looks like HIVE-7018 has introduced stored procedure as part of mysql upgrade script and it is causing issues with schematool upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9392: -- Attachment: HIVE-9392.4.patch JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName Key: HIVE-9392 URL: https://issues.apache.org/jira/browse/HIVE-9392 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Priority: Critical Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, HIVE-9392.4.patch In JoinStatsRule.process the join column statistics are stored in HashMap joinedColStats, the key used which is the ColStatistics.fqColName is duplicated between join column in the same vertex, as a result distinctVals ends up having duplicated values which negatively affects the join cardinality estimation. The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10620) ZooKeeperHiveLock overrides equal() method but not hashcode()
[ https://issues.apache.org/jira/browse/HIVE-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-10620: --- Description: ZooKeeperHiveLock overrides the public boolean equals(Object o) method but does not for public int hashCode(). It violates the Java contract and may cause unexpected results. (was: ZooKeeperHiveLock overrides the public boolean equals(Object o) method but does not for public int hashCode(). It violates the Java contract that equal and may cause unexpected results.) ZooKeeperHiveLock overrides equal() method but not hashcode() - Key: HIVE-10620 URL: https://issues.apache.org/jira/browse/HIVE-10620 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang ZooKeeperHiveLock overrides the public boolean equals(Object o) method but does not for public int hashCode(). It violates the Java contract and may cause unexpected results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10620) ZooKeeperHiveLock overrides equal() method but not hashcode()
[ https://issues.apache.org/jira/browse/HIVE-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-10620: --- Attachment: HIVE-10620.patch [~szehon] [~ashutoshc] could you review the code? Thanks ZooKeeperHiveLock overrides equal() method but not hashcode() - Key: HIVE-10620 URL: https://issues.apache.org/jira/browse/HIVE-10620 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Attachments: HIVE-10620.patch ZooKeeperHiveLock overrides the public boolean equals(Object o) method but does not for public int hashCode(). It violates the Java contract and may cause unexpected results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)
[ https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong reassigned HIVE-8769: - Assignee: Pengcheng Xiong (was: Prasanth Jayachandran) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected) -- Key: HIVE-8769 URL: https://issues.apache.org/jira/browse/HIVE-8769 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Fix For: 1.2.0 TPC-DS Q82 is running slower than hive 13 because the join type is not correct. The estimate for item x inventory x date_dim is 227 Million rows while the actual is 3K rows. Hive 13 finishes in 753 seconds. Hive 14 finishes in 1,267 seconds. Hive 14 + force map join finished in 431 seconds. Query {code} select i_item_id ,i_item_desc ,i_current_price from item, inventory, date_dim, store_sales where i_current_price between 30 and 30+30 and inv_item_sk = i_item_sk and d_date_sk=inv_date_sk and d_date between '2002-05-30' and '2002-07-30' and i_manufact_id in (437,129,727,663) and inv_quantity_on_hand between 100 and 500 and ss_item_sk = i_item_sk group by i_item_id,i_item_desc,i_current_price order by i_item_id limit 100 {code} Plan {code} STAGE PLANS: Stage: Stage-1 Tez Edges: Map 7 - Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE) Reducer 4 - Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE) Reducer 5 - Reducer 4 (SIMPLE_EDGE) Reducer 6 - Reducer 5 (SIMPLE_EDGE) DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1 Vertices: Map 1 Map Operator Tree: TableScan alias: item filterExpr: ((i_current_price BETWEEN 30 AND 60 and (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: boolean) Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((i_current_price BETWEEN 30 AND 60 and (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: boolean) Statistics: Num rows: 115500 Data size: 34185680 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: i_item_sk (type: int), i_item_id (type: string), i_item_desc (type: string), i_current_price (type: float) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 115500 Data size: 33724832 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 115500 Data size: 33724832 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: string), _col2 (type: string), _col3 (type: float) Execution mode: vectorized Map 2 Map Operator Tree: TableScan alias: date_dim filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' and d_date_sk is not null) (type: boolean) Statistics: Num rows: 36524 Data size: 3579352 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: d_date_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int) outputColumnNames: _col0 Statistics: Num rows: 36524 Data size: 146096 Basic stats:
[jira] [Commented] (HIVE-9451) Add max size of column dictionaries to ORC metadata
[ https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529570#comment-14529570 ] Prasanth Jayachandran commented on HIVE-9451: - No.. The test failures looks related. [~owen.omalley] Can you take a look at the test failures? I am assuming all these are related to file size differences. Add max size of column dictionaries to ORC metadata --- Key: HIVE-9451 URL: https://issues.apache.org/jira/browse/HIVE-9451 Project: Hive Issue Type: Improvement Reporter: Owen O'Malley Assignee: Owen O'Malley Labels: ORC Fix For: 1.2.0 Attachments: HIVE-9451.patch, HIVE-9451.patch To predict the amount of memory required to read an ORC file we need to know the size of the dictionaries for the columns that we are reading. I propose adding the number of bytes for each column's dictionary to the stripe's column statistics. The file's column statistics would have the maximum dictionary size for each column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable
[ https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529631#comment-14529631 ] Thejas M Nair commented on HIVE-6679: - +1 . Just a minor comment. Can you also update the description of HIVE_SERVER2_TCP_SOCKET_BLOCKING_TIMEOUT to say that its applicable only in binary mode, and for http mode, the equivalent is hive.server2.thrift.http.max.idle.time? HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable -- Key: HIVE-6679 URL: https://issues.apache.org/jira/browse/HIVE-6679 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0 Reporter: Prasad Mujumdar Assignee: Navis Labels: TODOC1.0, TODOC15 Fix For: 1.2.0 Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, HIVE-6679.3.patch, HIVE-6679.4.patch, HIVE-6679.5.patch, HIVE-6679.6.patch HiveServer2 should support configurable the server side socket read timeout and TCP keep-alive option. Metastore server already support this (and the so is the old hive server). We now have multiple client connectivity options like Kerberos, Delegation Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The configuration should be applicable to all types (if possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10616) TypeInfoUtils doesn't handle DECIMAL with just precision specified
[ https://issues.apache.org/jira/browse/HIVE-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Friedrich updated HIVE-10616: Attachment: HIVE-10616.1.patch TypeInfoUtils doesn't handle DECIMAL with just precision specified -- Key: HIVE-10616 URL: https://issues.apache.org/jira/browse/HIVE-10616 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.0.0 Reporter: Thomas Friedrich Assignee: Thomas Friedrich Priority: Minor Attachments: HIVE-10616.1.patch The parseType method in TypeInfoUtils doesn't handle decimal types with just precision specified although that's a valid type definition. As a result, TypeInfoUtils.getTypeInfoFromTypeString will always return decimal(10,0) for any decimal(precision) string. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10614) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure
[ https://issues.apache.org/jira/browse/HIVE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529657#comment-14529657 ] Hive QA commented on HIVE-10614: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730668/HIVE-10614.1.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-METASTORE-Test/43/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-METASTORE-Test/43/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-METASTORE-Test-43/ This message is automatically generated. ATTACHMENT ID: 12730668 - PreCommit-HIVE-METASTORE-Test schemaTool upgrade from 0.14.0 to 1.3.0 causes failure -- Key: HIVE-10614 URL: https://issues.apache.org/jira/browse/HIVE-10614 Project: Hive Issue Type: Bug Components: Metastore Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Priority: Critical Attachments: HIVE-10614.1.patch ./schematool -dbType mysql -upgradeSchemaFrom 0.14.0 -verbose {code} ++--+ | | ++--+ | HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from Mysql for other DBs do not have it | ++--+ 1 row selected (0.004 seconds) 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS RM_TLBS_LINKID No rows affected (0.005 seconds) 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS RM_PARTITIONS_LINKID No rows affected (0.006 seconds) 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS RM_LINKID No rows affected (0.002 seconds) 0: jdbc:mysql://node-1.example.com/hive CREATE PROCEDURE RM_TLBS_LINKID() BEGIN IF EXISTS (SELECT * FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` = 'LINK_TARGET_ID') THEN ALTER TABLE `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; ALTER TABLE `TBLS` DROP COLUMN `LINK_TARGET_ID` ; END IF; END Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1 (state=42000,code=1064) Closing: 0: jdbc:mysql://node-1.example.com/hive?createDatabaseIfNotExist=true org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !! org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !! at org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:229) at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:468) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.io.IOException: Schema script failed, errorcode 2 at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:355) at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326) at org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:224) {code} Looks like HIVE-7018 has introduced stored procedure as part of mysql upgrade script and it is causing issues with schematool upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10614) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure
[ https://issues.apache.org/jira/browse/HIVE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-10614: - Attachment: HIVE-10614.1.master.patch [~thejas] Thanks for the review, added HIVE-10614.1.master.patch for the master branch schemaTool upgrade from 0.14.0 to 1.3.0 causes failure -- Key: HIVE-10614 URL: https://issues.apache.org/jira/browse/HIVE-10614 Project: Hive Issue Type: Bug Components: Metastore Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Priority: Critical Attachments: HIVE-10614.1.master.patch, HIVE-10614.1.patch ./schematool -dbType mysql -upgradeSchemaFrom 0.14.0 -verbose {code} ++--+ | | ++--+ | HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from Mysql for other DBs do not have it | ++--+ 1 row selected (0.004 seconds) 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS RM_TLBS_LINKID No rows affected (0.005 seconds) 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS RM_PARTITIONS_LINKID No rows affected (0.006 seconds) 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS RM_LINKID No rows affected (0.002 seconds) 0: jdbc:mysql://node-1.example.com/hive CREATE PROCEDURE RM_TLBS_LINKID() BEGIN IF EXISTS (SELECT * FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` = 'LINK_TARGET_ID') THEN ALTER TABLE `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; ALTER TABLE `TBLS` DROP COLUMN `LINK_TARGET_ID` ; END IF; END Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1 (state=42000,code=1064) Closing: 0: jdbc:mysql://node-1.example.com/hive?createDatabaseIfNotExist=true org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !! org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !! at org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:229) at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:468) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.io.IOException: Schema script failed, errorcode 2 at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:355) at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326) at org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:224) {code} Looks like HIVE-7018 has introduced stored procedure as part of mysql upgrade script and it is causing issues with schematool upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch
[ https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529707#comment-14529707 ] Peter Slawski commented on HIVE-10538: -- Great, I've been working on just that. I'll be able to posted an updated patch tomorrow. Fix NPE in FileSinkOperator from hashcode mismatch -- Key: HIVE-10538 URL: https://issues.apache.org/jira/browse/HIVE-10538 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0, 1.2.0 Reporter: Peter Slawski Assignee: Peter Slawski Priority: Critical Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, HIVE-10538.1.patch A Null Pointer Exception occurs when in FileSinkOperator when using bucketed tables and distribute by with multiFileSpray enabled. The following snippet query reproduces this issue: {code} set hive.enforce.bucketing = true; set hive.exec.reducers.max = 20; create table bucket_a(key int, value_a string) clustered by (key) into 256 buckets; create table bucket_b(key int, value_b string) clustered by (key) into 256 buckets; create table bucket_ab(key int, value_a string, value_b string) clustered by (key) into 256 buckets; -- Insert data into bucket_a and bucket_b insert overwrite table bucket_ab select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key = b.key) distribute by key; {code} The following stack trace is logged. {code} 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer (ExecReducer.java:reduce(255)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) ... 8 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly
[ https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-10565: Attachment: HIVE-10565.07.patch LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly Key: HIVE-10565 URL: https://issues.apache.org/jira/browse/HIVE-10565 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 1.2.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, HIVE-10565.06.patch, HIVE-10565.07.patch Filtering can knock out some of the rows for a repeated key, but those knocked out rows need to be included in the LEFT OUTER JOIN result and are currently not when only some rows are filtered out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10539) set default value of hive.repl.task.factory
[ https://issues.apache.org/jira/browse/HIVE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529819#comment-14529819 ] Hive QA commented on HIVE-10539: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730349/HIVE-10539.3.patch {color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 8900 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3745/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3745/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3745/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 24 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12730349 - PreCommit-HIVE-TRUNK-Build set default value of hive.repl.task.factory --- Key: HIVE-10539 URL: https://issues.apache.org/jira/browse/HIVE-10539 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-10539.1.patch, HIVE-10539.2.patch, HIVE-10539.3.patch hive.repl.task.factory does not have a default value set. It should be set to org.apache.hive.hcatalog.api.repl.exim.EximReplicationTaskFactory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10618) Fix invocation of toString on byteArray in VerifyFast (250, 254)
[ https://issues.apache.org/jira/browse/HIVE-10618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10618: --- Attachment: rb33877.patch patch #1 Fix invocation of toString on byteArray in VerifyFast (250, 254) Key: HIVE-10618 URL: https://issues.apache.org/jira/browse/HIVE-10618 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: rb33877.patch Arrays.toString(byteArray) can be used to convert byte[] to string -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9743) Incorrect result set for vectorized left outer join
[ https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529560#comment-14529560 ] Vikram Dixit K commented on HIVE-9743: -- That seems to be because with SMB there seems to be full delegation to the base class. I am not sure if we need the SMB changes at all. Incorrect result set for vectorized left outer join --- Key: HIVE-9743 URL: https://issues.apache.org/jira/browse/HIVE-9743 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.14.0 Reporter: N Campbell Assignee: Matt McCline Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, HIVE-9743.06.patch, HIVE-9743.08.patch This query is supposed to return 3 rows and will when run without Tez but returns 2 rows when run with Tez. select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 15 ) tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 1 20 25 null 2 null 50 null instead of tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 0 10 15 null 1 20 25 null 2 null 50 null create table if not exists TJOIN1 (RNUM int , C1 int, C2 int) STORED AS orc ; 0|10|15 1|20|25 2|\N|50 create table if not exists TJOIN2 (RNUM int , C1 int, C2 char(2)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE ; 0|10|BB 1|15|DD 2|\N|EE 3|10|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9451) Add max size of column dictionaries to ORC metadata
[ https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529580#comment-14529580 ] Sushanth Sowmyan commented on HIVE-9451: Okay, thanks for the update. Will wait to hear more. :) Add max size of column dictionaries to ORC metadata --- Key: HIVE-9451 URL: https://issues.apache.org/jira/browse/HIVE-9451 Project: Hive Issue Type: Improvement Reporter: Owen O'Malley Assignee: Owen O'Malley Labels: ORC Fix For: 1.2.0 Attachments: HIVE-9451.patch, HIVE-9451.patch To predict the amount of memory required to read an ORC file we need to know the size of the dictionaries for the columns that we are reading. I propose adding the number of bytes for each column's dictionary to the stripe's column statistics. The file's column statistics would have the maximum dictionary size for each column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10547) CBO (Calcite Return Path) : genFileSinkPlan uses wrong partition col to create FS
[ https://issues.apache.org/jira/browse/HIVE-10547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong resolved HIVE-10547. Resolution: Fixed CBO (Calcite Return Path) : genFileSinkPlan uses wrong partition col to create FS - Key: HIVE-10547 URL: https://issues.apache.org/jira/browse/HIVE-10547 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Fix For: 1.2.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9392: -- Attachment: HIVE-9392.01.patch After discussing with [~jpullokkaran], we assume that this patch will solve the problem. And we already tried TPCDS 70,89 to confirm JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName Key: HIVE-9392 URL: https://issues.apache.org/jira/browse/HIVE-9392 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-9392.01.patch, HIVE-9392.1.patch, HIVE-9392.2.patch In JoinStatsRule.process the join column statistics are stored in HashMap joinedColStats, the key used which is the ColStatistics.fqColName is duplicated between join column in the same vertex, as a result distinctVals ends up having duplicated values which negatively affects the join cardinality estimation. The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch
[ https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529673#comment-14529673 ] Peter Slawski commented on HIVE-10538: -- The Spark driver failures are caused by this change. This would be expected if a row's hashcode affected its ordering in Spark. This patch makes it so that HiveKey's hashcode outputted from ReduceSinkOperator is no longer always multiplied by 31 (as explained previously). Also, for at least those failed qtests, the row ordering/output in the expected output differs across MapRed, Tez, and Spark. So, execution engine affects ordering. From [spark/groupby_complex_types_multi_single_reducer.q.out#L221|https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/spark/groupby_complex_types_multi_single_reducer.q.out#L221] {code} POSTHOOK: query: SELECT DEST2.* FROM DEST2 POSTHOOK: type: QUERY POSTHOOK: Input: default@dest2 A masked pattern was here {120:val_120} 2 {129:val_129} 2 {160:val_160} 1 {26:val_26} 2 {27:val_27} 1 {288:val_288} 2 {298:val_298} 3 {30:val_30} 1 {311:val_311} 3 {74:val_74} 1 {code} From [groupby_complex_types_multi_single_reducer.q.out#L240|https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/groupby_complex_types_multi_single_reducer.q.out#L240] {code} POSTHOOK: query: SELECT DEST2.* FROM DEST2 POSTHOOK: type: QUERY POSTHOOK: Input: default@dest2 A masked pattern was here {0:val_0} 3 {10:val_10} 1 {100:val_100} 2 {103:val_103} 2 {104:val_104} 2 {105:val_105} 1 {11:val_11} 1 {111:val_111} 1 {113:val_113} 2 {114:val_114} 1 {code} Fix NPE in FileSinkOperator from hashcode mismatch -- Key: HIVE-10538 URL: https://issues.apache.org/jira/browse/HIVE-10538 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0, 1.2.0 Reporter: Peter Slawski Assignee: Peter Slawski Priority: Critical Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, HIVE-10538.1.patch A Null Pointer Exception occurs when in FileSinkOperator when using bucketed tables and distribute by with multiFileSpray enabled. The following snippet query reproduces this issue: {code} set hive.enforce.bucketing = true; set hive.exec.reducers.max = 20; create table bucket_a(key int, value_a string) clustered by (key) into 256 buckets; create table bucket_b(key int, value_b string) clustered by (key) into 256 buckets; create table bucket_ab(key int, value_a string, value_b string) clustered by (key) into 256 buckets; -- Insert data into bucket_a and bucket_b insert overwrite table bucket_ab select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key = b.key) distribute by key; {code} The following stack trace is logged. {code} 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer (ExecReducer.java:reduce(255)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) ... 8 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10619) Fix ConcurrentHashMap.get in MetadataListStructObjectInspector.getInstance (52)
[ https://issues.apache.org/jira/browse/HIVE-10619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10619: --- Attachment: rb33878.patch patch #1 Fix ConcurrentHashMap.get in MetadataListStructObjectInspector.getInstance (52) --- Key: HIVE-10619 URL: https://issues.apache.org/jira/browse/HIVE-10619 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: rb33878.patch cached.get(columnNames) should be replaced with cached.get(key) in the code block below {code} cached = new ConcurrentHashMapListListString, MetadataListStructObjectInspector(); public static MetadataListStructObjectInspector getInstance( ListString columnNames) { ArrayListListString key = new ArrayListListString(1); key.add(columnNames); MetadataListStructObjectInspector result = cached.get(columnNames); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10526) CBO (Calcite Return Path): HiveCost epsilon comparison should take row count in to account
[ https://issues.apache.org/jira/browse/HIVE-10526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529584#comment-14529584 ] Sushanth Sowmyan commented on HIVE-10526: - I don't see this picked up in the test commit queue, and it's possible it'll fail out saying it's already processed this file, so I'm going to re-upload .1.patch as .2.patch and manually submit this into the queue. CBO (Calcite Return Path): HiveCost epsilon comparison should take row count in to account -- Key: HIVE-10526 URL: https://issues.apache.org/jira/browse/HIVE-10526 Project: Hive Issue Type: Sub-task Components: CBO Affects Versions: 0.12.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 1.2.0 Attachments: HIVE-10526.1.patch, HIVE-10526.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10617) LLAP: allocator occasionally has a spurious failure to allocate due to partitioned locking and has to retry
[ https://issues.apache.org/jira/browse/HIVE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10617: Summary: LLAP: allocator occasionally has a spurious failure to allocate due to partitioned locking and has to retry (was: LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to partitioned locking) LLAP: allocator occasionally has a spurious failure to allocate due to partitioned locking and has to retry - Key: HIVE-10617 URL: https://issues.apache.org/jira/browse/HIVE-10617 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin See HIVE-10482 and the comment in code. Right now this is worked around by retrying. Simple case - thread can reserve memory from manager and bounce between checking arena 1 and arena 2 for memory as other threads allocate and deallocate from respective arenas in reverse order, making it look like there's no memory. More importantly this can happen when buddy blocks are split when lots of stuff is allocated. This can be solved either with some form of helping (esp. for split case) or by making allocator an actor (or set of actors, one per 1-N arenas that they would own), to satisfy alloc requests more deterministically (and also get rid of most sync). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10609) Vectorization : Q64 fails with ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529972#comment-14529972 ] Matt McCline commented on HIVE-10609: - This doesn't fail on my combined build of HIVE-9743 and HIVE-10565. Will verify again when those JIRAs go in. Vectorization : Q64 fails with ClassCastException - Key: HIVE-10609 URL: https://issues.apache.org/jira/browse/HIVE-10609 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Matt McCline Fix For: 1.2.0 TPC-DS Q64 fails with ClassCastException. Query {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE cd1.cd_marital_status cd2.cd_marital_status and i_color in ('maroon','burnished','dim','steel','navajo','chocolate') and i_current_price between 35 and 35 + 10 and i_current_price between 35 + 1 and 35 + 15 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number ,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number ,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year ,d3.d_year ) cs1 JOIN (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON
[jira] [Commented] (HIVE-10482) LLAP: AsertionError cannot allocate when reading from orc
[ https://issues.apache.org/jira/browse/HIVE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529476#comment-14529476 ] Sergey Shelukhin commented on HIVE-10482: - I found the issue, not clear how to fix this yet though LLAP: AsertionError cannot allocate when reading from orc - Key: HIVE-10482 URL: https://issues.apache.org/jira/browse/HIVE-10482 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Sergey Shelukhin Fix For: llap This was from a run of tpch query 1. [~sershe] - not sure if you've already seen this. Creating a jira so that it doesn't get lost. {code} 2015-04-24 13:11:54,180 [TezTaskRunner_attempt_1429683757595_0326_4_00_000199_0(container_1_0326_01_003216_sseth_20150424131137_8ec6200c-77c8-43ea-a6a3-a0ab1da6e1ac:4_Map 1_199_0)] ERROR org.apache.hadoop.hive.ql.exec.tez.TezProcessor: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: java.lang.AssertionError: Cannot allocate at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:314) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:329) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: java.io.IOException: java.lang.AssertionError: Cannot allocate at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62) ... 16 more Caused by: java.io.IOException: java.lang.AssertionError: Cannot allocate at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:257) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:209) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:147) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:97) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) ... 22 more Caused by: java.lang.AssertionError: Cannot allocate at org.apache.hadoop.hive.ql.io.orc.InStream.readEncodedStream(InStream.java:761) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:441) at
[jira] [Commented] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly
[ https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529578#comment-14529578 ] Vikram Dixit K commented on HIVE-10565: --- I am reviewing this one. LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly Key: HIVE-10565 URL: https://issues.apache.org/jira/browse/HIVE-10565 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 1.2.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, HIVE-10565.06.patch Filtering can knock out some of the rows for a repeated key, but those knocked out rows need to be included in the LEFT OUTER JOIN result and are currently not when only some rows are filtered out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10604) update webhcat-default.xml with 1.2 version numbers
[ https://issues.apache.org/jira/browse/HIVE-10604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529633#comment-14529633 ] Hive QA commented on HIVE-10604: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730316/HIVE-10604.patch {color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 8900 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3743/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3743/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3743/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 24 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12730316 - PreCommit-HIVE-TRUNK-Build update webhcat-default.xml with 1.2 version numbers --- Key: HIVE-10604 URL: https://issues.apache.org/jira/browse/HIVE-10604 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Eugene Koifman Assignee: Eugene Koifman Priority: Minor Fix For: 1.2.0 Attachments: HIVE-10604.patch no precommit tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10620) ZooKeeperHiveLock overrides equal() method but not hashcode()
[ https://issues.apache.org/jira/browse/HIVE-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529931#comment-14529931 ] Ashutosh Chauhan commented on HIVE-10620: - +1 ZooKeeperHiveLock overrides equal() method but not hashcode() - Key: HIVE-10620 URL: https://issues.apache.org/jira/browse/HIVE-10620 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Attachments: HIVE-10620.patch ZooKeeperHiveLock overrides the public boolean equals(Object o) method but does not for public int hashCode(). It violates the Java contract and may cause unexpected results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive
[ https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529892#comment-14529892 ] Eugene Koifman commented on HIVE-8065: -- How come the move restriction is not an issue for something like Insert Overwrite tableEZ1 select * from tableEZ2 inner join tableEZ3? Support HDFS encryption functionality on Hive - Key: HIVE-8065 URL: https://issues.apache.org/jira/browse/HIVE-8065 Project: Hive Issue Type: Improvement Affects Versions: 0.13.1 Reporter: Sergio Peña Assignee: Sergio Peña Labels: Hive-Scrum The new encryption support on HDFS makes Hive incompatible and unusable when this feature is used. HDFS encryption is designed so that an user can configure different encryption zones (or directories) for multi-tenant environments. An encryption zone has an exclusive encryption key, such as AES-128 or AES-256. Because of security compliance, the HDFS does not allow to move/rename files between encryption zones. Renames are allowed only inside the same encryption zone. A copy is allowed between encryption zones. See HDFS-6134 for more details about HDFS encryption design. Hive currently uses a scratch directory (like /tmp/$user/$random). This scratch directory is used for the output of intermediate data (between MR jobs) and for the final output of the hive query which is later moved to the table directory location. If Hive tables are in different encryption zones than the scratch directory, then Hive won't be able to renames those files/directories, and it will make Hive unusable. To handle this problem, we can change the scratch directory of the query/statement to be inside the same encryption zone of the table directory location. This way, the renaming process will be successful. Also, for statements that move files between encryption zones (i.e. LOAD DATA), a copy may be executed instead of a rename. This will cause an overhead when copying large data files, but it won't break the encryption on Hive. Another security thing to consider is when using joins selects. If Hive joins different tables with different encryption key strengths, then the results of the select might break the security compliance of the tables. Let's say two tables with 128 bits and 256 bits encryption are joined, then the temporary results might be stored in the 128 bits encryption zone. This will conflict with the table encrypted with 256 bits temporary. To fix this, Hive should be able to select the scratch directory that is more secured/encrypted in order to save the intermediate data temporary with no compliance issues. For instance: {noformat} SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id; {noformat} - This should use a scratch directory (or staging directory) inside the table-aes256 table location. {noformat} INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1; {noformat} - This should use a scratch directory inside the table-aes1 location. {noformat} FROM table-unencrypted INSERT OVERWRITE TABLE table-aes128 SELECT id, name INSERT OVERWRITE TABLE table-aes256 SELECT id, name {noformat} - This should use a scratch directory on each of the tables locations. - The first SELECT will have its scratch directory on table-aes128 directory. - The second SELECT will have its scratch directory on table-aes256 directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10621) serde typeinfo equals methods are not symmetric
[ https://issues.apache.org/jira/browse/HIVE-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10621: --- Attachment: rb33880.patch patch #1 serde typeinfo equals methods are not symmetric --- Key: HIVE-10621 URL: https://issues.apache.org/jira/browse/HIVE-10621 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: rb33880.patch correct equals method implementation should start with {code} if (this == other) { return true; } if (other == null || getClass() != other.getClass()) { return false; } {code} DecimalTypeInfo, PrimitiveTypeInfo, VarcharTypeInfo, CharTypeInfo, HiveDecimalWritable equals method implementation starts with {code} if (other == null || !(other instanceof class_name)) { return false } {code} - first of all check for null is redundant - the second issue is that other instanceof class_name check is not symmetric. contract of equals() implies that, a.equals(b) is true if and only if b.equals(a) is true Current implementation violates this contract. e.g. DecimalTypeInfo instanceof PrimitiveTypeInfo is true but PrimitiveTypeInfo instanceof DecimalTypeInfo is false See more details here http://stackoverflow.com/questions/6518534/equals-method-overrides-equals-in-superclass-and-may-not-be-symmetric -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10618) Fix invocation of toString on byteArray in VerifyFast (250, 254)
[ https://issues.apache.org/jira/browse/HIVE-10618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529958#comment-14529958 ] Prasanth Jayachandran commented on HIVE-10618: -- +1 Fix invocation of toString on byteArray in VerifyFast (250, 254) Key: HIVE-10618 URL: https://issues.apache.org/jira/browse/HIVE-10618 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: rb33877.patch Arrays.toString(byteArray) can be used to convert byte[] to string -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10592) ORC file dump in JSON format
[ https://issues.apache.org/jira/browse/HIVE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529907#comment-14529907 ] Prasanth Jayachandran commented on HIVE-10592: -- Added multifile support in the new patch. The output will now look like {code}./bin/hive --orcfiledump --json --pretty file:///app/warehouse/alltypes_bloom/00_0 file:///app/warehouse/alltypes_orc/00_0{code} {code} {orcFileDumps: [ { fileName: file:\/\/\/app\/warehouse\/alltypes_bloom\/00_0, fileVersion: 0.12, writerVersion: HIVE_8732, numberOfRows: 3, compression: ZLIB, ... }, { fileName: file:\/\/\/app\/warehouse\/alltypes_orc\/00_0, fileVersion: 0.12, writerVersion: HIVE_8732, numberOfRows: 2, compression: ZLIB, ... } {code} ORC file dump in JSON format Key: HIVE-10592 URL: https://issues.apache.org/jira/browse/HIVE-10592 Project: Hive Issue Type: New Feature Affects Versions: 1.3.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10592.1.patch, HIVE-10592.2.patch, HIVE-10592.3.patch, HIVE-10592.4.patch ORC file dump uses custom format. Will be useful to dump ORC metadata in json format so that other tools can be built on top it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10621) serde typeinfo equals methods are not symmetric
[ https://issues.apache.org/jira/browse/HIVE-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10621: --- Description: correct equals method implementation should start with {code} if (this == other) { return true; } if (other == null || getClass() != other.getClass()) { return false; } {code} DecimalTypeInfo, PrimitiveTypeInfo, VarcharTypeInfo, CharTypeInfo, HiveDecimalWritable equals method implementation starts with {code} if (other == null || !(other instanceof class_name)) { return false } {code} - first of all check for null is redundant - the second issue is that other instanceof class_name check is not symmetric. contract of equals() implies that, a.equals(b) is true if and only if b.equals(a) is true Current implementation violates this contract. e.g. DecimalTypeInfo instanceof PrimitiveTypeInfo is true but PrimitiveTypeInfo instanceof DecimalTypeInfo is false See more details here http://stackoverflow.com/questions/6518534/equals-method-overrides-equals-in-superclass-and-may-not-be-symmetric serde typeinfo equals methods are not symmetric --- Key: HIVE-10621 URL: https://issues.apache.org/jira/browse/HIVE-10621 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor correct equals method implementation should start with {code} if (this == other) { return true; } if (other == null || getClass() != other.getClass()) { return false; } {code} DecimalTypeInfo, PrimitiveTypeInfo, VarcharTypeInfo, CharTypeInfo, HiveDecimalWritable equals method implementation starts with {code} if (other == null || !(other instanceof class_name)) { return false } {code} - first of all check for null is redundant - the second issue is that other instanceof class_name check is not symmetric. contract of equals() implies that, a.equals(b) is true if and only if b.equals(a) is true Current implementation violates this contract. e.g. DecimalTypeInfo instanceof PrimitiveTypeInfo is true but PrimitiveTypeInfo instanceof DecimalTypeInfo is false See more details here http://stackoverflow.com/questions/6518534/equals-method-overrides-equals-in-superclass-and-may-not-be-symmetric -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10563) MiniTezCliDriver tests ordering issues
[ https://issues.apache.org/jira/browse/HIVE-10563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-10563: - Attachment: HIVE-10563.2.patch MiniTezCliDriver tests ordering issues -- Key: HIVE-10563 URL: https://issues.apache.org/jira/browse/HIVE-10563 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-10563.1.patch, HIVE-10563.2.patch There are a bunch of tests related to TestMiniTezCliDriver which gives ordering issues when run on Centos/Windows/OSX -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10607) Combination of ReducesinkDedup + TopN optimization yields incorrect result if there are multiple GBY in reducer
[ https://issues.apache.org/jira/browse/HIVE-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529903#comment-14529903 ] Hive QA commented on HIVE-10607: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730369/HIVE-10607.patch {color:red}ERROR:{color} -1 due to 25 failed/errored test(s), 8900 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3746/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3746/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3746/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 25 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12730369 - PreCommit-HIVE-TRUNK-Build Combination of ReducesinkDedup + TopN optimization yields incorrect result if there are multiple GBY in reducer --- Key: HIVE-10607 URL: https://issues.apache.org/jira/browse/HIVE-10607 Project: Hive Issue Type: Bug Components: Logical Optimizer, Tez Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.1.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-10607.patch {code:sql} select ctinyint, count(cdouble) from (select ctinyint, cdouble from alltypesorc group by ctinyint, cdouble) t1 group by ctinyint order by ctinyint limit 20; {code} This gives different result set depending on which set of optimizations are on. In particular in .q test environment following two invocations will give you different result set: {code} * mvn test -Phadoop-2 -Dtest.output.overwrite=true -Dtest=TestMiniTezCliDriver -Dqfile=test.q -Dhive.optimize.reducededuplication.min.reducer=1 -Dhive.limit.pushdown.memory.usage=0.3f * mvn test -Phadoop-2
[jira] [Updated] (HIVE-10592) ORC file dump in JSON format
[ https://issues.apache.org/jira/browse/HIVE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10592: - Attachment: HIVE-10592.4.patch ORC file dump in JSON format Key: HIVE-10592 URL: https://issues.apache.org/jira/browse/HIVE-10592 Project: Hive Issue Type: New Feature Affects Versions: 1.3.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10592.1.patch, HIVE-10592.2.patch, HIVE-10592.3.patch, HIVE-10592.4.patch ORC file dump uses custom format. Will be useful to dump ORC metadata in json format so that other tools can be built on top it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8890) HiveServer2 dynamic service discovery: use persistent ephemeral nodes curator recipe
[ https://issues.apache.org/jira/browse/HIVE-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528575#comment-14528575 ] Thejas M Nair commented on HIVE-8890: - +1 Sorry about the delay in reviewing updated patch! HiveServer2 dynamic service discovery: use persistent ephemeral nodes curator recipe Key: HIVE-8890 URL: https://issues.apache.org/jira/browse/HIVE-8890 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.14.0, 1.0.0, 1.1.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Priority: Critical Fix For: 1.2.0 Attachments: HIVE-8890.1.patch, HIVE-8890.2.patch, HIVE-8890.3.patch, HIVE-8890.4.patch Using this recipe gives better reliability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)
[ https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reuben Kuhnert updated HIVE-10190: -- Attachment: HIVE-10190.10.patch CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE) - Key: HIVE-10190 URL: https://issues.apache.org/jira/browse/HIVE-10190 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Reuben Kuhnert Priority: Trivial Labels: perfomance Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, HIVE-10190.07.patch, HIVE-10190.08.patch, HIVE-10190.09.patch, HIVE-10190.10.patch {code} public static boolean validateASTForUnsupportedTokens(ASTNode ast) { String astTree = ast.toStringTree(); // if any of following tokens are present in AST, bail out String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE }; for (String token : tokens) { if (astTree.contains(token)) { return false; } } return true; } {code} This is an issue for a SQL query which is bigger in AST form than in text (~700kb). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10597) Relative path doesn't work with CREATE TABLE LOCATION 'relative/path'
[ https://issues.apache.org/jira/browse/HIVE-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reuben Kuhnert updated HIVE-10597: -- Attachment: HIVE-10597.02.patch Relative path doesn't work with CREATE TABLE LOCATION 'relative/path' - Key: HIVE-10597 URL: https://issues.apache.org/jira/browse/HIVE-10597 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Reuben Kuhnert Assignee: Reuben Kuhnert Priority: Minor Attachments: HIVE-10597.01.patch, HIVE-10597.02.patch {code} 0: jdbc:hive2://a2110.halxg.cloudera.com:1000 CREATE EXTERNAL TABLE IF NOT EXISTS mydb.employees3 like mydb.employees LOCATION 'data/stock'; Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.NullPointerException) (state=08S01,code=1) 0: jdbc:hive2://a2110.halxg.cloudera.com:1000 CREATE EXTERNAL TABLE IF NOT EXISTS mydb.employees3 like mydb.employees LOCATION '/user/hive/data/stock'; No rows affected (0.369 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)
[ https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reuben Kuhnert updated HIVE-10190: -- Attachment: HIVE-10190.10.patch CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE) - Key: HIVE-10190 URL: https://issues.apache.org/jira/browse/HIVE-10190 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Reuben Kuhnert Priority: Trivial Labels: perfomance Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, HIVE-10190.07.patch, HIVE-10190.08.patch, HIVE-10190.09.patch, HIVE-10190.10.patch {code} public static boolean validateASTForUnsupportedTokens(ASTNode ast) { String astTree = ast.toStringTree(); // if any of following tokens are present in AST, bail out String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE }; for (String token : tokens) { if (astTree.contains(token)) { return false; } } return true; } {code} This is an issue for a SQL query which is bigger in AST form than in text (~700kb). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10594) Remote Spark client doesn't use Kerberos keytab to authenticate [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528640#comment-14528640 ] Bruce Nelson commented on HIVE-10594: - I have confirmed the issue with a few more specifics : 1. Confirmed using CDH 5.4.0 with Kerberos, OpenLDAP/SSSD and Sentry (no impersonation) 2. Problem is seem even if beeline is run on the HS2 server, 3. Unless the hive/hs2 host princ@DOMAIN runs kinit, setting hive-execution.engine=spark will result in a failed SQL execution. Once the hive principal runs kinit, then the hive on spark query succeeds. 4. The problem is specific to HS2 - it must be able to find the TGT cache for the hive principal in the default or KRB5CCNAME location or hive on spark will fail. Remote Spark client doesn't use Kerberos keytab to authenticate [Spark Branch] -- Key: HIVE-10594 URL: https://issues.apache.org/jira/browse/HIVE-10594 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 1.1.0 Reporter: Chao Sun Reporting problem found by one of the HoS users: Currently, if user is running Beeline on a different host than HS2, and he/she didn't do kinit on the HS2 host, then he/she may get the following error: {code} 2015-04-29 15:49:34,614 INFO org.apache.hive.spark.client.SparkClientImpl: 15/04/29 15:49:34 WARN UserGroupInformation: PriviledgedActionException as:hive (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2015-04-29 15:49:34,652 INFO org.apache.hive.spark.client.SparkClientImpl: Exception in thread main java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: secure-hos-1.ent.cloudera.com/10.20.77.79; destination host is: secure-hos-1.ent.cloudera.com:8032; 2015-04-29 15:49:34,653 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) 2015-04-29 15:49:34,653 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.ipc.Client.call(Client.java:1472) 2015-04-29 15:49:34,654 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.ipc.Client.call(Client.java:1399) 2015-04-29 15:49:34,654 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) 2015-04-29 15:49:34,654 INFO org.apache.hive.spark.client.SparkClientImpl: at com.sun.proxy.$Proxy11.getClusterMetrics(Unknown Source) 2015-04-29 15:49:34,655 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:202) 2015-04-29 15:49:34,655 INFO org.apache.hive.spark.client.SparkClientImpl: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2015-04-29 15:49:34,655 INFO org.apache.hive.spark.client.SparkClientImpl: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 2015-04-29 15:49:34,656 INFO org.apache.hive.spark.client.SparkClientImpl: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2015-04-29 15:49:34,656 INFO org.apache.hive.spark.client.SparkClientImpl: at java.lang.reflect.Method.invoke(Method.java:606) 2015-04-29 15:49:34,656 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at com.sun.proxy.$Proxy12.getClusterMetrics(Unknown Source) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:461) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:91) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:91) 2015-04-29 15:49:34,657 INFO
[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)
[ https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reuben Kuhnert updated HIVE-10190: -- Attachment: (was: HIVE-10190.10.patch) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE) - Key: HIVE-10190 URL: https://issues.apache.org/jira/browse/HIVE-10190 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Reuben Kuhnert Priority: Trivial Labels: perfomance Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, HIVE-10190.07.patch, HIVE-10190.08.patch, HIVE-10190.09.patch, HIVE-10190.10.patch {code} public static boolean validateASTForUnsupportedTokens(ASTNode ast) { String astTree = ast.toStringTree(); // if any of following tokens are present in AST, bail out String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE }; for (String token : tokens) { if (astTree.contains(token)) { return false; } } return true; } {code} This is an issue for a SQL query which is bigger in AST form than in text (~700kb). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10576) add jar command does not work with Windows OS
[ https://issues.apache.org/jira/browse/HIVE-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528580#comment-14528580 ] Thejas M Nair commented on HIVE-10576: -- +1 Thanks Hari! add jar command does not work with Windows OS - Key: HIVE-10576 URL: https://issues.apache.org/jira/browse/HIVE-10576 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-10576.1.patch, HIVE-10576.2.patch, HIVE-10576.3.patch Steps to reproduce this issue in Windows OS: hadoop.cmd fs -mkdir -p /tmp/testjars hadoop.cmd fs -copyFromLocal hive-hcatalog-core-*.jar /tmp/testjars from hive cli: add jar hdfs:///tmp/testjars/hive-hcatalog-core-*.jar; add jar D:\hdp\hive-1.2.0.2.3.0.0-1737\hcatalog\share\hcatalog\hive-hcatal og-core-1.2.0.2.3.0.0-1737.jar; {code} hive add jar hdfs:///tmp/testjars/hive-hcatalog-core-1.2.0.2.3.0.0-1737.jar; converting to local hdfs:///tmp/testjars/hive-hcatalog-core-1.2.0.2.3.0.0-1737.j ar Illegal character in opaque part at index 2: C:\Users\hadoopqa\AppData\Local\Tem p\cf0c70a4-f8e5-43ae-8c94-aa528f90887d_resources\hive-hcatalog-core-1.2.0.2.3.0. 0-1737.jar Query returned non-zero code: 1, cause: java.net.URISyntaxException: Illegal cha racter in opaque part at index 2: C:\Users\hadoopqa\AppData\Local\Temp\cf0c70a4- f8e5-43ae-8c94-aa528f90887d_resources\hive-hcatalog-core-1.2.0.2.3.0.0-1737.jar hive add jar D:\hdp\hive-1.2.0.2.3.0.0-1737\hcatalog\share\hcatalog\hive-hcatal og-core-1.2.0.2.3.0.0-1737.jar; Illegal character in opaque part at index 2: D:\hdp\hive-1.2.0.2.3.0.0-1737\hcat alog\share\hcatalog\hive-hcatalog-core-1.2.0.2.3.0.0-1737.jar Query returned non-zero code: 1, cause: java.net.URISyntaxException: Illegal cha racter in opaque part at index 2: D:\hdp\hive-1.2.0.2.3.0.0-1737\hcatalog\share\ hcatalog\hive-hcatalog-core-1.2.0.2.3.0.0-1737.jar {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10576) add jar command does not work with Windows OS
[ https://issues.apache.org/jira/browse/HIVE-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528591#comment-14528591 ] Hive QA commented on HIVE-10576: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730319/HIVE-10576.3.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 8895 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testTempTable org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3734/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3734/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3734/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12730319 - PreCommit-HIVE-TRUNK-Build add jar command does not work with Windows OS - Key: HIVE-10576 URL: https://issues.apache.org/jira/browse/HIVE-10576 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-10576.1.patch, HIVE-10576.2.patch, HIVE-10576.3.patch Steps to reproduce this issue in Windows OS: hadoop.cmd fs -mkdir -p /tmp/testjars hadoop.cmd fs -copyFromLocal hive-hcatalog-core-*.jar /tmp/testjars from hive cli: add jar hdfs:///tmp/testjars/hive-hcatalog-core-*.jar; add jar D:\hdp\hive-1.2.0.2.3.0.0-1737\hcatalog\share\hcatalog\hive-hcatal og-core-1.2.0.2.3.0.0-1737.jar; {code} hive add jar hdfs:///tmp/testjars/hive-hcatalog-core-1.2.0.2.3.0.0-1737.jar; converting to local hdfs:///tmp/testjars/hive-hcatalog-core-1.2.0.2.3.0.0-1737.j ar Illegal character in opaque part at index 2: C:\Users\hadoopqa\AppData\Local\Tem p\cf0c70a4-f8e5-43ae-8c94-aa528f90887d_resources\hive-hcatalog-core-1.2.0.2.3.0. 0-1737.jar Query returned non-zero code: 1, cause: java.net.URISyntaxException: Illegal cha racter in opaque part at index 2: C:\Users\hadoopqa\AppData\Local\Temp\cf0c70a4- f8e5-43ae-8c94-aa528f90887d_resources\hive-hcatalog-core-1.2.0.2.3.0.0-1737.jar hive add jar D:\hdp\hive-1.2.0.2.3.0.0-1737\hcatalog\share\hcatalog\hive-hcatal og-core-1.2.0.2.3.0.0-1737.jar; Illegal character in opaque part at index 2: D:\hdp\hive-1.2.0.2.3.0.0-1737\hcat alog\share\hcatalog\hive-hcatalog-core-1.2.0.2.3.0.0-1737.jar Query returned non-zero code: 1, cause: java.net.URISyntaxException: Illegal cha racter in opaque part at index 2: D:\hdp\hive-1.2.0.2.3.0.0-1737\hcatalog\share\ hcatalog\hive-hcatalog-core-1.2.0.2.3.0.0-1737.jar {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10608) Fix useless 'if' stamement in RetryingMetaStoreClient (135)
[ https://issues.apache.org/jira/browse/HIVE-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529430#comment-14529430 ] Szehon Ho commented on HIVE-10608: -- +1 Fix useless 'if' stamement in RetryingMetaStoreClient (135) --- Key: HIVE-10608 URL: https://issues.apache.org/jira/browse/HIVE-10608 Project: Hive Issue Type: Bug Components: Metastore Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: rb33861.patch if statement below is useless because it ends with ; {code} } catch (MetaException e) { if (e.getMessage().matches((?s).*(IO|TTransport)Exception.*)); caughtException = e; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10213) MapReduce jobs using dynamic-partitioning fail on commit.
[ https://issues.apache.org/jira/browse/HIVE-10213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529452#comment-14529452 ] Sushanth Sowmyan commented on HIVE-10213: - This patch set off some warning flags for me with regards to the traditional M-R usecase, but it's because it's been a while since I looked at this piece of code. The traditional M-R usecase is still fine, because the DynamicPartitionFileRecordWriterContainer.close() will register an appropriate TaskCommitterProxy, and a commit on the OutputCommitter will be called in the same process scope, thus making it okay. For pig-based optimizations also, it'd continue to be okay as the singleton retains it in memory. +1, and I'm okay with committing this patch as-is, tests have already run on this, and this section of code has not changed since then. MapReduce jobs using dynamic-partitioning fail on commit. - Key: HIVE-10213 URL: https://issues.apache.org/jira/browse/HIVE-10213 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-10213.1.patch I recently ran into a problem in {{TaskCommitContextRegistry}}, when using dynamic-partitions. Consider a MapReduce program that reads HCatRecords from a table (using HCatInputFormat), and then writes to another table (with identical schema), using HCatOutputFormat. The Map-task fails with the following exception: {code} Error: java.io.IOException: No callback registered for TaskAttemptID:attempt_1426589008676_509707_m_00_0@hdfs://crystalmyth.myth.net:8020/user/mithunr/mythdb/target/_DYN0.6784154320609959/grid=__HIVE_DEFAULT_PARTITION__/dt=__HIVE_DEFAULT_PARTITION__ at org.apache.hive.hcatalog.mapreduce.TaskCommitContextRegistry.commitTask(TaskCommitContextRegistry.java:56) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitTask(FileOutputCommitterContainer.java:139) at org.apache.hadoop.mapred.Task.commit(Task.java:1163) at org.apache.hadoop.mapred.Task.done(Task.java:1025) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) {code} {{TaskCommitContextRegistry::commitTask()}} uses call-backs registered from {{DynamicPartitionFileRecordWriter}}. But in case {{HCatInputFormat}} and {{HCatOutputFormat}} are both used in the same job, the {{DynamicPartitionFileRecordWriter}} might only be exercised in the Reducer. I'm relaxing the IOException, and log a warning message instead of just failing. (I'll post the fix shortly.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9743) Incorrect result set for vectorized left outer join
[ https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-9743: --- Attachment: (was: HIVE-9743.07.patch) Incorrect result set for vectorized left outer join --- Key: HIVE-9743 URL: https://issues.apache.org/jira/browse/HIVE-9743 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.14.0 Reporter: N Campbell Assignee: Matt McCline Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, HIVE-9743.06.patch This query is supposed to return 3 rows and will when run without Tez but returns 2 rows when run with Tez. select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 15 ) tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 1 20 25 null 2 null 50 null instead of tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 0 10 15 null 1 20 25 null 2 null 50 null create table if not exists TJOIN1 (RNUM int , C1 int, C2 int) STORED AS orc ; 0|10|15 1|20|25 2|\N|50 create table if not exists TJOIN2 (RNUM int , C1 int, C2 char(2)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE ; 0|10|BB 1|15|DD 2|\N|EE 3|10|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive
[ https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529461#comment-14529461 ] Brock Noland commented on HIVE-8065: bq. have you considered creating a single encrypted staging dir for all queries to use instead of creating new ones under the table namespace? (this could be owned by Hive and encrypted with Hive's key). If so, why did you choose the current design? This approach does not work since you cannot move files across encryption zones. Support HDFS encryption functionality on Hive - Key: HIVE-8065 URL: https://issues.apache.org/jira/browse/HIVE-8065 Project: Hive Issue Type: Improvement Affects Versions: 0.13.1 Reporter: Sergio Peña Assignee: Sergio Peña Labels: Hive-Scrum The new encryption support on HDFS makes Hive incompatible and unusable when this feature is used. HDFS encryption is designed so that an user can configure different encryption zones (or directories) for multi-tenant environments. An encryption zone has an exclusive encryption key, such as AES-128 or AES-256. Because of security compliance, the HDFS does not allow to move/rename files between encryption zones. Renames are allowed only inside the same encryption zone. A copy is allowed between encryption zones. See HDFS-6134 for more details about HDFS encryption design. Hive currently uses a scratch directory (like /tmp/$user/$random). This scratch directory is used for the output of intermediate data (between MR jobs) and for the final output of the hive query which is later moved to the table directory location. If Hive tables are in different encryption zones than the scratch directory, then Hive won't be able to renames those files/directories, and it will make Hive unusable. To handle this problem, we can change the scratch directory of the query/statement to be inside the same encryption zone of the table directory location. This way, the renaming process will be successful. Also, for statements that move files between encryption zones (i.e. LOAD DATA), a copy may be executed instead of a rename. This will cause an overhead when copying large data files, but it won't break the encryption on Hive. Another security thing to consider is when using joins selects. If Hive joins different tables with different encryption key strengths, then the results of the select might break the security compliance of the tables. Let's say two tables with 128 bits and 256 bits encryption are joined, then the temporary results might be stored in the 128 bits encryption zone. This will conflict with the table encrypted with 256 bits temporary. To fix this, Hive should be able to select the scratch directory that is more secured/encrypted in order to save the intermediate data temporary with no compliance issues. For instance: {noformat} SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id; {noformat} - This should use a scratch directory (or staging directory) inside the table-aes256 table location. {noformat} INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1; {noformat} - This should use a scratch directory inside the table-aes1 location. {noformat} FROM table-unencrypted INSERT OVERWRITE TABLE table-aes128 SELECT id, name INSERT OVERWRITE TABLE table-aes256 SELECT id, name {noformat} - This should use a scratch directory on each of the tables locations. - The first SELECT will have its scratch directory on table-aes128 directory. - The second SELECT will have its scratch directory on table-aes256 directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9743) Incorrect result set for vectorized left outer join
[ https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-9743: --- Attachment: HIVE-9743.08.patch Incorrect result set for vectorized left outer join --- Key: HIVE-9743 URL: https://issues.apache.org/jira/browse/HIVE-9743 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.14.0 Reporter: N Campbell Assignee: Matt McCline Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, HIVE-9743.06.patch, HIVE-9743.08.patch This query is supposed to return 3 rows and will when run without Tez but returns 2 rows when run with Tez. select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 15 ) tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 1 20 25 null 2 null 50 null instead of tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 0 10 15 null 1 20 25 null 2 null 50 null create table if not exists TJOIN1 (RNUM int , C1 int, C2 int) STORED AS orc ; 0|10|15 1|20|25 2|\N|50 create table if not exists TJOIN2 (RNUM int , C1 int, C2 char(2)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE ; 0|10|BB 1|15|DD 2|\N|EE 3|10|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10615) LLAP: Invalid containerId prefix
[ https://issues.apache.org/jira/browse/HIVE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529473#comment-14529473 ] Prasanth Jayachandran commented on HIVE-10615: -- [~sseth] fyi.. LLAP: Invalid containerId prefix Key: HIVE-10615 URL: https://issues.apache.org/jira/browse/HIVE-10615 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran I encountered this error when I ran a simple query in llap mode today. {code}org.apache.hadoop.ipc.RemoteException(java.io.IOException): java.lang.IllegalArgumentException: Invalid ContainerId prefix: at org.apache.hadoop.yarn.api.records.ContainerId.fromString(ContainerId.java:211) at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:178) at org.apache.tez.dag.app.TezTaskCommunicatorImpl$TezTaskUmbilicalProtocolImpl.heartbeat(TezTaskCommunicatorImpl.java:311) at org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator$LlapTaskUmbilicalProtocolImpl.heartbeat(LlapTaskCommunicator.java:398) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:514) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) at org.apache.hadoop.ipc.Client.call(Client.java:1468) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244) at com.sun.proxy.$Proxy14.heartbeat(Unknown Source) at org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.heartbeat(LlapTaskReporter.java:256) at org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:184) at org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:126) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/05/05 15:24:22 [Task-Executor-0] INFO task.TezTaskRunner : Interrupted while waiting for task to complete. Interrupting task 15/05/05 15:24:22 [TezTaskRunner_attempt_1430816501738_0034_1_00_00_0] INFO task.TezTaskRunner : Encounted an error while executing task: attempt_1430816501738_0034_1_00_00_0 java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.initialize(LogicalIOProcessorRuntimeTask.java:218) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:177) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at
[jira] [Updated] (HIVE-10610) hive command fails to get hadoop version
[ https://issues.apache.org/jira/browse/HIVE-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10610: Assignee: Shwetha G S hive command fails to get hadoop version Key: HIVE-10610 URL: https://issues.apache.org/jira/browse/HIVE-10610 Project: Hive Issue Type: Bug Reporter: Shwetha G S Assignee: Shwetha G S Attachments: HIVE-10610.patch NO PRECOMMIT TESTS If debug level logging is enabled, hive command fails with the following exception: {noformat} apache-hive-1.2.0-SNAPSHOT-bin$ ./bin/hive Unable to determine Hadoop version information from 13:54:07,683 'hadoop version' returned: 2015-05-05 13:54:08,014 DEBUG - [main:] ~ version: 2.5.0-cdh5.3.3 (VersionInfo:171) Hadoop 2.5.0-cdh5.3.3 Subversion http://github.com/cloudera/hadoop -r 82a65209d6e9e4a2b41fdbcd8190c7ea38730627 Compiled by jenkins on 2015-04-08T22:00Z Compiled with protoc 2.5.0 From source with checksum 1531e104cdad7489656f44875f3334b This command was run using /Users/sshivalingamurthy/installs/hadoop-2.5.0-cdh5.3.3/share/hadoop/common/hadoop-common-2.5.0-cdh5.3.3.jar {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9743) Incorrect result set for vectorized left outer join
[ https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529477#comment-14529477 ] Matt McCline commented on HIVE-9743: [~vikram.dixit] I removed the annotations and the MR vector_left_outer_join3.q.out and fiddled with environment variables so that it now has Sorted Merge Bucket Map Join Operator operators; Tez has Merge Join Operator as you said. The original LEFT OUTER JOIN problem does not repro with vector_left_outer_join3.q though. Incorrect result set for vectorized left outer join --- Key: HIVE-9743 URL: https://issues.apache.org/jira/browse/HIVE-9743 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.14.0 Reporter: N Campbell Assignee: Matt McCline Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, HIVE-9743.06.patch, HIVE-9743.08.patch This query is supposed to return 3 rows and will when run without Tez but returns 2 rows when run with Tez. select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 15 ) tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 1 20 25 null 2 null 50 null instead of tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 0 10 15 null 1 20 25 null 2 null 50 null create table if not exists TJOIN1 (RNUM int , C1 int, C2 int) STORED AS orc ; 0|10|15 1|20|25 2|\N|50 create table if not exists TJOIN2 (RNUM int , C1 int, C2 char(2)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE ; 0|10|BB 1|15|DD 2|\N|EE 3|10|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9508) MetaStore client socket connection should have a lifetime
[ https://issues.apache.org/jira/browse/HIVE-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528854#comment-14528854 ] Vaibhav Gumashta commented on HIVE-9508: Failures are unrelated. Will commit shortly. MetaStore client socket connection should have a lifetime - Key: HIVE-9508 URL: https://issues.apache.org/jira/browse/HIVE-9508 Project: Hive Issue Type: Sub-task Components: CLI, Metastore Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Labels: metastore, rolling_upgrade Fix For: 1.2.0 Attachments: HIVE-9508.1.patch, HIVE-9508.2.patch, HIVE-9508.3.patch, HIVE-9508.4.patch, HIVE-9508.5.patch, HIVE-9508.6.patch Currently HiveMetaStoreClient (or SessionHMSC) is connected to one Metastore server until the connection is closed or there is a problem. I would like to introduce the concept of a MetaStore client socket life time. The MS client will reconnect if the socket lifetime is reached. This will help during rolling upgrade of Metastore. When there are multiple Metastore servers behind a VIP (load balancer), it is easy to take one server out of rotation and wait for 10+ mins for all existing connections will die down (if the lifetime is 5mins say) and the server can be updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10607) Combination of ReducesinkDedup + TopN optimization yields incorrect result if there are multiple GBY in reducer
[ https://issues.apache.org/jira/browse/HIVE-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528939#comment-14528939 ] Ashutosh Chauhan commented on HIVE-10607: - yeah.. it will be good to have this in 1.2 too. [~sushanth] is that OK ? Combination of ReducesinkDedup + TopN optimization yields incorrect result if there are multiple GBY in reducer --- Key: HIVE-10607 URL: https://issues.apache.org/jira/browse/HIVE-10607 Project: Hive Issue Type: Bug Components: Logical Optimizer, Tez Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.1.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-10607.patch {code:sql} select ctinyint, count(cdouble) from (select ctinyint, cdouble from alltypesorc group by ctinyint, cdouble) t1 group by ctinyint order by ctinyint limit 20; {code} This gives different result set depending on which set of optimizations are on. In particular in .q test environment following two invocations will give you different result set: {code} * mvn test -Phadoop-2 -Dtest.output.overwrite=true -Dtest=TestMiniTezCliDriver -Dqfile=test.q -Dhive.optimize.reducededuplication.min.reducer=1 -Dhive.limit.pushdown.memory.usage=0.3f * mvn test -Phadoop-2 -Dtest.output.overwrite=true -Dtest=TestMiniTezCliDriver -Dqfile=test.q {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10542) Full outer joins in tez produce incorrect results in certain cases
[ https://issues.apache.org/jira/browse/HIVE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-10542: -- Attachment: HIVE-10542.6.patch Full outer joins in tez produce incorrect results in certain cases -- Key: HIVE-10542 URL: https://issues.apache.org/jira/browse/HIVE-10542 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Priority: Blocker Attachments: HIVE-10542.1.patch, HIVE-10542.2.patch, HIVE-10542.3.patch, HIVE-10542.4.patch, HIVE-10542.5.patch, HIVE-10542.6.patch If there is no records for one of the tables in the full outer join, we do not read the other input and end up not producing rows which we should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9845) HCatSplit repeats information making input split data size huge
[ https://issues.apache.org/jira/browse/HIVE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529007#comment-14529007 ] Mithun Radhakrishnan commented on HIVE-9845: Here's the updated patch. Sorry for the delay. HCatSplit repeats information making input split data size huge --- Key: HIVE-9845 URL: https://issues.apache.org/jira/browse/HIVE-9845 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Rohini Palaniswamy Assignee: Mithun Radhakrishnan Attachments: HIVE-9845.1.patch, HIVE-9845.3.patch, HIVE-9845.4.patch, HIVE-9845.5.patch Pig on Tez jobs with larger tables hit PIG-4443. Running on HDFS data which has even triple the number of splits(100K+ splits and tasks) does not hit that issue. {code} HCatBaseInputFormat.java: //Call getSplit on the InputFormat, create an //HCatSplit for each underlying split //NumSplits is 0 for our purposes org.apache.hadoop.mapred.InputSplit[] baseSplits = inputFormat.getSplits(jobConf, 0); for(org.apache.hadoop.mapred.InputSplit split : baseSplits) { splits.add(new HCatSplit( partitionInfo, split,allCols)); } {code} Each hcatSplit duplicates partition schema and table schema. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10611) Mini tez tests wait for 5 minutes before shutting down
[ https://issues.apache.org/jira/browse/HIVE-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529024#comment-14529024 ] Ashutosh Chauhan commented on HIVE-10611: - +1 Thanks, [~vikram.dixit] for doing this. Mini tez tests wait for 5 minutes before shutting down -- Key: HIVE-10611 URL: https://issues.apache.org/jira/browse/HIVE-10611 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 1.3.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-10611.1.patch Currently, at shutdown, the tez mini cluster waits for the session to close before shutting down the cluster. This ends up being 5 minutes - the default value. We can shut down the session to alleviate this situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others
[ https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529366#comment-14529366 ] Thejas M Nair commented on HIVE-7018: - This change breaks schematool upgrade - See HIVE-10614 Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others - Key: HIVE-7018 URL: https://issues.apache.org/jira/browse/HIVE-7018 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Yongzhi Chen Fix For: 1.2.0 Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch It appears that at least postgres and oracle do not have the LINK_TARGET_ID column while mysql does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive
[ https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529378#comment-14529378 ] Eugene Koifman commented on HIVE-8065: -- [~spena], when implementing this, have you considered creating a single encrypted staging dir for all queries to use instead of creating new ones under the table namespace? (this could be owned by Hive and encrypted with Hive's key). If so, why did you choose the current design? Some possible issues with current design: Requires write permission on the table dir delete-on-exit (on stagingdir) is not completely reliable as far as I know. This may leave files around in a query like SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id; when staging dir is created under table-aes256, someone how has a key for this EZ may read data (in theory at least) that came from table-aes128 even if they don't have a key for EZ which contains table-aes128. thanks Support HDFS encryption functionality on Hive - Key: HIVE-8065 URL: https://issues.apache.org/jira/browse/HIVE-8065 Project: Hive Issue Type: Improvement Affects Versions: 0.13.1 Reporter: Sergio Peña Assignee: Sergio Peña Labels: Hive-Scrum The new encryption support on HDFS makes Hive incompatible and unusable when this feature is used. HDFS encryption is designed so that an user can configure different encryption zones (or directories) for multi-tenant environments. An encryption zone has an exclusive encryption key, such as AES-128 or AES-256. Because of security compliance, the HDFS does not allow to move/rename files between encryption zones. Renames are allowed only inside the same encryption zone. A copy is allowed between encryption zones. See HDFS-6134 for more details about HDFS encryption design. Hive currently uses a scratch directory (like /tmp/$user/$random). This scratch directory is used for the output of intermediate data (between MR jobs) and for the final output of the hive query which is later moved to the table directory location. If Hive tables are in different encryption zones than the scratch directory, then Hive won't be able to renames those files/directories, and it will make Hive unusable. To handle this problem, we can change the scratch directory of the query/statement to be inside the same encryption zone of the table directory location. This way, the renaming process will be successful. Also, for statements that move files between encryption zones (i.e. LOAD DATA), a copy may be executed instead of a rename. This will cause an overhead when copying large data files, but it won't break the encryption on Hive. Another security thing to consider is when using joins selects. If Hive joins different tables with different encryption key strengths, then the results of the select might break the security compliance of the tables. Let's say two tables with 128 bits and 256 bits encryption are joined, then the temporary results might be stored in the 128 bits encryption zone. This will conflict with the table encrypted with 256 bits temporary. To fix this, Hive should be able to select the scratch directory that is more secured/encrypted in order to save the intermediate data temporary with no compliance issues. For instance: {noformat} SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id; {noformat} - This should use a scratch directory (or staging directory) inside the table-aes256 table location. {noformat} INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1; {noformat} - This should use a scratch directory inside the table-aes1 location. {noformat} FROM table-unencrypted INSERT OVERWRITE TABLE table-aes128 SELECT id, name INSERT OVERWRITE TABLE table-aes256 SELECT id, name {noformat} - This should use a scratch directory on each of the tables locations. - The first SELECT will have its scratch directory on table-aes128 directory. - The second SELECT will have its scratch directory on table-aes256 directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)
[ https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reuben Kuhnert updated HIVE-10190: -- Attachment: HIVE-10190.09.patch CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE) - Key: HIVE-10190 URL: https://issues.apache.org/jira/browse/HIVE-10190 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Reuben Kuhnert Priority: Trivial Labels: perfomance Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, HIVE-10190.07.patch, HIVE-10190.08.patch, HIVE-10190.09.patch {code} public static boolean validateASTForUnsupportedTokens(ASTNode ast) { String astTree = ast.toStringTree(); // if any of following tokens are present in AST, bail out String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE }; for (String token : tokens) { if (astTree.contains(token)) { return false; } } return true; } {code} This is an issue for a SQL query which is bigger in AST form than in text (~700kb). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.
[ https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528696#comment-14528696 ] Sushanth Sowmyan commented on HIVE-9736: +1 : Have looked through patch and it makes sense. Tests pass, and I trust Chris' judgement on this for a more detailed verification. :) Will commit to master and branch-1.2 StorageBasedAuthProvider should batch namenode-calls where possible. Key: HIVE-9736 URL: https://issues.apache.org/jira/browse/HIVE-9736 Project: Hive Issue Type: Bug Components: Metastore, Security Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch Consider a table partitioned by 2 keys (dt, region). Say a dt partition could have 1 associated regions. Consider that the user does: {code:sql} ALTER TABLE my_table DROP PARTITION (dt='20150101'); {code} As things stand now, {{StorageBasedAuthProvider}} will make individual {{DistributedFileSystem.listStatus()}} calls for each partition-directory, and authorize each one separately. It'd be faster to batch the calls, and examine multiple FileStatus objects at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)
[ https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reuben Kuhnert updated HIVE-10190: -- Attachment: (was: HIVE-10190.10.patch) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE) - Key: HIVE-10190 URL: https://issues.apache.org/jira/browse/HIVE-10190 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Reuben Kuhnert Priority: Trivial Labels: perfomance Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, HIVE-10190.07.patch, HIVE-10190.08.patch {code} public static boolean validateASTForUnsupportedTokens(ASTNode ast) { String astTree = ast.toStringTree(); // if any of following tokens are present in AST, bail out String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE }; for (String token : tokens) { if (astTree.contains(token)) { return false; } } return true; } {code} This is an issue for a SQL query which is bigger in AST form than in text (~700kb). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)
[ https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reuben Kuhnert updated HIVE-10190: -- Attachment: (was: HIVE-10190.09.patch) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE) - Key: HIVE-10190 URL: https://issues.apache.org/jira/browse/HIVE-10190 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Reuben Kuhnert Priority: Trivial Labels: perfomance Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, HIVE-10190.07.patch, HIVE-10190.08.patch {code} public static boolean validateASTForUnsupportedTokens(ASTNode ast) { String astTree = ast.toStringTree(); // if any of following tokens are present in AST, bail out String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE }; for (String token : tokens) { if (astTree.contains(token)) { return false; } } return true; } {code} This is an issue for a SQL query which is bigger in AST form than in text (~700kb). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7375) Add option in test infra to compile in other profiles (like hadoop-1)
[ https://issues.apache.org/jira/browse/HIVE-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528733#comment-14528733 ] Xuefu Zhang commented on HIVE-7375: --- +1 Add option in test infra to compile in other profiles (like hadoop-1) - Key: HIVE-7375 URL: https://issues.apache.org/jira/browse/HIVE-7375 Project: Hive Issue Type: Test Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-7375.2.patch, HIVE-7375.patch As we are seeing some commits breaking hadoop-1 compilation due to lack of pre-commit converage, it might be nice to add an option in the test infra to compile on optional profiles as a pre-step before testing on the main profile. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10563) MiniTezCliDriver tests ordering issues
[ https://issues.apache.org/jira/browse/HIVE-10563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528756#comment-14528756 ] Ashutosh Chauhan commented on HIVE-10563: - Comment directive {{-- SORT_QUERY_RESULTS}} is invented for this exact used case. Please use that instead of adding order by in queries. If anything else (ie other than query result set) is order sensitive, then use {{ --SORT_BEFORE_DIFF }} MiniTezCliDriver tests ordering issues -- Key: HIVE-10563 URL: https://issues.apache.org/jira/browse/HIVE-10563 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-10563.1.patch There are a bunch of tests related to TestMiniTezCliDriver which gives ordering issues when run on Centos/Windows/OSX -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9845) HCatSplit repeats information making input split data size huge
[ https://issues.apache.org/jira/browse/HIVE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528830#comment-14528830 ] Mithun Radhakrishnan commented on HIVE-9845: I'll upload a new patch shortly. HCatSplit repeats information making input split data size huge --- Key: HIVE-9845 URL: https://issues.apache.org/jira/browse/HIVE-9845 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Rohini Palaniswamy Assignee: Mithun Radhakrishnan Attachments: HIVE-9845.1.patch, HIVE-9845.3.patch, HIVE-9845.4.patch Pig on Tez jobs with larger tables hit PIG-4443. Running on HDFS data which has even triple the number of splits(100K+ splits and tasks) does not hit that issue. {code} HCatBaseInputFormat.java: //Call getSplit on the InputFormat, create an //HCatSplit for each underlying split //NumSplits is 0 for our purposes org.apache.hadoop.mapred.InputSplit[] baseSplits = inputFormat.getSplits(jobConf, 0); for(org.apache.hadoop.mapred.InputSplit split : baseSplits) { splits.add(new HCatSplit( partitionInfo, split,allCols)); } {code} Each hcatSplit duplicates partition schema and table schema. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.
[ https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528724#comment-14528724 ] Chris Nauroth commented on HIVE-9736: - [~sushanth], thank you for your review and the commit! StorageBasedAuthProvider should batch namenode-calls where possible. Key: HIVE-9736 URL: https://issues.apache.org/jira/browse/HIVE-9736 Project: Hive Issue Type: Bug Components: Metastore, Security Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Fix For: 1.2.0 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch Consider a table partitioned by 2 keys (dt, region). Say a dt partition could have 1 associated regions. Consider that the user does: {code:sql} ALTER TABLE my_table DROP PARTITION (dt='20150101'); {code} As things stand now, {{StorageBasedAuthProvider}} will make individual {{DistributedFileSystem.listStatus()}} calls for each partition-directory, and authorize each one separately. It'd be faster to batch the calls, and examine multiple FileStatus objects at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)