[jira] [Created] (HIVE-20858) Serializer is not correctly initialized with configuration in Utilities.createEmptyBuckets()
Wei Zheng created HIVE-20858: Summary: Serializer is not correctly initialized with configuration in Utilities.createEmptyBuckets() Key: HIVE-20858 URL: https://issues.apache.org/jira/browse/HIVE-20858 Project: Hive Issue Type: Bug Affects Versions: 3.1.0 Reporter: Wei Zheng Assignee: Wei Zheng Attachments: HIVE-20858.1.patch -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20800) Use "posix" for property tarLongFileMode for maven-assembly-plugin
Wei Zheng created HIVE-20800: Summary: Use "posix" for property tarLongFileMode for maven-assembly-plugin Key: HIVE-20800 URL: https://issues.apache.org/jira/browse/HIVE-20800 Project: Hive Issue Type: Improvement Components: Build Infrastructure Affects Versions: 3.1.0 Reporter: Wei Zheng Assignee: Wei Zheng Fix For: 4.0.0 Came across this error when building hive using "mvn clean install -DskipTests" {code} [INFO] Building tar: /Users/wei/apache/hive/standalone-metastore/target/apache-hive-standalone-metastore-4.0.0-SNAPSHOT-src.tar.gz [INFO] [INFO] Reactor Summary: [INFO] [INFO] Hive Storage API 2.7.0-SNAPSHOT SUCCESS [ 5.656 s] [INFO] Hive 4.0.0-SNAPSHOT SUCCESS [ 0.779 s] [INFO] Hive Classifications ... SUCCESS [ 0.908 s] [INFO] Hive Shims Common .. SUCCESS [ 3.217 s] [INFO] Hive Shims 0.23 SUCCESS [ 7.102 s] [INFO] Hive Shims Scheduler ... SUCCESS [ 2.069 s] [INFO] Hive Shims . SUCCESS [ 1.905 s] [INFO] Hive Common SUCCESS [ 8.185 s] [INFO] Hive Service RPC ... SUCCESS [ 3.603 s] [INFO] Hive Serde . SUCCESS [ 7.438 s] [INFO] Hive Standalone Metastore .. FAILURE [ 0.576 s] [INFO] Hive Standalone Metastore Common Code .. SKIPPED [INFO] Hive Metastore . SKIPPED [INFO] Hive Vector-Code-Gen Utilities . SKIPPED [INFO] Hive Llap Common ... SKIPPED [INFO] Hive Llap Client ... SKIPPED [INFO] Hive Llap Tez .. SKIPPED [INFO] Hive Spark Remote Client ... SKIPPED [INFO] Hive Metastore Server .. SKIPPED [INFO] Hive Query Language SKIPPED [INFO] Hive Llap Server ... SKIPPED [INFO] Hive Service ... SKIPPED [INFO] Hive Accumulo Handler .. SKIPPED [INFO] Hive JDBC .. SKIPPED [INFO] Hive Beeline ... SKIPPED [INFO] Hive CLI ... SKIPPED [INFO] Hive Contrib ... SKIPPED [INFO] Hive Druid Handler . SKIPPED [INFO] Hive HBase Handler . SKIPPED [INFO] Hive JDBC Handler .. SKIPPED [INFO] Hive HCatalog .. SKIPPED [INFO] Hive HCatalog Core . SKIPPED [INFO] Hive HCatalog Pig Adapter .. SKIPPED [INFO] Hive HCatalog Server Extensions SKIPPED [INFO] Hive HCatalog Webhcat Java Client .. SKIPPED [INFO] Hive HCatalog Webhcat .. SKIPPED [INFO] Hive HCatalog Streaming SKIPPED [INFO] Hive HPL/SQL ... SKIPPED [INFO] Hive Streaming . SKIPPED [INFO] Hive Llap External Client .. SKIPPED [INFO] Hive Shims Aggregator .. SKIPPED [INFO] Hive Kryo Registrator .. SKIPPED [INFO] Hive TestUtils . SKIPPED [INFO] Hive Kafka Storage Handler . SKIPPED [INFO] Hive Packaging . SKIPPED [INFO] Hive Metastore Tools ... SKIPPED [INFO] Hive Metastore Tools common libraries .. SKIPPED [INFO] Hive metastore benchmarks .. SKIPPED [INFO] Hive Upgrade Acid .. SKIPPED [INFO] Hive Pre Upgrade Acid 4.0.0-SNAPSHOT ... SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 42.026 s [INFO] Finished at: 2018-10-24T15:34:40-07:00 [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:3.1.0:single (assemble) on project hive-standalone-metastore: Execution assemble of goal org.apache.maven.plugins:maven-assembly-plugin:3.1.0:single failed: group id '74715970' is too big ( > 2097151 ). Use STAR or POSIX extensions
[jira] [Created] (HIVE-17361) Support LOAD DATA for transactional tables
Wei Zheng created HIVE-17361: Summary: Support LOAD DATA for transactional tables Key: HIVE-17361 URL: https://issues.apache.org/jira/browse/HIVE-17361 Project: Hive Issue Type: Bug Components: Transactions Reporter: Wei Zheng Assignee: Wei Zheng LOAD DATA was not supported since ACID was introduced. Need to fill this gap between ACID table and regular hive table. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-16963) rely on AcidUtils.getAcidState() for read path
Wei Zheng created HIVE-16963: Summary: rely on AcidUtils.getAcidState() for read path Key: HIVE-16963 URL: https://issues.apache.org/jira/browse/HIVE-16963 Project: Hive Issue Type: Sub-task Reporter: Wei Zheng Assignee: Wei Zheng This is to make MM table more consistent to full ACID table. Also it's a prerequisite for Insert Overwrite support for MM table (refer to HIVE-14988). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-16850) Only open a new transaction when there's no currently opened transaction
Wei Zheng created HIVE-16850: Summary: Only open a new transaction when there's no currently opened transaction Key: HIVE-16850 URL: https://issues.apache.org/jira/browse/HIVE-16850 Project: Hive Issue Type: Sub-task Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16819) Add MM test for temporary table
Wei Zheng created HIVE-16819: Summary: Add MM test for temporary table Key: HIVE-16819 URL: https://issues.apache.org/jira/browse/HIVE-16819 Project: Hive Issue Type: Sub-task Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16817) Restore CTAS tests in mm_all.q
Wei Zheng created HIVE-16817: Summary: Restore CTAS tests in mm_all.q Key: HIVE-16817 URL: https://issues.apache.org/jira/browse/HIVE-16817 Project: Hive Issue Type: Sub-task Reporter: Wei Zheng Assignee: Wei Zheng In earlier ACID integration patch CTAS was not supported. (previously I used a different approach in which I created a new data operation type for INSERT ONLY, which got errored out in TxnHandler. Later I changed that to INSERT which is working fine) As CTAS is working now, the corresponding tests for it should be restored. Note we still have the same limitations for MM tables as Hive regular tables do in general: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS) CTAS has these restrictions: The target table cannot be a partitioned table. The target table cannot be an external table. The target table cannot be a list bucketing table. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16810) Fix an export/import bug due to ACID integration
Wei Zheng created HIVE-16810: Summary: Fix an export/import bug due to ACID integration Key: HIVE-16810 URL: https://issues.apache.org/jira/browse/HIVE-16810 Project: Hive Issue Type: Sub-task Affects Versions: hive-14535 Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16760) Update errata.txt for HIVE-16743
Wei Zheng created HIVE-16760: Summary: Update errata.txt for HIVE-16743 Key: HIVE-16760 URL: https://issues.apache.org/jira/browse/HIVE-16760 Project: Hive Issue Type: Bug Affects Versions: 3.0.0, hive-14535 Reporter: Wei Zheng Assignee: Wei Zheng Refer to: https://issues.apache.org/jira/browse/HIVE-16743?focusedCommentId=16024139=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16024139 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16753) Add tests that cover createValidReadTxnList and createValidCompactTxnList in TxnUtils.java
Wei Zheng created HIVE-16753: Summary: Add tests that cover createValidReadTxnList and createValidCompactTxnList in TxnUtils.java Key: HIVE-16753 URL: https://issues.apache.org/jira/browse/HIVE-16753 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.0.0 Reporter: Wei Zheng Assignee: Wei Zheng Both are critical methods used in ACID paths. But there is no corresponding tests for them. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16743) BitSet set() is not incorrectly used in TxnUtils.createValidCompactTxnList()
Wei Zheng created HIVE-16743: Summary: BitSet set() is not incorrectly used in TxnUtils.createValidCompactTxnList() Key: HIVE-16743 URL: https://issues.apache.org/jira/browse/HIVE-16743 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.0.0 Reporter: Wei Zheng Assignee: Wei Zheng The second line is problematic {code} BitSet bitSet = new BitSet(exceptions.length); bitSet.set(0, bitSet.length()); // for ValidCompactorTxnList, everything in exceptions are aborted {code} For example, exceptions' length is 2. We declare a BitSet object with initial size of 2 via the first line above. But that's not the actual size of the BitSet. So bitSet.length() will still return 0. The intention of the second line above is to set all the bits to true. This was not achieved because bitSet.set(0, bitSet.length()) is equivalent to bitSet.set(0, 0). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16728) Fix some regression caused by HIVE-14879
Wei Zheng created HIVE-16728: Summary: Fix some regression caused by HIVE-14879 Key: HIVE-16728 URL: https://issues.apache.org/jira/browse/HIVE-16728 Project: Hive Issue Type: Sub-task Affects Versions: hive-14535 Reporter: Wei Zheng Assignee: Wei Zheng HIVE-14879 integrates ACID logic with MM table. But it broke some existing ACID tests. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16565) Improve how the open transactions and aborted transactions are deserialized in ValidReadTxnList.readFromString
Wei Zheng created HIVE-16565: Summary: Improve how the open transactions and aborted transactions are deserialized in ValidReadTxnList.readFromString Key: HIVE-16565 URL: https://issues.apache.org/jira/browse/HIVE-16565 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.0.0 Reporter: Wei Zheng Assignee: Wei Zheng This is a follow-up of HIVE-16534. In ValidReadTxnList.writeToString, we write out two open and aborted transactions as two sorted lists. We can take advantage of that and perform merge sort them together when reading them back in readFromString. Note that the aborted bits should also be handled properly during the merge sort. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16534) Add capability to tell aborted transactions apart from open transactions in ValidTxnList
Wei Zheng created HIVE-16534: Summary: Add capability to tell aborted transactions apart from open transactions in ValidTxnList Key: HIVE-16534 URL: https://issues.apache.org/jira/browse/HIVE-16534 Project: Hive Issue Type: Bug Components: Transactions Reporter: Wei Zheng Assignee: Wei Zheng Currently in ValidReadTxnList, open transactions and aborted transactions are stored together in one array. That makes it impossible to extract just aborted transactions or open transactions. For ValidCompactorTxnList this is fine, since we only store aborted transactions but no open transactions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16092) Generate and use universal mmId instead of per db/table
Wei Zheng created HIVE-16092: Summary: Generate and use universal mmId instead of per db/table Key: HIVE-16092 URL: https://issues.apache.org/jira/browse/HIVE-16092 Project: Hive Issue Type: Sub-task Reporter: Wei Zheng Assignee: Wei Zheng To facilitate later replacement for it with txnId -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16063) instead of explicitly specifying mmWriteId during compilation phase, it should only be generated whenever needed during runtime
Wei Zheng created HIVE-16063: Summary: instead of explicitly specifying mmWriteId during compilation phase, it should only be generated whenever needed during runtime Key: HIVE-16063 URL: https://issues.apache.org/jira/browse/HIVE-16063 Project: Hive Issue Type: Sub-task Reporter: Wei Zheng Assignee: Wei Zheng For ACID transaction logic to work with mm table, first thing is to make the ID usage logic consistent. ACID stores valid txn list in VALID_TXNS_KEY. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16028) Fail UPDATE/DELETE/MERGE queries when Ranger authorization manager is used
Wei Zheng created HIVE-16028: Summary: Fail UPDATE/DELETE/MERGE queries when Ranger authorization manager is used Key: HIVE-16028 URL: https://issues.apache.org/jira/browse/HIVE-16028 Project: Hive Issue Type: Bug Components: Authorization Affects Versions: 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng This is a followup of HIVE-15891. In that jira an error-out logic was added, but the assumption that we need to do row filtering/column masking for entries in a non-empty list of tables returned by applyRowFilterAndColumnMasking is wrong, because on Ranger side, RangerHiveAuthorizer#applyRowFilterAndColumnMasking will unconditionally return a list of tables no matter whether row filtering/column masking is applicable on the tables. The fix for Hive for now will be to move the error-out logic after we figure out there's no replacement text for the query. But ideally we should consider modifying Ranger logic to only return tables that need to be masked. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-15999) Fix flakiness in TestDbTxnManager2
Wei Zheng created HIVE-15999: Summary: Fix flakiness in TestDbTxnManager2 Key: HIVE-15999 URL: https://issues.apache.org/jira/browse/HIVE-15999 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng Right now there is test flakiness wrt. TestDbTxnManager2. The error is like this: {code} org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.checkExpectedLocks Error Details Table/View 'TXNS' already exists in Schema 'APP'. {code} The failure is due to HiveConf used in the test being polluted by some test, e.g. in testDummyTxnManagerOnAcidTable(), conf entry HIVE_TXN_MANAGER is set to "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager" but not switched back. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-15934) Downgrade Maven surefire plugin from 2.19.1 to 2.18.1
Wei Zheng created HIVE-15934: Summary: Downgrade Maven surefire plugin from 2.19.1 to 2.18.1 Key: HIVE-15934 URL: https://issues.apache.org/jira/browse/HIVE-15934 Project: Hive Issue Type: Bug Affects Versions: 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng Surefire 2.19.1 has some issue (https://issues.apache.org/jira/browse/SUREFIRE-1255) which caused debugging session to abort after a short period of time. Many IntelliJ users have seen this, although it looks fine for Eclipse users. Version 2.18.1 works fine. We'd better make the change to not impact the development for IntelliJ guys. We can upgrade again once the root cause is figured out. cc [~kgyrtkirk] [~ashutoshc] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-15891) Detect query rewrite scenario for UPDATE/DELETE/MERGE and fail fast
Wei Zheng created HIVE-15891: Summary: Detect query rewrite scenario for UPDATE/DELETE/MERGE and fail fast Key: HIVE-15891 URL: https://issues.apache.org/jira/browse/HIVE-15891 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng Currently ACID UpdateDeleteSemanticAnalyzer directly manipulates the AST tree but it's different from the general approach of modifying the token stream and thus will cause AST tree mismatch if there is any rewrite happening after UpdateDeleteSemanticAnalyzer. The long term solution will be to rewrite the AST handling logic in UpdateDeleteSemanticAnalyzer, to make it consistent with the general approach. This ticket will for now detect the error prone cases and fail early. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-15774) Ensure DbLockManager backward compatibility for non-ACID resources
Wei Zheng created HIVE-15774: Summary: Ensure DbLockManager backward compatibility for non-ACID resources Key: HIVE-15774 URL: https://issues.apache.org/jira/browse/HIVE-15774 Project: Hive Issue Type: Bug Components: Hive, Transactions Reporter: Wei Zheng Assignee: Wei Zheng In pre-ACID days, users perform operations such as INSERT with either ZooKeeperHiveLockManager or no lock manager at all. If their workflow is designed to take advantage of no locking and they take care of the control of concurrency, this works well with good performance. With ACID, if users enable transactions (i.e. using DbTxnManager & DbLockManager), then for all the operations, different types of locks will be acquired accordingly by DbLockManager, even for non-ACID resources. This may impact the performance of some workflows designed for pre-ACID use cases. A viable solution would be to differentiate the locking mode for ACID and non-ACID resources, so that DbLockManager will continue its current behavior for ACID tables, but will be able to acquire a less strict lock type for non-ACID resources, thus avoiding the performance loss for those workflows. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-15681) Pull specified version of jetty for Hive
Wei Zheng created HIVE-15681: Summary: Pull specified version of jetty for Hive Key: HIVE-15681 URL: https://issues.apache.org/jira/browse/HIVE-15681 Project: Hive Issue Type: Bug Components: Hive Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15644) Collect JVM metrics via JvmPauseMonitor
Wei Zheng created HIVE-15644: Summary: Collect JVM metrics via JvmPauseMonitor Key: HIVE-15644 URL: https://issues.apache.org/jira/browse/HIVE-15644 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.2.0 Reporter: Wei Zheng Similar to what Hadoop's JvmMetrics is doing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15628) Add more logs for hybrid grace hash join during the initial hash table loading
Wei Zheng created HIVE-15628: Summary: Add more logs for hybrid grace hash join during the initial hash table loading Key: HIVE-15628 URL: https://issues.apache.org/jira/browse/HIVE-15628 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng This can be useful for debugging memory issues. Metrics that can be possibly added: 1. Log memory usage after say every 50M data has been loaded 2. Add a counter for number of write buffers already allocated 3. Log a snapshot of partitions (memory usage for each of them) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15623) Use customized version of netty for llap
Wei Zheng created HIVE-15623: Summary: Use customized version of netty for llap Key: HIVE-15623 URL: https://issues.apache.org/jira/browse/HIVE-15623 Project: Hive Issue Type: Bug Components: llap Affects Versions: 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15622) Remove HWI component from Hive
Wei Zheng created HIVE-15622: Summary: Remove HWI component from Hive Key: HIVE-15622 URL: https://issues.apache.org/jira/browse/HIVE-15622 Project: Hive Issue Type: Bug Components: Web UI Affects Versions: 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15621) Remove use of JvmPauseMonitor in LLAP
Wei Zheng created HIVE-15621: Summary: Remove use of JvmPauseMonitor in LLAP Key: HIVE-15621 URL: https://issues.apache.org/jira/browse/HIVE-15621 Project: Hive Issue Type: Bug Components: llap Affects Versions: 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15589) Remove redundant test from TestDbTxnManager.testHeartbeater
Wei Zheng created HIVE-15589: Summary: Remove redundant test from TestDbTxnManager.testHeartbeater Key: HIVE-15589 URL: https://issues.apache.org/jira/browse/HIVE-15589 Project: Hive Issue Type: Bug Affects Versions: 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng Case 1 claims there's no delay for the heartbeat startup, but actually the logic is when delay is specified as 0, we will unconditionally set HiveConf.ConfVars.HIVE_TXN_TIMEOUT / 2 to be the delay. So this case 1 is not needed, as it's covered by case 2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource
Wei Zheng created HIVE-15421: Summary: Assumption in exception handling can be wrong in DagUtils.localizeResource Key: HIVE-15421 URL: https://issues.apache.org/jira/browse/HIVE-15421 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng In localizeResource once we got an IOException, we always assume this is due to another thread writing the same file. But that is not always the case. Even without the interference from other threads, it may still get an IOException (RemoteException) due to failure of copyFromLocalFile in a specific environment, for example, in a kerberized HDFS encryption zone where the TGT is expired. We'd better fail early with different message to avoid confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15376) Improve heartbeater scheduling for transactions
Wei Zheng created HIVE-15376: Summary: Improve heartbeater scheduling for transactions Key: HIVE-15376 URL: https://issues.apache.org/jira/browse/HIVE-15376 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15362) Add the missing fields for 2.2.0 upgrade scripts
Wei Zheng created HIVE-15362: Summary: Add the missing fields for 2.2.0 upgrade scripts Key: HIVE-15362 URL: https://issues.apache.org/jira/browse/HIVE-15362 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng The 2.2.0 upgrade scripts were cut on 05/25/16, while HIVE-13354 (which added some fields to upgrade scripts) was committed to master on 05/27/16, and there's no conflict. So we accidentally missed those fields for 2.2.0. cc [~ekoifman] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15267) Make query length calculation logic more accurate in TxnUtils.needNewQuery()
Wei Zheng created HIVE-15267: Summary: Make query length calculation logic more accurate in TxnUtils.needNewQuery() Key: HIVE-15267 URL: https://issues.apache.org/jira/browse/HIVE-15267 Project: Hive Issue Type: Bug Components: Hive, Transactions Affects Versions: 2.1.0, 1.2.1 Reporter: Wei Zheng Assignee: Wei Zheng In HIVE-15181 there's such review comment, for which this ticket will handle {code} in TxnUtils.needNewQuery() "sizeInBytes / 1024 > queryMemoryLimit" doesn't do the right thing. If the user sets METASTORE_DIRECT_SQL_MAX_QUERY_LENGTH to 1K, they most likely want each SQL string to be at most 1K. But if sizeInBytes=2047, this still returns false. It should include length of "suffix" in computation of sizeInBytes Along the same lines: the check for max query length is done after each batch is already added to the query. Suppose there are 1000 9-digit txn IDs in each IN(...). That's, conservatively, 18KB of text. So the length of each query is increasing in 18KB chunks. I think the check for query length should be done for each item in IN clause. If some DB has a limit on query length of X, then any query > X will fail. So I think this must ensure not to produce any queries > X, even by 1 char. For example, case 3.1 of the UT generates a query of almost 4000 characters - this is clearly > 1KB. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15265) support snapshot isolation for MM tables
Wei Zheng created HIVE-15265: Summary: support snapshot isolation for MM tables Key: HIVE-15265 URL: https://issues.apache.org/jira/browse/HIVE-15265 Project: Hive Issue Type: Sub-task Reporter: Wei Zheng Since MM table is using the incremental "delta" insertion mechanism via ACID, it makes sense to make MM tables support snapshot isolation as well -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15181) buildQueryWithINClause didn't properly handle multiples of ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE
Wei Zheng created HIVE-15181: Summary: buildQueryWithINClause didn't properly handle multiples of ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE Key: HIVE-15181 URL: https://issues.apache.org/jira/browse/HIVE-15181 Project: Hive Issue Type: Bug Components: Hive, Transactions Affects Versions: 2.1.0, 1.2.1 Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15099) PTFOperator.PTFInvocation didn't properly reset the input partition
Wei Zheng created HIVE-15099: Summary: PTFOperator.PTFInvocation didn't properly reset the input partition Key: HIVE-15099 URL: https://issues.apache.org/jira/browse/HIVE-15099 Project: Hive Issue Type: Bug Components: Hive, PTF-Windowing Affects Versions: 2.1.0, 1.2.1, 1.3.0, 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng There is an issue with PTFOperator.PTFInvocation where the inputPart is not reset properly. The inputPart has been closed and its content (member variables) has been cleaned up, but since itself is not nullified, it's reused in the next round and caused NPE issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15087) integrate MM tables into ACID: replace "hivecommit" property with ACID property
Wei Zheng created HIVE-15087: Summary: integrate MM tables into ACID: replace "hivecommit" property with ACID property Key: HIVE-15087 URL: https://issues.apache.org/jira/browse/HIVE-15087 Project: Hive Issue Type: Sub-task Reporter: Wei Zheng Assignee: Wei Zheng Previously declared DDL {code} create table t1 (key int, key2 int) tblproperties("hivecommit"="true"); {code} should be replaced with: {code} create table t1 (key int, key2 int) tblproperties("transactional"="true", "transactional_properties"="insert_only"); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14479) Add some join tests for acid table
Wei Zheng created HIVE-14479: Summary: Add some join tests for acid table Key: HIVE-14479 URL: https://issues.apache.org/jira/browse/HIVE-14479 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14447) Set HIVE_TRANSACTIONAL_TABLE_SCAN to the correct job conf for FetchOperator
Wei Zheng created HIVE-14447: Summary: Set HIVE_TRANSACTIONAL_TABLE_SCAN to the correct job conf for FetchOperator Key: HIVE-14447 URL: https://issues.apache.org/jira/browse/HIVE-14447 Project: Hive Issue Type: Bug Components: Hive, Transactions Affects Versions: 1.3.0, 2.2.0, 2.1.1 Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14446) Disable bloom filter for hybrid grace hash join when row count exceeds certain limit
Wei Zheng created HIVE-14446: Summary: Disable bloom filter for hybrid grace hash join when row count exceeds certain limit Key: HIVE-14446 URL: https://issues.apache.org/jira/browse/HIVE-14446 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.3.0, 2.2.0, 2.1.1 Reporter: Wei Zheng Assignee: Wei Zheng When row count exceeds certain limit, it doesn't make sense to generate a bloom filter, since its size will be a few hundred MB or even a few GB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14400) Handle concurrent insert with dynamic partition
Wei Zheng created HIVE-14400: Summary: Handle concurrent insert with dynamic partition Key: HIVE-14400 URL: https://issues.apache.org/jira/browse/HIVE-14400 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng With multiple users concurrently issuing insert statements on the same partition has a side effect that some queries may not see a partition at the time when they're issued, but will realize the partition is actually there when it is trying to add such partition to the metastore and thus get AlreadyExistsException, because some earlier query just created it (race condition). For example, imagine such a table is created: {code} create table T (name char(50)) partitioned by (ds string) clustered by (name) into 2 buckets stored as orc tblproperties('transactional'='true'); {code} and the following two queries are launched at the same time, from different sessions: {code} insert into table T partition (ds) values ('Bob', 'today'); -- creates the partition 'today' insert into table T partition (ds) values ('Joe', 'today'); -- will fail with AlreadyExistsException {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14381) Handle null value in WindowingTableFunction.WindowingIterator.next()
Wei Zheng created HIVE-14381: Summary: Handle null value in WindowingTableFunction.WindowingIterator.next() Key: HIVE-14381 URL: https://issues.apache.org/jira/browse/HIVE-14381 Project: Hive Issue Type: Bug Components: PTF-Windowing Affects Versions: 2.1.0, 1.3.0, 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14339) Fix UT failure for acid_globallimit.q
Wei Zheng created HIVE-14339: Summary: Fix UT failure for acid_globallimit.q Key: HIVE-14339 URL: https://issues.apache.org/jira/browse/HIVE-14339 Project: Hive Issue Type: Bug Affects Versions: 2.1.0, 1.3.0, 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14311) No need to schedule Heartbeat task if the query doesn't require locks
Wei Zheng created HIVE-14311: Summary: No need to schedule Heartbeat task if the query doesn't require locks Key: HIVE-14311 URL: https://issues.apache.org/jira/browse/HIVE-14311 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.3.0, 2.2.0, 2.1.1 Reporter: Wei Zheng Assignee: Wei Zheng Otherwise the Heartbeat task will just stay there and not be cleaned up, which may cause OOM eventually. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14061) Add unit test for kerberos support in Hive streaming
Wei Zheng created HIVE-14061: Summary: Add unit test for kerberos support in Hive streaming Key: HIVE-14061 URL: https://issues.apache.org/jira/browse/HIVE-14061 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13972) Resolve class dependency issue introduced by HIVE-13354
Wei Zheng created HIVE-13972: Summary: Resolve class dependency issue introduced by HIVE-13354 Key: HIVE-13972 URL: https://issues.apache.org/jira/browse/HIVE-13972 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.3.0, 2.1.0, 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng Priority: Blocker HIVE-13354 moved a helper class StringableMap from ql/txn/compactor/CompactorMR.java to metastore/txn/TxnUtils.java This introduced a dependency from ql package to metastore package which is not allowed and fails in a real cluster. Instead of moving it to metastore, it should be moved to common package. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13961) ACID: Major compaction fails to include the original bucket files if there's no delta directory
Wei Zheng created HIVE-13961: Summary: ACID: Major compaction fails to include the original bucket files if there's no delta directory Key: HIVE-13961 URL: https://issues.apache.org/jira/browse/HIVE-13961 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.3.0, 2.1.0, 2.2.0 Reporter: Wei Zheng Assignee: Wei Zheng The issue can be reproduced by steps below: 1. Insert a row to Non-ACID table 2. Convert Non-ACID to ACID table 3. Perform Major compaction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13934) Tez needs to allocate extra buffer space for joins
Wei Zheng created HIVE-13934: Summary: Tez needs to allocate extra buffer space for joins Key: HIVE-13934 URL: https://issues.apache.org/jira/browse/HIVE-13934 Project: Hive Issue Type: Bug Reporter: Wei Zheng Assignee: Siddharth Seth Otherwise it's very easy to run OOM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13834) Use LinkedHashMap instead of HashMap for LockRequestBuilder to maintain predictable iteration order
Wei Zheng created HIVE-13834: Summary: Use LinkedHashMap instead of HashMap for LockRequestBuilder to maintain predictable iteration order Key: HIVE-13834 URL: https://issues.apache.org/jira/browse/HIVE-13834 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.3.0, 2.1.0 Reporter: Wei Zheng Assignee: Wei Zheng In Java 7 it is assumed the iteration order is always the same as the insert order, but that's not guaranteed. In Java 8 some unit test breaks because of this ordering change. Solution is to use LinkedHashMap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13833) Add an initial delay when starting the heartbeat
Wei Zheng created HIVE-13833: Summary: Add an initial delay when starting the heartbeat Key: HIVE-13833 URL: https://issues.apache.org/jira/browse/HIVE-13833 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 2.0.0, 2.1.0 Reporter: Wei Zheng Assignee: Wei Zheng Priority: Minor Since the scheduling of heartbeat happens immediately after lock acquisition, it's unnecessary to send heartbeat at the time when locks is acquired. Add an initial delay to skip this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13809) Hybrid Grace Hash Join memory usage estimation didn't take into account the bloom filter size
Wei Zheng created HIVE-13809: Summary: Hybrid Grace Hash Join memory usage estimation didn't take into account the bloom filter size Key: HIVE-13809 URL: https://issues.apache.org/jira/browse/HIVE-13809 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.0.0, 2.1.0 Reporter: Wei Zheng Assignee: Wei Zheng Memory estimation is important during hash table loading, because we need to make the decision of whether to load the next hash partition in memory or spill it. If the assumption is there's enough memory but it turns out not the case, we will run into OOM problem. Currently hybrid grace hash join memory usage estimation didn't take into account the bloom filter size. In large test cases (TB scale) the bloom filter grows as big as hundreds of MB, big enough to cause estimation error. The solution is to count in the bloom filter size into memory estimation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13755) Hybrid mapjoin allocates memory the same for multi broadcast
Wei Zheng created HIVE-13755: Summary: Hybrid mapjoin allocates memory the same for multi broadcast Key: HIVE-13755 URL: https://issues.apache.org/jira/browse/HIVE-13755 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.1.0 Reporter: Wei Zheng Assignee: Wei Zheng PROBLEM: When hybrid mapjoin gets the memory needed, it estimates memory needed for each hashtable the same. This may cause problem when there are multiple broadcast, as it may exceeds the memory intended to allocate to it. Example reducer task log attached. This task has 5 broadcast input, Reducer 3 <- Map 10 (BROADCAST_EDGE), Map 11 (BROADCAST_EDGE), Map 12 (BROADCAST_EDGE), Map 8 (SIMPLE_EDGE), Map 9 (BROADCAST_EDGE), Reducer 2 (SIMPLE_EDGE) excerpt of it: {code} 2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Memory manager allocates 0 bytes for the loading hashtable. 2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] |persistence.HashMapWrapper|: Key count from statistics is 210; setting map size to 280 2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Total available memory: 1968177152 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Estimated small table size: 155190 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Number of hash partitions to be created: 16 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Write buffer size: 524288 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Number of partitions created: 16 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Number of partitions spilled directly to disk on creation: 0 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Using tableContainer HybridHashTableContainer 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Initializing container with org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe and org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe 2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |readers.UnorderedKVReader|: Num Records read: 20 2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |log.PerfLogger|: 2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |tez.ObjectCache|: Caching key: svc-phx-efmhadoop_20160315191303_8c53ce88-e64f-4d36-bad0-846bbf096f57__HASH_MAP_MAPJOIN_126_container 2016-03-15 19:23:50,814 [INFO] [TezChild] |exec.HashTableDummyOperator|: Initializing operator HASHTABLEDUMMY[32] 2016-03-15 19:23:50,814 [INFO] [TezChild] |exec.MapJoinOperator|: Initializing operator MAPJOIN[26] 2016-03-15 19:23:50,816 [INFO] [TezChild] |exec.CommonJoinOperator|: JOIN struct<_col3:string,_col4:decimal(5,0),_col5:char(1),_col6:char(1),_col7:date,_col8:string,_col9:string,_col12:string,_col13:string,_col14:string,_col15:string,_col16:string,_col19:decimal(13,3),_col20:string,_col22:decimal(5,0),_col23:decimal(5,0),_col24:decimal(5,0),_col25:decimal(5,0),_col26:decimal(13,2),_col27:decimal(5,0),_col28:decimal(15,2),_col29:decimal(15,2),_col31:decimal(3,0),_col33:char(1),_col41:decimal(3,1),_col42:char(1),_col43:decimal(3,1),_col44:string,_col45:char(1),_col48:char(1),_col55:char(1),_col57:char(1),_col59:char(1),_col60:string,_col64:string,_col65:string,_col67:decimal(15,2),_col76:decimal(3,0),_col81:char(1),_col98:string,_col99:string,_col105:string,_col108:string,_col122:string,_col123:decimal(5,0),_col127:string,_col128:decimal(5,0),_col129:string,_col137:char(1),_col139:string,_col145:string,_col151:string,_col152:string,_col154:string,_col158:char(1),_col164:char(1),_col204:string,_col213:string,_col214:char(1),_col215:string,_col218:char(1),_col219:date,_col220:string,_col221:decimal(5,0),_col222:decimal(5,0),_col223:string,_col224:char(1),_col225:string,_col226:decimal(3,0),_col231:string,_col232:string,_col233:string,_col234:decimal(9,5),_col236:date,_col240:date,_col256:string,_col257:string,_col268:string,_col269:string,_col270:char(1),_col271:string,_col272:char(1),_col324:string,_col344:string,_col464:string,_col478:decimal(5,0),_col479:decimal(5,0),_col519:string,_col532:string,_col534:char(1),_col540:decimal(13,3),_col541:decimal(13,3),_col561:string,_col568:char(1),_col570:string> totalsz = 95 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |log.PerfLogger|: 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Memory manager allocates 0 bytes for the loading hashtable. 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |persistence.HashMapWrapper|: Key count from statistics is 5942112; setting map size to 7922816 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Total
[jira] [Created] (HIVE-13753) Make metastore client thread safe in DbTxnManager
Wei Zheng created HIVE-13753: Summary: Make metastore client thread safe in DbTxnManager Key: HIVE-13753 URL: https://issues.apache.org/jira/browse/HIVE-13753 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.3.0, 2.1.0 Reporter: Wei Zheng Assignee: Wei Zheng The fact that multiple threads sharing the same metastore client which is used for RPC to Thrift is not thread safe. Race condition can happen when one sees "out of sequence response" error message from Thrift server. That means the response from the Thrift server is for a different request (by a different thread). Solution will be to synchronize methods from the client side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13724) Backport HIVE-11591 to branch-1 to use undated annotations
Wei Zheng created HIVE-13724: Summary: Backport HIVE-11591 to branch-1 to use undated annotations Key: HIVE-13724 URL: https://issues.apache.org/jira/browse/HIVE-13724 Project: Hive Issue Type: Bug Components: Thrift API Affects Versions: 1.2.1 Reporter: Wei Zheng Assignee: Wei Zheng HIVE-12832 changed branch-1 hive pom file and updated thrift version from 0.9.2 to 0.9.3. But it didn't update the thrift args part to use undated annotation from HIVE-11591. So every time someone is running maven thrift re-gen command, it will still update a lot of unrelated files, just because of the date change. Need backport HIVE-11591 to branch-1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13694) Prevent ACID table being unusable due to DDL changes
Wei Zheng created HIVE-13694: Summary: Prevent ACID table being unusable due to DDL changes Key: HIVE-13694 URL: https://issues.apache.org/jira/browse/HIVE-13694 Project: Hive Issue Type: Bug Components: Transactions Reporter: Wei Zheng Assignee: Wei Zheng Currently in order to define an ACID table, the following three conditions need to be satisfied: * tblproperties ('transactional'='true') * table has to be bucketed * table format has to be stored as ORC If any of the above condition doesn't hold, the table won't be ACID compliant, and query result against the table will be unexpected. HIVE-12064 made sure that reverting tblproperties 'transactional' from 'true' to 'false' is not allowed. But changes for the other two conditions are still not restrained. We need to make sure an ACID table cannot be un-bucketed, and cannot use other data format other than ORC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13684) Remove the deprecated IMetaStoreClient.showLocks()
Wei Zheng created HIVE-13684: Summary: Remove the deprecated IMetaStoreClient.showLocks() Key: HIVE-13684 URL: https://issues.apache.org/jira/browse/HIVE-13684 Project: Hive Issue Type: Bug Reporter: Wei Zheng Assignee: Wei Zheng IMetaStoreClient.showLocks() is deprecated in Hive 2.1. This method can be removed in Hive 2.2 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13563) Hive Streaming does not honor orc.compress.size and orc.stripe.size table properties
Wei Zheng created HIVE-13563: Summary: Hive Streaming does not honor orc.compress.size and orc.stripe.size table properties Key: HIVE-13563 URL: https://issues.apache.org/jira/browse/HIVE-13563 Project: Hive Issue Type: Bug Components: ORC Affects Versions: 2.1.0 Reporter: Wei Zheng Assignee: Wei Zheng According to the doc: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-HiveQLSyntax One should be able to specify tblproperties for many ORC options. But the settings for orc.compress.size and orc.stripe.size don't take effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13458) Heartbeater doesn't fail query when heartbeat fails
Wei Zheng created HIVE-13458: Summary: Heartbeater doesn't fail query when heartbeat fails Key: HIVE-13458 URL: https://issues.apache.org/jira/browse/HIVE-13458 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 2.1.0 Reporter: Wei Zheng Assignee: Wei Zheng When a heartbeat fails to locate a lock, it should fail the current query. That doesn't happen, which is a bug. Another thing is, we need to make sure stopHeartbeat really stops the heartbeat, i.e. no additional heartbeat will be sent, since that will break the assumption and cause the query to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13388) Fix inconsistent content due to Thrift changes
Wei Zheng created HIVE-13388: Summary: Fix inconsistent content due to Thrift changes Key: HIVE-13388 URL: https://issues.apache.org/jira/browse/HIVE-13388 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.1.0 Reporter: Wei Zheng Assignee: Wei Zheng HIVE-12442 and HIVE-12862 are related here. If one wants to make some thrift change by following instruction here: https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-GeneratingThriftCode When they first execute (i.e. in a clean environment) {code} mvn clean install -Pthriftif -DskipTests -Dthrift.home=/usr/local -Phadoop-2 {code} The following content will show up {code} $ git status On branch master Your branch is up-to-date with 'origin/master'. Untracked files: (use "git add ..." to include in what will be committed) service-rpc/src/gen/thrift/gen-py/__init__.py service/src/gen/ nothing added to commit but untracked files present (use "git add" to track) {code} They should have been included in the codebase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13249) Hard upper bound on number of open transactions
Wei Zheng created HIVE-13249: Summary: Hard upper bound on number of open transactions Key: HIVE-13249 URL: https://issues.apache.org/jira/browse/HIVE-13249 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 2.0.0 Reporter: Wei Zheng Assignee: Wei Zheng We need to have a safeguard by adding an upper bound for open transactions to avoid huge number of open-transaction requests, usually due to improper configuration of clients such as Storm. Once that limit is reached, clients will start failing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13201) Compaction shouldn't be allowed on non-ACID table
Wei Zheng created HIVE-13201: Summary: Compaction shouldn't be allowed on non-ACID table Key: HIVE-13201 URL: https://issues.apache.org/jira/browse/HIVE-13201 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 2.0.0 Reporter: Wei Zheng Assignee: Wei Zheng Looks like compaction is allowed on non-ACID table, although that's of no sense and does nothing. Moreover the compaction request will be enqueued into COMPACTION_QUEUE metastore table, which brings unnecessary overhead. We should prevent compaction commands being allowed on non-ACID tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13186) ALTER TABLE RENAME should lowercase table name and hdfs location
Wei Zheng created HIVE-13186: Summary: ALTER TABLE RENAME should lowercase table name and hdfs location Key: HIVE-13186 URL: https://issues.apache.org/jira/browse/HIVE-13186 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.0.0 Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13175) Disallow making external tables transactional
Wei Zheng created HIVE-13175: Summary: Disallow making external tables transactional Key: HIVE-13175 URL: https://issues.apache.org/jira/browse/HIVE-13175 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.0.0 Reporter: Wei Zheng Assignee: Wei Zheng The fact that compactor rewrites contents of ACID tables is in conflict with what is expected of external tables. Conversely, end user can write to External table which certainly not what is expected of ACID table. So we should explicitly disallow making an external table ACID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13174) Remove Vectorizer noise in logs
Wei Zheng created HIVE-13174: Summary: Remove Vectorizer noise in logs Key: HIVE-13174 URL: https://issues.apache.org/jira/browse/HIVE-13174 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.0.0 Reporter: Wei Zheng Assignee: Wei Zheng If you have a table with a bin column you're hs2/client logs are full of the stack traces below. These should either be made debug or we just log the message not the trace. {code} 2015-10-12 12:34:23,922 INFO [main]: physical.Vectorizer (Vectorizer.java:validateExprNodeDesc(1249)) - Failed to vectorize org.apache.hadoop.hive.ql.metadata.HiveException: No vector argument type for type name binary at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getConstantVectorExpression(VectorizationContext.java:872) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:443) at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateExprNodeDesc(Vectorizer.java:1243) at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateExprNodeDesc(Vectorizer.java:1234) at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateSelectOperator(Vectorizer.java:1100) at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateMapWorkOperator(Vectorizer.java:911) at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$MapWorkValidationNodeProcessor.process(Vectorizer.java:581) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110) at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateMapWork(Vectorizer.java:412) at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(Vectorizer.java:355) at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Vectorizer.java:330) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125) at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(Vectorizer.java:890) at org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(TezCompiler.java:469) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:227) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10188) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:211) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {code} -- This message was sent
[jira] [Created] (HIVE-13151) Clean up UGI objects in FileSystem cache for transactions
Wei Zheng created HIVE-13151: Summary: Clean up UGI objects in FileSystem cache for transactions Key: HIVE-13151 URL: https://issues.apache.org/jira/browse/HIVE-13151 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.0.0 Reporter: Wei Zheng Assignee: Wei Zheng One issue with FileSystem.CACHE is that it does not clean itself. The key in that cache includes UGI object. When new UGI objects are created and used with the FileSystem api, new entries get added to the cache. We need to manually clean up those UGI objects once they are no longer in use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13126) Clean up MapJoinOperator properly to avoid object cache reuse with unintentional states
Wei Zheng created HIVE-13126: Summary: Clean up MapJoinOperator properly to avoid object cache reuse with unintentional states Key: HIVE-13126 URL: https://issues.apache.org/jira/browse/HIVE-13126 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.0.0 Reporter: Wei Zheng Assignee: Wei Zheng For a given job, one task may reuse other task's object cache (plan cache) such as MapJoinOperator. This is fine. But if we have some dirty states left over, it may cause issue like wrong results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12996) Temp tables shouldn't be stored in metastore tables for ACID
Wei Zheng created HIVE-12996: Summary: Temp tables shouldn't be stored in metastore tables for ACID Key: HIVE-12996 URL: https://issues.apache.org/jira/browse/HIVE-12996 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.0.0 Reporter: Wei Zheng Assignee: Wei Zheng Internally, INSERT INTO ... VALUES statements use temp table to accomplish its functionality. But temp tables shouldn't be stored in the metastore tables for ACID, because they are by definition only visible inside the session that created them, and we don't allow multiple threads inside a session. If a temp table is used in a query, it should be ignored by lock manager. {code} mysql> select * from COMPLETED_TXN_COMPONENTS; +---+--+---+--+ | CTC_TXNID | CTC_DATABASE | CTC_TABLE | CTC_PARTITION| +---+--+---+--+ | 1 | acid | t1| NULL | | 1 | acid | values__tmp__table__1 | NULL | | 2 | acid | t1| NULL | | 2 | acid | values__tmp__table__2 | NULL | | 3 | acid | values__tmp__table__3 | NULL | | 3 | acid | t1| NULL | | 4 | acid | values__tmp__table__1 | NULL | | 4 | acid | t2p | ds=today | | 5 | acid | values__tmp__table__1 | NULL | | 5 | acid | t3p | ds=today/hour=12 | +---+--+---+--+ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12837) Better memory estimation/allocation for hybrid grace hash join during hash table loading
Wei Zheng created HIVE-12837: Summary: Better memory estimation/allocation for hybrid grace hash join during hash table loading Key: HIVE-12837 URL: https://issues.apache.org/jira/browse/HIVE-12837 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.1.0 Reporter: Wei Zheng Assignee: Wei Zheng This is to avoid an edge case when the memory available is very little (less than a single write buffer size), and we start loading the hash table. Since the write buffer is lazily allocated, we will easily run out of memory before even checking if we should spill any hash partition. e.g. Total memory available: 210 MB Size of ref array of BytesBytesMultiHashMap for each hash partition: ~16 MB Size of write buffer: 8 MB (lazy allocation) # hash partitions: 16 # hash partitions created in memory: 13 # hash partitions created on disk: 3 Available memory left after HybridHashTableContainer initialization: 210-16*13=2MB Now let's say a row is to be loaded into a hash partition in memory, it will try to allocate an 8MB write buffer for it, but we only have 2MB, thus OOM. Solution is to perform the check for possible spilling earlier so we can spill partitions if memory is about to be full, to avoid OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job
Wei Zheng created HIVE-12724: Summary: ACID: Major compaction fails to include the original bucket files into MR job Key: HIVE-12724 URL: https://issues.apache.org/jira/browse/HIVE-12724 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.0.0, 2.1.0 Reporter: Wei Zheng Assignee: Wei Zheng How the problem happens: * Create a non-ACID table * Before non-ACID to ACID table conversion, we inserted row one * After non-ACID to ACID table conversion, we inserted row two * Both rows can be retrieved before MAJOR compaction * After MAJOR compaction, row one is lost {code} hive> USE acidtest; OK Time taken: 0.77 seconds hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment STRING) > CLUSTERED BY (regionkey) INTO 2 BUCKETS > STORED AS ORC; OK Time taken: 0.179 seconds hive> DESC FORMATTED t1; OK # col_name data_type comment nationkey int namestring regionkey int comment string # Detailed Table Information Database: acidtest Owner: wzheng CreateTime: Mon Dec 14 15:50:40 PST 2015 LastAccessTime: UNKNOWN Retention: 0 Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 Table Type: MANAGED_TABLE Table Parameters: transient_lastDdlTime 1450137040 # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets:2 Bucket Columns: [regionkey] Sort Columns: [] Storage Desc Params: serialization.format1 Time taken: 0.198 seconds, Fetched: 28 row(s) hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db; Found 1 items drwxr-xr-x - wzheng staff 68 2015-12-14 15:50 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1 hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states'); WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases. Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 2 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Job running in-process (local Hadoop) 2015-12-14 15:51:58,070 Stage-1 map = 100%, reduce = 100% Ended Job = job_local73977356_0001 Loading data to table acidtest.t1 MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 2.825 seconds hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; Found 2 items -rwxr-xr-x 1 wzheng staff112 2015-12-14 15:51 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0 -rwxr-xr-x 1 wzheng staff472 2015-12-14 15:51 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0 hive> SELECT * FROM t1; OK 1 USA 1 united states Time taken: 0.434 seconds, Fetched: 1 row(s) hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true'); OK Time taken: 0.071 seconds hive> DESC FORMATTED t1; OK # col_name data_type comment nationkey int namestring regionkey int comment string # Detailed Table Information Database: acidtest Owner: wzheng CreateTime: Mon Dec 14 15:50:40 PST 2015 LastAccessTime: UNKNOWN Retention: 0 Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE false last_modified_bywzheng last_modified_time 1450137141 numFiles2 numRows -1 rawDataSize -1 totalSize 584 transactional true transient_lastDdlTime 1450137141 # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets:2 Bucket Columns: [regionkey] Sort Columns: [] Storage Desc Params:
[jira] [Created] (HIVE-12685) Remove invalid property in common/src/test/resources/hive-site.xml
Wei Zheng created HIVE-12685: Summary: Remove invalid property in common/src/test/resources/hive-site.xml Key: HIVE-12685 URL: https://issues.apache.org/jira/browse/HIVE-12685 Project: Hive Issue Type: Bug Affects Versions: 2.0.0, 2.1.0 Reporter: Wei Zheng Assignee: Wei Zheng Currently there's such a property as below, which is obviously wrong {code} javax.jdo.option.ConnectionDriverName hive-site.xml Override ConfVar defined in HiveConf {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12610) Hybrid Grace Hash Join should fail task faster if processing first batch fails, instead of continuing processing the rest
Wei Zheng created HIVE-12610: Summary: Hybrid Grace Hash Join should fail task faster if processing first batch fails, instead of continuing processing the rest Key: HIVE-12610 URL: https://issues.apache.org/jira/browse/HIVE-12610 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: Wei Zheng Assignee: Wei Zheng During processing the in memory partition(s), if there's any fatal error, such as Kryo exception, then we should exit early, instead of moving on to process the spilled partition(s). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12453) Check SessionState status before performing cleanup
Wei Zheng created HIVE-12453: Summary: Check SessionState status before performing cleanup Key: HIVE-12453 URL: https://issues.apache.org/jira/browse/HIVE-12453 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.3.0, 2.0.0 Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12444) Queries against ACID table without base directory may throw exception
Wei Zheng created HIVE-12444: Summary: Queries against ACID table without base directory may throw exception Key: HIVE-12444 URL: https://issues.apache.org/jira/browse/HIVE-12444 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: Wei Zheng Assignee: Wei Zheng Steps to reproduce: set hive.fetch.task.conversion=minimal; set hive.limit.optimize.enable=true; create table acidtest1( c_custkey int, c_name string, c_nationkey int, c_acctbal double) clustered by (c_nationkey) into 3 buckets stored as orc tblproperties("transactional"="true"); insert into table acidtest1 select c_custkey, c_name, c_nationkey, c_acctbal from tpch_text_10.customer; select cast (c_nationkey as string) from acidtest.acidtest1 limit 10; {code} DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1447362491939_0020_1_00, diagnostics=[Vertex vertex_1447362491939_0020_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: acidtest1 initializer failed, vertex=vertex_1447362491939_0020_1_00 [Map 1], java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1035) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1062) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:308) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:410) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:246) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:240) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:240) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:227) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: delta_017_017 does not start with base_ at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1012) ... 15 more Caused by: java.lang.IllegalArgumentException: delta_017_017 does not start with base_ at org.apache.hadoop.hive.ql.io.AcidUtils.parseBase(AcidUtils.java:144) at org.apache.hadoop.hive.ql.io.AcidUtils.parseBaseBucketFilename(AcidUtils.java:172) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:667) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:625) ... 4 more ]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12366) Refactor Heartbeater logic for transaction
Wei Zheng created HIVE-12366: Summary: Refactor Heartbeater logic for transaction Key: HIVE-12366 URL: https://issues.apache.org/jira/browse/HIVE-12366 Project: Hive Issue Type: Bug Components: Hive Reporter: Wei Zheng Assignee: Wei Zheng Currently there is a gap between the time locks acquisition and the first heartbeat being sent out. Normally the gap is negligible, but when it's big it will cause query fail since the locks are timed out by the time the heartbeat is sent. Need to remove this gap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12180) Use MapJoinDesc::isHybridHashJoin() instead of the HiveConf lookup in Vectorizer
Wei Zheng created HIVE-12180: Summary: Use MapJoinDesc::isHybridHashJoin() instead of the HiveConf lookup in Vectorizer Key: HIVE-12180 URL: https://issues.apache.org/jira/browse/HIVE-12180 Project: Hive Issue Type: Bug Components: Hive Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12074) Conditionally turn off hybrid grace hash join based on est. data size, etc
Wei Zheng created HIVE-12074: Summary: Conditionally turn off hybrid grace hash join based on est. data size, etc Key: HIVE-12074 URL: https://issues.apache.org/jira/browse/HIVE-12074 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1, 1.2.0 Reporter: Wei Zheng Assignee: Wei Zheng Currently, as long as the below flag is set to true, we always do grace hash join for map join. This may not be necessary, esp. for cases where the data size is quite small, and number of distinct values is also small. hive.mapjoin.hybridgrace.hashtable -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12032) Add unit test for HIVE-9855
Wei Zheng created HIVE-12032: Summary: Add unit test for HIVE-9855 Key: HIVE-12032 URL: https://issues.apache.org/jira/browse/HIVE-12032 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1, 1.2.0 Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12041) Add unit test for HIVE-9386
Wei Zheng created HIVE-12041: Summary: Add unit test for HIVE-9386 Key: HIVE-12041 URL: https://issues.apache.org/jira/browse/HIVE-12041 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1, 1.1.1, 1.1.0, 1.2.0 Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11942) MetaException(message:The threadlocal Deadline is null, please register it first.)
Wei Zheng created HIVE-11942: Summary: MetaException(message:The threadlocal Deadline is null, please register it first.) Key: HIVE-11942 URL: https://issues.apache.org/jira/browse/HIVE-11942 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.2.1 Reporter: Wei Zheng I got such exception when running qtest unionDistinct_1.q with my WIP patch for another JIRA (attached). I tried the same qfile on master w/o my patch and couldn't reproduce. But I don't have any change that's related to metastore, so I guess maybe my code exposed some bug. {code} 2015-09-23T17:02:05,385 ERROR [main]: ql.Driver (SessionState.java:printError(967)) - FAILED: RuntimeException org.apache.hadoop.hive.ql.parse.SemanticException: MetaException(message:The threadlocal Deadline is null, please register it first.) java.lang.RuntimeException: org.apache.hadoop.hive.ql.parse.SemanticException: MetaException(message:The threadlocal Deadline is null, please register it first.) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:151) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:617) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:252) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10143) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:212) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:240) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:240) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:310) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1156) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1209) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1085) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1075) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1084) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1058) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:147) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_unionDistinct_1(TestMiniTezCliDriver.java:131) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: MetaException(message:The threadlocal Deadline is null, please register it first.) at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getPartitionsFromServer(PartitionPruner.java:431) at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:219) at org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.computePartitionList(RelOptHiveTable.java:253) at org.apache.hadoop.hive.ql.optimizer.calcite.rules.HivePartitionPruneRule.perform(HivePartitionPruneRule.java:55) at
[jira] [Created] (HIVE-11889) Add unit test for HIVE-11449
Wei Zheng created HIVE-11889: Summary: Add unit test for HIVE-11449 Key: HIVE-11889 URL: https://issues.apache.org/jira/browse/HIVE-11889 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.3.0, 2.0.0 Reporter: Wei Zheng Assignee: Wei Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11714) Handle cross product join properly for Hybrid grace hashjoin
Wei Zheng created HIVE-11714: Summary: Handle cross product join properly for Hybrid grace hashjoin Key: HIVE-11714 URL: https://issues.apache.org/jira/browse/HIVE-11714 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Reporter: Wei Zheng Assignee: Wei Zheng Current partitioning calculation is solely based on hash value of the key. For cross product join where keys are empty, all the rows will be put into partition 0. This falls back to the regular mapjoin behavior where we only have one hashtable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11566) Hybrid grace hash join should only allocate write buffer for a hash partition when first write happens
Wei Zheng created HIVE-11566: Summary: Hybrid grace hash join should only allocate write buffer for a hash partition when first write happens Key: HIVE-11566 URL: https://issues.apache.org/jira/browse/HIVE-11566 Project: Hive Issue Type: Bug Components: Hive Reporter: Wei Zheng Assignee: Wei Zheng Currently it's allocating a write buffer for a fixed number of hash partitions up front, which causes GC pause. It's better to do the write buffer allocation on demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11467) WriteBuffers rounding wbSize to next power of 2 may cause OOM
Wei Zheng created HIVE-11467: Summary: WriteBuffers rounding wbSize to next power of 2 may cause OOM Key: HIVE-11467 URL: https://issues.apache.org/jira/browse/HIVE-11467 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0, 2.0.0 Reporter: Wei Zheng Assignee: Wei Zheng If wbSize passed to WriteBuffers cstr is not power of 2, it will do a rounding first to the next power of 2 {code} public WriteBuffers(int wbSize, long maxSize) { this.wbSize = Integer.bitCount(wbSize) == 1 ? wbSize : (Integer.highestOneBit(wbSize) 1); this.wbSizeLog2 = 31 - Integer.numberOfLeadingZeros(this.wbSize); this.offsetMask = this.wbSize - 1; this.maxSize = maxSize; writePos.bufferIndex = -1; nextBufferToWrite(); } {code} That may break existing memory consumption assumption for mapjoin, and potentially cause OOM. The solution will be to pass a power of 2 number as wbSize from upstream during hashtable creation, to avoid this late expansion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11256) Update release note to clarify hadoop compatibility
Wei Zheng created HIVE-11256: Summary: Update release note to clarify hadoop compatibility Key: HIVE-11256 URL: https://issues.apache.org/jira/browse/HIVE-11256 Project: Hive Issue Type: Bug Components: Documentation, Website Affects Versions: 1.2.0, 1.0.0, 0.14.0, 1.1.0, 1.0.1, 1.1.1, 1.2.1 Reporter: Wei Zheng On the Downloads page: http://hive.apache.org/downloads.html We should say This release works with Hadoop 1.2.0+, 2.x.y for Hive 0.14+. This is due to HIVE-8189 starting using org.apache.hadoop.mapred.JobConf.unset method, which is only available since Hadoop 1.2.0. Users using Hadoop versions earlier than that encountered NoSuchMethodError exception: e.g. HIVE-11246 http://stackoverflow.com/questions/28070003/error-while-executing-select-query-in-hive -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11193) ConstantPropagateProcCtx should use a Set instead of a List to hold operators to be deleted
Wei Zheng created HIVE-11193: Summary: ConstantPropagateProcCtx should use a Set instead of a List to hold operators to be deleted Key: HIVE-11193 URL: https://issues.apache.org/jira/browse/HIVE-11193 Project: Hive Issue Type: Bug Components: Logical Optimizer Reporter: Wei Zheng Assignee: Wei Zheng During Constant Propagation optimization, sometimes a node ends up being added to opToDelete list more than once. Later in ConstantPropagate transform, we try to delete that operator multiple times, which will cause SemanticException since the node has already been removed in an earlier pass. The data structure for storing opToDelete is List. We should use Set to avoid the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11155) TestHiveMetaTool needs to cover updates for DBS and SDS tables in metastore as well
Wei Zheng created HIVE-11155: Summary: TestHiveMetaTool needs to cover updates for DBS and SDS tables in metastore as well Key: HIVE-11155 URL: https://issues.apache.org/jira/browse/HIVE-11155 Project: Hive Issue Type: Bug Components: Metastore Reporter: Wei Zheng Assignee: Wei Zheng Priority: Minor In HIVE-11147 we solved a Hive MetaTool bug. During testing, it has been found that TestHiveMetaTool doesn't cover many cases such as updating FS Root location for DBS and SDS table. The test coverage needs to be enhanced. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11147) MetaTool doesn't update FS root location for partitions with space in name
Wei Zheng created HIVE-11147: Summary: MetaTool doesn't update FS root location for partitions with space in name Key: HIVE-11147 URL: https://issues.apache.org/jira/browse/HIVE-11147 Project: Hive Issue Type: Bug Components: Metastore Reporter: Wei Zheng Assignee: Wei Zheng Problem happens when trying to update the FS root location: {code} # HIVE_CONF_DIR=/etc/hive/conf.server/ hive --service metatool -dryRun -updateLocation hdfs://mycluster hdfs://c6401.ambari.apache.org:8020 ... Looking for LOCATION_URI field in DBS table to update.. Dry Run of updateLocation on table DBS.. old location: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse new location: hdfs://mycluster/apps/hive/warehouse Found 1 records in DBS table to update Looking for LOCATION field in SDS table to update.. Dry Run of updateLocation on table SDS.. old location: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/web_sales/ws_web_site_sk=12 new location: hdfs://mycluster/apps/hive/warehouse/web_sales/ws_web_site_sk=12 old location: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/web_sales/ws_web_site_sk=13 new location: hdfs://mycluster/apps/hive/warehouse/web_sales/ws_web_site_sk=13 ... Found 143 records in SDS table to update Warning: Found records with bad LOCATION in SDS table.. bad location URI: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=Advanced Degree bad location URI: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=Advanced Degree bad location URI: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=4 yr Degree bad location URI: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=4 yr Degree bad location URI: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=2 yr Degree bad location URI: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=2 yr Degree {code} The reason why some entries are marked as bad location is that they have space character in the partition name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11038) MiniTezCli tests are hanging
Wei Zheng created HIVE-11038: Summary: MiniTezCli tests are hanging Key: HIVE-11038 URL: https://issues.apache.org/jira/browse/HIVE-11038 Project: Hive Issue Type: Bug Components: Hive, Tez Affects Versions: 2.0.0 Reporter: Wei Zheng Priority: Blocker Whenever running a MiniTezCli test, it just hangs. Here's the maven command to run a test: {code} $ mvn test -Phadoop-2 -Dtest=TestMiniTezCliDriver -Dqfile=dynamic_partition_pruning.q {code} Here's the tail of org.apache.hadoop.hive.cli.TestMiniTezCliDriver-output.txt: {code} Status: Running (Executing on YARN cluster with App id application_1434574617753_0001) Map 1: -/- Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 1/1 POSTHOOK: query: analyze table lineitem compute statistics for columns POSTHOOK: type: QUERY POSTHOOK: Input: default@lineitem POSTHOOK: Output: file:/Users/wzheng/bf/hive/itests/qtest/target/tmp/localscratchdir/c684ea6a-11b1-4253-a529-c3778695b72a/hive_2015-06-17_13-57-19_047_1275844087077606719-1/-mr-1 OK Time taken: 0.387 seconds Begin query: dynamic_partition_pruning.q ivysettings.xml file not found in HIVE_HOME or HIVE_CONF_DIR,/Users/wzheng/bf/hive/conf/ivysettings.xml will be used {code} And here's the jstack output (partial): {code} main #1 prio=5 os_prio=31 tid=0x7fc75e805800 nid=0x1303 waiting on condition [0x000101d84000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor.monitorExecution(TezJobMonitor.java:378) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:168) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1657) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1416) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1197) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1061) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1033) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1007) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:146) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning(TestMiniTezCliDriver.java:130) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) VM Thread os_prio=31 tid=0x7fc75e830800 nid=0x3103 runnable GC task thread#0 (ParallelGC) os_prio=31 tid=0x7fc75e811800 nid=0x2103 runnable GC task thread#1 (ParallelGC) os_prio=31 tid=0x7fc75f00 nid=0x2303 runnable GC task thread#2 (ParallelGC) os_prio=31 tid=0x7fc75f001000 nid=0x2503 runnable GC task thread#3 (ParallelGC) os_prio=31 tid=0x7fc75f80 nid=0x2703 runnable GC task thread#4 (ParallelGC) os_prio=31 tid=0x7fc75f801000 nid=0x2903
[jira] [Created] (HIVE-10559) IndexOutOfBoundsException with RemoveDynamicPruningBySize
Wei Zheng created HIVE-10559: Summary: IndexOutOfBoundsException with RemoveDynamicPruningBySize Key: HIVE-10559 URL: https://issues.apache.org/jira/browse/HIVE-10559 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 1.2.0 Reporter: Wei Zheng Assignee: Wei Zheng The problem can be reproduced by running the script attached. Backtrace {code} 2015-04-29 10:34:36,390 ERROR [main]: ql.Driver (SessionState.java:printError(956)) - FAILED: IndexOutOfBoundsException Index: 0, Size: 0 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.optimizer.RemoveDynamicPruningBySize.process(RemoveDynamicPruningBySize.java:61) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79) at org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:77) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110) at org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsDependentOptimizations(TezCompiler.java:281) at org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:123) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10092) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9932) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1026) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1000) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:139) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_q85(TestMiniTezCliDriver.java:123) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10403) Add n-way join support for Hybrid Grace Hash Join
Wei Zheng created HIVE-10403: Summary: Add n-way join support for Hybrid Grace Hash Join Key: HIVE-10403 URL: https://issues.apache.org/jira/browse/HIVE-10403 Project: Hive Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Wei Zheng Assignee: Wei Zheng Currently Hybrid Grace Hash Join only supports 2-way join (one big table and one small table). This task will enable n-way join (one big table and multiple small tables). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10368) VectorExpressionWriter doesn't match vectorColumn during row spilling in HybridGraceHashJoin
Wei Zheng created HIVE-10368: Summary: VectorExpressionWriter doesn't match vectorColumn during row spilling in HybridGraceHashJoin Key: HIVE-10368 URL: https://issues.apache.org/jira/browse/HIVE-10368 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Wei Zheng Assignee: Wei Zheng This problem was exposed by HIVE-10284, when testing vectorized_context Below is the query and backtrace: {code} select store.s_city, ss_net_profit from store_sales JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN household_demographics ON store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk limit 100 {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.LongColumnVector at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:175) at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.getRowObject(VectorMapJoinOperator.java:347) at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.spillBigTableRow(VectorMapJoinOperator.java:306) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:390) ... 24 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10072) Add vectorization support for Hybrid Grace Hash Join
Wei Zheng created HIVE-10072: Summary: Add vectorization support for Hybrid Grace Hash Join Key: HIVE-10072 URL: https://issues.apache.org/jira/browse/HIVE-10072 Project: Hive Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Wei Zheng Assignee: Wei Zheng Fix For: 1.2.0 This task is to enable vectorization support for Hybrid Grace Hash Join feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9833) udaf_percentile_approx_23.q fails intermittently
Wei Zheng created HIVE-9833: --- Summary: udaf_percentile_approx_23.q fails intermittently Key: HIVE-9833 URL: https://issues.apache.org/jira/browse/HIVE-9833 Project: Hive Issue Type: Bug Components: UDF Reporter: Wei Zheng For the query below: select percentile_approx(case when key 100 then cast('NaN' as double) else key end, 0.5) from bucket the base test result is 341.5. But sometimes it returns 342 during QA testing. It happens randomly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9277) Hybrid Hybrid Grace Hash Join
[ https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-9277: Attachment: HIVE-9277.03.patch Uploaded 3rd patch for testing Hybrid Hybrid Grace Hash Join - Key: HIVE-9277 URL: https://issues.apache.org/jira/browse/HIVE-9277 Project: Hive Issue Type: New Feature Components: Physical Optimizer Reporter: Wei Zheng Assignee: Wei Zheng Labels: join Attachments: HIVE-9277.01.patch, HIVE-9277.02.patch, HIVE-9277.03.patch, High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf We are proposing an enhanced hash join algorithm called _“hybrid hybrid grace hash join”_. We can benefit from this feature as illustrated below: * The query will not fail even if the estimated memory requirement is slightly wrong * Expensive garbage collection overhead can be avoided when hash table grows * Join execution using a Map join operator even though the small table doesn't fit in memory as spilling some data from the build and probe sides will still be cheaper than having to shuffle the large fact table The design was based on Hadoop’s parallel processing capability and significant amount of memory available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9277) Hybrid Hybrid Grace Hash Join
[ https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-9277: Attachment: HIVE-9277.01.patch Uploading 1st patch for testing Hybrid Hybrid Grace Hash Join - Key: HIVE-9277 URL: https://issues.apache.org/jira/browse/HIVE-9277 Project: Hive Issue Type: New Feature Components: Physical Optimizer Reporter: Wei Zheng Assignee: Wei Zheng Labels: join Attachments: HIVE-9277.01.patch, High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf We are proposing an enhanced hash join algorithm called “hybrid hybrid grace hash join”. We can benefit from this feature as illustrated below: o The query will not fail even if the estimated memory requirement is slightly wrong o Expensive garbage collection overhead can be avoided when hash table grows o Join execution using a Map join operator even though the small table doesn't fit in memory as spilling some data from the build and probe sides will still be cheaper than having to shuffle the large fact table The design was based on Hadoop’s parallel processing capability and significant amount of memory available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9277) Hybrid Hybrid Grace Hash Join
[ https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-9277: Status: Patch Available (was: Open) Hybrid Hybrid Grace Hash Join - Key: HIVE-9277 URL: https://issues.apache.org/jira/browse/HIVE-9277 Project: Hive Issue Type: New Feature Components: Physical Optimizer Reporter: Wei Zheng Assignee: Wei Zheng Labels: join Attachments: HIVE-9277.01.patch, High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf We are proposing an enhanced hash join algorithm called “hybrid hybrid grace hash join”. We can benefit from this feature as illustrated below: o The query will not fail even if the estimated memory requirement is slightly wrong o Expensive garbage collection overhead can be avoided when hash table grows o Join execution using a Map join operator even though the small table doesn't fit in memory as spilling some data from the build and probe sides will still be cheaper than having to shuffle the large fact table The design was based on Hadoop’s parallel processing capability and significant amount of memory available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9277) Hybrid Hybrid Grace Hash Join
[ https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-9277: Attachment: HIVE-9277.02.patch Uploading 2nd patch for testing Hybrid Hybrid Grace Hash Join - Key: HIVE-9277 URL: https://issues.apache.org/jira/browse/HIVE-9277 Project: Hive Issue Type: New Feature Components: Physical Optimizer Reporter: Wei Zheng Assignee: Wei Zheng Labels: join Attachments: HIVE-9277.01.patch, HIVE-9277.02.patch, High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf We are proposing an enhanced hash join algorithm called “hybrid hybrid grace hash join”. We can benefit from this feature as illustrated below: o The query will not fail even if the estimated memory requirement is slightly wrong o Expensive garbage collection overhead can be avoided when hash table grows o Join execution using a Map join operator even though the small table doesn't fit in memory as spilling some data from the build and probe sides will still be cheaper than having to shuffle the large fact table The design was based on Hadoop’s parallel processing capability and significant amount of memory available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9676) add serialization to BytesBytes hashtable
[ https://issues.apache.org/jira/browse/HIVE-9676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319205#comment-14319205 ] Wei Zheng commented on HIVE-9676: - Thanks [~sershe]! I will test out the code and give you an update. add serialization to BytesBytes hashtable - Key: HIVE-9676 URL: https://issues.apache.org/jira/browse/HIVE-9676 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-9676.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9277) Hybrid Hybrid Grace Hash Join
[ https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289866#comment-14289866 ] Wei Zheng commented on HIVE-9277: - Thanks [~leftylev] for your review! I've updated the wording. Hope this time it's better explained. Hybrid Hybrid Grace Hash Join - Key: HIVE-9277 URL: https://issues.apache.org/jira/browse/HIVE-9277 Project: Hive Issue Type: New Feature Components: Physical Optimizer Reporter: Wei Zheng Assignee: Wei Zheng Labels: join Attachments: High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf We are proposing an enhanced hash join algorithm called “hybrid hybrid grace hash join”. We can benefit from this feature as illustrated below: o The query will not fail even if the estimated memory requirement is slightly wrong o Expensive garbage collection overhead can be avoided when hash table grows o Join execution using a Map join operator even though the small table doesn't fit in memory as spilling some data from the build and probe sides will still be cheaper than having to shuffle the large fact table The design was based on Hadoop’s parallel processing capability and significant amount of memory available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9277) Hybrid Hybrid Grace Hash Join
[ https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-9277: Attachment: High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf Uploaded design doc version 1. Hybrid Hybrid Grace Hash Join - Key: HIVE-9277 URL: https://issues.apache.org/jira/browse/HIVE-9277 Project: Hive Issue Type: New Feature Components: Physical Optimizer Reporter: Wei Zheng Assignee: Wei Zheng Labels: join Attachments: High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf We are proposing an enhanced hash join algorithm called “hybrid hybrid grace hash join”. We can benefit from this feature as illustrated below: o The query will not fail even if the estimated memory requirement is slightly wrong o Expensive garbage collection overhead can be avoided when hash table grows o Join execution using a Map join operator even though the small table doesn't fit in memory as spilling some data from the build and probe sides will still be cheaper than having to shuffle the large fact table The design was based on Hadoop’s parallel processing capability and significant amount of memory available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9382) Query got rerun with Global Limit optimization on and Fetch optimization off
[ https://issues.apache.org/jira/browse/HIVE-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14280676#comment-14280676 ] Wei Zheng commented on HIVE-9382: - The test failure is irrelevant. Query got rerun with Global Limit optimization on and Fetch optimization off Key: HIVE-9382 URL: https://issues.apache.org/jira/browse/HIVE-9382 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Wei Zheng Assignee: Wei Zheng Attachments: HIVE-9382.1.patch When Global Limit optimization is enabled, and Fetch Optimization for Simple Queries is off or not applicable, some queries with LIMIT clause will run twice. set hive.limit.optimize.enable=true; set hive.fetch.task.conversion=none; For example, {code:sql} hive select * from t1 limit 10; Query ID = wzheng_20150107185252_4a6d0e65-9e58-464b-9ed3-9177740c30a9 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1420567249453_0039) VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED Map 1 .. SUCCEEDED 1 100 0 0 VERTICES: 01/01 [==] 100% ELAPSED TIME: 0.41 s OK 20120899848 119820 32627 982976 509206 0.000100898 20120899745 119820 32627 982976 509206 0.000100898 20120899739 119820 32627 982976 509206 0.000100898 20120899847 119820 32627 982976 509206 0.000100898 201208613588 119820 32627 982976 509206 0.000100898 20120899809 119820 32627 982976 509206 0.000100898 20120899725 119820 32627 982976 509206 0.000100898 20120899666 119820 32627 982976 509206 0.000100898 20120899743 119820 32627 982976 509206 0.000100898 20120899801 119820 32627 982976 509206 0.000100898 Retry query with a different approach... Query ID = wzheng_20150107185252_8a77f793-cad7-4c6b-b64a-07d8310970b9 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1420567249453_0039) VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED Map 1 .. SUCCEEDED30930900 0 0 VERTICES: 01/01 [==] 100% ELAPSED TIME: 2.04 s OK 20120899848 119820 32627 982976 509206 0.000100898 20120899745 119820 32627 982976 509206 0.000100898 20120899739 119820 32627 982976 509206 0.000100898 20120899847 119820 32627 982976 509206 0.000100898 201208613588 119820 32627 982976 509206 0.000100898 20120899809 119820 32627 982976 509206 0.000100898 20120899725 119820 32627 982976 509206 0.000100898 20120899666 119820 32627 982976 509206 0.000100898 20120899743 119820 32627 982976 509206 0.000100898 20120899801 119820 32627 982976 509206 0.000100898 Time taken: 2.748 seconds, Fetched: 10 row(s) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)