[jira] [Created] (HIVE-26280) Copy more data into COMPLETED_COMPACTIONS for better supportability
Karen Coppage created HIVE-26280: Summary: Copy more data into COMPLETED_COMPACTIONS for better supportability Key: HIVE-26280 URL: https://issues.apache.org/jira/browse/HIVE-26280 Project: Hive Issue Type: Improvement Components: Transactions Reporter: Karen Coppage Assignee: Karen Coppage There is some information in COMPACTION_QUEUE that doesn't get copied over to COMPLETED_COMPACTIONS when compaction completes. It would help with supportability if COMPLETED_COMPACTIONS (and especially the view of it in the SYS database) also contained this information. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-25716) Fix flaky test TestCompactionMetrics#testOldestReadyForCleaningAge
Karen Coppage created HIVE-25716: Summary: Fix flaky test TestCompactionMetrics#testOldestReadyForCleaningAge Key: HIVE-25716 URL: https://issues.apache.org/jira/browse/HIVE-25716 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Karen Coppage Flaky check failed on run #59: [http://ci.hive.apache.org/job/hive-flaky-check/467/|http://ci.hive.apache.org/job/hive-flaky-check/467/] {code:java} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:87) at org.junit.Assert.assertTrue(Assert.java:42) at org.junit.Assert.assertTrue(Assert.java:53) at org.apache.hadoop.hive.ql.txn.compactor.TestCompactionMetrics.testOldestReadyForCleaningAge(TestCompactionMetrics.java:214) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25578) Tests are failing because operators can't be closed
Karen Coppage created HIVE-25578: Summary: Tests are failing because operators can't be closed Key: HIVE-25578 URL: https://issues.apache.org/jira/browse/HIVE-25578 Project: Hive Issue Type: Bug Reporter: Karen Coppage The following qtests are failing consistently ([example|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2667/6/tests/]) on the master branch: * TestMiniLlapCliDriver ([http://ci.hive.apache.org/job/hive-flaky-check/420/]) ** newline ** groupby_bigdata ** input20 ** input33 ** rcfile_bigdata ** remote_script * TestContribCliDriver ([http://ci.hive.apache.org/job/hive-flaky-check/421/]) ** serde_typedbytes5 The failure reason for all seems to be that operators can't be closed. Not 100% sure that TestContribCliDriver#serde_typedbytes5 failure is related to the others – the issue seems to be the same, the error message is a bit different. I'm about to disable these as they are blocking all work. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25549) Need to transient init partition expressions in vectorized PTFs
Karen Coppage created HIVE-25549: Summary: Need to transient init partition expressions in vectorized PTFs Key: HIVE-25549 URL: https://issues.apache.org/jira/browse/HIVE-25549 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25513) Delta metrics collection may cause NPE
Karen Coppage created HIVE-25513: Summary: Delta metrics collection may cause NPE Key: HIVE-25513 URL: https://issues.apache.org/jira/browse/HIVE-25513 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage When collecting metrics about the number of deltas under specific partitions/tables, information about which partitions/tables are being read is stored in the Configuration object under key delta.files.metrics.metadata. This information is retrieved in DeltaFilesMetricsReporter#mergeDeltaFilesStats when collecting the actual information about the number of deltas. But if the information was never stored for some reason, an NPE will be thrown from DeltaFilesMetricsReporter#mergeDeltaFilesStats. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25492) Major query-based compaction is skipped if partition is empty
Karen Coppage created HIVE-25492: Summary: Major query-based compaction is skipped if partition is empty Key: HIVE-25492 URL: https://issues.apache.org/jira/browse/HIVE-25492 Project: Hive Issue Type: Bug Reporter: Karen Coppage Currently if the result of query-based compaction is an empty base, delta, or delete delta, the empty directory is deleted. This is because of minor compaction – if there are only deltas to compact, then no compacted delete delta should be created (only a compacted delta). In the same way, if there are only delete deltas to compact, then no compacted delta should be created (only a compacted delete delta). There is an issue with major compaction. If all the data in the partition has been deleted, then we should get an empty base directory after compaction. Instead, the empty base directory is deleted because it's empty and compaction claims to succeed but we end up with the same deltas/delete deltas we started with – basically compaction does not run. Where to start? MajorQueryCompactor#commitCompaction -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25450) Delta metrics keys should contain database name
Karen Coppage created HIVE-25450: Summary: Delta metrics keys should contain database name Key: HIVE-25450 URL: https://issues.apache.org/jira/browse/HIVE-25450 Project: Hive Issue Type: Sub-task Reporter: Karen Coppage Currently metrics about the number of deltas in a given partition or unpartitioned table includes information about the table name and the partition name (if applicable), but they should also include the database name, since there could be 2 tables in different databases with the same name. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25430) compactor.Worker.markFailed should catch and log any kind of exception
Karen Coppage created HIVE-25430: Summary: compactor.Worker.markFailed should catch and log any kind of exception Key: HIVE-25430 URL: https://issues.apache.org/jira/browse/HIVE-25430 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25429) Delta metrics collection may cause number of tez counters to exceed tez.counters.max limit
Karen Coppage created HIVE-25429: Summary: Delta metrics collection may cause number of tez counters to exceed tez.counters.max limit Key: HIVE-25429 URL: https://issues.apache.org/jira/browse/HIVE-25429 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Karen Coppage Assignee: Karen Coppage There's a limit to the number of tez counters allowed (tez.counters.max). Delta metrics collection (i.e. DeltaFileMetricsReporter) was creating 3 counters for each partition touched by a given query, which can result in a huge number of counters, which is unnecessary because we're only interested in n number of partitions with the most deltas. This change limits the number of counters created to hive.txn.acid.metrics.max.cache.size*3. Also when tez.counters.max is reached a LimitExceededException is thrown but isn't caught on the Hive side and causes the query to fail. We should catch this and skip delta metrics collection in this case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25390) Metrics compaction_failed_initiator_ratio and compaction_failed_cleaner_ratio should be counters
Karen Coppage created HIVE-25390: Summary: Metrics compaction_failed_initiator_ratio and compaction_failed_cleaner_ratio should be counters Key: HIVE-25390 URL: https://issues.apache.org/jira/browse/HIVE-25390 Project: Hive Issue Type: Sub-task Reporter: Karen Coppage Assignee: Karen Coppage Metric compaction_failed_initiator_ratio represents the ratio of initiator failures to the total number of initiator runs, both computed since Metastore was restarted, represented by a double. This isn't really useable. It would be better if it counted the number of initiator failures since Metastore was restarted so other components can keep an eye on things like "number of initiator failures in the last day". Same goes for compaction_failed_cleaner_ratio. This commit removes metrics * compaction_failed_initiator_ratio * compaction_failed_cleaner_ratio and introduces metrics * compaction_initiator_failure_counter * compaction_cleaner_failure_counter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25371) Add myself to thrift file reviewers
Karen Coppage created HIVE-25371: Summary: Add myself to thrift file reviewers Key: HIVE-25371 URL: https://issues.apache.org/jira/browse/HIVE-25371 Project: Hive Issue Type: Task Reporter: Karen Coppage Assignee: Karen Coppage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25363) Fix TestStatsUpdaterThread#testQueueingWithThreads
Karen Coppage created HIVE-25363: Summary: Fix TestStatsUpdaterThread#testQueueingWithThreads Key: HIVE-25363 URL: https://issues.apache.org/jira/browse/HIVE-25363 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Karen Coppage [http://ci.hive.apache.org/job/hive-flaky-check/330] Failed on first run with {code:java} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:87) at org.junit.Assert.assertTrue(Assert.java:42) at org.junit.Assert.assertTrue(Assert.java:53) at org.apache.hadoop.hive.ql.stats.TestStatsUpdaterThread.verifyStatsUpToDate(TestStatsUpdaterThread.java:761) at org.apache.hadoop.hive.ql.stats.TestStatsUpdaterThread.verifyStatsUpToDate(TestStatsUpdaterThread.java:771) at org.apache.hadoop.hive.ql.stats.TestStatsUpdaterThread.verifyPartStatsUpToDate(TestStatsUpdaterThread.java:677) at org.apache.hadoop.hive.ql.stats.TestStatsUpdaterThread.testQueueingWithThreads(TestStatsUpdaterThread.java:365) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25359) Changes to metastore API in HIVE-24880 are not backwards compatible
Karen Coppage created HIVE-25359: Summary: Changes to metastore API in HIVE-24880 are not backwards compatible Key: HIVE-25359 URL: https://issues.apache.org/jira/browse/HIVE-25359 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage With HIVE-24880 find_next_compact(String workerId) was changed to find_next_compact(String workerId, String workerVersion). This isn't backwards compatible and could break other components This commit reverts that change, deprecates find_next_compact, adds a new method: find_next_compact2(FindNextCompactRequest rqst) where FindNextCompactRequest has fields workerId and workerVersion, and makes Hive use find_next_compact2. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25348) Skip metrics collection about writes to tables with tblproperty no_auto_compaction=true if CTAS
Karen Coppage created HIVE-25348: Summary: Skip metrics collection about writes to tables with tblproperty no_auto_compaction=true if CTAS Key: HIVE-25348 URL: https://issues.apache.org/jira/browse/HIVE-25348 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage We collect metrics about writes to tables with no_auto_compaction=true when allocating writeids. In the case of CTAS, if ACID is enabled on the new table, a writeid is allocated before the table object is created so we can't get tblproperties from it when allocating the writeid. In this case we should skip collecting the metric. This commit fixes errors like this: {code:java} 2021-07-16 18:48:04,350 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-9-thread-72]: java.lang.NullPointerException at org.apache.hadoop.hive.metastore.HMSMetricsListener.onAllocWriteId(HMSMetricsListener.java:104) at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.lambda$static$6(MetaStoreListenerNotifier.java:229) at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:291) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.allocate_table_write_ids(HiveMetaStore.java:8592) at sun.reflect.GeneratedMethodAccessor86.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:160) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:121) at com.sun.proxy.$Proxy33.allocate_table_write_ids(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$allocate_table_write_ids.getResult(ThriftHiveMetastore.java:21584) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$allocate_table_write_ids.getResult(ThriftHiveMetastore.java:21568) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25323) Fix TestVectorCastStatement
Karen Coppage created HIVE-25323: Summary: Fix TestVectorCastStatement Key: HIVE-25323 URL: https://issues.apache.org/jira/browse/HIVE-25323 Project: Hive Issue Type: Task Reporter: Karen Coppage org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorCastStatement tests were timing out after 5 hours. [http://ci.hive.apache.org/job/hive-flaky-check/307/] http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/749/pipeline/242 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25318) Number of initiator hosts metric should ignore manually initiated compactions
Karen Coppage created HIVE-25318: Summary: Number of initiator hosts metric should ignore manually initiated compactions Key: HIVE-25318 URL: https://issues.apache.org/jira/browse/HIVE-25318 Project: Hive Issue Type: Sub-task Affects Versions: 4.0.0 Reporter: Karen Coppage Assignee: Karen Coppage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25115) Compaction queue entries may accumulate in "ready for cleaning" state
Karen Coppage created HIVE-25115: Summary: Compaction queue entries may accumulate in "ready for cleaning" state Key: HIVE-25115 URL: https://issues.apache.org/jira/browse/HIVE-25115 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage If the Cleaner does not delete any files, the compaction queue entry is thrown back to the queue and remains in "ready for cleaning" state. Problem: If 2 compactions run on the same table and enter "ready for cleaning" state at the same time, only one "cleaning" will remove obsolete files, the other entry will remain in the queue in "ready for cleaning" state. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25022) Metric about incomplete compactions
Karen Coppage created HIVE-25022: Summary: Metric about incomplete compactions Key: HIVE-25022 URL: https://issues.apache.org/jira/browse/HIVE-25022 Project: Hive Issue Type: Sub-task Reporter: Karen Coppage Assignee: Karen Coppage "Compactions in a state" metrics (for example compaction_num_working) count the sum of tables/partitions where the last compaction is in that state. I propose introducing a new metric about incomplete compactions: i.e. the number of tables/partitions where the last finished compaction* is unsuccessful (failed or "did not initiate"), or where major compaction was unsuccessful then minor compaction succeeded (compaction is not "complete" since major compaction has not succeeded in the time since it should have run). Example: {code:java} These compactions ran on a partition: major succeeded major working major failed major initiated major working major failed major initiated major working The "compactions in a state" metrics will consider the state of this table: working. The "incomplete compactions" metric will consider this: incomplete, since there have been failed compactions since the last succeeded compaction. {code} Another example: {code:java} These compactions ran on a partition: major succeeded major failed minor failed minor succeeded The "compactions in a state" metrics will consider the state of this table: succeeded. The "incomplete compactions" metric will consider this: incomplete, since there hasn't been a major succeeded since major failed.{code} Last example: {code:java} These compactions ran on a partition: major succeeded minor did not initiate The "compactions in a state" metrics will consider the state of this table: did not initiate. The "incomplete compactions" metric will consider this: incomplete, since the last compaction was "did not initiate"{code} *finished compaction: state in (succeeded, failed, attempted/did not initiate) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24980) Add timeout for failed and "not initiated" compaction cleanup
Karen Coppage created HIVE-24980: Summary: Add timeout for failed and "not initiated" compaction cleanup Key: HIVE-24980 URL: https://issues.apache.org/jira/browse/HIVE-24980 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage Clear failed and not initiated compactions from COMPLETED_COMPACTIONS that are older than a week (configurable) if there already is a newer successful compaction on the table/partition and either (1) the succeeded compaction is major or (2) it is minor and the not initiated or failed compaction is also minor –– so a minor succeeded compaction will not cause the deletion of a major not initiated or failed compaction from history. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24955) New metrics about aborted transactions
Karen Coppage created HIVE-24955: Summary: New metrics about aborted transactions Key: HIVE-24955 URL: https://issues.apache.org/jira/browse/HIVE-24955 Project: Hive Issue Type: Sub-task Reporter: Karen Coppage Assignee: Karen Coppage Fix For: 4.0.0 5 new metrics: * Number of aborted transactions in the TXNS table (collected in AcidMetricsService) * Oldest aborted transaction (collected in AcidMetricsService) * Number of aborted write transaction (incremented counter at abortTransaction) * Number of committed write transaction (incremented counter at commitTransaction) * Number of timed out transactions (cleaner removed them after heartbeat time out) The latter 3 will restart as 0 after every HMS restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24881) Abort old open replication txns
Karen Coppage created HIVE-24881: Summary: Abort old open replication txns Key: HIVE-24881 URL: https://issues.apache.org/jira/browse/HIVE-24881 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage We should auto-abort/remove open replication txns that are older than a time threshold (default: 24h). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24860) DbTxnManager$HeartbeaterThread thread leak
Karen Coppage created HIVE-24860: Summary: DbTxnManager$HeartbeaterThread thread leak Key: HIVE-24860 URL: https://issues.apache.org/jira/browse/HIVE-24860 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24858) UDFClassLoader leak in session HiveConf
Karen Coppage created HIVE-24858: Summary: UDFClassLoader leak in session HiveConf Key: HIVE-24858 URL: https://issues.apache.org/jira/browse/HIVE-24858 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage If a UDF jar has been registered in a session and a temporary function created from it, when the session is closed its UDFClassLoader is not GC'd as it has been leaked to the session's HiveConf object's cache. Since the ClassLoader is not GC'd, the UDF jar's classes aren't GC'd from Metaspace. This can potentially lead to Metaspace OOM. Path to GC root is: {code:java} Class Name | Shallow Heap | Retained Heap --- contextClassLoader org.apache.hive.service.server.ThreadWithGarbageCleanup @ 0x7164deb50 HiveServer2-Handler-Pool: Thread-72 Thread| 128 | 79,072 referent java.util.WeakHashMap$Entry @ 0x7164e67d0 | 40 | 824 '- [6] java.util.WeakHashMap$Entry[16] @ 0x71581aac0 | 80 | 5,056 '- table java.util.WeakHashMap @ 0x71580f510 | 48 | 6,920 '- CACHE_CLASSES class org.apache.hadoop.conf.Configuration @ 0x71580f3d8 | 64 | 74,528 --- {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24719) There's a getAcidState() without impersonation in compactor.Worker
Karen Coppage created HIVE-24719: Summary: There's a getAcidState() without impersonation in compactor.Worker Key: HIVE-24719 URL: https://issues.apache.org/jira/browse/HIVE-24719 Project: Hive Issue Type: Improvement Reporter: Karen Coppage In Initiator and Cleaner, getAcidState is called by a proxy user (the table/partition dir owner) because the HS2 user might not have permission to list the files. In Worker getAcidState is not called by a proxy user. It's potentially a simple fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24652) If compactor worker times out, compaction is not cleared from queue
Karen Coppage created HIVE-24652: Summary: If compactor worker times out, compaction is not cleared from queue Key: HIVE-24652 URL: https://issues.apache.org/jira/browse/HIVE-24652 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage If worker is timed out (that is, takes longer than value of hive.compactor.worker.timeout) then the corresponding entry is not cleared from the COMPACTION_QUEUE table, the entry is left in state "working" or "initiated" which means that compaction can't be run again on the table/partition. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24587) DataFileReader is not closed in AvroGenericRecordReader#extractWriterProlepticFromMetadata
Karen Coppage created HIVE-24587: Summary: DataFileReader is not closed in AvroGenericRecordReader#extractWriterProlepticFromMetadata Key: HIVE-24587 URL: https://issues.apache.org/jira/browse/HIVE-24587 Project: Hive Issue Type: Improvement Affects Versions: 4.0.0 Reporter: Karen Coppage Assignee: Karen Coppage Same problem as HIVE-22981 appears in AvroGenericRecordReader#extractWriterProlepticFromMetadata. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24586) Rename compaction "attempted" status
Karen Coppage created HIVE-24586: Summary: Rename compaction "attempted" status Key: HIVE-24586 URL: https://issues.apache.org/jira/browse/HIVE-24586 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage A compaction with "attempted" status sounds like compactor tried to compact the table/partition and failed. In reality it means one of these: * the Initiator did not queue compaction because the number of previously failed compactions has passed a threshold * the Initiator did not queue compaction because of an error In both these cases the user is still able initiate compaction manually. This should be made clearer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24474) Failed compaction always throws TxnAbortedException (again)
Karen Coppage created HIVE-24474: Summary: Failed compaction always throws TxnAbortedException (again) Key: HIVE-24474 URL: https://issues.apache.org/jira/browse/HIVE-24474 Project: Hive Issue Type: Bug Components: Hive Reporter: Karen Coppage Assignee: Karen Coppage Fix For: 4.0.0 Re-introduced with HIVE-24096. If there is an error during compaction, the compaction's txn is aborted but in the finally clause, we try to commit it (commitTxnIfSet), so Worker throws a TxnAbortedException. We should set compactorTxnId to TXN_ID_NOT_SET if the compaction's txn is aborted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24459) Qtest to simulate query-based compaction
Karen Coppage created HIVE-24459: Summary: Qtest to simulate query-based compaction Key: HIVE-24459 URL: https://issues.apache.org/jira/browse/HIVE-24459 Project: Hive Issue Type: Test Reporter: Karen Coppage Assignee: Karen Coppage AFAIK all compaction tests run on a local filesystem, and none run on HDFS. Since HDFS and local filesystem behavior differs sometimes, it would be good to test query-based compaction on HDFS. Compaction threads don't run in the qtest environment so this qtest would simulate QB compaction by running the queries that QB compaction runs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS
Karen Coppage created HIVE-2: Summary: compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS Key: HIVE-2 URL: https://issues.apache.org/jira/browse/HIVE-2 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage This is an improvement on HIVE-24314, in which markCleaned() is called only if +any+ files are deleted by the cleaner. This could cause a problem in the following case: Say for table_1 compaction1 cleaning was blocked by an open txn, and compaction is run again on the same table (compaction2). Both compaction1 and compaction2 could be in "ready for cleaning" at the same time. By this time the blocking open txn could be committed. When the cleaner runs, one of compaction1 and compaction2 will remain in the "ready for cleaning" state: Say compaction2 is picked up by the cleaner first. The Cleaner deletes all obsolete files. Then compaction1 is picked up by the cleaner; the cleaner doesn't remove any files and compaction1 will stay in the queue in a "ready for cleaning" state. HIVE-24291 already solves this issue but if it isn't usable (for example if HMS schema changes are out the question) then HIVE-24314 + this change will fix the issue of the Cleaner not removing all obsolete files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24429) Figure out a better way to test failed compactions
Karen Coppage created HIVE-24429: Summary: Figure out a better way to test failed compactions Key: HIVE-24429 URL: https://issues.apache.org/jira/browse/HIVE-24429 Project: Hive Issue Type: Improvement Reporter: Karen Coppage This block is executed during compaction: {code:java} if(conf.getBoolVar(HiveConf.ConfVars.HIVE_IN_TEST) && conf.getBoolVar(HiveConf.ConfVars.HIVETESTMODEFAILCOMPACTION)) { throw new RuntimeException(HiveConf.ConfVars.HIVETESTMODEFAILCOMPACTION.name() + "=true"); }{code} We should figure out a better way to test failed compaction than including test code in the source. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24410) Query-based compaction hangs because of doAs
Karen Coppage created HIVE-24410: Summary: Query-based compaction hangs because of doAs Key: HIVE-24410 URL: https://issues.apache.org/jira/browse/HIVE-24410 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage Fix For: 4.0.0 QB compaction runs within a doas +and+ hive.server2.enable.doAs is set to true (as of HIVE-24089). On a secure cluster with Worker threads running in HS2, this results in HMS client not receiving a login context during compaction queries, so kerberos prompts for a login via stdin which causes the worker thread to hang until it times out: {code:java} "node-x.com-44_executor" #47 daemon prio=1 os_prio=0 tid=0x01506000 nid=0x1348 runnable [0x7f1beea95000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:255) at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) - locked <0x9fa38c90> (a java.io.BufferedInputStream) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) - locked <0x8c7d5010> (a java.io.InputStreamReader) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.readLine(BufferedReader.java:324) - locked <0x8c7d5010> (a java.io.InputStreamReader) at java.io.BufferedReader.readLine(BufferedReader.java:389) at com.sun.security.auth.callback.TextCallbackHandler.readLine(TextCallbackHandler.java:153) at com.sun.security.auth.callback.TextCallbackHandler.handle(TextCallbackHandler.java:120) at com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:862) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:708) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginContext.java:587) at sun.security.jgss.GSSUtil.login(GSSUtil.java:258) at sun.security.jgss.krb5.Krb5Util.getInitialTicket(Krb5Util.java:175) at sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:341) at sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:337) at java.security.AccessController.doPrivileged(Native Method) at sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:336) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:146) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:189) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.metastore.security.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:51) at org.apache.hadoop.hive.metastore.security.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:48) at java.security.AccessController.doPrivileged(Native Method) at
[jira] [Created] (HIVE-24347) Fix failing test: TestMiniLlapLocalCliDriver.testCliDriver[cardinality_preserving_join_opt2]
Karen Coppage created HIVE-24347: Summary: Fix failing test: TestMiniLlapLocalCliDriver.testCliDriver[cardinality_preserving_join_opt2] Key: HIVE-24347 URL: https://issues.apache.org/jira/browse/HIVE-24347 Project: Hive Issue Type: Bug Reporter: Karen Coppage Since HIVE-24325. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24345) Flaky test: TestCleanupService#testEventualCleanupService_finishesCleanupBeforeExit is f
Karen Coppage created HIVE-24345: Summary: Flaky test: TestCleanupService#testEventualCleanupService_finishesCleanupBeforeExit is f Key: HIVE-24345 URL: https://issues.apache.org/jira/browse/HIVE-24345 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Karen Coppage [http://ci.hive.apache.org/job/hive-flaky-check/137/] failed on #4. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24314) compactor.Cleaner should not set state "mark cleaned" if it didn't remove any files
Karen Coppage created HIVE-24314: Summary: compactor.Cleaner should not set state "mark cleaned" if it didn't remove any files Key: HIVE-24314 URL: https://issues.apache.org/jira/browse/HIVE-24314 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24302) Cleaner should not mark compaction queue entry as cleaned if it doesn't remove obsolete files
Karen Coppage created HIVE-24302: Summary: Cleaner should not mark compaction queue entry as cleaned if it doesn't remove obsolete files Key: HIVE-24302 URL: https://issues.apache.org/jira/browse/HIVE-24302 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage Example: # open txn 5, leave it open (maybe it's a long-running compaction) # insert into table t in txns 6, 7 with writeids 1, 2 # compactor.Worker runs on table t and compacts writeids 1, 2 # compactor.Cleaner picks up the compaction queue entry, but doesn't delete any files because the min global open txnid is 5, which cannot see writeIds 1, 2. # Cleaner marks the compactor queue entry as cleaned and removes the entry from the queue. delta_1 and delta_2 will remain in the file system until another compaction is run on table t. Step 5 should not happen, we should skip calling markCleaned() and leave it in the queue in "ready to clean" state. MarkCleaned() should be called only after txn 5 is closed and, following that, the cleaner runs successfully. This will potentially slow down the cleaner, but on the other hand it won't silently "fail" i.e. not do its job. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24235) Drop and recreate table during MR compaction leaves behind base/delta directory
Karen Coppage created HIVE-24235: Summary: Drop and recreate table during MR compaction leaves behind base/delta directory Key: HIVE-24235 URL: https://issues.apache.org/jira/browse/HIVE-24235 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage If a table is dropped and recreated during MR compaction, the table directory and a base (or delta, if minor compaction) directory could be created, with or without data, while the table "does not exist". E.g. {code:java} create table c (i int) stored as orc tblproperties ("NO_AUTO_COMPACTION"="true", "transactional"="true"); insert into c values (9); insert into c values (9); alter table c compact 'major'; While compaction job is running: { drop table c; create table c (i int) stored as orc tblproperties ("NO_AUTO_COMPACTION"="true", "transactional"="true"); } {code} The table directory should be empty, but table directory could look like this after the job is finished: {code:java} Oct 6 14:23 c/base_002_v101/._orc_acid_version.crc Oct 6 14:23 c/base_002_v101/.bucket_0.crc Oct 6 14:23 c/base_002_v101/_orc_acid_version Oct 6 14:23 c/base_002_v101/bucket_0 {code} or perhaps just: {code:java} Oct 6 14:23 c/base_002_v101/._orc_acid_version.crc Oct 6 14:23 c/base_002_v101/_orc_acid_version {code} Insert another row and you have: {code:java} Oct 6 14:33 base_002_v101/ Oct 6 14:33 base_002_v101/._orc_acid_version.crc Oct 6 14:33 base_002_v101/.bucket_0.crc Oct 6 14:33 base_002_v101/_orc_acid_version Oct 6 14:33 base_002_v101/bucket_0 Oct 6 14:35 delta_001_001_/._orc_acid_version.crc Oct 6 14:35 delta_001_001_/.bucket_0_0.crc Oct 6 14:35 delta_001_001_/_orc_acid_version Oct 6 14:35 delta_001_001_/bucket_0_0 {code} Selecting from the table will result in this error because the highest valid writeId for this table is 1: {code:java} thrift.ThriftCLIService: Error fetching results: org.apache.hive.service.cli.HiveSQLException: Unable to get the next row set at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:482) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] ... Caused by: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: java.io.IOException: Not enough history available for (1,x). Oldest available base: .../warehouse/b/base_004_v092 {code} Solution: Resolve the table again after compaction is finished; compare the id with the table id from when compaction began. If the ids do not match, abort the compaction's transaction. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24191) Introduce configurable user to run compaction as
Karen Coppage created HIVE-24191: Summary: Introduce configurable user to run compaction as Key: HIVE-24191 URL: https://issues.apache.org/jira/browse/HIVE-24191 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage File listing in compaction Initiator and Cleaner, as well as compaction (MR as of HIVE-23929 and query-based as of HIVE-24089) run as the table directory owner. (CompactorThread#findUserToRunAs) We should create a configuration that enables setting the user that these would run as instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24168) Disable hdfsEncryptionShims cache during query-based compaction
Karen Coppage created HIVE-24168: Summary: Disable hdfsEncryptionShims cache during query-based compaction Key: HIVE-24168 URL: https://issues.apache.org/jira/browse/HIVE-24168 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage Hive keeps a cache of encryption shims in SessionState (Map hdfsEncryptionShims). Each encryption shim in the cache stores a FileSystem object. After compaction where the session user is not the same user as the owner of the partition/table directory, we close all FileSystem objects associated with the user running the compaction, possibly closing an FS stored in the encryption shim cache. The next time query-based compaction is run on a table/partition owned by the same user, compaction will fail in MoveTask[1] since the FileSystem stored in the cache was closed. This change disables the cache during query-based compaction (optionally; default: disabled). [1] Error: {code:java} 2020-09-08 11:23:50,170 ERROR org.apache.hadoop.hive.ql.Driver: [rncdpdev-2.fyre.ibm.com-27]: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Filesystem closed. org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Filesystem closed at org.apache.hadoop.hive.ql.metadata.Hive.needToCopy(Hive.java:4637) at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4147) at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4694) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3120) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:423) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:477) at org.apache.hadoop.hive.ql.DriverUtils.runOnDriver(DriverUtils.java:70) at org.apache.hadoop.hive.ql.txn.compactor.QueryCompactor.runCompactionQueries(QueryCompactor.java:116) at org.apache.hadoop.hive.ql.txn.compactor.MmMajorQueryCompactor.runCompaction(MmMajorQueryCompactor.java:72) at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:232) at org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:221) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:218) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24158) Cleanup isn't complete in OrcFileMergeOperator#closeOp
Karen Coppage created HIVE-24158: Summary: Cleanup isn't complete in OrcFileMergeOperator#closeOp Key: HIVE-24158 URL: https://issues.apache.org/jira/browse/HIVE-24158 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage Field Map outWriters isn't cleared during operation close: {code:java} if (outWriters != null) { for (Map.Entry outWriterEntry : outWriters.entrySet()) { Writer outWriter = outWriterEntry.getValue(); outWriter.close(); outWriter = null; } }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24135) Drop database doesn't delete directory in managed location
Karen Coppage created HIVE-24135: Summary: Drop database doesn't delete directory in managed location Key: HIVE-24135 URL: https://issues.apache.org/jira/browse/HIVE-24135 Project: Hive Issue Type: Sub-task Reporter: Karen Coppage Assignee: Naveen Gangam Repro: say the default managed location is managed/hive and the default external location is external/hive. {code:java} create database db1; -- creates: external/hive/db1.db create table db1.table1 (i int); -- creates: managed/hive/db1.db and managed/hive/db1.db/table1 drop database db1 cascade; -- removes : external/hive/db1.db and managed/hive/db1.db/table1 {code} Problem: Directory managed/hive/db1.db remains. Since HIVE-22995, dbs have a managed (managedLocationUri) and an external location (locationUri). I think the issue is that HiveMetaStore.HMSHandler#drop_database_core deletes only the db directory in the external location. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24096) Abort failed compaction's txn on TException|IOException
Karen Coppage created HIVE-24096: Summary: Abort failed compaction's txn on TException|IOException Key: HIVE-24096 URL: https://issues.apache.org/jira/browse/HIVE-24096 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage If compaction fails with a TException or IOException (e.g. IOException from [getAcidState|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java#L500] which is) after the compaction txn has been opened, the compaction is marked 'failed' but the compaction txn is never aborted. We should abort an open compaction txn upon TExceptions/IOExceptions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24089) Run QB compaction as table directory user with impersonation
Karen Coppage created HIVE-24089: Summary: Run QB compaction as table directory user with impersonation Key: HIVE-24089 URL: https://issues.apache.org/jira/browse/HIVE-24089 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage Currently QB compaction runs as the session user, unlike MR compaction which runs as the table/partition directory owner (see CompactorThread#findUserToRunAs). We should make QB compaction run as the table/partition directory owner and enable user impersonation during compaction to avoid any issues with temp directories. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24024) Improve logging around CompactionTxnHandler
Karen Coppage created HIVE-24024: Summary: Improve logging around CompactionTxnHandler Key: HIVE-24024 URL: https://issues.apache.org/jira/browse/HIVE-24024 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage CompactionTxnHandler often doesn't log the preparedStatement parameters, which is really painful when compaction isn't working the way it should. Also expand logging around compaction Cleaner, Initiator, Worker. And some formatting cleanup. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24023) Hive parquet reader can't read files with length=0
Karen Coppage created HIVE-24023: Summary: Hive parquet reader can't read files with length=0 Key: HIVE-24023 URL: https://issues.apache.org/jira/browse/HIVE-24023 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage Impala truncates insert-only parquet tables by creating a base directory containing a completely empty file. Hive throws an exception upon reading when it looks for metadata: {code:java} Error: java.io.IOException: java.lang.RuntimeException: is not a Parquet file (too small length: 0) (state=,code=0){code} We can introduce a check for an empty file before Hive tries to read the metadata. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24021) Read insert-only tables truncated by Impala correctly
Karen Coppage created HIVE-24021: Summary: Read insert-only tables truncated by Impala correctly Key: HIVE-24021 URL: https://issues.apache.org/jira/browse/HIVE-24021 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage Impala truncates insert-only tables by writing a base directory containing an empty file named "_empty". (Like Hive should, see HIVE-20137) Generally in Hive a file name beginning with an underscore connotes a temporary file that isn't supposed to be read by operations that didn't create it. Before HIVE-23495, getAcidState listed each directory in the table (HdfsUtils#listLocatedStatus) – and filtered out directories with names beginning with an underscore or period as they are presumably temporary. This allowed files called "_empty" to be read, since hive checked the directory name and not the file name. After HIVE-23495, we recursively list each file in the table (AcidUtils#getHdfsDirSnapshots) with a filter that doesn't accept files with names beginning with an underscore or period as they are presumably temporary. As a result Hive reads the table data as if the truncate operation had not happened. Since performance in getAcidState is important, probably the best solution is make an exception in the filter and accept files with the name "_empty". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24015) Disable query-based compaction on MR execution engine
Karen Coppage created HIVE-24015: Summary: Disable query-based compaction on MR execution engine Key: HIVE-24015 URL: https://issues.apache.org/jira/browse/HIVE-24015 Project: Hive Issue Type: Task Reporter: Karen Coppage Assignee: Karen Coppage Major compaction can be run when the execution engine is MR. This can cause data loss a la HIVE-23703 (the fix for data loss when the execution engine is MR was reverted by HIVE-23763). Currently minor compaction can only be run when the execution engine is Tez, otherwise it falls back to MR (non-query-based) compaction. We should extend this functionality to major compaction as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24001) Don't cache MapWork in tez/ObjectCache during query-based compaction
Karen Coppage created HIVE-24001: Summary: Don't cache MapWork in tez/ObjectCache during query-based compaction Key: HIVE-24001 URL: https://issues.apache.org/jira/browse/HIVE-24001 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage Query-based major compaction can fail intermittently with the following issue: {code:java} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: One writer is supposed to handle only one bucket. We saw these 2 different buckets: 1 and 6 at org.apache.hadoop.hive.ql.udf.generic.GenericUDFValidateAcidSortOrder.evaluate(GenericUDFValidateAcidSortOrder.java:77) {code} This is consistently preceded in the application log with: {code:java} [INFO] [TezChild] |tez.ObjectCache|: Found hive_20200804185133_f04cca69-fa30-4f1b-a5fe-80fc2d749f48_Map 1__MAP_PLAN__ in cache with value: org.apache.hadoop.hive.ql.plan.MapWork@74652101 {code} Alternatively, when MapRecordProcessor doesn't find mapWork in tez/ObjectCache (but instead caches mapWork), major compaction succeeds. The failure happens because, if MapWork is reused, GenericUDFValidateAcidSortOrder (which is called during compaction) is also reused on splits belonging to two different buckets, which produces an error. Solution is to avoid storing MapWork in the ObjectCache during query-based compaction. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23966) Minor query-based compaction always results in delta dirs with minWriteId=1
Karen Coppage created HIVE-23966: Summary: Minor query-based compaction always results in delta dirs with minWriteId=1 Key: HIVE-23966 URL: https://issues.apache.org/jira/browse/HIVE-23966 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage Minor compaction after major/IOW will result in directories that look like: * base_z_v * delta_1_y_v * delete_delta_1_y_v Should be: * base_z_v * delta_(z+1)_y_v * delete_delta_(z+1)_y_v -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23838) KafkaRecordIteratorTest is flaky
Karen Coppage created HIVE-23838: Summary: KafkaRecordIteratorTest is flaky Key: HIVE-23838 URL: https://issues.apache.org/jira/browse/HIVE-23838 Project: Hive Issue Type: Bug Components: kafka integration Affects Versions: 4.0.0 Reporter: Karen Coppage Assignee: Karen Coppage Failed on [4th run of flaky test checker|http://ci.hive.apache.org/job/hive-flaky-check/69/] with org.apache.kafka.common.errors.TimeoutException: Timeout expired after 1milliseconds while awaiting InitProducerId -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23703) Major QB compaction with multiple FileSinkOperators results in data loss and one original file
Karen Coppage created HIVE-23703: Summary: Major QB compaction with multiple FileSinkOperators results in data loss and one original file Key: HIVE-23703 URL: https://issues.apache.org/jira/browse/HIVE-23703 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage h4. Problems Example: {code:java} drop table if exists tbl2; create transactional table tbl2 (a int, b int) clustered by (a) into 4 buckets stored as ORC TBLPROPERTIES('transactional'='true','transactional_properties'='default'); insert into tbl2 values(1,2),(1,3),(1,4),(2,2),(2,3),(2,4); insert into tbl2 values(3,2),(3,3),(3,4),(4,2),(4,3),(4,4); insert into tbl2 values(5,2),(5,3),(5,4),(6,2),(6,3),(6,4);{code} E.g. in the example above, bucketId=0 when a=2 and a=6. 1. Data loss In non-acid tables, an operator's temp files are named with their task id. Because of this snippet, temp files in the FileSinkOperator for compaction tables are identified by their bucket_id. {code:java} if (conf.isCompactionTable()) { fsp.initializeBucketPaths(filesIdx, AcidUtils.BUCKET_PREFIX + String.format(AcidUtils.BUCKET_DIGITS, bucketId), isNativeTable(), isSkewedStoredAsSubDirectories); } else { fsp.initializeBucketPaths(filesIdx, taskId, isNativeTable(), isSkewedStoredAsSubDirectories); } {code} So 2 temp files containing data with a=2 and a=6 will be named bucket_0 and not 00_0 and 00_1 as they would normally. In FileSinkOperator.commit, when data with a=2, filename: bucket_0 is moved from _task_tmp.-ext-10002 to _tmp.-ext-10002, it overwrites the files already there with a=6 data, because it too is named bucket_0. You can see in the logs: {code:java} WARN [LocalJobRunner Map Task Executor #0] exec.FileSinkOperator: Target path file:.../hive/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnNoBuckets-1591107230237/warehouse/testmajorcompaction/base_002_v013/.hive-staging_hive_2020-06-02_07-15-21_771_8551447285061957908-1/_tmp.-ext-10002/bucket_0 with a size 610 exists. Trying to delete it. {code} 2. Results in one original file OrcFileMergeOperator merges the results of the FSOp into 1 file named 00_0. h4. Fix 1. FSOp will store data as: taskid/bucketId. e.g. 0_0/bucket_0 2. OrcMergeFileOp, instead of merging a bunch of files into 1 file named 00_0, will merge all files named bucket_0 into one file named bucket_0, and so on. 3. MoveTask will get rid of the taskId directories if present and only move the bucket files in them, in case OrcMergeFileOp is not run. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23678) Don't enforce ASF license headers on target files
Karen Coppage created HIVE-23678: Summary: Don't enforce ASF license headers on target files Key: HIVE-23678 URL: https://issues.apache.org/jira/browse/HIVE-23678 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23608) Change an FS#exists call to FS#isFile call in AcidUtils
Karen Coppage created HIVE-23608: Summary: Change an FS#exists call to FS#isFile call in AcidUtils Key: HIVE-23608 URL: https://issues.apache.org/jira/browse/HIVE-23608 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage Currently S3AFileSystem#isFile and S3AFileSystem#exists have the same implementation. HADOOP-13230 will optimize S3AFileSystem#isFile by only doing a HEAD request for the file; no need for a LIST probe for a directory (isDir will do that). S3AFileSystem#exists will still need both. This and HIVE-23533 will get rid of the last exists() calls in AcidUtils. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23533) Remove an FS#exists call from AcidUtils#getLogicalLength
Karen Coppage created HIVE-23533: Summary: Remove an FS#exists call from AcidUtils#getLogicalLength Key: HIVE-23533 URL: https://issues.apache.org/jira/browse/HIVE-23533 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage {code:java} Path lengths = OrcAcidUtils.getSideFile(file.getPath()); if(!fs.exists(lengths)) { ... return file.getLen(); } long len = OrcAcidUtils.getLastFlushLength(fs, file.getPath()); {code} OrcAcidUtils.getLastFlushLength also has an exists() check and returns Long.MAX_VALUE if false. exists() is expensive on S3. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23531) Major CRUD QB compaction failing with ClassCastException when vectorization off
Karen Coppage created HIVE-23531: Summary: Major CRUD QB compaction failing with ClassCastException when vectorization off Key: HIVE-23531 URL: https://issues.apache.org/jira/browse/HIVE-23531 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage Exception: {code:java} 2020-05-22T01:33:09,944 ERROR [TezChild] tez.MapRecordSource: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:403) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:965) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:174) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552) ... 20 more {code} And some more in Tez. Because when vectorization is turned on, primitives in the row are wrapped in Writables by VectorFileSinkOperator; when it is off, they are not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23506) Move getAcidVersionFrom...File utility methods to TestTxnCommands
Karen Coppage created HIVE-23506: Summary: Move getAcidVersionFrom...File utility methods to TestTxnCommands Key: HIVE-23506 URL: https://issues.apache.org/jira/browse/HIVE-23506 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage They're only used in test, and since they contain expensive file accesses, it's best to remove the temptation to use them -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23492) Remove unnecessary FileSystem#exists calls from ql module
Karen Coppage created HIVE-23492: Summary: Remove unnecessary FileSystem#exists calls from ql module Key: HIVE-23492 URL: https://issues.apache.org/jira/browse/HIVE-23492 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage Wherever there is an exists() call before open() or delete(), remove it and infer from the FileNotFoundException raised in open/delete that the file does not exist. Exists() just checks for a FileNotFoundException so it's a waste of time, especially on clunkier FSes -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23405) Bump org.scala-lang:scala-library version to 2.11.12
Karen Coppage created HIVE-23405: Summary: Bump org.scala-lang:scala-library version to 2.11.12 Key: HIVE-23405 URL: https://issues.apache.org/jira/browse/HIVE-23405 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage 2.11.8 is not secure. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23344) Remove org.scala-lang:scala-compiler 2.11.0 transitive dependency
Karen Coppage created HIVE-23344: Summary: Remove org.scala-lang:scala-compiler 2.11.0 transitive dependency Key: HIVE-23344 URL: https://issues.apache.org/jira/browse/HIVE-23344 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage Lots of org.scala-lang:scala-compiler:jar:2.11.0:compile -> org.scala-lang:scalap:jar:2.11.0:compile -> org.json4s:json4s-core_2.11:jar:3.2.11:compile -> org.json4s:json4s-jackson_2.11:jar:3.2.11:compile -> org.apache.spark:spark-core_2.11:jar:2.3.0:compile Where scala-compiler 2.11.0 is a security vulnerability. Excluding org.json4s:json4s-jackson_2.11:jar:3.2.11 from spark-core_2 dependencies. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23338) Bump jackson-databind version up to 2.9.10.4
Karen Coppage created HIVE-23338: Summary: Bump jackson-databind version up to 2.9.10.4 Key: HIVE-23338 URL: https://issues.apache.org/jira/browse/HIVE-23338 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage com.fasterxml.jackson.core:jackson-databind:2.9.9 is exploitable. And exclude a transitive dependency on com.fasterxml.jackson.core:jackson-databind:2.6.5, which is also exploitable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23284) Remove dependency on mariadb-java-client
Karen Coppage created HIVE-23284: Summary: Remove dependency on mariadb-java-client Key: HIVE-23284 URL: https://issues.apache.org/jira/browse/HIVE-23284 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage It has GNU Lesser General Public License and 1 use is easily replaceable with a string. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23280) Trigger compaction with old aborted txns
Karen Coppage created HIVE-23280: Summary: Trigger compaction with old aborted txns Key: HIVE-23280 URL: https://issues.apache.org/jira/browse/HIVE-23280 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage When a txn is aborted and the compaction threshold for number of aborted txns is not reached then the aborted transaction can remain forever in the RDBMS database. This could result in several serious performance degradations: - getOpenTxns has to list this aborted txn forever - TXN_TO_WRITE_ID table is not cleaned We should add a threshold, so after a given time the compaction is started anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23109) Query-based compaction omits database
Karen Coppage created HIVE-23109: Summary: Query-based compaction omits database Key: HIVE-23109 URL: https://issues.apache.org/jira/browse/HIVE-23109 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage E.g. MM major compaction query looks like: {code:java} insert into tmp_table select * from src_table; {code} it should be {code:java} insert into tmp_table select * from src_db.src_table; {code} Therefore compaction fails if db of source table isn't default. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23072) Can't select from table after minor compaction of MM table with original files
Karen Coppage created HIVE-23072: Summary: Can't select from table after minor compaction of MM table with original files Key: HIVE-23072 URL: https://issues.apache.org/jira/browse/HIVE-23072 Project: Hive Issue Type: Bug Reporter: Karen Coppage I'm assuming minor compaction shouldn't be run on tables containing original files (==files outside a delta/base directory) in general because minor compaction requires that the table have >1 delta directory (see Worker#isEnoughToCompact). However, it is possible under the following circumstance: 1. Create non-transactional table, stored as e.g. textfile or parquet (orc doesn't have this problem) 2. Run a couple inserts -> creates original files 3. Alter table, make insert-only. 4. Run a couple inserts -> creates delta dirs 5. Run minor compaction. The attached unit test recreates these steps and results in the error below [1]. Side notes: If the table was insert-only from the beginning (no original files), then no problem, the split examined/returned is: {code:java} file:/Users/karencoppage/upstream/hive/itests/hive-unit/target/tmp/org.apache.hadoop.hive.ql.txn.compactor.TestCompactor-1585083695675_-875862010/warehouse/mm_nonpart/delta_001_005_v010/00_0 {code} I tried playing around with mapreduce.input.fileinputformat.input.dir.recursive without success since it messes with the ability to ignore delta dirs that shouldn't be read. [1] {code:java} java.io.IOException: java.io.IOException: Not a file: file:/Users/karencoppage/upstream/hive/itests/hive-unit/target/tmp/org.apache.hadoop.hive.ql.txn.compactor.TestCompactor-1585081294879_556631282/warehouse/mm_nonpart/delta_001_003_v011 at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:638) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:545) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:880) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:241) at org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.verifyFooBarResult(TestCompactor.java:1153) at org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.mmTableOriginalsMinor(TestCompactor.java:941) at org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.mmTableOriginalsText(TestCompactor.java:838) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.apache.hive.common.util.Retry$RetryingStatement.evaluate(Retry.java:61) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) Caused by: java.io.IOException: Not a file: file:/Users/karencoppage/upstream/hive/itests/hive-unit/target/tmp/org.apache.hadoop.hive.ql.txn.compactor.TestCompactor-1585081294879_556631282/warehouse/mm_nonpart/delta_001_003_v011 at
[jira] [Created] (HIVE-23023) MR compaction ignores column schema evolution
Karen Coppage created HIVE-23023: Summary: MR compaction ignores column schema evolution Key: HIVE-23023 URL: https://issues.apache.org/jira/browse/HIVE-23023 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage Attachments: HIVE-23023.01.patch create table compaction_error(i int) partitioned by (`part1` string) stored as orc TBLPROPERTIES ('transactional'='true'); insert into table compaction_error values (2, 'aa'); ALTER TABLE compaction_error ADD COLUMNS (newcol string); update compaction_error set newcol='new' where i=2; alter table compaction_error partition (part1='aa') compact 'minor'; --or major -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22985) Failed compaction always throws TxnAbortedException
Karen Coppage created HIVE-22985: Summary: Failed compaction always throws TxnAbortedException Key: HIVE-22985 URL: https://issues.apache.org/jira/browse/HIVE-22985 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage If compaction fails, its txn is aborted, however Worker attempts to commit it again in a finally statement. This results in a TxnAbortedException [1] thrown from TxnHandler#commitTxn We need to add a check and only try to commit at the end if the txn is not aborted.(TxnHandler#commitTxn does nothing if txn is already committed.) [1] {code:java} ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler - TxnAbortedException(message:Transaction txnid:16 already aborted) at org.apache.hadoop.hive.metastore.txn.TxnHandler.raiseTxnUnexpectedState(TxnHandler.java:4843) at org.apache.hadoop.hive.metastore.txn.TxnHandler.commitTxn(TxnHandler.java:1141) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.commit_txn(HiveMetaStore.java:8101) ... at org.apache.hadoop.hive.ql.txn.compactor.Worker.commitTxn(Worker.java:291) at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:269) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22981) DataFileReader is not closed in AvroGenericRecordReader#extractWriterTimezoneFromMetadata
Karen Coppage created HIVE-22981: Summary: DataFileReader is not closed in AvroGenericRecordReader#extractWriterTimezoneFromMetadata Key: HIVE-22981 URL: https://issues.apache.org/jira/browse/HIVE-22981 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage Method looks like : {code} private ZoneId extractWriterTimezoneFromMetadata(JobConf job, FileSplit split, GenericDatumReader gdr) throws IOException { if (job == null || gdr == null || split == null || split.getPath() == null) { return null; } try { DataFileReader dataFileReader = new DataFileReader(new FsInput(split.getPath(), job), gdr); [...return...] } } catch (IOException e) { // Can't access metadata, carry on. } return null; } {code} The DataFileReader is never closed which can cause a memory leak. We need a try-with-resources here. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22971) Eliminate file rename in insert-only compactor
Karen Coppage created HIVE-22971: Summary: Eliminate file rename in insert-only compactor Key: HIVE-22971 URL: https://issues.apache.org/jira/browse/HIVE-22971 Project: Hive Issue Type: Improvement Reporter: Karen Coppage File rename is expensive for object stores, so MM (insert-only) compaction should skip that step when committing and write directly to base_x_cZ or delta_x_y_cZ. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22968) Set hive.parquet.timestamp.time.unit default to micros
Karen Coppage created HIVE-22968: Summary: Set hive.parquet.timestamp.time.unit default to micros Key: HIVE-22968 URL: https://issues.apache.org/jira/browse/HIVE-22968 Project: Hive Issue Type: Task Reporter: Karen Coppage Assignee: Karen Coppage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22902) Incorrect Spark/SparkOnYarn result for auto_sortmerge_join_16.q
Karen Coppage created HIVE-22902: Summary: Incorrect Spark/SparkOnYarn result for auto_sortmerge_join_16.q Key: HIVE-22902 URL: https://issues.apache.org/jira/browse/HIVE-22902 Project: Hive Issue Type: Bug Reporter: Karen Coppage In files {code:java} auto_sortmerge_join_16.q.out_spark [TestMiniSparkOnYarnCliDriver] auto_sortmerge_join_16.q.out [TestSparkCliDriver] {code} at the first run of: {code:java} select a.key , a.value , b.value , 'day1' as day, 1 as pri from ( select key, value from bucket_big_n17 where day='day1' ) a left outer join ( select key, value from bucket_small_n17 where pri between 1 and 2 ) b on (a.key = b.key) {code} the output is (beginning line 444): {code:java} 0 val_0 val_0 day11 0 val_0 val_0 day11 0 val_0 val_0 day11 0 val_0 val_0 day11 0 val_0 val_0 day11 0 val_0 val_0 day11 103 val_103 val_103 day11 103 val_103 val_103 day11 103 val_103 val_103 day11 103 val_103 val_103 day11 169 val_169 NULLday11 172 val_172 val_172 day11 172 val_172 val_172 day11 172 val_172 val_172 day11 172 val_172 val_172 day11 374 val_374 NULLday11 {code} Result should not include NULLs. It should match llap/auto_sortmerge_join_16.q.out, beginning line 461: {code:java} 0 val_0 val_0 day11 0 val_0 val_0 day11 0 val_0 val_0 day11 0 val_0 val_0 day11 0 val_0 val_0 day11 0 val_0 val_0 day11 103 val_103 val_103 day11 103 val_103 val_103 day11 103 val_103 val_103 day11 103 val_103 val_103 day11 169 val_169 val_169 day11 169 val_169 val_169 day11 169 val_169 val_169 day11 169 val_169 val_169 day11 169 val_169 val_169 day11 169 val_169 val_169 day11 169 val_169 val_169 day11 169 val_169 val_169 day11 172 val_172 val_172 day11 172 val_172 val_172 day11 172 val_172 val_172 day11 172 val_172 val_172 day11 374 val_374 val_374 day11 374 val_374 val_374 day11 {code} Looks like this was changed in HIVE-20915. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22875) Refactor query creation in QueryCompactor implementations
Karen Coppage created HIVE-22875: Summary: Refactor query creation in QueryCompactor implementations Key: HIVE-22875 URL: https://issues.apache.org/jira/browse/HIVE-22875 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage There is a lot of repetition where creation/compaction/drop queries are created in MajorQueryCompactor, MinorQueryCompactor, MmMajorQueryCompactor and MmMinorQueryCompactor. Initial idea is to create a CompactionQueryBuilder that all 4 implementations would use. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22863) Commit compaction txn if it is opened but compaction is skipped
Karen Coppage created HIVE-22863: Summary: Commit compaction txn if it is opened but compaction is skipped Key: HIVE-22863 URL: https://issues.apache.org/jira/browse/HIVE-22863 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage Currently if a table does not have enough directories to compact, compaction is skipped and the compaction is either (a) marked ready for cleaning or (b) marked compacted. However, the txn the compaction runs in is never committed, it remains open, so TXNS and TXN_COMPONENTS will never be cleared of information about the attempted compaction. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22862) Remove unnecessary calls to isEnoughToCompact
Karen Coppage created HIVE-22862: Summary: Remove unnecessary calls to isEnoughToCompact Key: HIVE-22862 URL: https://issues.apache.org/jira/browse/HIVE-22862 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage QueryCompactor.Util#isEnoughToCompact is called in Worker#run once before any sort of compaction is run, after this it is called in 3 other places during compaction unnecessarily. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
Karen Coppage created HIVE-22826: Summary: ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names Key: HIVE-22826 URL: https://issues.apache.org/jira/browse/HIVE-22826 Project: Hive Issue Type: Bug Reporter: Karen Coppage Attachments: unitTest.patch Compaction for tables where a bucketed column has been renamed fails since the list of bucketed columns in the StorageDescriptor doesn't get updated when the column is renamed, therefore we can't recreate the table correctly during compaction. Attached a unit test that fails. NO PRECOMMIT TESTS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22762) Leap day is incorrectly parsed during cast in Hive
Karen Coppage created HIVE-22762: Summary: Leap day is incorrectly parsed during cast in Hive Key: HIVE-22762 URL: https://issues.apache.org/jira/browse/HIVE-22762 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Karen Coppage Assignee: Karen Coppage Fix For: 4.0.0 While casting a string to a date with a custom date format having day token before year and moth tokens, the date is parsed incorrectly for leap days. h3. How to reproduce Execute {code}select cast("29 02 0" as date format "dd mm rr"){code} with Hive. The query results in *2020-02-28*, incorrectly. Executing the another cast with a slightly modified representation of the date (day is preceded by year and moth) is however correctly parsed: {code}select cast("0 02 29" as date format "rr mm dd"){code} It returns *2020-02-29*. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22763) 0 is accepted in 12-hour format during timestamp cast
Karen Coppage created HIVE-22763: Summary: 0 is accepted in 12-hour format during timestamp cast Key: HIVE-22763 URL: https://issues.apache.org/jira/browse/HIVE-22763 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Karen Coppage Assignee: Karen Coppage Fix For: 4.0.0 Having a timestamp string in 12-hour format can be parsed if the hour is 0, however, based on the [design document|https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/edit], it should be rejected. h3. How to reproduce Run {code}select cast("2020-01-01 0 am 00" as timestamp format "-mm-dd hh12 p.m. ss"){code} It shouldn' t be parsed, as the hour component is 0. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22646) CTASing a dynamically partitioned MM table results in unreadable table
Karen Coppage created HIVE-22646: Summary: CTASing a dynamically partitioned MM table results in unreadable table Key: HIVE-22646 URL: https://issues.apache.org/jira/browse/HIVE-22646 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage Repro steps: {code:java} create table plain (i int, j int, s string); insert into plain values (1,1,'1'); create table ctas partitioned by (s) tblproperties ('transactional'='true', 'transactional_properties' = 'insert_only') as select * from plain; select * from ctas; {code} We get this error: {code:java} Error: java.io.IOException: java.io.IOException: Not a file: file:/Users/karencoppage/data/upstream/warehouse/ctas/s=1/delta_002_002_/delta_002_002_ (state=,code=0){code} This also happens when CTASing from a dynamically partitioned table. As seen in the error message, the issue is that a new delta directory is created in the temp directory, and during MoveTask another delta dir is unnecessarily created, then the first delta dir is moved into the second. The table is unreadable since a file and not another delta dir is expected in the top delta dir. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22610) Minor compaction for MM (insert-only) tables
Karen Coppage created HIVE-22610: Summary: Minor compaction for MM (insert-only) tables Key: HIVE-22610 URL: https://issues.apache.org/jira/browse/HIVE-22610 Project: Hive Issue Type: Sub-task Reporter: Karen Coppage Assignee: Karen Coppage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22593) Dynamically partitioned MM (insert-only ACID) table isn't compacting automatically
Karen Coppage created HIVE-22593: Summary: Dynamically partitioned MM (insert-only ACID) table isn't compacting automatically Key: HIVE-22593 URL: https://issues.apache.org/jira/browse/HIVE-22593 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage Dynamic partitions of MM tables aren't entered into the HMS table TXN_COMPONENTS. On inserting into such tables we see this line in the HMS log: {code:java} Expected to move at least one record from txn_components to completed_txn_components when committing txn!{code} (This is not the case for non-partitioned MM tables.) Since the partitions aren't entered into COMPLETED_TXN_COMPONENTS, they aren't considered for automatic compaction. Probably the culprit is org.apache.hadoop.hive.ql.metadata.Hive#loadDynamicPartitions which has an isAcid parameter that is always false regarding MM tables, and also because MM tables' "write type" is AcidUtils.Operation.NOT_ACID and not INSERT. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22592) Remove redundant calls to AcidUtils#getAcidState in Worker and CompactorMR
Karen Coppage created HIVE-22592: Summary: Remove redundant calls to AcidUtils#getAcidState in Worker and CompactorMR Key: HIVE-22592 URL: https://issues.apache.org/jira/browse/HIVE-22592 Project: Hive Issue Type: Improvement Affects Versions: 4.0.0 Reporter: Karen Coppage Assignee: Karen Coppage AcidUtils#getAcidState is called in the Worker before CompactorMR#run and again inside CompactorMR#run. Since it's costly to call, we can pass the value as an argument to CompactorMR#run. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22483) Vectorize UDF datetime_legacy_hybrid_calendar
Karen Coppage created HIVE-22483: Summary: Vectorize UDF datetime_legacy_hybrid_calendar Key: HIVE-22483 URL: https://issues.apache.org/jira/browse/HIVE-22483 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22422) Missing documentation from HiveSqlDateTimeFormatter: list of date-based patterns
Karen Coppage created HIVE-22422: Summary: Missing documentation from HiveSqlDateTimeFormatter: list of date-based patterns Key: HIVE-22422 URL: https://issues.apache.org/jira/browse/HIVE-22422 Project: Hive Issue Type: Task Components: Documentation Affects Versions: 4.0.0 Reporter: Karen Coppage Assignee: Karen Coppage Attachments: HIVE-22422.01.patch Documentation referenced a "List of Date-Based Patterns" but didn't include the list anywhere. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22416) MR-related operation logs missing when parallel execution is enabled
Karen Coppage created HIVE-22416: Summary: MR-related operation logs missing when parallel execution is enabled Key: HIVE-22416 URL: https://issues.apache.org/jira/browse/HIVE-22416 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage Repro steps: 1. Happy path, parallel execution disabled {code:java} 0: jdbc:hive2://localhost:1> set hive.exec.parallel=false; No rows affected (0.023 seconds) 0: jdbc:hive2://localhost:1> select count (*) from t1; INFO : Compiling command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d): select count (*) from t1 INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:c0, type:bigint, comment:null)], properties:null) INFO : Completed compiling command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d); Time taken: 0.309 seconds INFO : Executing command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d): select count (*) from t1 WARN : INFO : Query ID = karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Number of reduce tasks determined at compile time: 1 INFO : In order to change the average load for a reducer (in bytes): INFO : set hive.exec.reducers.bytes.per.reducer= INFO : In order to limit the maximum number of reducers: INFO : set hive.exec.reducers.max= INFO : In order to set a constant number of reducers: INFO : set mapreduce.job.reduces= DEBUG : Configuring job job_local495362389_0008 with file:/tmp/hadoop/mapred/staging/karencoppage495362389/.staging/job_local495362389_0008 as the submit dir DEBUG : adding the following namenodes' delegation tokens:[file:///] DEBUG : Creating splits at file:/tmp/hadoop/mapred/staging/karencoppage495362389/.staging/job_local495362389_0008 INFO : number of splits:0 INFO : Submitting tokens for job: job_local495362389_0008 INFO : Executing with tokens: [] INFO : The url to track the job: http://localhost:8080/ INFO : Job running in-process (local Hadoop) INFO : 2019-10-28 15:26:22,537 Stage-1 map = 0%, reduce = 100% INFO : Ended Job = job_local495362389_0008 INFO : MapReduce Jobs Launched: INFO : Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS INFO : Total MapReduce CPU Time Spent: 0 msec INFO : Completed executing command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d); Time taken: 6.497 seconds INFO : OK DEBUG : Shutting down query select count (*) from t1 +-+ | c0 | +-+ | 0 | +-+ 1 row selected (11.874 seconds) {code} 2. Faulty path, parallel execution enabled {code:java} 0: jdbc:hive2://localhost:1> set hive.server2.logging.operation.level=EXECUTION; No rows affected (0.236 seconds) 0: jdbc:hive2://localhost:1> set hive.exec.parallel=true; No rows affected (0.01 seconds) 0: jdbc:hive2://localhost:1> select count (*) from t1; INFO : Compiling command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77): select count (*) from t1 INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:c0, type:bigint, comment:null)], properties:null) INFO : Completed compiling command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77); Time taken: 4.707 seconds INFO : Executing command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77): select count (*) from t1 WARN : INFO : Query ID = karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in parallel INFO : MapReduce Jobs Launched: INFO : Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS INFO : Total MapReduce CPU Time Spent: 0 msec INFO : Completed executing command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77); Time taken: 44.577 seconds INFO : OK DEBUG : Shutting down query select count (*) from t1 +-+ | c0 | +-+ | 0 | +-+ 1 row selected (54.665 seconds) {code} The issue is that Log4J stores the session ID and query ID in some atomic thread metadata (org.apache.logging.log4j.ThreadContext.getImmutableContext()). If the queryId is missing from this metadata, then the RoutingAppender (which is defined programmatically in LogDivertAppender) will route the log to a NullAppender, which logs nothing. If the queryId is present, then the RoutingAppender routes the event to the "query-appender" logger, which will log the line in the operation log/Beeline. This is not happening in a multi-threaded context since new threads created for parallel query execution do not have the queryId/sessionId
[jira] [Created] (HIVE-22330) Maximize smallBuffer usage in BytesColumnVector
Karen Coppage created HIVE-22330: Summary: Maximize smallBuffer usage in BytesColumnVector Key: HIVE-22330 URL: https://issues.apache.org/jira/browse/HIVE-22330 Project: Hive Issue Type: Improvement Affects Versions: 4.0.0 Reporter: Karen Coppage Assignee: Karen Coppage When BytesColumnVector is populated with values, it either creates a new (byte[]) buffer object to help take the values, but if the values array is <=1MB, then instead of creating a new buffer it reuses a single "smallBuffer". Every time the smallBuffer is too small for the data we want to store there, the size is doubled; when the size ends up larger than 1 GB (or Integer.MAX_VALUE / 2) then the next time we try to double the size, overflow occurs and an error is thrown. A quick fix here is to set the smallBuffer size to Integer.MAX_VALUE in this case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-21888) Set hive.parquet.timestamp.skip.conversion default to true
Karen Coppage created HIVE-21888: Summary: Set hive.parquet.timestamp.skip.conversion default to true Key: HIVE-21888 URL: https://issues.apache.org/jira/browse/HIVE-21888 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21868) Vectorize CAST...FORMAT
Karen Coppage created HIVE-21868: Summary: Vectorize CAST...FORMAT Key: HIVE-21868 URL: https://issues.apache.org/jira/browse/HIVE-21868 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage Introduce FORMAT clause to CAST statements as well as the below limited list of SQL:2016 datetime formats to Hive in general. * * MM * DD * HH * MI * SS * YYY * YY * Y * * RR * DDD * HH12 * HH24 * S * FF[1..9] * AM/A.M. * PM/P.M. * TZH * TZM Definitions of these formats here: [https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/|https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/edit] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21580) Introduce ISO 8601 week numbering SQL:2016 formats
Karen Coppage created HIVE-21580: Summary: Introduce ISO 8601 week numbering SQL:2016 formats Key: HIVE-21580 URL: https://issues.apache.org/jira/browse/HIVE-21580 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Enable Hive to parse the following datetime formats when any combination/subset of these or previously implemented patterns is provided in one string. Also catch combinations that conflict. * IYYY * IYY * IY * I * IW [https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/edit] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21579) Introduce more complex SQL:2016 datetime formats
Karen Coppage created HIVE-21579: Summary: Introduce more complex SQL:2016 datetime formats Key: HIVE-21579 URL: https://issues.apache.org/jira/browse/HIVE-21579 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Enable Hive to parse the following datetime formats when any combination/subset of these or previously implemented patterns is provided in one string. Also catch combinations that conflict. * MONTH * MON * D * DAY * DY * Q * WW * W [https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/edit] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21578) Introduce SQL:2016 formats FM, FX, and nested strings
Karen Coppage created HIVE-21578: Summary: Introduce SQL:2016 formats FM, FX, and nested strings Key: HIVE-21578 URL: https://issues.apache.org/jira/browse/HIVE-21578 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Enable Hive to parse the following datetime formats when any combination or subset of these or previously implemented formats is provided in one string. * "text" (nested strings) * FM * FX [Definitions here|https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/edit] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21577) Introduce remaining basic SQL:2016 formats
Karen Coppage created HIVE-21577: Summary: Introduce remaining basic SQL:2016 formats Key: HIVE-21577 URL: https://issues.apache.org/jira/browse/HIVE-21577 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Introduce the following SQL:2016 formats to Hive: * YYY * YY * Y * * RR * DDD * HH12 * HH24 * S * FF[1..9] * AM/A.M. * PM/P.M. * TZH * TZM -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21576) Introduce CAST...FORMAT and limited list of SQL:2016 datetime formats
Karen Coppage created HIVE-21576: Summary: Introduce CAST...FORMAT and limited list of SQL:2016 datetime formats Key: HIVE-21576 URL: https://issues.apache.org/jira/browse/HIVE-21576 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Introduce FORMAT clause to CAST statements as well as the below limited list of SQL:2016 datetime formats to Hive in general. These can be used if a session-level feature flag is turned on. * * MM * DD * HH * MI * SS Definitions of these formats here: [https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/|https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/edit] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21575) Add support for SQL:2016 datetime templates/patterns/masks and CAST(... AS ... FORMAT )
Karen Coppage created HIVE-21575: Summary: Add support for SQL:2016 datetime templates/patterns/masks and CAST(... AS ... FORMAT ) Key: HIVE-21575 URL: https://issues.apache.org/jira/browse/HIVE-21575 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage *Summary* Timestamp and date handling and formatting is currently implemented in Hive using (sometimes very specific) [Java SimpleDateFormat patterns|http://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html] , however, it is not what most standard SQL systems use. For example see [Vertica|https://my.vertica.com/docs/7.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Formatting/TemplatePatternsForDateTimeFormatting.htm], [Netezza|http://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_ntz_sql_extns_templ_patterns_date_time_conv.html], [Oracle|https://docs.oracle.com/database/121/SQLRF/sql_elements004.htm#SQLRF00212], and [PostgreSQL|https://www.postgresql.org/docs/9.5/static/functions-formatting.html#FUNCTIONS-FORMATTING-DATETIME-TABLE]. *Cast...Format* SQL:2016 also introduced the FORMAT clause for CAST which is the standard way to do string <-> datetime conversions For example: {code:java} CAST( AS [FORMAT ]) CAST( AS [FORMAT ]) cast(dt as string format 'DD-MM-') cast('01-05-2017' as date format 'DD-MM-') {code} [Stuff like this|http://bigdataprogrammers.com/string-date-conversion-hive/] wouldn't need to happen. *New SQL:2016 Patterns* Examples: {code:java} --old SimpleDateFormat select date_format('2015-05-15 12:00:00', 'MMM dd, HH:mm:ss'); --new SQL:2016 format select date_format('2015-05-15 12:00:00', 'mon dd, hh:mi:ss'); {code} Some other conflicting examples: SimpleDateTime: 'MMM dd, HH:mm:ss' SQL:2016: 'mon dd, hh:mi:ss' SimpleDateTime: '-MM-dd HH:mm:ss' SQL:2016: '-mm-dd hh24:mi:ss' We will have a session-level feature flag to revert to the legacy Java SimpleDateFormat patterns. This would allow users to chose the behavior they desire and scope it to a session if need be. For the full list of patterns, see subsection "Proposal for Impala’s datetime patterns" in this doc: [https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/edit] *Existing Hive functions affected* Other functions use SimpleDateFormat internally; these are the ones afaik where SimpleDateFormat or some similar format is part of the input: * from_unixtime(bigint unixtime[, string format]) * unix_timestamp(string date, string pattern) * to_unix_timestamp(date[, pattern]) * add_months(string start_date, int num_months, output_date_format) * trunc(string date, string format) - currently only supports 'MONTH'/'MON'/'MM', 'QUARTER'/'Q' and 'YEAR'/''/'YY' as format. * date_format(date/timestamp/string ts, string fmt) This description is a heavily edited description of IMPALA-4018. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21525) [cosmetic] reformat code in NanoTimeUtils.java
Karen Coppage created HIVE-21525: Summary: [cosmetic] reformat code in NanoTimeUtils.java Key: HIVE-21525 URL: https://issues.apache.org/jira/browse/HIVE-21525 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage Fix For: 4.0.0 indentation is off by 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21333) [trivial] Fix argument order in TestDateWritableV2#setupDateStrings
Karen Coppage created HIVE-21333: Summary: [trivial] Fix argument order in TestDateWritableV2#setupDateStrings Key: HIVE-21333 URL: https://issues.apache.org/jira/browse/HIVE-21333 Project: Hive Issue Type: Improvement Affects Versions: 4.0.0 Reporter: Karen Coppage Assignee: Karen Coppage Fix For: 4.0.0 Calendar#add (int field, int amount) is given parameters (1, Calendar.DAY_OF_YEAR) which i presume is backwards especially since this method is called 365 times. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21234) Disable negative timestamps
Karen Coppage created HIVE-21234: Summary: Disable negative timestamps Key: HIVE-21234 URL: https://issues.apache.org/jira/browse/HIVE-21234 Project: Hive Issue Type: Improvement Components: Hive Reporter: Karen Coppage Our Wiki specifies a range for DATE, but not for TIMESTAMP (well, there's a specified format () but no explicitly specified range). [1] TIMESTAMP used to have inner representation of java.sql.Timestamp which couldn't handle timestamps outside of the range of years -. ( converted to 0001) Since the inner representation was changed to LocalDateTime (HIVE-20007), negative timestamps overflow because of a formatting error. I propose simply disabling negative timestamps. No data is much better than bad data. See [2] for more details. [1] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-TimestampstimestampTimestamps [2] https://docs.google.com/document/d/1y-GcyzzALXM2AJB3bFuyTAEq5fq6p41gu5eH1pF8I7o/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21215) Read Parquet INT64 timestamp
Karen Coppage created HIVE-21215: Summary: Read Parquet INT64 timestamp Key: HIVE-21215 URL: https://issues.apache.org/jira/browse/HIVE-21215 Project: Hive Issue Type: New Feature Reporter: Karen Coppage Assignee: Marta Kuczora [WIP] This patch enables Hive to start reading timestamps from Parquet written with the new semantics: With Parquet version 1.11, a new timestamp LogicalType with base INT64 and the following metadata is introduced: * boolean isAdjustedToUtc: marks whether the timestamp is converted to UTC (aka Instant semantics) or not (LocalDateTime semantics). * enum TimeUnit (NANOS, MICROS, MILLIS): granularity of timestamp Upon reading, the semantics of these new timestamps will be determined by their metadata, while the semantics of INT96 timestamps will continue to be deduced from the writer metadata. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21216) Write Parquet INT64 timestamp
Karen Coppage created HIVE-21216: Summary: Write Parquet INT64 timestamp Key: HIVE-21216 URL: https://issues.apache.org/jira/browse/HIVE-21216 Project: Hive Issue Type: New Feature Components: Hive Reporter: Karen Coppage Assignee: Karen Coppage [WIP] This patch enables Hive to start writing int64 timestamps in Parquet. With Parquet version 1.11, a new timestamp LogicalType with base INT64 and the following metadata is introduced: boolean isAdjustedToUtc: marks whether the timestamp is converted to UTC (aka Instant semantics) or not (LocalDateTime semantics) enum TimeUnit (NANOS, MICROS, MILLIS): granularity of timestamp The timestamp will have LocalDateTime semantics (not converted to UTC). Time unit (granularity) will be determined by the user. Default is milliseconds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21095) 'Show create table' should not display a time zone for timestamp with local time zone
Karen Coppage created HIVE-21095: Summary: 'Show create table' should not display a time zone for timestamp with local time zone Key: HIVE-21095 URL: https://issues.apache.org/jira/browse/HIVE-21095 Project: Hive Issue Type: Improvement Reporter: Karen Coppage Assignee: Karen Coppage SHOW CREATE TABLE shows the time zone that the table was created in (if it contains a TIMESTAMPTZ column). This is also misleading, since it might have nothing to do with the actual data. e.g. {code:java} hive> set time zone America/Los_Angeles; hive> create table text_local (ts timestamp with local time zone) stored as textfile; hive> show create table text_local; CREATE TABLE `text_local`( `ts` timestamp with local time zone('America/Los_Angeles')) {code} should be: {code:java} hive> show create table text_local; CREATE TABLE `text_local`( `ts` timestamp with local time zone) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21094) Store TIMESTAMP WITH LOCAL TIME ZONE in UTC instead of writer's time zone
Karen Coppage created HIVE-21094: Summary: Store TIMESTAMP WITH LOCAL TIME ZONE in UTC instead of writer's time zone Key: HIVE-21094 URL: https://issues.apache.org/jira/browse/HIVE-21094 Project: Hive Issue Type: Improvement Components: Hive Reporter: Karen Coppage Assignee: Karen Coppage TIMESTAMP WITH LOCAL TIME ZONE (aka TIMESTAMPTZ) is stored in writer's local time, and the writer's zone is stored with it. When reading, the timestamp in reader local time + reader zone is displayed. This is misleading for the user, since it looks like all the data was written in the reader's time zone. TIMESTAMPTZ should be stored in UTC time and be displayed in reader local time (as it was before) but without the reader's time zone. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21050) Upgrade Parquet to 1.12.0 and use LogicalTypes
Karen Coppage created HIVE-21050: Summary: Upgrade Parquet to 1.12.0 and use LogicalTypes Key: HIVE-21050 URL: https://issues.apache.org/jira/browse/HIVE-21050 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Karen Coppage Assignee: Karen Coppage [WIP; contains necessary jars until Parquet community releases version 1.12.0] The new Parquet version (1.12.0) uses [LogicalTypes|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md] instead of OriginalTypes. These are backwards-compatible with OriginalTypes. Thanks to [~kuczoram] for her work on this patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)