[jira] [Created] (HIVE-26470) Remove stringifyException from MetaStore
David Mollitor created HIVE-26470: - Summary: Remove stringifyException from MetaStore Key: HIVE-26470 URL: https://issues.apache.org/jira/browse/HIVE-26470 Project: Hive Issue Type: Sub-task Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26469) Remove stringifyException Method From QL Package
David Mollitor created HIVE-26469: - Summary: Remove stringifyException Method From QL Package Key: HIVE-26469 URL: https://issues.apache.org/jira/browse/HIVE-26469 Project: Hive Issue Type: Sub-task Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26456) Remove stringifyException Method From Storage Handlers
David Mollitor created HIVE-26456: - Summary: Remove stringifyException Method From Storage Handlers Key: HIVE-26456 URL: https://issues.apache.org/jira/browse/HIVE-26456 Project: Hive Issue Type: Sub-task Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-25831) Report Progress on Every Record Read for CompactorMR
David Mollitor created HIVE-25831: - Summary: Report Progress on Every Record Read for CompactorMR Key: HIVE-25831 URL: https://issues.apache.org/jira/browse/HIVE-25831 Project: Hive Issue Type: Improvement Reporter: David Mollitor Progress should be updated for every read of an input {quote} reads an input, writes an output, nor updates its status string {quote} https://github.com/apache/hive/blob/fffb31f2346df2b8011a9949895de21f506c0117/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java#L813-L828 I think ever loop should simply be calling {{progress()}}. If during a major compaction there are a lot of deleted values, long gaps of time can occur without a progress update and the job may be timed out by YARN. I'm not 100% sure this is happening, but just something I wanted to point out. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25544) Remove Dependency of hive-meta-common From hive-common
David Mollitor created HIVE-25544: - Summary: Remove Dependency of hive-meta-common From hive-common Key: HIVE-25544 URL: https://issues.apache.org/jira/browse/HIVE-25544 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor These two things should not be linked and it means any HS2 client libraries pulling in hive-common library also has to pull in a ton of metastore code as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25543) Add Read-Only Capability to ObjectStore
David Mollitor created HIVE-25543: - Summary: Add Read-Only Capability to ObjectStore Key: HIVE-25543 URL: https://issues.apache.org/jira/browse/HIVE-25543 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Recently saw some stack-traces that shows that calling "commit" triggers quite a bit of work within DataNucleus, as I understand it, to look for changes in the transaction and to commit those changes. Given that many of the RPCs within the Metastore are look-ups, Hive can avoid all these needless work by making transaction read-only (rollbackOnly). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25542) Remove References to hive.optimize.index.filter
David Mollitor created HIVE-25542: - Summary: Remove References to hive.optimize.index.filter Key: HIVE-25542 URL: https://issues.apache.org/jira/browse/HIVE-25542 Project: Hive Issue Type: Improvement Reporter: David Mollitor Hive indexes were removed from 4.x series. Please remove all references to the Index configurations For example: hive.optimize.index.filter Also update the docs: https://cwiki.apache.org/confluence/display/hive/configuration+properties -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25495) Upgrade to JLine3
David Mollitor created HIVE-25495: - Summary: Upgrade to JLine3 Key: HIVE-25495 URL: https://issues.apache.org/jira/browse/HIVE-25495 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Jline 2 has been discontinued a long while ago. Hadoop uses JLine3 so Hive should match. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25477) Clean Up JDBC Code
David Mollitor created HIVE-25477: - Summary: Clean Up JDBC Code Key: HIVE-25477 URL: https://issues.apache.org/jira/browse/HIVE-25477 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor * Remove unused imports * Remove unused code -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25476) Remove Unused Dependencies for JDBC Driver
David Mollitor created HIVE-25476: - Summary: Remove Unused Dependencies for JDBC Driver Key: HIVE-25476 URL: https://issues.apache.org/jira/browse/HIVE-25476 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor I am using JDBC driver in a project and was very surprised by the number of dependencies it has. Remove some unnecessary dependencies to make it a little easier to work with. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25320) Remove hive.optimize.sort.dynamic.partition
David Mollitor created HIVE-25320: - Summary: Remove hive.optimize.sort.dynamic.partition Key: HIVE-25320 URL: https://issues.apache.org/jira/browse/HIVE-25320 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: David Mollitor {{hive.optimize.sort.dynamic.partition}} has been replace with {{hive.optimize.sort.dynamic.partition.threshold}} . It has been marked as "deprecated", but it's actually totally defunct in the current code base. Deprecation would allow an admin to continue to use it (maybe as an alias to {{threadshold}} = 0/-1), but that is not the case here. Remove all references to "hive.optimize.sort.dynamic.partition" in the q tests and remove {{HIVEOPTSORTDYNAMICPARTITION}} all together. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25319) Allow HiveDecimalWritable to Accept Java BigDecimal
David Mollitor created HIVE-25319: - Summary: Allow HiveDecimalWritable to Accept Java BigDecimal Key: HIVE-25319 URL: https://issues.apache.org/jira/browse/HIVE-25319 Project: Hive Issue Type: Improvement Components: storage-api Reporter: David Mollitor Add support for {{set}} in {{HiveDecimalWritable}} of a Java BigDecimal value. Also, the unit tests in {{TestHiveDecimalWritable}} are really lacking. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25241) Simplify Metrics System
David Mollitor created HIVE-25241: - Summary: Simplify Metrics System Key: HIVE-25241 URL: https://issues.apache.org/jira/browse/HIVE-25241 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Took a look at the {{Metrics}} stuff in Hive and found a lot of boilerplate code on the client code to interact with Metrics. It's too much stuff and it's done differently in different places. * Never allow Metrics System to be "null" - supply a no-op version by default * Metrics system should never throw an error to the client, just log-and-ignore. Metrics shouldn't break a query or other operation * General cleanup -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25235) Remove ThreadPoolExecutorWithOomHook
David Mollitor created HIVE-25235: - Summary: Remove ThreadPoolExecutorWithOomHook Key: HIVE-25235 URL: https://issues.apache.org/jira/browse/HIVE-25235 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: David Mollitor Assignee: David Mollitor While I was looking at [HIVE-24846] to better perform OOM logging and I just realized that this is not a good way to handle OOM. https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java bq. there's likely no easy way for you to recover from it if you do catch it If we want to handle OOM, it's best to do it from outside. It's be to do it with the JVM facilities: {{-XX:+ExitOnOutOfMemoryError}} {{-XX:OnOutOfMemoryError}} It seems odd that the OOM handler attempts to load a handler and then do more work when clearly the server is hosed at this point and just requesting to do more work will further add to memory pressure. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25185) Improve Logging On Polling Tez Session from Pool
David Mollitor created HIVE-25185: - Summary: Improve Logging On Polling Tez Session from Pool Key: HIVE-25185 URL: https://issues.apache.org/jira/browse/HIVE-25185 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25184) ReExecDriver Only Run afterExecute If No Exceptions
David Mollitor created HIVE-25184: - Summary: ReExecDriver Only Run afterExecute If No Exceptions Key: HIVE-25184 URL: https://issues.apache.org/jira/browse/HIVE-25184 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25177) Add Additional Debugging Help for HBase Reader
David Mollitor created HIVE-25177: - Summary: Add Additional Debugging Help for HBase Reader Key: HIVE-25177 URL: https://issues.apache.org/jira/browse/HIVE-25177 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25176) Print DAG ID to Console
David Mollitor created HIVE-25176: - Summary: Print DAG ID to Console Key: HIVE-25176 URL: https://issues.apache.org/jira/browse/HIVE-25176 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25157) Clean up QueryResultsCache Code
David Mollitor created HIVE-25157: - Summary: Clean up QueryResultsCache Code Key: HIVE-25157 URL: https://issues.apache.org/jira/browse/HIVE-25157 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor * Remove superfluous code * Simplify lock usage (remove instances of {{synchronization}}) * Re-do "LRU" map. The original contributor's understanding of {{LinkedHashMap}} as a {LRU} map is incorrect. * Simplify code with Guava {{Multimap}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25152) Remove Superfluous Logging Code
David Mollitor created HIVE-25152: - Summary: Remove Superfluous Logging Code Key: HIVE-25152 URL: https://issues.apache.org/jira/browse/HIVE-25152 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor So much logging code can be removed to lessen the amount of code in the project (and perhaps some small performance gains). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25151) Remove Unused Interner from HiveMetastoreChecker
David Mollitor created HIVE-25151: - Summary: Remove Unused Interner from HiveMetastoreChecker Key: HIVE-25151 URL: https://issues.apache.org/jira/browse/HIVE-25151 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25144) Add NoReconnect Annotation to Create AlreadyExistsException Methods
David Mollitor created HIVE-25144: - Summary: Add NoReconnect Annotation to Create AlreadyExistsException Methods Key: HIVE-25144 URL: https://issues.apache.org/jira/browse/HIVE-25144 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with {{AlreadyExistsException}} even though the table does absolutely not exist. I believe the issue is there there is a timeout/transient error with HMS and the backend database. So, the client submits the request to HMS, and the request does eventually succeed, but only after the connection to the client connects. Therefore, when the HMS Client "retry" functionality kicks it, the second time around, the table looks like it already exists. If something goes wrong during a HMS CREATE operation, we do not know the state of the operation and therefore it should just fail. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25143) Improve ERROR Logging in QL Package
David Mollitor created HIVE-25143: - Summary: Improve ERROR Logging in QL Package Key: HIVE-25143 URL: https://issues.apache.org/jira/browse/HIVE-25143 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25141) Review Error Level Logging in HMS Module
David Mollitor created HIVE-25141: - Summary: Review Error Level Logging in HMS Module Key: HIVE-25141 URL: https://issues.apache.org/jira/browse/HIVE-25141 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor * Remove "log *and* throw" (it should be one or the other * Remove superfluous code * Ensure the stack traces are being logged (and not just the Exception message) to ease troubleshooting * Remove double-printing the Exception message (SLF4J dictates that the Exception message will be printed as part of the logger's formatting -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25136) Remove MetaExceptions From RawStore First Cut
David Mollitor created HIVE-25136: - Summary: Remove MetaExceptions From RawStore First Cut Key: HIVE-25136 URL: https://issues.apache.org/jira/browse/HIVE-25136 Project: Hive Issue Type: Sub-task Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25128) Remove Thrift Exceptions From RawStore alterCatalog
David Mollitor created HIVE-25128: - Summary: Remove Thrift Exceptions From RawStore alterCatalog Key: HIVE-25128 URL: https://issues.apache.org/jira/browse/HIVE-25128 Project: Hive Issue Type: Sub-task Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25127) Update getCatalogs
David Mollitor created HIVE-25127: - Summary: Update getCatalogs Key: HIVE-25127 URL: https://issues.apache.org/jira/browse/HIVE-25127 Project: Hive Issue Type: Sub-task Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25126) Remove Thrift Exceptions From RawStore
David Mollitor created HIVE-25126: - Summary: Remove Thrift Exceptions From RawStore Key: HIVE-25126 URL: https://issues.apache.org/jira/browse/HIVE-25126 Project: Hive Issue Type: Improvement Reporter: David Mollitor Remove all references to NoSuchObjectException/InvalidOperationException/MetaException from the method signature of RawStore. These Exceptions are generated by Thrift and are used to communicate error conditions across the wire. They are not designed for use as part of the underlying stack, yet in Hive, they have been pushed down into these data access operators. The RawStore should not have to be this tightly coupled to the transport layer. Remove all checked Exceptions from RawStore in favor of Hive runtime exceptions. This is a popular format and is used (and therefore dovetails nicely) with the underlying database access library DataNucleaus. All of the logging of un-checked Exceptions, and transforming them into Thrift exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25112) Simplify TXN Compactor Heartbeat Thread
David Mollitor created HIVE-25112: - Summary: Simplify TXN Compactor Heartbeat Thread Key: HIVE-25112 URL: https://issues.apache.org/jira/browse/HIVE-25112 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Simplify the Thread structure. Threads do not need a "start"/"stop" state, they already have it. It is running/interrupted and it is designed to work this way with thread pools and forced exits. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25111) Metastore Catalog Methods JDO Persistence
David Mollitor created HIVE-25111: - Summary: Metastore Catalog Methods JDO Persistence Key: HIVE-25111 URL: https://issues.apache.org/jira/browse/HIVE-25111 Project: Hive Issue Type: Sub-task Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25110) Upgrade JDO Persistence to Use DN5 Features
David Mollitor created HIVE-25110: - Summary: Upgrade JDO Persistence to Use DN5 Features Key: HIVE-25110 URL: https://issues.apache.org/jira/browse/HIVE-25110 Project: Hive Issue Type: Improvement Components: Metastore, Standalone Metastore Reporter: David Mollitor Assignee: David Mollitor Hive has updated DataNucealus for Hive v4 but is not taking advantage of new features and paradigms. There's a ton of code in Hive that can be removed in favor or relying on the underlying libraries using their best practices. https://www.datanucleus.org/products/accessplatform_5_2/index.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25108) Do Not Log and Throw MetaExceptions
David Mollitor created HIVE-25108: - Summary: Do Not Log and Throw MetaExceptions Key: HIVE-25108 URL: https://issues.apache.org/jira/browse/HIVE-25108 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor "Log and throw" is a bad pattern and leads to logging the same error multiple times. There is code in Hive that explicitly implements this behavior and should therefore be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24875) Unify InetAddress.getLocalHost()
David Mollitor created HIVE-24875: - Summary: Unify InetAddress.getLocalHost() Key: HIVE-24875 URL: https://issues.apache.org/jira/browse/HIVE-24875 Project: Hive Issue Type: Improvement Reporter: David Mollitor Lots of calls in the Hive code to {{InetAddress.getLocalHost()}}. This should be standardized onto hive-common {{ServerUtils.hostname()}}, which includes removing (deprecating) a similar method in {{HiveStringUtils}}. Open to anyone to improve. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24850) Don't Cache SQL Text in Hive Query Results Cache
David Mollitor created HIVE-24850: - Summary: Don't Cache SQL Text in Hive Query Results Cache Key: HIVE-24850 URL: https://issues.apache.org/jira/browse/HIVE-24850 Project: Hive Issue Type: Improvement Reporter: David Mollitor In class {{QueryResultsCache}}, the Map used to map queries to results is keyed on the query string, but we have no idea how large those strings are. Instead, hash the MD5 (SHA256) of each query instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24847) Remove LOAD DATA Command
David Mollitor created HIVE-24847: - Summary: Remove LOAD DATA Command Key: HIVE-24847 URL: https://issues.apache.org/jira/browse/HIVE-24847 Project: Hive Issue Type: Improvement Reporter: David Mollitor Please remove this confusing feature. As I understand it, this as an artifact of a previous era in Hive and that the best way to do this now is to create an {{EXTERNAL}} table then {{INSERT INTO .. SELECT * FROM ...}} the table into a managed table. The benefit of this is that table stats are collected during the {{INSERT}} statement and stats are not calculated as part of {{LOAD DATA}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24846) Log When HS2 Goes OOM
David Mollitor created HIVE-24846: - Summary: Log When HS2 Goes OOM Key: HIVE-24846 URL: https://issues.apache.org/jira/browse/HIVE-24846 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Otherwise the server just shuts down without any justification. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24845) Make Return of InputEstimator estimate Optional
David Mollitor created HIVE-24845: - Summary: Make Return of InputEstimator estimate Optional Key: HIVE-24845 URL: https://issues.apache.org/jira/browse/HIVE-24845 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Make Return of InputEstimator estimate Optional so that an implementer can signal that it does not know the estimate and any optimizations around data size can be ignored. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24833) Hive Should Only Pushdown EQ Predicate on HBaseStorageHandler
David Mollitor created HIVE-24833: - Summary: Hive Should Only Pushdown EQ Predicate on HBaseStorageHandler Key: HIVE-24833 URL: https://issues.apache.org/jira/browse/HIVE-24833 Project: Hive Issue Type: Improvement Reporter: David Mollitor I believe that a Hive query with an HBase Storage Handler is incorrectly applies a predicate pushdown into the storage handler. I observed a FETCH optimization that took a long time to complete because it was performing a table scan across the entire HBase table. The only case in which a predicate should be pushed down the storage layer is for `SELECT * FROM TABLE my_hbase_table WHERE row_key=?` This would be appropriate (EQ on the row key). Anything else will involve a scan of the table and there is no way to easily calculate how small a scan it will require and therefore should always be passed to the compute engine (Tez). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24832) Remove Spring Artifacts from Log4j Properties Files
David Mollitor created HIVE-24832: - Summary: Remove Spring Artifacts from Log4j Properties Files Key: HIVE-24832 URL: https://issues.apache.org/jira/browse/HIVE-24832 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Getting a warning about a bad FILE logger and it looks like it's coming from some antiquated copy & paste code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24814) Harmonize Hive Date-Time Formats
David Mollitor created HIVE-24814: - Summary: Harmonize Hive Date-Time Formats Key: HIVE-24814 URL: https://issues.apache.org/jira/browse/HIVE-24814 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Harmonize Hive on JDK date-time formats. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24811) Discover Other Areas that Can Benefit from Cached Dates
David Mollitor created HIVE-24811: - Summary: Discover Other Areas that Can Benefit from Cached Dates Key: HIVE-24811 URL: https://issues.apache.org/jira/browse/HIVE-24811 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Based on my work on [HIVE-24808], I noticed other places that call {{Date#valueOf}} that can probably also benefit from using this cache mechanism. Locate those places and change calls to this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24810) Use JDK 8 String Switch in TruncDateFromTimestamp
David Mollitor created HIVE-24810: - Summary: Use JDK 8 String Switch in TruncDateFromTimestamp Key: HIVE-24810 URL: https://issues.apache.org/jira/browse/HIVE-24810 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24808) Cache Dates Parsed
David Mollitor created HIVE-24808: - Summary: Cache Dates Parsed Key: HIVE-24808 URL: https://issues.apache.org/jira/browse/HIVE-24808 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Parsing Date strings should be cached since it requires some amount of work to do it, and there are only so many dates in a particular data set. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24772) Revamp Server Error Logging
David Mollitor created HIVE-24772: - Summary: Revamp Server Error Logging Key: HIVE-24772 URL: https://issues.apache.org/jira/browse/HIVE-24772 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Most of the action takes place in {{ThriftCLIService}} where errors are logged in response to client requests (though I know in many instances things are logged multiple times). I propose to improve this on multiple fronts: # Many log messages have the word "Error" in it, but log at the WARN level. I have changed all relevant logging to be at ERROR level and removed the word "Error" from the message # Some of the error message in the logging code had copy & paste errors where they printed the wrong request name # Print the actual request object in the error message # Big one for me: Do not pass a stack trace to the client. This is bad practice from a security perspective,... clients should not know that level of detail of the server, and also it's very confusing for the client perspective to understand that the stack trace is actually from the remote server, not the local client, and finally, it's too messy for a typical user to deal with anyway. Stack trace should be presented in the HS2 logs only. # Various clean up # Log an IP address for the client as part of standard operating procedure -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24759) Lower Logging on "Worker thread finished one loop"
David Mollitor created HIVE-24759: - Summary: Lower Logging on "Worker thread finished one loop" Key: HIVE-24759 URL: https://issues.apache.org/jira/browse/HIVE-24759 Project: Hive Issue Type: Improvement Reporter: David Mollitor This logging is too spamy and provide almost zero value. Please lower to DEBUG level logging. https://github.com/apache/hive/blob/6c285a89f5199c20747e02d3c793c4f2d1fd3373/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java#L133 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24758) Log Tez Task DAG ID, DAG Session ID, HS2 Hostname
David Mollitor created HIVE-24758: - Summary: Log Tez Task DAG ID, DAG Session ID, HS2 Hostname Key: HIVE-24758 URL: https://issues.apache.org/jira/browse/HIVE-24758 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor In order to get the logs for a particular query, submitted to Tez on YARN, the following pieces of information are required: * YARN Application ID * TEZ DAG ID * HS2 Host that ran the job Include this information in TezTask output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24757) Add UDF To Obtain HS2 Host
David Mollitor created HIVE-24757: - Summary: Add UDF To Obtain HS2 Host Key: HIVE-24757 URL: https://issues.apache.org/jira/browse/HIVE-24757 Project: Hive Issue Type: New Feature Reporter: David Mollitor It can be confusing to troubleshoot an issue in Hive because it's not very easy to determine which instance a connection is made to (in multi-HS2 environment). Please add a UDF that displays the hostname of the currently connected HS2 instance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24739) Clarify Usage of Thrift TServerEventHandler and Count Number of Messages Processed
David Mollitor created HIVE-24739: - Summary: Clarify Usage of Thrift TServerEventHandler and Count Number of Messages Processed Key: HIVE-24739 URL: https://issues.apache.org/jira/browse/HIVE-24739 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Make the messages emitted from {{TServerEventHandler}} more meaningful. Also, track the number of messages that each client sends to aid in troubleshooting. I run into this issue all the time with and this would greatly help clarify the logging. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24737) Remove Configuration TEZ_SIMPLE_CUSTOM_EDGE_TINY_BUFFER_SIZE_MB
David Mollitor created HIVE-24737: - Summary: Remove Configuration TEZ_SIMPLE_CUSTOM_EDGE_TINY_BUFFER_SIZE_MB Key: HIVE-24737 URL: https://issues.apache.org/jira/browse/HIVE-24737 Project: Hive Issue Type: Improvement Reporter: David Mollitor Please remove {{TEZ_SIMPLE_CUSTOM_EDGE_TINY_BUFFER_SIZE_MB}} It is never in practice actually used. Can it just be assigned a sensible hard-coded value? This seem like an over optimization at the cost of yet another configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24723) Use ExecutorService in TezSessionPool
David Mollitor created HIVE-24723: - Summary: Use ExecutorService in TezSessionPool Key: HIVE-24723 URL: https://issues.apache.org/jira/browse/HIVE-24723 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Currently there are some wonky home-made thread pooling action going on in {{TezSessionPool}. Replace it with some JDK/Guava goodness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort
David Mollitor created HIVE-24707: - Summary: Apply Sane Default for Tez Containers as Last Resort Key: HIVE-24707 URL: https://issues.apache.org/jira/browse/HIVE-24707 Project: Hive Issue Type: Improvement Reporter: David Mollitor {code:java|title=DagUtils.java} public static Resource getContainerResource(Configuration conf) { int memory = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ? HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) : conf.getInt(MRJobConfig.MAP_MEMORY_MB, MRJobConfig.DEFAULT_MAP_MEMORY_MB); int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 0 ? HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) : conf.getInt(MRJobConfig.MAP_CPU_VCORES, MRJobConfig.DEFAULT_MAP_CPU_VCORES); return Resource.newInstance(memory, cpus); } {code} If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls back onto the MapReduce configurations, but if the MapReduce configurations have invalid values ( <= 0 ), they are excepted regardless and this will cause failures down the road. This code should also check the MapReduce values and fall back to MapReduce default values if they are <= 0. Also, some logging would be nice here too, reporting about where the configuration values came from. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24701) Remove String Manipulation from Date Parsing TimestampTZUtil
David Mollitor created HIVE-24701: - Summary: Remove String Manipulation from Date Parsing TimestampTZUtil Key: HIVE-24701 URL: https://issues.apache.org/jira/browse/HIVE-24701 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor This operation is pretty slow: {code:java} // Converts Date to TimestampTZ. public static TimestampTZ convert(Date date, ZoneId defaultTimeZone) { return parse(date.toString(), defaultTimeZone); } {code} To convert from Date to TimestampTZ, it creates a string, then parses it. Should be able to just look at the epoch time and do the conversion without all the string manipulation/parsing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24693) Parquet Timestamp Values Read/Write Very Slow
David Mollitor created HIVE-24693: - Summary: Parquet Timestamp Values Read/Write Very Slow Key: HIVE-24693 URL: https://issues.apache.org/jira/browse/HIVE-24693 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Parquet {{DataWriteableWriter}} relias on {{NanoTimeUtils}} to convert a timestamp object into a binary value. The way in which it does this,... it calls {{toString()}} on the timestamp object, and then parses the String. This particular timestamp do not carry a timezone, so the string is something like: {{2021-21-03 12:32:23....}} The parse code tries to parse the string assuming there is a time zone, and if not, falls-back and applies the provided "default time zone". As was noted in [HIVE-24353], if something fails to parse, it is very expensive to try to parse again. So, for each timestamp in the Parquet file, it: * Builds a string from the time stamp * Parses it (throws an exception, parses again) There is no need to do this kind of string manipulations/parsing, it should just be using the epoch millis/seconds/time stored internal to the Timestamp object. {code:java} // Converts Timestamp to TimestampTZ. public static TimestampTZ convert(Timestamp ts, ZoneId defaultTimeZone) { return parse(ts.toString(), defaultTimeZone); } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24684) Move HiveSchemaHelper to Common Package
David Mollitor created HIVE-24684: - Summary: Move HiveSchemaHelper to Common Package Key: HIVE-24684 URL: https://issues.apache.org/jira/browse/HIVE-24684 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Beeline wants access to this class, but beeline has to import entire server project to get it. Ick. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24661) Do Not "Stringify" Exception in Logger messages
David Mollitor created HIVE-24661: - Summary: Do Not "Stringify" Exception in Logger messages Key: HIVE-24661 URL: https://issues.apache.org/jira/browse/HIVE-24661 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor SLF4J already has support for pretty-printing Exceptions. No need to manually do it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24660) Remove Commons Logger from jdbc-handler Package
David Mollitor created HIVE-24660: - Summary: Remove Commons Logger from jdbc-handler Package Key: HIVE-24660 URL: https://issues.apache.org/jira/browse/HIVE-24660 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24659) Remove Commons Logger from serde Package
David Mollitor created HIVE-24659: - Summary: Remove Commons Logger from serde Package Key: HIVE-24659 URL: https://issues.apache.org/jira/browse/HIVE-24659 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24658) Move LogUtil Class to Metastore Proper from Common
David Mollitor created HIVE-24658: - Summary: Move LogUtil Class to Metastore Proper from Common Key: HIVE-24658 URL: https://issues.apache.org/jira/browse/HIVE-24658 Project: Hive Issue Type: Improvement Components: Metastore, Standalone Metastore Reporter: David Mollitor Assignee: David Mollitor Currently there is a dependency on Log4J from the Metastore Commons project. Log4J is not really something that should be common, the logging framework is specific to the application. Having this dependency on the 'common' package pushes the dependency to everything that uses this commons library. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24657) Make Beeline Logging Dependencies Explicit
David Mollitor created HIVE-24657: - Summary: Make Beeline Logging Dependencies Explicit Key: HIVE-24657 URL: https://issues.apache.org/jira/browse/HIVE-24657 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Based on my work on [HIVE-24591], logging dependencies for beeline should be explicit. They currently come in transitively. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24627) Add Debug Logging to Hive JDBC Connection
David Mollitor created HIVE-24627: - Summary: Add Debug Logging to Hive JDBC Connection Key: HIVE-24627 URL: https://issues.apache.org/jira/browse/HIVE-24627 Project: Hive Issue Type: Improvement Components: JDBC Reporter: David Mollitor Assignee: David Mollitor Log the following: # Session handle # Version Number # Any configurations/variables set by the user at the client-side # Dump the Hive configurations at session-start -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24617) Review beeline Driver Scanning Code
David Mollitor created HIVE-24617: - Summary: Review beeline Driver Scanning Code Key: HIVE-24617 URL: https://issues.apache.org/jira/browse/HIVE-24617 Project: Hive Issue Type: Improvement Components: Beeline Reporter: David Mollitor Assignee: David Mollitor There seems to be quite a few code artifacts laying around the area of the code that are no longer valid. Remove and improve. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24616) Add Logging to Track Query Status
David Mollitor created HIVE-24616: - Summary: Add Logging to Track Query Status Key: HIVE-24616 URL: https://issues.apache.org/jira/browse/HIVE-24616 Project: Hive Issue Type: Improvement Components: JDBC Reporter: David Mollitor Assignee: David Mollitor Add additional logging to JDBC to all for tracking the status of a query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24592) Revert Hive-24550
David Mollitor created HIVE-24592: - Summary: Revert Hive-24550 Key: HIVE-24592 URL: https://issues.apache.org/jira/browse/HIVE-24592 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Broke the build. {code:none} [ERROR] COMPILATION ERROR : [ERROR] /home/travis/build/apache/hive/itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2Acid.java:[23,44] cannot find symbol symbol: class TxnDbUtil location: package org.apache.hadoop.hive.metastore.txn [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:testCompile (default-testCompile) on project hive-it-unit: Compilation failure [ERROR] /home/travis/build/apache/hive/itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2Acid.java:[23,44] cannot find symbol [ERROR] symbol: class TxnDbUtil [ERROR] location: package org.apache.hadoop.hive.metastore.txn [ERROR] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24591) Move Beeline To SLF4J Simple Logger
David Mollitor created HIVE-24591: - Summary: Move Beeline To SLF4J Simple Logger Key: HIVE-24591 URL: https://issues.apache.org/jira/browse/HIVE-24591 Project: Hive Issue Type: Improvement Components: Beeline Reporter: David Mollitor Assignee: David Mollitor To make beeline as simple as possible, move its SLF4J logger implementation to SLFJ-Simple logger. This will allow users to change the logging level simply on the command line. Currently uses must create a Log4J configuration file which is way too advance/cumbersome for a data analyst that just wants to use SQL (and do some minor troubleshooting) {code:none} beeline -Dorg.slf4j.simpleLogger.defaultLogLevel=debug ... {code} http://www.slf4j.org/api/org/slf4j/impl/SimpleLogger.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24583) Make Hive JDBC Java Executable
David Mollitor created HIVE-24583: - Summary: Make Hive JDBC Java Executable Key: HIVE-24583 URL: https://issues.apache.org/jira/browse/HIVE-24583 Project: Hive Issue Type: Improvement Components: JDBC Reporter: David Mollitor Runining: {code:none} java -jar hive-jdbc.jar {code} Should print driver version information. Something like this is implemented, but probably better to move it into a {{main}} method in {{HiveDriver}} class. https://github.com/apache/hive/blob/72d983ae76f420bdb719d33002a9c321a4e4f891/jdbc/src/java/org/apache/hive/jdbc/HiveDatabaseMetaData.java#L1218-L1222 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24574) Add DIAGNOSE Statement
David Mollitor created HIVE-24574: - Summary: Add DIAGNOSE Statement Key: HIVE-24574 URL: https://issues.apache.org/jira/browse/HIVE-24574 Project: Hive Issue Type: Improvement Reporter: David Mollitor Add a new statement to Hive called {{DIAGNOSE}} {code:sql} DIAGNOSE ... {code} Returns a single binary (BLOB) column which contains a TAR-GZ file comprised of several other files: * The query itself * EXPLAIN * SHOW CREATE for each table in the query * The configuration of the session (set) * The Hive logs generated by the query * The processing engine logs generated by the query -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24566) Add Parquet Stats Optimization
David Mollitor created HIVE-24566: - Summary: Add Parquet Stats Optimization Key: HIVE-24566 URL: https://issues.apache.org/jira/browse/HIVE-24566 Project: Hive Issue Type: Improvement Reporter: David Mollitor Parquet files store min/max/count data in foot metadata. When a query is submitted to a Parquet table, and stats are not available, Hive should launch a single multi-threaded processor that simply reads the meta data of each Parquet file instead of walking through every single record in the table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24560) Move Column Name and Type Parsing to AbstractSerde Class
David Mollitor created HIVE-24560: - Summary: Move Column Name and Type Parsing to AbstractSerde Class Key: HIVE-24560 URL: https://issues.apache.org/jira/browse/HIVE-24560 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24542) Prepare Guava for Upgrades
David Mollitor created HIVE-24542: - Summary: Prepare Guava for Upgrades Key: HIVE-24542 URL: https://issues.apache.org/jira/browse/HIVE-24542 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Hive is currently using some Guava methods that are removed in future versions, also, in some projects, the version of Guava being used is being implicitly inherited from other projects even though Hive has a defined version. Be explicit about it. These actions will make upgrading Guava versions easier in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24484) Upgrade Hadoop to 3.2.1
David Mollitor created HIVE-24484: - Summary: Upgrade Hadoop to 3.2.1 Key: HIVE-24484 URL: https://issues.apache.org/jira/browse/HIVE-24484 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry
David Mollitor created HIVE-24468: - Summary: Use Event Time instead of Current Time in Notification Log DB Entry Key: HIVE-24468 URL: https://issues.apache.org/jira/browse/HIVE-24468 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24463) Add special case for Derby and MySQL in Get Next ID DbNotificationListener
David Mollitor created HIVE-24463: - Summary: Add special case for Derby and MySQL in Get Next ID DbNotificationListener Key: HIVE-24463 URL: https://issues.apache.org/jira/browse/HIVE-24463 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor * Derby does not support {{SELECT FOR UPDATE}} statements * MySQL can be optimized to use {{LAST_INSERT_ID()}} Debry tables are locked in other parts of the code already, but not in this path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24460) Refactor Get Next Event ID for DbNotificationListener
David Mollitor created HIVE-24460: - Summary: Refactor Get Next Event ID for DbNotificationListener Key: HIVE-24460 URL: https://issues.apache.org/jira/browse/HIVE-24460 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Refactor event ID generation to match notification log ID generation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24450) DbNotificationListener Request Notification IDs in Batches
David Mollitor created HIVE-24450: - Summary: DbNotificationListener Request Notification IDs in Batches Key: HIVE-24450 URL: https://issues.apache.org/jira/browse/HIVE-24450 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Every time a new notification event is logged into the database, the sequence number for the ID of the even is incremented by one. It is very standard in database design to instead request a block of IDs for each fetch from the database. The sequence numbers are then handed out locally until the block of IDs is exhausted. This allows for fewer database round-trips and transactions, at the expense of perhaps burning a few IDs. Burning of IDs happens when the server is restarted in the middle of a block of sequence IDs. That is, if the HMS requests a block of 10 ids, and only three have been assigned, after the restart, the HMS will request another block of 10, burning (wasting) 7 IDs. As long as the blocks are not too small, and restarts are infrequent, then few IDs are lost. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24432) Delete Notification Events in Batches
David Mollitor created HIVE-24432: - Summary: Delete Notification Events in Batches Key: HIVE-24432 URL: https://issues.apache.org/jira/browse/HIVE-24432 Project: Hive Issue Type: Improvement Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor Notification events are loaded in batches (reduces memory pressure on the HMS), but all of the deletes happen under a single transactions and, when deleting many records, can put a lot of pressure on the backend database. Instead, delete events in batches (in different transactions) as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24424) Use PreparedStatements in DbNotificationListener getNextNLId
David Mollitor created HIVE-24424: - Summary: Use PreparedStatements in DbNotificationListener getNextNLId Key: HIVE-24424 URL: https://issues.apache.org/jira/browse/HIVE-24424 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Simplify the code, remove debug logging concatenation, and make it more readable, -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24423) Improve DbNotificationListener Thread
David Mollitor created HIVE-24423: - Summary: Improve DbNotificationListener Thread Key: HIVE-24423 URL: https://issues.apache.org/jira/browse/HIVE-24423 Project: Hive Issue Type: Improvement Affects Versions: 3.1.0 Reporter: David Mollitor Assignee: David Mollitor Clean up and simplify {{DbNotificationListener}} thread class. Most importantly, stop the thread and wait for it to finish before launching a new thread. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24332) Make AbstractSerDe Superclass of all Classes
David Mollitor created HIVE-24332: - Summary: Make AbstractSerDe Superclass of all Classes Key: HIVE-24332 URL: https://issues.apache.org/jira/browse/HIVE-24332 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes are designed. Simplify, and consolidate more functionality into {{AbstractSerDe}}. Make it like Java's {{ByteChannel}} that provides implementations for both {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24321) Implement Default getSerDeStats in AbstractSerDe
David Mollitor created HIVE-24321: - Summary: Implement Default getSerDeStats in AbstractSerDe Key: HIVE-24321 URL: https://issues.apache.org/jira/browse/HIVE-24321 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: David Mollitor Assignee: David Mollitor Seems like very few SerDes implement the getSerDeStats feature. Add a default implementation and remove all of the superfluous overrides in the implementing classes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24226) Avoid Copy of Bytes in Protobuf BinaryWriter
David Mollitor created HIVE-24226: - Summary: Avoid Copy of Bytes in Protobuf BinaryWriter Key: HIVE-24226 URL: https://issues.apache.org/jira/browse/HIVE-24226 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor {code:java|title=ProtoWriteSupport.java} class BinaryWriter extends FieldWriter { @Override final void writeRawValue(Object value) { ByteString byteString = (ByteString) value; Binary binary = Binary.fromConstantByteArray(byteString.toByteArray()); recordConsumer.addBinary(binary); } } {code} {{toByteArray()}} creates a copy of the buffer. There is already support with Parquet and Protobuf to pass instead a ByteBuffer which avoids the copy. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23942) Use HADOOP-17141
David Mollitor created HIVE-23942: - Summary: Use HADOOP-17141 Key: HIVE-23942 URL: https://issues.apache.org/jira/browse/HIVE-23942 Project: Hive Issue Type: Bug Reporter: David Mollitor When available, use [HADOOP-17141] instead of the workaround produced here: [HIVE-23870] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23900) Replace Base64 in exec Package
David Mollitor created HIVE-23900: - Summary: Replace Base64 in exec Package Key: HIVE-23900 URL: https://issues.apache.org/jira/browse/HIVE-23900 Project: Hive Issue Type: Sub-task Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23899) Replace Base64 in llap Packages
David Mollitor created HIVE-23899: - Summary: Replace Base64 in llap Packages Key: HIVE-23899 URL: https://issues.apache.org/jira/browse/HIVE-23899 Project: Hive Issue Type: Sub-task Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23885) Remove Hive on Spark
David Mollitor created HIVE-23885: - Summary: Remove Hive on Spark Key: HIVE-23885 URL: https://issues.apache.org/jira/browse/HIVE-23885 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23865) Use More Java Collection Class
David Mollitor created HIVE-23865: - Summary: Use More Java Collection Class Key: HIVE-23865 URL: https://issues.apache.org/jira/browse/HIVE-23865 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23862) Clean Up StatsUtils and BasicStats
David Mollitor created HIVE-23862: - Summary: Clean Up StatsUtils and BasicStats Key: HIVE-23862 URL: https://issues.apache.org/jira/browse/HIVE-23862 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Miscellaneous improvements to readability and performance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23856) Beeline Should Print Binary Data in Base64
David Mollitor created HIVE-23856: - Summary: Beeline Should Print Binary Data in Base64 Key: HIVE-23856 URL: https://issues.apache.org/jira/browse/HIVE-23856 Project: Hive Issue Type: Improvement Reporter: David Mollitor Make binary data formatted as Base64 to make it more parse-able by external applications and easier for humans to convert using a Base64 tool. https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/Rows.java#L165 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23836) Make "cols" dependent so that it cascade deletes
David Mollitor created HIVE-23836: - Summary: Make "cols" dependent so that it cascade deletes Key: HIVE-23836 URL: https://issues.apache.org/jira/browse/HIVE-23836 Project: Hive Issue Type: Bug Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23829) Compute Stats Incorrect for Binary Columns
David Mollitor created HIVE-23829: - Summary: Compute Stats Incorrect for Binary Columns Key: HIVE-23829 URL: https://issues.apache.org/jira/browse/HIVE-23829 Project: Hive Issue Type: Bug Reporter: David Mollitor Assignee: David Mollitor I came across an issue when working on [HIVE-22674]. The SerDe used for processing binary data tries to auto-detect if the data is in Base-64. It uses {{org.apache.commons.codec.binary.Base64#isArrayByteBase64}} which has two issues: # It's slow since it will check if the array is compatible,... and then process the data (examines the array twice) # More importantly, this method _Tests a given byte array to see if it contains only valid characters within the Base64 alphabet. Currently the method treats whitespace as valid._ https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html#isArrayByteBase64-byte:A- The [qtest|https://github.com/apache/hive/blob/f98e136bdd5642e3de10d2fd1a4c14d1d6762113/ql/src/test/queries/clientpositive/compute_stats_binary.q] for this feature uses full sentences (which includes spaces) [here|https://github.com/apache/hive/blob/f98e136bdd5642e3de10d2fd1a4c14d1d6762113/data/files/binary.txt] and therefore it thinks this data is Base-64 and returns an incorrect estimation for size. This should really not auto-detect Base64 data and instead it should be enabled with a table property. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23818) Use String Switch-Case Statement in StatUtils
David Mollitor created HIVE-23818: - Summary: Use String Switch-Case Statement in StatUtils Key: HIVE-23818 URL: https://issues.apache.org/jira/browse/HIVE-23818 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
Failed CI Jobs
Hello Gang, Last week, all of my PRs showed "green" for successful CI runs. Today, they are all "red" and marked as failed. Did something happen over the weekend? Thanks.
[jira] [Created] (HIVE-23795) Add Additional Debugging Help for Import SQL
David Mollitor created HIVE-23795: - Summary: Add Additional Debugging Help for Import SQL Key: HIVE-23795 URL: https://issues.apache.org/jira/browse/HIVE-23795 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23793) Review of QueryInfo Class
David Mollitor created HIVE-23793: - Summary: Review of QueryInfo Class Key: HIVE-23793 URL: https://issues.apache.org/jira/browse/HIVE-23793 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
Reviewers List
Hello Gang, When I am trying to add reviewers in Github, the following commiters do not come up in the list of available reviewrs automatically: ashutoshc (Ashutosh Chauhan) nrg4878 (Naveen Gangam) Is there something administratively that needs to happen to allow this? Thanks, David
[jira] [Created] (HIVE-23731) Review of AvroInstance Cache
David Mollitor created HIVE-23731: - Summary: Review of AvroInstance Cache Key: HIVE-23731 URL: https://issues.apache.org/jira/browse/HIVE-23731 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Flaky test checker
Hey Zoltan, I've also seen this one several times recently: testExternalTablesReplLoadBootstrapIncr – org.apache.hadoop.hive.ql.parse.TestScheduledReplicationScenarios On Thu, Jun 11, 2020 at 10:54 AM David Mollitor wrote: > Zoltan, > > I've seen 'org.apache.hadoop.hive.kafka.TransactionalKafkaWriterTest' > failing quite a bit in some recent runs in GitHup precomit. > > Topic `TOPIC_TEST` to delete does not exist > Stacktrace > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: Topic > `TOPIC_TEST` to delete does not exist > > On Wed, Jun 10, 2020 at 9:53 AM Zoltan Haindrich wrote: > >> One more thing: there should be other builds running while the flaky >> check is being executed (otherwise it will be "alone" on a 12 core system) >> >> On 6/10/20 3:49 PM, Zoltan Haindrich wrote: >> > Hey All! >> > >> > I've fiddled around to build this into the main test system or not; but >> in the end I've concluded that it will be more usefull as a standalone tool >> (this makes the job a >> > bit uglier - but well...it would have made the main one uglier as well >> - so it doesn't matter which finger I'll bite) >> > >> > So...if you are suspecting that test is causing trouble for no good >> reason; you could launch a run of this job which will run it a 100 times in >> a row...if it fails...well: >> > * you could open a jira which references the check you executed which >> proves that the test is low quality >> >* please also add the "flaky-test" label to the jira >> > * add an Ignore to the test referencing the jira ticket >> > * push the commit which disables the test... >> > >> > The other use would be when enabling previously unreliable tests back: >> > * push your branch which supposed to stabilize the test to your own >> fork on github >> > * visit http://130.211.9.232/job/hive-flaky-check/ >> > * point the job to your user/repo/branch ; and configure to run the >> test in question to validate it >> > >> > >> > cheers, >> > Zoltan >> >
[jira] [Created] (HIVE-23704) Thrift HTTP Server Does Not Handle Auth Handle Correctly
David Mollitor created HIVE-23704: - Summary: Thrift HTTP Server Does Not Handle Auth Handle Correctly Key: HIVE-23704 URL: https://issues.apache.org/jira/browse/HIVE-23704 Project: Hive Issue Type: Bug Components: Security Affects Versions: 2.3.7, 3.1.2 Reporter: David Mollitor Assignee: David Mollitor Fix For: 4.0.0 Attachments: Base64NegotiationError.png {code:java|title=ThriftHttpServlet.java} private String[] getAuthHeaderTokens(HttpServletRequest request, String authType) throws HttpAuthenticationException { String authHeaderBase64 = getAuthHeader(request, authType); String authHeaderString = StringUtils.newStringUtf8( Base64.decodeBase64(authHeaderBase64.getBytes())); String[] creds = authHeaderString.split(":"); return creds; } {code} So here, it takes the authHeaderBase64 (which is a base-64 string), and converts it into bytes, and then it tries to decode those bytes. That is incorrect It should covert base-64 string directly into bytes. I tried to do this as part of [HIVE-22676] and the tests was failing because the string that is being decoded is not actually Base-64 (see attached image). Again, the existing code doesn't care because it's not parsing Base-64 text, it is parsing the bytes generated by converting base-64 text to bytes. I'm not sure what affect this has, what security issues this may present, but it's definitely not correct. -- This message was sent by Atlassian Jira (v8.3.4#803005)
JUnit 5 Broke Unit Tests
Hello Gang, When I check out master and run 'mvn clean install'. A bunch of unit tests are skipped and present the following error: un 15, 2020 9:19:57 AM org.junit.platform.launcher.core.DefaultLauncher handleThrowable WARNING: TestEngine with ID 'junit-vintage' failed to discover tests java.lang.NoSuchMethodError: org.junit.platform.engine.EngineDiscoveryRequest.getDiscoveryListener()Lorg/junit/platform/engine/EngineDiscoveryListener; at org.junit.platform.engine.support.discovery.EngineDiscoveryRequestResolution.resolveCompletely(EngineDiscoveryRequestResolution.java:88) at org.junit.platform.engine.support.discovery.EngineDiscoveryRequestResolution.run(EngineDiscoveryRequestResolution.java:82) at org.junit.platform.engine.support.discovery.EngineDiscoveryRequestResolver.resolve(EngineDiscoveryRequestResolver.java:113) at org.junit.vintage.engine.discovery.VintageDiscoverer.discover(VintageDiscoverer.java:44) at org.junit.vintage.engine.VintageTestEngine.discover(VintageTestEngine.java:63)
Github PR Pre Commit Build Error
Hey Zoltan, A build just failed with: Timed out waiting for websocket connection. You should increase the value of system property org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator.websocketConnectionTimeout currently set at 60 seconds http://130.211.9.232/blue/organizations/jenkins/hive-precommit/detail/PR-1082/5/pipeline/94 Not sure if this needs to be increased. Thanks.
Re: Re-Running CI Tests
Hello, I am trying to re-run a test: http://130.211.9.232/job/hive-precommit/job/PR-1082/ Maybe it's a permission issue, but I don't see a way to manually do it. We really need a way though GitHub to re-launch the tests. On Wed, Jun 10, 2020 at 3:07 PM David Mollitor wrote: > Great, thanks! > > What triggers these builds? Do we need to add something to branch-2, > branch-3 projects to get this to trigger there? > > On Wed, Jun 10, 2020 at 3:03 PM Zoltán Haindrich wrote: > >> its on cooldown...5 builds/day/branch >> http://34.66.156.144:8080/job/hive-precommit/job/PR-1082/ >> >> On June 10, 2020 6:55:45 PM GMT+02:00, David Mollitor >> wrote: >> >Zoltán, >> > >> >Even a PR against master is only running Travis. >> > >> >https://github.com/apache/hive/pull/1082 >> > >> >On Wed, Jun 10, 2020 at 12:52 PM David Mollitor >> >wrote: >> > >> >> Hey Zoltán, >> >> >> >> Also in regard to that PR, it only ran the travis build "mvn clean >> >> install -DskipTests -q -Pitests" >> >> >> >> What does this branch require (and 3.x) to enable running of tests? >> >> >> >> Thanks. >> >> >> >> >> >> >> >> On Wed, Jun 10, 2020 at 11:36 AM David Mollitor >> >wrote: >> >> >> >>> Hey Zoltán, >> >>> >> >>> Yes. That is correct. Community recently put out a 2.x release. >> >This >> >>> is in case someone wishes to release a new one. >> >>> >> >>> Does that have any bearing on re-running tests? >> >>> >> >>> Thanks. >> >>> >> >>> On Wed, Jun 10, 2020 at 11:32 AM Zoltán Haindrich >> >wrote: >> >>> >> >>>> That pr seems to be for branch-2 and not master >> >>>> >> >>>> On June 10, 2020 5:17:19 PM GMT+02:00, David Mollitor >> > >> >>>> wrote: >> >>>>> >> >>>>> Zoltan, >> >>>>> >> >>>>> I just tried to close/re-open a PR and I don't believe it >> >triggered a >> >>>>> new CI run: >> >>>>> >> >>>>> https://github.com/apache/hive/pull/1076 >> >>>>> >> >>>>> Thanks. >> >>>>> >> >>>>> On Wed, Jun 10, 2020 at 10:59 AM David Mollitor >> > >> >>>>> wrote: >> >>>>> >> >>>>>> Hey Zoltan, >> >>>>>> >> >>>>>> Can you please research a way to initiate it from the GitHub >> >>>>>> interface? I have a strong feeling we're going to need such a >> >>>>>> capability regularly. >> >>>>>> >> >>>>>> Thanks. >> >>>>>> >> >>>>>> On Wed, Jun 10, 2020 at 9:29 AM Zoltan Haindrich >> >wrote: >> >>>>>> >> >>>>>>> Hey >> >>>>>>> >> >>>>>>> you could: >> >>>>>>> * push new commits to the branch >> >>>>>>>- this will create a new merge with the current master >> >>>>>>> * login in to the jenkins instance: and launch a new build of >> >that PR >> >>>>>>> * close the pr: will re-emit the github event triggering >> >the >> >>>>>>> testrun >> >>>>>>> * login in to the jenkins instance: and press retry button >> >>>>>>> >> >>>>>>> I don't know if the last method (retrigger button) will create a >> >new >> >>>>>>> merge with the current master's HEAD or not - I suspect that it >> >doesn't. >> >>>>>>> >> >>>>>>> cheers, >> >>>>>>> Zoltan >> >>>>>>> >> >>>>>>> On 6/10/20 3:18 PM, David Mollitor wrote: >> >>>>>>> > Hey Zoltan, >> >>>>>>> > >> >>>>>>> > What is the process to trigger a new CI build on GitHub if a >> >>>>>>> previous one >> >>>>>>> > failed on a flaky test. timeout, or something of that nature? >> >>>>>>> > >> >>>>>>> > Thanks. >> >>>>>>> > >> >>>>>>> >> >>>>>> >> >>>> -- >> >>>> Zoltán Haindrich >> >>>> >> >>> >> >> -- >> Zoltán Haindrich > >