[jira] [Created] (HIVE-27201) Inconsistency between session Hive and thread-local Hive may cause HS2 deadlock
Zhihua Deng created HIVE-27201: -- Summary: Inconsistency between session Hive and thread-local Hive may cause HS2 deadlock Key: HIVE-27201 URL: https://issues.apache.org/jira/browse/HIVE-27201 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Zhihua Deng Assignee: Zhihua Deng The HiveServer2’s server handler can switch to process the operation from other session, in such case, the Hive cached in ThreadLocal is not the same as the Hive in SessionState, and can be referenced by another session. If the two handlers swap their sessions to process the DatabaseMetaData request, and the HiveMetastoreClientFactory obtains the Hive via Hive.get(), then there is a chance that the deadlock can happen. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27179) HS2 WebUI throws NPE when JspFactory loaded from jetty-runner
Zhihua Deng created HIVE-27179: -- Summary: HS2 WebUI throws NPE when JspFactory loaded from jetty-runner Key: HIVE-27179 URL: https://issues.apache.org/jira/browse/HIVE-27179 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Zhihua Deng In HIVE-17088{*},{*} we resolved a NPE thrown from HS2 WebUI by introducing javax.servlet.jsp-api. It works as expected when the javax.servlet.jsp-api jar prevails jetty-runner jar, but things can be different in some environments, it still throws NPE when opening the HS2 web: {noformat} java.lang.NullPointerException at org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:286) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1443) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626) ...{noformat} The jetty-runner JspFactory.getDefaultFactory() just returns null. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27139) Log details when hiveserver2.sh doing sanity check with the process id
Zhihua Deng created HIVE-27139: -- Summary: Log details when hiveserver2.sh doing sanity check with the process id Key: HIVE-27139 URL: https://issues.apache.org/jira/browse/HIVE-27139 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Zhihua Deng HiveServer2 always persists the process id into a file after HIVE-22193. When some other process reuses the same pid, restarting the HiveServer2 would be failed, print the details of the process if in case, and delete the old pid file when the HiveServer2 is decommissioning. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27091) Add double quotes for tables in PartitionProjectionEvaluator
Zhihua Deng created HIVE-27091: -- Summary: Add double quotes for tables in PartitionProjectionEvaluator Key: HIVE-27091 URL: https://issues.apache.org/jira/browse/HIVE-27091 Project: Hive Issue Type: Bug Components: Standalone Metastore Reporter: Zhihua Deng Assignee: Zhihua Deng When PartitionProjectionEvaluator requests partitions against PostgreSQL, there throws exception: {noformat} javax.jdo.JDODataStoreException: Error executing SQL query "select "SDS"."LOCATION","PARTITIONS"."CREATE_TIME","SDS"."SD_ID","PARTITIONS"."PART_ID" from PARTITIONS left outer join SDS on PARTITIONS."SD_ID" = SDS."SD_ID" left outer join SERDES on SDS."SERDE_ID" = SERDES."SERDE_ID" where "PART_ID" in (92731,92732,92733,92734,92735,92736) order by "PART_NAME" asc". … Caused by: org.postgresql.util.PSQLException: ERROR: relation "partitions" does not exist{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26965) Docker image for Apache Hive
Zhihua Deng created HIVE-26965: -- Summary: Docker image for Apache Hive Key: HIVE-26965 URL: https://issues.apache.org/jira/browse/HIVE-26965 Project: Hive Issue Type: Improvement Reporter: Zhihua Deng This feature work is to provide docker image for Hive and track further improvements. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26794) Explore changing TxnHandler#connPoolMutex to NoPoolConnectionPool
Zhihua Deng created HIVE-26794: -- Summary: Explore changing TxnHandler#connPoolMutex to NoPoolConnectionPool Key: HIVE-26794 URL: https://issues.apache.org/jira/browse/HIVE-26794 Project: Hive Issue Type: Improvement Components: Standalone Metastore Reporter: Zhihua Deng Instead of creating a fixed size connection pool for TxnHandler#MutexAPI, the pool can be assigned to NoPoolConnectionPool due to: * TxnHandler#MutexAPI is primarily designed to provide coarse-grained mutex support to maintenance tasks running inside the Metastore, these tasks are not user faced; * A fixed size connection pool as same as the pool used in ObjectStore is a waste for other non leaders in the warehouse; The NoPoolConnectionPool provides connection on demand, and TxnHandler#MutexAPI only uses getConnection method to fetch a connection from the pool, so it's doable to change the pool to NoPoolConnectionPool, this would make the HMS more scaleable. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26773) Update Avro version to 1.10.2
Zhihua Deng created HIVE-26773: -- Summary: Update Avro version to 1.10.2 Key: HIVE-26773 URL: https://issues.apache.org/jira/browse/HIVE-26773 Project: Hive Issue Type: Improvement Components: Avro Reporter: Zhihua Deng Update the avro version to 1.10.2, there is a transitive dependency to velocity. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26667) Incompatible expression deserialization against latest HMS
Zhihua Deng created HIVE-26667: -- Summary: Incompatible expression deserialization against latest HMS Key: HIVE-26667 URL: https://issues.apache.org/jira/browse/HIVE-26667 Project: Hive Issue Type: Bug Components: Standalone Metastore Reporter: Zhihua Deng When an old Hive Metastore client issues listPartitionsByExpr against the lastest HMS, an exception would be thrown: {noformat} MetaException(message:Unable to find class: ) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_expr_result$get_partitions_by_expr_resultStandardScheme.read(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_expr_result$get_partitions_by_expr_resultStandardScheme.read(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_expr_result.read(ThriftHiveMetastore.java) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partitions_by_expr(ThriftHiveMetastore.java:3273) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions_by_expr(ThriftHiveMetastore.java:3260) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByExpr(HiveMetaStoreClient.java:1488){noformat} This was caused by a gap between old client and server on (de)serializing the expression. In old client, we don’t stream the expression’s class type into bytes, while the server should read the class type from serialized bytes firstly, which makes the trouble. Other APIs that need to (de)serialize expression may be suffered as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26644) Introduce auto sizing in HMS
Zhihua Deng created HIVE-26644: -- Summary: Introduce auto sizing in HMS Key: HIVE-26644 URL: https://issues.apache.org/jira/browse/HIVE-26644 Project: Hive Issue Type: Improvement Components: Standalone Metastore Reporter: Zhihua Deng Assignee: Zhihua Deng HMS should have some ability to auto-size itself based on enabled features. Server thread pool sizes-to-HMS connection pool sizes, larger pool sizes on compaction-disabled-instances for better performance etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26617) Remove some useless properties
Zhihua Deng created HIVE-26617: -- Summary: Remove some useless properties Key: HIVE-26617 URL: https://issues.apache.org/jira/browse/HIVE-26617 Project: Hive Issue Type: Improvement Reporter: Zhihua Deng Some properties in HiveConf or MetastoreConf don't use at all, it's better to clean up them: * hive.metastore.initial.metadata.count.enabled * hive.timedout.txn.reaper.start * metastore.acid.housekeeper.start * metastore.initial.metadata.count.enabled -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26561) Fix test TestMiniLlapLocalCliDriver#stats_part2
Zhihua Deng created HIVE-26561: -- Summary: Fix test TestMiniLlapLocalCliDriver#stats_part2 Key: HIVE-26561 URL: https://issues.apache.org/jira/browse/HIVE-26561 Project: Hive Issue Type: Test Components: Tests Reporter: Zhihua Deng The test is flaky, sometimes failed by: {noformat} Caused by: org.apache.derby.iapi.error.StandardException: Invalid character string format for type DECIMAL. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) ~[derby-10.14.2.0.jar:?] at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) ~[derby-10.14.2.0.jar:?] at org.apache.derby.iapi.types.DataType.invalidFormat(Unknown Source) ~[derby-10.14.2.0.jar:?] at org.apache.derby.iapi.types.DataType.setValue(Unknown Source) ~[derby-10.14.2.0.jar:?] at org.apache.derby.exe.ac29cfd09cx0183x5e87xdb0ax2168460057f.e4(Unknown Source) ~[?:?] at org.apache.derby.impl.services.reflect.DirectCall.invoke(Unknown Source) ~[derby-10.14.2.0.jar:?] at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(Unknown Source) ~[derby-10.14.2.0.jar:?] at org.apache.derby.impl.sql.execute.NestedLoopJoinResultSet.getNextRowCore(Unknown Source) ~[derby-10.14.2.0.jar:?] at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(Unknown Source) ~[derby-10.14.2.0.jar:?] at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(Unknown Source) ~[derby-10.14.2.0.jar:?] at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown Source) ~[derby-10.14.2.0.jar:?] at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source) ~[derby-10.14.2.0.jar:?] at org.apache.hive.com.zaxxer.hikari.pool.HikariProxyResultSet.next(HikariProxyResultSet.java) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] at org.datanucleus.store.rdbms.query.ForwardQueryResult.initialise(ForwardQueryResult.java:93) ~[datanucleus-rdbms-5.2.10.jar:?] at org.datanucleus.store.rdbms.query.SQLQuery.performExecute(SQLQuery.java:687) ~[datanucleus-rdbms-5.2.10.jar:?] at org.datanucleus.store.query.Query.executeQuery(Query.java:1975) ~[datanucleus-core-5.2.10.jar:?] at org.datanucleus.store.rdbms.query.SQLQuery.executeWithArray(SQLQuery.java:818) ~[datanucleus-rdbms-5.2.10.jar:?] at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:433) ~[datanucleus-api-jdo-5.2.8.jar:?]{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26553) Decrease the overhead of Metastore benchmarks
Zhihua Deng created HIVE-26553: -- Summary: Decrease the overhead of Metastore benchmarks Key: HIVE-26553 URL: https://issues.apache.org/jira/browse/HIVE-26553 Project: Hive Issue Type: Improvement Components: Standalone Metastore Reporter: Zhihua Deng When running Metastore micro-benchmarks, every partitioned related method should add new partitions before measuring, this adds lots of overhead when performing with a mass of partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26539) Prevent unsafe deserialization in PartitionExpressionForMetastore
Zhihua Deng created HIVE-26539: -- Summary: Prevent unsafe deserialization in PartitionExpressionForMetastore Key: HIVE-26539 URL: https://issues.apache.org/jira/browse/HIVE-26539 Project: Hive Issue Type: Improvement Reporter: Zhihua Deng -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26538) MetastoreDefaultTransformer should revise the location when it's empty
Zhihua Deng created HIVE-26538: -- Summary: MetastoreDefaultTransformer should revise the location when it's empty Key: HIVE-26538 URL: https://issues.apache.org/jira/browse/HIVE-26538 Project: Hive Issue Type: Improvement Components: Standalone Metastore Reporter: Zhihua Deng The table's location is treated as null when it's empty, this takes place somewhere such as: [https://github.com/apache/hive/blob/82f319773cb2361a98963e861fb903ab8eecd9c4/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java#L2367] [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDefaultTransformer.java#L729] MetastoreDefaultTransformer should revise the empty location when altering/creating tables. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26509) Introduce dynamic leader election in HMS
Zhihua Deng created HIVE-26509: -- Summary: Introduce dynamic leader election in HMS Key: HIVE-26509 URL: https://issues.apache.org/jira/browse/HIVE-26509 Project: Hive Issue Type: New Feature Components: Standalone Metastore Reporter: Zhihua Deng >From HIVE-21841 we have a leader HMS selected by configuring >metastore.housekeeping.leader.hostname on startup. This approach saves us from >running duplicated HMS's housekeeping tasks cluster-wide. In this jira, we introduce another dynamic leader election: adopt hive lock to implement the leader election. Once a HMS owns the lock, then it becomes the leader, carries out the housekeeping tasks, and sends heartbeats to renew the lock before timeout. If the leader fails to reclaim the lock, then stops the already started tasks if it has, the electing event is audited. We can achieve a more dynamic leader when the original goes down or in the public cloud without well configured property, and reduce the leader’s burdens by running these tasks among different leaders. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26494) Fix flaky test TestJdbcWithMiniHS2 testHttpRetryOnServerIdleTimeout
Zhihua Deng created HIVE-26494: -- Summary: Fix flaky test TestJdbcWithMiniHS2 testHttpRetryOnServerIdleTimeout Key: HIVE-26494 URL: https://issues.apache.org/jira/browse/HIVE-26494 Project: Hive Issue Type: Test Reporter: Zhihua Deng The TestJdbcWithMiniHS2#testHttpRetryOnServerIdleTimeout fails on master: [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/1362/tests] It can be fixed by setting hive.server2.thrift.http.max.idle.time to a larger value, other than 5ms. Flaky check: http://ci.hive.apache.org/job/hive-flaky-check/585/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26402) HiveSchemaTool does not honor metastore-site.xml
Zhihua Deng created HIVE-26402: -- Summary: HiveSchemaTool does not honor metastore-site.xml Key: HIVE-26402 URL: https://issues.apache.org/jira/browse/HIVE-26402 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Zhihua Deng When using following scripts for initializing metastore schema, {code:java} export HIVE_CONF_DIR='/path/to/metastore_conf' ./bin/schematool -dbType mysql -initSchema{code} the schematool command will be failed though we have a valid metastore-site.xml under the config path, it tries to init the default embeded db. {noformat} Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true Metastore connection Driver : org.apache.derby.jdbc.EmbeddedDriver Metastore connection User: APP Initializing the schema to: 4.0.0-alpha-2{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26400) Provide a self-contained docker
Zhihua Deng created HIVE-26400: -- Summary: Provide a self-contained docker Key: HIVE-26400 URL: https://issues.apache.org/jira/browse/HIVE-26400 Project: Hive Issue Type: Improvement Components: Build Infrastructure Reporter: Zhihua Deng Assignee: Zhihua Deng -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26322) Upgrade gson to 2.9.0 due to CVE
Zhihua Deng created HIVE-26322: -- Summary: Upgrade gson to 2.9.0 due to CVE Key: HIVE-26322 URL: https://issues.apache.org/jira/browse/HIVE-26322 Project: Hive Issue Type: Improvement Reporter: Zhihua Deng -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26058) Choose meaningful names for the Metastore pool threads
Zhihua Deng created HIVE-26058: -- Summary: Choose meaningful names for the Metastore pool threads Key: HIVE-26058 URL: https://issues.apache.org/jira/browse/HIVE-26058 Project: Hive Issue Type: Improvement Reporter: Zhihua Deng Due to TThreadPoolServer#createDefaultExecutorService setting the thread name by {code:java} thread.setName("TThreadPoolServer WorkerProcess-%d"); {code} The logger output the thread name like: {noformat} [TThreadPoolServer WorkerProcess-%d] utils.FileUtils: Renaming pfile:/{noformat} , which makes it hard to identify and debug a thread. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-26057) Cleanup QueryWrapper
Zhihua Deng created HIVE-26057: -- Summary: Cleanup QueryWrapper Key: HIVE-26057 URL: https://issues.apache.org/jira/browse/HIVE-26057 Project: Hive Issue Type: Improvement Reporter: Zhihua Deng Now the QueryWrapper implements Query which has dozens of overridden methods no use in codebase, these methods can be cleaned to keep it simple for maintaining. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-26056) Retire the api metrics of HMSHandler
Zhihua Deng created HIVE-26056: -- Summary: Retire the api metrics of HMSHandler Key: HIVE-26056 URL: https://issues.apache.org/jira/browse/HIVE-26056 Project: Hive Issue Type: Improvement Components: Standalone Metastore Reporter: Zhihua Deng We are using PerfLogger to measure and log the time spent for the metastore thrift apis, this is more complete and simpler than inserting start/end functions in HMSHandler to do the same thing. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25896) Remove getThreadId from IHMSHandler
Zhihua Deng created HIVE-25896: -- Summary: Remove getThreadId from IHMSHandler Key: HIVE-25896 URL: https://issues.apache.org/jira/browse/HIVE-25896 Project: Hive Issue Type: Improvement Components: Standalone Metastore Reporter: Zhihua Deng In IHMSHandler which is annotated as 'InterfaceAudience.Private', we use getThreadId to log the thread information now, the threadId can be logged automatically if we configure the logger properly, the method can be removed for better maintenance of IMSHandler. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25892) Group HMSHandler's thread locals into a single context
Zhihua Deng created HIVE-25892: -- Summary: Group HMSHandler's thread locals into a single context Key: HIVE-25892 URL: https://issues.apache.org/jira/browse/HIVE-25892 Project: Hive Issue Type: Improvement Components: Standalone Metastore Reporter: Zhihua Deng There are more than six ThreadLocal variables in HMSHandler, we can group them together into a single context to improve the management of variables and the code readability. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25783) Provide rat check to the CI
Zhihua Deng created HIVE-25783: -- Summary: Provide rat check to the CI Key: HIVE-25783 URL: https://issues.apache.org/jira/browse/HIVE-25783 Project: Hive Issue Type: Improvement Components: Build Infrastructure Reporter: Zhihua Deng The Jira tries to investigate if we can provide rat check to the CI, make sure that the newly added source files contain the ASF license information. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25774) Add ASF license for newly created files in standalone-metastore
Zhihua Deng created HIVE-25774: -- Summary: Add ASF license for newly created files in standalone-metastore Key: HIVE-25774 URL: https://issues.apache.org/jira/browse/HIVE-25774 Project: Hive Issue Type: Bug Components: Standalone Metastore Affects Versions: 4.0.0 Reporter: Zhihua Deng -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25729) ThriftUnionObjectInspector should be notified when fully inited
Zhihua Deng created HIVE-25729: -- Summary: ThriftUnionObjectInspector should be notified when fully inited Key: HIVE-25729 URL: https://issues.apache.org/jira/browse/HIVE-25729 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Zhihua Deng For thread safe purpose, a ReflectionStructObjectInspector instance would wait for 3 seconds to ensure the returning ObjectInspector is fully inited, {code:java} synchronized (soi) { while (!soi.isFullyInited(checkedTypes)) { // soi.wait(3000); } } {code} It seems that we are missing to notify ThriftUnionObjectInspector when it has been inited. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25582) Empty result when using offset limit with MR
Zhihua Deng created HIVE-25582: -- Summary: Empty result when using offset limit with MR Key: HIVE-25582 URL: https://issues.apache.org/jira/browse/HIVE-25582 Project: Hive Issue Type: Bug Components: Operators Affects Versions: 4.0.0 Reporter: Zhihua Deng Assignee: Zhihua Deng The _mr.ObjectCache_ caches nothing, every time when the limit [retrieving global counter from the cache|https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L150-L161], a new AtomicInteger will be returned. This make offset _<= currentCountForAllTasksInt_ always __ be __ evaluated to false_,_ as _offset > 0_, the operator will skip all rows. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25448) Invalid partition columns when skew with distinct
Zhihua Deng created HIVE-25448: -- Summary: Invalid partition columns when skew with distinct Key: HIVE-25448 URL: https://issues.apache.org/jira/browse/HIVE-25448 Project: Hive Issue Type: Bug Components: Logical Optimizer Reporter: Zhihua Deng When hive.groupby.skewindata is enabled, we spray by the grouping key and distinct key if distinct is present in the first reduce sink operator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25383) Make TestMarkPartitionRemote more stable
Zhihua Deng created HIVE-25383: -- Summary: Make TestMarkPartitionRemote more stable Key: HIVE-25383 URL: https://issues.apache.org/jira/browse/HIVE-25383 Project: Hive Issue Type: Test Components: Standalone Metastore Reporter: Zhihua Deng Sometimes the TestMarkPartitionRemote failed by {noformat} org.apache.hadoop.hive.metastore.api.MetaException: Exception determining external table location:Default location is not available for table: file:/path/to/tableat org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer.transformCreateTable(MetastoreDefaultTransformer.java:660) ~[classes/:?]at org.apache.hadoop.hive.metastore.HMSHandler.create_table_core(HMSHandler.java:2325) ~[classes/:?]at org.apache.hadoop.hive.metastore.HMSHandler.create_table_req(HMSHandler.java:2578) [classes/:?]{noformat} [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2441/15/tests] [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2473/3/tests] The cause is that the table path is existed before the test executed, TableLocationStrategy with prohibit does not allow alternate locations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25365) Insufficient priviledges to show partitions when partition columns are authorized
Zhihua Deng created HIVE-25365: -- Summary: Insufficient priviledges to show partitions when partition columns are authorized Key: HIVE-25365 URL: https://issues.apache.org/jira/browse/HIVE-25365 Project: Hive Issue Type: Bug Components: Authorization Reporter: Zhihua Deng When the privileges of partition columns have granted to user, showing partitions still needs select privilege on the table, though they are able to query from partition columns. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25294) Optimise the metadata count queries for local mode
Zhihua Deng created HIVE-25294: -- Summary: Optimise the metadata count queries for local mode Key: HIVE-25294 URL: https://issues.apache.org/jira/browse/HIVE-25294 Project: Hive Issue Type: Improvement Components: Standalone Metastore Reporter: Zhihua Deng Assignee: Zhihua Deng When Metastore is in local mode, the client uses his own private HMSHandler to get the meta data, the HMSHandler should be initialized before being ready to serve. When the metrics is enabled, HMSHandler will count the number of db, table, partitions, which cloud lead to some problems. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25261) RetryingHMSHandler should wrap the MetaException with short description of the target
Zhihua Deng created HIVE-25261: -- Summary: RetryingHMSHandler should wrap the MetaException with short description of the target Key: HIVE-25261 URL: https://issues.apache.org/jira/browse/HIVE-25261 Project: Hive Issue Type: Bug Components: Standalone Metastore Reporter: Zhihua Deng Assignee: Zhihua Deng [RetryingMetaStoreClient|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java#L267-L276] relies on the message of MetaException to make decision on retrying the current operation when failed. However the RetryingHMSHandler only wraps the message into MetaException, which may cause the client unable to retry with other metastore instances. For example, if we got exception: {code:java} Caused by: javax.jdo.JDOFatalUserException: Persistence Manager has been closed at org.datanucleus.api.jdo.JDOPersistenceManager.assertIsOpen(JDOPersistenceManager.java:2235) at org.datanucleus.api.jdo.JDOPersistenceManager.evictAll(JDOPersistenceManager.java:481) at org.apache.hadoop.hive.metastore.ObjectStore.rollbackTransaction(ObjectStore.java:635) at org.apache.hadoop.hive.metastore.ObjectStore.getTable(ObjectStore.java:1415) at sun.reflect.GeneratedMethodAccessor153.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498){code} RetryingHMSHandler will throw MetaException with message 'Persistence Manager has been closed', which not in the recoverable pattern defined in client. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25192) No need to create table directory for the non-native table
Zhihua Deng created HIVE-25192: -- Summary: No need to create table directory for the non-native table Key: HIVE-25192 URL: https://issues.apache.org/jira/browse/HIVE-25192 Project: Hive Issue Type: Bug Reporter: Zhihua Deng When creating non-native tables like kudu, hbase and so on, we always create a warehouse location for these tables, though these tables may not use the location to store data or for job plan, so there is no need to create such location. We also should skip getting the input summary of non-native tables in some cases, this will avoid oom problem of building the hash table when the non-native table is on build side. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json
Zhihua Deng created HIVE-25188: -- Summary: JsonSerDe: Unable to read the string value from a nested json Key: HIVE-25188 URL: https://issues.apache.org/jira/browse/HIVE-25188 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 4.0.0 Reporter: Zhihua Deng Assignee: Zhihua Deng Steps to reproduce: create table json_table(data string, messageid string, publish_time bigint, attributes string); if the data of the table stored like: {code:java} {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code} Exception will be thrown when trying to deserialize the data: Caused by: java.lang.IllegalArgumentException at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108) at org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374) at org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216) at org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327) at org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221) at org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198) at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25055) Improve the exception handling in HMSHandler
Zhihua Deng created HIVE-25055: -- Summary: Improve the exception handling in HMSHandler Key: HIVE-25055 URL: https://issues.apache.org/jira/browse/HIVE-25055 Project: Hive Issue Type: Improvement Components: Standalone Metastore Reporter: Zhihua Deng Assignee: Zhihua Deng -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25048) Refine the start/end functions in HMSHandler
Zhihua Deng created HIVE-25048: -- Summary: Refine the start/end functions in HMSHandler Key: HIVE-25048 URL: https://issues.apache.org/jira/browse/HIVE-25048 Project: Hive Issue Type: Improvement Components: Standalone Metastore Reporter: Zhihua Deng Assignee: Zhihua Deng Some start/end functions are incomplete in the HMSHandler, the functions can audit the use actions, monitor the performance, and notify the listeners. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24969) Predicates are removed by PPD when left semi join followed by lateral view
Zhihua Deng created HIVE-24969: -- Summary: Predicates are removed by PPD when left semi join followed by lateral view Key: HIVE-24969 URL: https://issues.apache.org/jira/browse/HIVE-24969 Project: Hive Issue Type: Bug Components: Logical Optimizer Reporter: Zhihua Deng Assignee: Zhihua Deng Step to reproduce: {code:java} select count(distinct logItem.triggerId) from service_stat_log LATERAL VIEW explode(logItems) LogItemTable AS logItem where logItem.dsp in ('delivery', 'ocpa') and logItem.iswin = true and logItem.adid in ( select distinct adId from ad_info where subAccountId in (16010, 14863)); {code} For predicates _logItem.dsp in ('delivery', 'ocpa')_ and _logItem.iswin = true_ are removed when doing ppd: JOIN -> RS -> LVJ. The JOIN has candicates: logitem -> [logItem.dsp in ('delivery', 'ocpa'), logItem.iswin = true],when pushing them to the RS followed by LVJ, none of them are pushed, the candicates of logitem are removed finally by default, which cause to the wrong result. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24901) Re-enable tests in TestBeeLineWithArgs
Zhihua Deng created HIVE-24901: -- Summary: Re-enable tests in TestBeeLineWithArgs Key: HIVE-24901 URL: https://issues.apache.org/jira/browse/HIVE-24901 Project: Hive Issue Type: Test Components: Test Reporter: Zhihua Deng Re-enable the tests in TestBeeLineWithArgs, cause they are stable on master now: http://ci.hive.apache.org/job/hive-flaky-check/219/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24802) Show operation log at webui
Zhihua Deng created HIVE-24802: -- Summary: Show operation log at webui Key: HIVE-24802 URL: https://issues.apache.org/jira/browse/HIVE-24802 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Zhihua Deng Currently we provide getQueryLog in HiveStatement to fetch the operation log, and the operation log would be deleted on operation closing(delay for the canceled operation). Sometimes it's would be not easy for the user(jdbc) or administrators to deep into the details of the finished(failed) operation, so we present the operation log on webui and keep the operation log for some time for latter analysis. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24792) Potential thread leak in Operation
Zhihua Deng created HIVE-24792: -- Summary: Potential thread leak in Operation Key: HIVE-24792 URL: https://issues.apache.org/jira/browse/HIVE-24792 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Zhihua Deng The _scheduledExecutorService_ in _Operation_ does not shut down after scheduling delay operationlog cleanup, which may result to thread leak in hiveserver2... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24752) Returned operation's drilldown link may be broken since HIVE-23625
Zhihua Deng created HIVE-24752: -- Summary: Returned operation's drilldown link may be broken since HIVE-23625 Key: HIVE-24752 URL: https://issues.apache.org/jira/browse/HIVE-24752 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Zhihua Deng The path spec for the query page has changed from _query_page_ to _query_page.html_, {code:java} webServer.addServlet("query_page", "/query_page.html", QueryProfileServlet.class);{code} the drilldown link of the operation returned may be broken if hive.server2.show.operation.drilldown.link is enabled... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24694) Early connection close to release server resources during creating
Zhihua Deng created HIVE-24694: -- Summary: Early connection close to release server resources during creating Key: HIVE-24694 URL: https://issues.apache.org/jira/browse/HIVE-24694 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Zhihua Deng Assignee: Zhihua Deng If exception happens during we try to get the connection from HiveDriver, the opened transport or session may leave unclosed as the connection returned is null, we cannot call the close method to release the server resources(threads/connection quota), this could make things more worse if the user rearches the connection limit, the following calls to get the connection will be failed until we restart the hs2. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string
Zhihua Deng created HIVE-24666: -- Summary: Vectorized UDFToBoolean may unable to filter rows if input is string Key: HIVE-24666 URL: https://issues.apache.org/jira/browse/HIVE-24666 Project: Hive Issue Type: Bug Components: Vectorization Reporter: Zhihua Deng Assignee: Zhihua Deng If we use cast boolean in where conditions to filter rows, in vectorization execution the filter is unable to filter rows, step to reproduce: {code:java} create table vtb (key string, value string); insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 'valoff'),('no','valno'),('vk', 'valvk'); select distinct value from vtb where cast(key as boolean); {code} It's seems we don't generate a SelectColumnIsTrue to filter the rows if the casted type is string: https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24639) Raises SemanticException other than ClassCastException when filter has non-boolean expressions
Zhihua Deng created HIVE-24639: -- Summary: Raises SemanticException other than ClassCastException when filter has non-boolean expressions Key: HIVE-24639 URL: https://issues.apache.org/jira/browse/HIVE-24639 Project: Hive Issue Type: Improvement Reporter: Zhihua Deng Sometimes we see ClassCastException in filters when fetching some rows of a table or executing the query. The GenericUDFOPOr/GenericUDFOPAnd/FilterOperator assume that the output of their conditions should be a boolean, but there is no garanteed. For example: _select * from ccn_table where src + 1;_ will throw ClassCastException: {code:java} Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Boolean at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:125) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:153) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:553) ...{code} We'd better to validate the filter during analyzing instead of at runtime and bring more meaningful messages. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24632) Replace with null when GenericUDFBaseCompare has a non-interpretable val
Zhihua Deng created HIVE-24632: -- Summary: Replace with null when GenericUDFBaseCompare has a non-interpretable val Key: HIVE-24632 URL: https://issues.apache.org/jira/browse/HIVE-24632 Project: Hive Issue Type: Improvement Components: Parser Affects Versions: 4.0.0 Reporter: Zhihua Deng The query {code:java} create table ccn_table(key int, value string); set hive.cbo.enable=false; select * from ccn_table where key > '123a' ; {code} will scan all records(partitions) compared to older version, as the plan tells: {noformat} STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: ccn_table filterExpr: (key > '123a') (type: boolean) Statistics: Num rows: 2 Data size: 180 Basic stats: COMPLETE Column stats: COMPLETE GatherStats: false Filter Operator isSamplingPred: false predicate: (key > '123a') (type: boolean) Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: key (type: int), value (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column stats: COMPLETE ListSink{noformat} When the TypeCheckProcFactory#getXpathOrFuncExprNodeDesc validates the expr: +key > '123a',+ the operator(>) is not an equal operator(=), so the factory returns +key > '123a'+ as it is. However all the subclass of GenericUDFBaseCompare(except GenericUDFOPEqualNS and GenericUDFOPNotEqualNS) would return null if either side of the function children is null, so it's safe to return constant null when processing the expr +`key > '123a'`+. This will benifit some queries when the cbo is disabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24575) VectorGroupByOperator reusing keys can lead to wrong results
Zhihua Deng created HIVE-24575: -- Summary: VectorGroupByOperator reusing keys can lead to wrong results Key: HIVE-24575 URL: https://issues.apache.org/jira/browse/HIVE-24575 Project: Hive Issue Type: Bug Components: Vectorization Reporter: Zhihua Deng Assignee: Zhihua Deng A common sql like {code:java} select category as category, count(distinct maskdid) as uv from dwd_internal_inc_d group by category{code} can have a wrong result on the trunk, the result of column category can be confused and aggregate of distinct maskdid is also wrong. After some debugging, We find that the problem is caused by wrong byteStarts[i] when using it to copy the current keys to the reusable keys: [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneral.java#L351-L362] The byteStarts[i] is always 0 due to Arrays.fill(byteStarts, 0); so it copies the range from 0 other then the real start index to len of the current keys to the reusable keys when clone.byteValues[i].length >= byteValues[i].length met, which results to the problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24511) Fix typo in SerDeStorageSchemaReader
Zhihua Deng created HIVE-24511: -- Summary: Fix typo in SerDeStorageSchemaReader Key: HIVE-24511 URL: https://issues.apache.org/jira/browse/HIVE-24511 Project: Hive Issue Type: Improvement Components: Standalone Metastore Reporter: Zhihua Deng 1, Close the created classloader to release resources. 2, More detail error messages on MetaException when throwing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24422) Throw SemanticException when CTE alias is conflicted with table name
Zhihua Deng created HIVE-24422: -- Summary: Throw SemanticException when CTE alias is conflicted with table name Key: HIVE-24422 URL: https://issues.apache.org/jira/browse/HIVE-24422 Project: Hive Issue Type: Improvement Components: Parser Reporter: Zhihua Deng If the alias of CTE is conflicted with the table name, we use the alias fetching the table other than replacing it with the ASTNode tree, this may cause some confusing problems. For example: {noformat} create table game_info (game_name string); with game_info as ( select distinct ext_id, dev_app_id, game_name from game_info_extend ) select count(game_name) from game_info;{noformat} The query will return the number of rows of the table game_info, instead of the game_info_extend. Maybe we should better throw an exception to avoid such cases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24411) Make ThreadPoolExecutorWithOomHook more awareness of OutOfMemoryError
Zhihua Deng created HIVE-24411: -- Summary: Make ThreadPoolExecutorWithOomHook more awareness of OutOfMemoryError Key: HIVE-24411 URL: https://issues.apache.org/jira/browse/HIVE-24411 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Zhihua Deng Assignee: Zhihua Deng Now the ThreadPoolExecutorWithOomHook invokes some oom hooks and stops the HiveServer2 in case of OutOfMemoryError when executing the tasks. The exception is obtained by calling method `future.get()`, however the exception may never be an instance of OutOfMemoryError, as the exception is wrapped in ExecutionException, see the method report in class FutureTask. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24358) Some tasks should set exception on failures
Zhihua Deng created HIVE-24358: -- Summary: Some tasks should set exception on failures Key: HIVE-24358 URL: https://issues.apache.org/jira/browse/HIVE-24358 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Zhihua Deng Some tasks miss setting exception on failures. This information is useful for beeline users figuring out the problem and the configured failure hooks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24351) Report progress to prevent merge task from timeout
Zhihua Deng created HIVE-24351: -- Summary: Report progress to prevent merge task from timeout Key: HIVE-24351 URL: https://issues.apache.org/jira/browse/HIVE-24351 Project: Hive Issue Type: Improvement Reporter: Zhihua Deng If the MergeFileTask tries to merge lots of empty files, the task may be terminated due to task timeout. It’s rare, but it happens. Report the progress regularly to prevent the mapper from timeout. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24310) Allow specified number of deserialize errors to be ignored
Zhihua Deng created HIVE-24310: -- Summary: Allow specified number of deserialize errors to be ignored Key: HIVE-24310 URL: https://issues.apache.org/jira/browse/HIVE-24310 Project: Hive Issue Type: Improvement Components: Operators Reporter: Zhihua Deng Assignee: Zhihua Deng Sometimes we see some corrupted records in user's raw data, like one corrupted in a file which contains over thousands of records, user has to either give up all records or replay the whole data in order to run successfully on hive, we should provide a way to ignore such corrupted records. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24248) TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky
Zhihua Deng created HIVE-24248: -- Summary: TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky Key: HIVE-24248 URL: https://issues.apache.org/jira/browse/HIVE-24248 Project: Hive Issue Type: Bug Reporter: Zhihua Deng [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1205/26/tests] {code:java} java.lang.AssertionError: Client Execution succeeded but contained differences (error code = 1) after executing subquery_join_rewrite.q 241,244d240 < 1 1 < 1 2 < 2 1 < 2 2 245a242,243 > 2 2 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24146) Cleanup TaskExecutionException in GenericUDTFExplode
Zhihua Deng created HIVE-24146: -- Summary: Cleanup TaskExecutionException in GenericUDTFExplode Key: HIVE-24146 URL: https://issues.apache.org/jira/browse/HIVE-24146 Project: Hive Issue Type: Improvement Components: UDF Reporter: Zhihua Deng Assignee: Zhihua Deng - Remove TaskExecutionException, which may be not used anymore; - Remove the default handling in GenericUDTFExplode#process, which has been verified during the function initializing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24107) Fix typo in ReloadFunctionsOperation
Zhihua Deng created HIVE-24107: -- Summary: Fix typo in ReloadFunctionsOperation Key: HIVE-24107 URL: https://issues.apache.org/jira/browse/HIVE-24107 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Zhihua Deng Hive.get() will register all functions as doRegisterAllFns is true, so Hive.get().reloadFunctions() may load all functions from metastore twice, use Hive.get(false) instead may be better. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24106) Abort polling on the operation state when the current thread is interrupted
Zhihua Deng created HIVE-24106: -- Summary: Abort polling on the operation state when the current thread is interrupted Key: HIVE-24106 URL: https://issues.apache.org/jira/browse/HIVE-24106 Project: Hive Issue Type: Improvement Components: JDBC Reporter: Zhihua Deng If running HiveStatement asynchronously as a task like in a thread or future, if we interrupt the task, the HiveStatement would continue to poll on the operation state until finish. It's may better to provide a way to abort the executing in such case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24069) HiveHistory should log the task that ends abnormally
Zhihua Deng created HIVE-24069: -- Summary: HiveHistory should log the task that ends abnormally Key: HIVE-24069 URL: https://issues.apache.org/jira/browse/HIVE-24069 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Zhihua Deng When the task returns with the exitVal not equal to 0, The Executor would skip marking the task return code and calling endTask. This may make the history log incomplete for such tasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24063) SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo
Zhihua Deng created HIVE-24063: -- Summary: SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo Key: HIVE-24063 URL: https://issues.apache.org/jira/browse/HIVE-24063 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Zhihua Deng When the current SqlOperator is SqlCastFunction, FunctionRegistry.getFunctionInfo would return null, but when hive.allow.udf.load.on.demand is enabled, HiveServer2 will refer to metastore for the function definition, an exception stack trace can be seen here in HiveServer2 log: INFO exec.FunctionRegistry: Unable to look up default.cast in metastore org.apache.hadoop.hive.ql.metadata.HiveException: NoSuchObjectException(message:Function @hive#default.cast does not exist) at org.apache.hadoop.hive.ql.metadata.Hive.getFunction(Hive.java:5495) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfoFromMetastoreNoLock(Registry.java:788) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Registry.getQualifiedFunctionInfo(Registry.java:657) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfo(Registry.java:351) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:597) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.translator.SqlFunctionConverter.getHiveUDF(SqlFunctionConverter.java:158) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:112) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] So it's may be better to handle explicit cast before geting the FunctionInfo from Registry. Even if there is no cast in the query, the method handleExplicitCast returns null quickly when op.kind is not a SqlKind.CAST. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24044) Implement listPartitionNames with filter or order on temporary tables
Zhihua Deng created HIVE-24044: -- Summary: Implement listPartitionNames with filter or order on temporary tables Key: HIVE-24044 URL: https://issues.apache.org/jira/browse/HIVE-24044 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 4.0.0 Reporter: Zhihua Deng Temporary tables can have their own partitions, and IMetaStoreClient use {code:java} List listPartitionNames(PartitionsByExprRequest request){code} to filter or sort the results. This method can be implemented on temporary tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23997) Some logs in ConstantPropagateProcFactory are not straightforward
Zhihua Deng created HIVE-23997: -- Summary: Some logs in ConstantPropagateProcFactory are not straightforward Key: HIVE-23997 URL: https://issues.apache.org/jira/browse/HIVE-23997 Project: Hive Issue Type: Improvement Components: Logical Optimizer Reporter: Zhihua Deng Assignee: Zhihua Deng Some logs in ConstantPropagateProcFactory are not easy to understand, like query: select * from tbl where a = 'a1'; showing some logs like this: optimizer.ConstantPropagateProcFactory: Filter org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqual@78907a46 is identified as a value assignment, propagate it. Maybe It's better to log like this: optimizer.ConstantPropagateProcFactory: Filter (a = 'a1') is identified as a value assignment, propagate it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23893) Extract deterministic conditions for pdd when the predicate contains non-deterministic function
Zhihua Deng created HIVE-23893: -- Summary: Extract deterministic conditions for pdd when the predicate contains non-deterministic function Key: HIVE-23893 URL: https://issues.apache.org/jira/browse/HIVE-23893 Project: Hive Issue Type: Improvement Components: Logical Optimizer Reporter: Zhihua Deng Taken the following query for example, assume unix_timestamp is non-deterministic before version 1.3.0: {{SELECT}} {{ from_unixtime(unix_timestamp(a.first_dt), 'MMdd') AS ft,}} {{ b.game_id AS game_id,}} {{ b.game_name AS game_name,}} {{ count(DISTINCT a.sha1_imei) uv}} {{FROM}} {{ gamesdk_userprofile a}} {{ JOIN game_info_all b ON a.appid = b.dev_app_id}} {{WHERE}} {{ a.date = 20200704}} {{ AND from_unixtime(unix_timestamp(a.first_dt), 'MMdd') = 20200704}} {{ AND b.date = 20200704}} {{GROUP BY}} {{ from_unixtime(unix_timestamp(a.first_dt), 'MMdd'),}} {{ b.game_id,}} {{ b.game_name}} {{ORDER BY}} {{ uv DESC}} {{LIMIT 200;}} The predicates(a.date = 20200704, b.date = 20200704) are unable to push down to join op, make the optimizer unable to prune partitions, which may result to a full scan on tables gamesdk_userprofile and game_info_all. {{}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23850) Allow PPD when subject is not a column with grouping sets present
Zhihua Deng created HIVE-23850: -- Summary: Allow PPD when subject is not a column with grouping sets present Key: HIVE-23850 URL: https://issues.apache.org/jira/browse/HIVE-23850 Project: Hive Issue Type: Bug Components: Logical Optimizer Reporter: Zhihua Deng After [HIVE-19653|https://issues.apache.org/jira/browse/HIVE-19653], filters with only columns and constants are pushed down, but in some cases, this may not work as well, for example: SET hive.cbo.enable=false; SELECT a, b, sum(s) FROM T1 GROUP BY a, b GROUPING SETS ((a), (a, b)) HAVING upper(a) = "AAA" AND sum(s) > 100; SELECT upper(a), b, sum(s) FROM T1 GROUP BY upper(a), b GROUPING SETS ((upper(a)), (upper(a), b)) HAVING upper(a) = "AAA" AND sum(s) > 100; The filters pushed down to GBY can be f(gbyKey) or gbyKey with udf , not only the column groupby keys. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23800) Make HiveServer2 oom hook interface
Zhihua Deng created HIVE-23800: -- Summary: Make HiveServer2 oom hook interface Key: HIVE-23800 URL: https://issues.apache.org/jira/browse/HIVE-23800 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Zhihua Deng -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23797) Throwing exception when no metastore spec found in zookeeper
Zhihua Deng created HIVE-23797: -- Summary: Throwing exception when no metastore spec found in zookeeper Key: HIVE-23797 URL: https://issues.apache.org/jira/browse/HIVE-23797 Project: Hive Issue Type: Bug Reporter: Zhihua Deng When enable service discovery for metastore, there is a chance that the client may find no metastore uris available in zookeeper, such as during metastores startup or the client wrongly configured the path. This results to redundant retries and finally MetaException with "Unknown exception" message. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23727) Improve SQLOperation log handling when cleanup
Zhihua Deng created HIVE-23727: -- Summary: Improve SQLOperation log handling when cleanup Key: HIVE-23727 URL: https://issues.apache.org/jira/browse/HIVE-23727 Project: Hive Issue Type: Improvement Reporter: Zhihua Deng The SQLOperation checks _if (shouldRunAsync() && state != OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the background task. If true, the state should not be OperationState.CANCELED, so logging under the state == OperationState.CANCELED should never happen. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23722) Emit operation's drilldown link to client
Zhihua Deng created HIVE-23722: -- Summary: Emit operation's drilldown link to client Key: HIVE-23722 URL: https://issues.apache.org/jira/browse/HIVE-23722 Project: Hive Issue Type: Improvement Reporter: Zhihua Deng Now the HiveServer2 webui provides a drilldown link for many collected metrics or messages about a operation, but it's not easy for a end user to find the target url of his submitted query. Less knowledge on the deployment, ha based environment(such as using LVS for balancing or routing), and the multiple running queries can make things more difficult. The jira provides a way to emit the link to the interested end user when enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23720) Background task should be interrupted when operation being canceled or timeout
Zhihua Deng created HIVE-23720: -- Summary: Background task should be interrupted when operation being canceled or timeout Key: HIVE-23720 URL: https://issues.apache.org/jira/browse/HIVE-23720 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Zhihua Deng Currently SQLOperation cancels the background task only when the condition is met: if (shouldRunAsync() && state != OperationState.CANCELED && state != OperationState.TIMEDOUT) The conditions is evaluated to false when state is OperationState.CANCELED or OperationState.TIMEDOUT, but operations in such states should stop the background tasks to release resources. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23633) Metastore some JDO query objects do not close properly
Zhihua Deng created HIVE-23633: -- Summary: Metastore some JDO query objects do not close properly Key: HIVE-23633 URL: https://issues.apache.org/jira/browse/HIVE-23633 Project: Hive Issue Type: Bug Reporter: Zhihua Deng After patched [HIVE-10895|https://issues.apache.org/jira/browse/HIVE-10895], The metastore still has seen a memory leak on db resources: many StatementImpls left unclosed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23546) Skip authorization when user is a superuser
Zhihua Deng created HIVE-23546: -- Summary: Skip authorization when user is a superuser Key: HIVE-23546 URL: https://issues.apache.org/jira/browse/HIVE-23546 Project: Hive Issue Type: Improvement Reporter: Zhihua Deng If the current user is a superuser, there is no need to do authorization. This can speed up queries, especially for those ddl queries. For example, the superuser use show partitions to determine whether is OK to add partitions when the external data is ready, or take a work flow one step further in a busy hive cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23526) Out of sequence seen in Beeline may swallow the real problem
Zhihua Deng created HIVE-23526: -- Summary: Out of sequence seen in Beeline may swallow the real problem Key: HIVE-23526 URL: https://issues.apache.org/jira/browse/HIVE-23526 Project: Hive Issue Type: Improvement Components: Beeline Environment: Hive 1.2.2 Reporter: Zhihua Deng Sometimes we can see 'out of sequence response' message in beeline, for example: Error: org.apache.thrift.TApplicationException: CloseOperation failed: out of sequence response (state=08S01,code=0) java.sql.SQLException: org.apache.thrift.TApplicationException: CloseOperation failed: out of sequence response at org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:198) at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:217) at org.apache.hive.beeline.Commands.execute(Commands.java:891) at org.apache.hive.beeline.Commands.sql(Commands.java:713) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:976) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:816) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:774) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:487) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:470) and there is no other usage messages to figured it out, this makes problem puzzled as beeline does not have concurrency problem on underlying thrift transport. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23269) Unsafe compares bigints and chars
Zhihua Deng created HIVE-23269: -- Summary: Unsafe compares bigints and chars Key: HIVE-23269 URL: https://issues.apache.org/jira/browse/HIVE-23269 Project: Hive Issue Type: Improvement Reporter: Zhihua Deng Comparing bigints and varchars or chars may result to wrong result, for example: CREATE TABLE test_a (appid1 varchar(256), appid2 char(20)); INSERT INTO test_a VALUES ('2882303761517473127', '2882303761517473127'), ('2882303761517473276','2882303761517473276'); SET hive.strict.checks.type.safety=false; SELECT appid1 FROM test_a WHERE appid1 = 2882303761517473127; SELECT appid2 FROM test_a WHERE appid2 = 2882303761517473127; Both queries will output the row: ('2882303761517473276','2882303761517473276') -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23185) Historic queries lost after HS2 restart
Zhihua Deng created HIVE-23185: -- Summary: Historic queries lost after HS2 restart Key: HIVE-23185 URL: https://issues.apache.org/jira/browse/HIVE-23185 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Zhihua Deng QueryInfoCache caches historic queries in memory, when HS2 restart due to OOM or upgrade, the queries are no longer seen at webui. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22989) Don't close parent classloader when session being closed
Zhihua Deng created HIVE-22989: -- Summary: Don't close parent classloader when session being closed Key: HIVE-22989 URL: https://issues.apache.org/jira/browse/HIVE-22989 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Zhihua Deng When hiveserver2 load udfs, Registry will use session specified classloader to load them and add cache the classloader. When user don't set the aux jars, the classloader cached is equal to the session's parent classloader, in our case, we don't set the aux jars while update the session's parent classloader periodicity to update user jars dynamically. It's should do a sanity check when Registry closes the cached classloaders. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22983) Address the comments on ConstantPropagate
Zhihua Deng created HIVE-22983: -- Summary: Address the comments on ConstantPropagate Key: HIVE-22983 URL: https://issues.apache.org/jira/browse/HIVE-22983 Project: Hive Issue Type: Improvement Components: Logical Optimizer Reporter: Zhihua Deng The constantPropagate traverse the DAG from root to child, the child won’t start until all his parents have been visited. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22458) Add more constraints on showing partitions
Zhihua Deng created HIVE-22458: -- Summary: Add more constraints on showing partitions Key: HIVE-22458 URL: https://issues.apache.org/jira/browse/HIVE-22458 Project: Hive Issue Type: New Feature Reporter: Zhihua Deng When we showing partitions of a table with thousands of partitions, all the partitions will be returned and it's not easy to catch the specified one from them, this make showing partitions hard to use. We can add where/limit/order by constraints to show partitions like: show partitions table_name [partition_specs] where partition_field >= value order by partition_field desc limit n; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-19818) SessionState getQueryId returns an empty string
Zhihua Deng created HIVE-19818: -- Summary: SessionState getQueryId returns an empty string Key: HIVE-19818 URL: https://issues.apache.org/jira/browse/HIVE-19818 Project: Hive Issue Type: Bug Affects Versions: 1.2.2 Reporter: Zhihua Deng When we execute sql asynchronously, a new configuration based on the session holds will be created and passed to the driver instance, which resulting to return an empty string when SessionState#getQueryId called later on. This problem can be seen in HadoopJobExecHelper.java. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-16114) NullPointerException in TezSessionPoolManager when getting the session
Zhihua Deng created HIVE-16114: -- Summary: NullPointerException in TezSessionPoolManager when getting the session Key: HIVE-16114 URL: https://issues.apache.org/jira/browse/HIVE-16114 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Zhihua Deng Priority: Minor hive version: apache-hive-2.1.1 we use hue(3.11.0) connecting to the HiveServer2. when hue starts up, it works with no problems, a few hours passed, when we use the same sql, an exception about unable to initialize TezTask will come into being. -- This message was sent by Atlassian JIRA (v6.3.15#6346)