[jira] [Created] (HIVE-27173) Add method for Spark to be able to trigger DML events
Naveen Gangam created HIVE-27173: Summary: Add method for Spark to be able to trigger DML events Key: HIVE-27173 URL: https://issues.apache.org/jira/browse/HIVE-27173 Project: Hive Issue Type: Improvement Reporter: Naveen Gangam Spark currently uses Hive.java from Hive as a convenient way to hide from the having to deal with HMS Client and the thrift objects. Currently, Hive has support for DML events (being able to generate events on DML operations but does not expose a public method to do so). It has a private method that takes in Hive objects like Table etc. Would be nice if we can have something with more primitive datatypes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27063) LDAP+JWT auth forms not supported
Naveen Gangam created HIVE-27063: Summary: LDAP+JWT auth forms not supported Key: HIVE-27063 URL: https://issues.apache.org/jira/browse/HIVE-27063 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Naveen Gangam In HIVE-25875, support for multiple authentication forms was added for Hive Server. In HIVE-25575, support for JWT authentication was added. However, setting hive.server2.authentication="JWT,LDAP" will fail with the following validation error. {noformat} <12>1 2023-02-03T09:32:11.018Z hiveserver2-0 hiveserver2 1 0393cf91-48f7-49e3-b2b1-b983000d4cd6 [mdc@18060 class="server.HiveServer2" level="WARN" thread="main"] Error starting HiveServer2 on attempt 2, will retry in 6ms\rorg.apache.hive.service.ServiceException: Failed to Start HiveServer2\r at org.apache.hive.service.CompositeService.start(CompositeService.java:80)\r at org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:692)\r at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1154)\r at org.apache.hive.service.server.HiveServer2.access$1400(HiveServer2.java:145)\r at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:1503)\r at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:1316)\r at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\r at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\r at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\r at java.base/java.lang.reflect.Method.invoke(Method.java:566)\r at org.apache.hadoop.util.RunJar.run(RunJar.java:318)\r at org.apache.hadoop.util.RunJar.main(RunJar.java:232)\rCaused by: java.lang.RuntimeException: Failed to init HttpServer\r at org.apache.hive.service.cli.thrift.ThriftHttpCLIService.initServer(ThriftHttpCLIService.java:239)\r at org.apache.hive.service.cli.thrift.ThriftCLIService.start(ThriftCLIService.java:235)\r at org.apache.hive.service.CompositeService.start(CompositeService.java:70)\r ... 11 more\rCaused by: java.lang.Exception: The authentication types have conflicts: LDAP,JWT\r at org.apache.hive.service.auth.AuthType.verifyTypes(AuthType.java:69)\r at org.apache.hive.service.auth.AuthType.(AuthType.java:43)\r at org.apache.hive.service.cli.thrift.ThriftHttpServlet.(ThriftHttpServlet.java:124)\r at org.apache.hive.service.cli.thrift.ThriftHttpCLIService.initServer(ThriftHttpCLIService.java:197)\r ... 13 more\r {noformat} We never fixed the AuthType.validateTypes() to support this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26568) Upgrade Log4j2 to 2.18.0 due to CVEs
Naveen Gangam created HIVE-26568: Summary: Upgrade Log4j2 to 2.18.0 due to CVEs Key: HIVE-26568 URL: https://issues.apache.org/jira/browse/HIVE-26568 Project: Hive Issue Type: Bug Affects Versions: 3.1.2 Reporter: weidong Assignee: Hankó Gergely Fix For: 4.0.0, 4.0.0-alpha-1 High security vulnerability in Log4J - CVE-2021-44832 bundled with Hive -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26566) Upgrade H2 database version to 2.1.214
Naveen Gangam created HIVE-26566: Summary: Upgrade H2 database version to 2.1.214 Key: HIVE-26566 URL: https://issues.apache.org/jira/browse/HIVE-26566 Project: Hive Issue Type: Task Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Fix For: 4.0.0, 4.0.0-alpha-1 The 1.3.166 version, which is in use in Hive, suffers from the following security vulnerabilities: https://nvd.nist.gov/vuln/detail/CVE-2021-42392 https://nvd.nist.gov/vuln/detail/CVE-2022-23221 In the project, we use H2 only for testing purposes (inside the jdbc-handler module) thus the H2 binaries are not present in the runtime classpath thus these CVEs do not pose a problem for Hive or its users. Nevertheless, it would be good to upgrade to a more recent version to avoid Hive coming up in vulnerability scans due to this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26502) Improve LDAP auth to support include generic user filters
Naveen Gangam created HIVE-26502: Summary: Improve LDAP auth to support include generic user filters Key: HIVE-26502 URL: https://issues.apache.org/jira/browse/HIVE-26502 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 4.0.0-alpha-1 Reporter: Naveen Gangam Assignee: Naveen Gangam Currently, Hive's ldap userfiltering is based on configuring a set of patterns in which wild cards are replaced by usernames and searched for. While this model supports advanced filtering options where a corporate ldap can have users in different orgs and trees, it does not quite support generic ldap searches like this. (&(uid={0})(objectClass=person)) To be able to support this without making changes to the semantics of existing configuration params, and to be backward compatible, we can enhance the existing custom query functionality to support this. For with a configuration like this, we should be able to perform a search for user who uid matches the username being authenticated. hive.server2.authentication.ldap.baseDN dc=apache,dc=org hive.server2.authentication.ldap.customLDAPQuery (&(uid={0})(objectClass=person)) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26321) Upgrade commons-io to 2.11.0
Naveen Gangam created HIVE-26321: Summary: Upgrade commons-io to 2.11.0 Key: HIVE-26321 URL: https://issues.apache.org/jira/browse/HIVE-26321 Project: Hive Issue Type: Improvement Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Upgrade commons-io to 2.11.0 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26118) [Standalone Beeline] Jar name mismatch between build and assembly
Naveen Gangam created HIVE-26118: Summary: [Standalone Beeline] Jar name mismatch between build and assembly Key: HIVE-26118 URL: https://issues.apache.org/jira/browse/HIVE-26118 Project: Hive Issue Type: Sub-task Components: Beeline Affects Versions: 3.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Fix from HIVE-25750 has an issue where the beeline builds a jar named "jar-with-dependencies.jar" but the assembly looks for a jar name "original-jar-with-dependencies.jar". Thus this uber jar never gets included in the distribution. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-26046) MySQL's bit datatype is default to void datatype in hive
Naveen Gangam created HIVE-26046: Summary: MySQL's bit datatype is default to void datatype in hive Key: HIVE-26046 URL: https://issues.apache.org/jira/browse/HIVE-26046 Project: Hive Issue Type: Sub-task Components: Standalone Metastore Affects Versions: 4.0.0 Reporter: Naveen Gangam describe on a table that contains a "bit" datatype gets mapped to void. We need a explicit conversion logic in the MySQL ConnectorProvider to map it to a suitable datatype in hive. {noformat} +---+---++ | col_name| data_type | comment | +---+---++ | tbl_id| bigint | from deserializer | | create_time | int | from deserializer | | db_id | bigint | from deserializer | | last_access_time | int | from deserializer | | owner | varchar(767) | from deserializer | | owner_type| varchar(10) | from deserializer | | retention | int | from deserializer | | sd_id | bigint | from deserializer | | tbl_name | varchar(256) | from deserializer | | tbl_type | varchar(128) | from deserializer | | view_expanded_text| string | from deserializer | | view_original_text| string | from deserializer | | is_rewrite_enabled| void | from deserializer | | write_id | bigint | from deserializer {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-26045) Detect timed out connections for providers and auto-reconnect
Naveen Gangam created HIVE-26045: Summary: Detect timed out connections for providers and auto-reconnect Key: HIVE-26045 URL: https://issues.apache.org/jira/browse/HIVE-26045 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Naveen Gangam For the connectors, we use single connection, no pooling. But when the connection is idle for an extended period, the JDBC connection times out. We need to check for closed connections (Connection.isClosed()?) and re-establish the connection. Otherwise it renders the connector fairly useless. {noformat} 2022-03-17T13:02:16,635 WARN [HiveServer2-Handler-Pool: Thread-116] thrift.ThriftCLIService: Error executing statement: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Unable to fetch table temp_dbs. Error retrieving remote table:com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: No operations allowed after connection closed. at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:373) ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:211) ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:265) ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] at org.apache.hive.service.cli.operation.Operation.run(Operation.java:285) ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:576) ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:562) ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_231] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231] at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_231] at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_231] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) ~[hadoop-common-3.1.0.jar:?] at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] at com.sun.proxy.$Proxy44.executeStatementAsync(Unknown Source) ~[?:?] at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315) ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:567) ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1550) ~[hive-exec-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1530) ~[hive-exec-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) ~[hive-exec-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) ~[hive-exec-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) ~[hive-service-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313) ~[hive-exec-3.1.3000.7.2.15.0-SNAPSHOT.jar:3.1.3000.7.2.15.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[jira] [Created] (HIVE-26012) HMS APIs to be enhanced for metadata replication
Naveen Gangam created HIVE-26012: Summary: HMS APIs to be enhanced for metadata replication Key: HIVE-26012 URL: https://issues.apache.org/jira/browse/HIVE-26012 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 3.1.0 Reporter: Naveen Gangam HMS currently has APIs like these that automatically create/delete the directories on the associated DFS. [create/drop]_database [create/drop]_table* [add/append/drop]_partition* This is expected and should be this way when query processors use this APIs. However, when tools that replicate hive metadata use this APIs on the target cluster, creating these dirs on target side which cause the replication of DFS-snapshots to fail. So we if provide an option to bypass this creation of dirs, dfs replications will be smoother. In the future we will need to restrict users that can use these APIs. So we will have some sort of an authorization policy. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25875) Support multiple authentication mechanisms simultaneously
Naveen Gangam created HIVE-25875: Summary: Support multiple authentication mechanisms simultaneously Key: HIVE-25875 URL: https://issues.apache.org/jira/browse/HIVE-25875 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 3.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Currently, HS2 supports a single form of auth on any given instance of HiveServer2. Hive should be able to support multiple auth mechanisms on a single instance especially with http transport. for example, LDAP and SAML. In both cases, HS2 ends up with receiving an Authorization Header in the request. Similarly we could be able to support JWT support or other forms of boundary authentication that is done outside of Hive. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25855) Make a branch-3 release
Naveen Gangam created HIVE-25855: Summary: Make a branch-3 release Key: HIVE-25855 URL: https://issues.apache.org/jira/browse/HIVE-25855 Project: Hive Issue Type: Bug Reporter: Naveen Gangam Assignee: Naveen Gangam This jira is to track commits for a hive release off branch-3 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25798) Update pom.xml
Naveen Gangam created HIVE-25798: Summary: Update pom.xml Key: HIVE-25798 URL: https://issues.apache.org/jira/browse/HIVE-25798 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Naveen Gangam Assignee: Naveen Gangam -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25214) Add hive authorization support for Data connectors.
Naveen Gangam created HIVE-25214: Summary: Add hive authorization support for Data connectors. Key: HIVE-25214 URL: https://issues.apache.org/jira/browse/HIVE-25214 Project: Hive Issue Type: Sub-task Reporter: Naveen Gangam Assignee: Naveen Gangam We need to add authorization support for data connectors in hive. The default behavior should be 1) Connectors can be create/dropped by users in admin role. 2) Connectors have READ and WRITE permissions. * READ permissions are required to fetch a connector object or fetch all connector names. So to create a REMOTE database using a connector, users will need READ permission on the connector. DDL queries like "show connectors" and "describe " will check for read access on the connector as well. * WRITE permissions are required to alter/drop a connector. DDL queries like "alter connector" and "drop connector" will need WRITE access on the connector. Adding this support, Ranger can integrate with this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25213) Implement List getTables() for existing connectors.
Naveen Gangam created HIVE-25213: Summary: Implement List getTables() for existing connectors. Key: HIVE-25213 URL: https://issues.apache.org/jira/browse/HIVE-25213 Project: Hive Issue Type: Sub-task Reporter: Naveen Gangam Assignee: Naveen Gangam In the initial implementation, connector providers do not implement the getTables(string pattern) spi. We had deferred it for later. Only getTableNames() and getTable() were implemented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24970) Reject location and managed locations in DDL for REMOTE databases.
Naveen Gangam created HIVE-24970: Summary: Reject location and managed locations in DDL for REMOTE databases. Key: HIVE-24970 URL: https://issues.apache.org/jira/browse/HIVE-24970 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam This was part of the review feedback from Yongzhi. Creating a followup jira to track this discussion. So, using DB connector for DB, will not create managed tables? @nrg4878 nrg4878 1 hour ago Author Member we don't support create/drop/alter in REMOTE databases at this point. the concepts of managed vs external is not in the picture at this point. When we do support it, it will be application to the hive connectors only (or other hive based connectors like AWS Glue) @nrg4878 nrg4878 2 minutes ago Author Member will file a separate jira for this. Basically, instead of ignoring the location and managedlocation that may be specified for remote database, the grammer needs to not accept any locations in the DDL at all. The argument is fair, why accept something we do not honor or entirely irrelevant for such databases. However, this requires some thought when we have additional connectors for remote hive instances. It might have some relevance in terms of security with Ranger etc. So will create new jira for followup discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24942) Consider use of lambda expressions in formatters.
Naveen Gangam created HIVE-24942: Summary: Consider use of lambda expressions in formatters. Key: HIVE-24942 URL: https://issues.apache.org/jira/browse/HIVE-24942 Project: Hive Issue Type: Sub-task Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Narayanan Venkateswaran ArrayList dcDescription = new ArrayList(); dcDescription.add(connector); dcDescription.add(type); dcDescription.add(ownerName); dcDescription.add(ownerType); dcDescription.add(HiveStringUtils.escapeJava(comment)); dcDecription.add(params.toString()); Consumer description_handler = (param) -> { out.write(param.getBytes(StandardCharsets.UTF_8));}; dcDescription.forEach(param); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24941) [Evaluate] if ReplicationSpec is needed for DataConnectors.
Naveen Gangam created HIVE-24941: Summary: [Evaluate] if ReplicationSpec is needed for DataConnectors. Key: HIVE-24941 URL: https://issues.apache.org/jira/browse/HIVE-24941 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam We have ReplicationSpec on Connector. Not sure if this is needed, if we do not want to replicate connectors. public ReplicationSpec getReplicationSpec() { return replicationSpec; } -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24938) [Evaluate] Dataconnector URL validation on create
Naveen Gangam created HIVE-24938: Summary: [Evaluate] Dataconnector URL validation on create Key: HIVE-24938 URL: https://issues.apache.org/jira/browse/HIVE-24938 Project: Hive Issue Type: Sub-task Reporter: Naveen Gangam Assignee: Naveen Gangam >From the review feedback, there was a comment about validating URL specified >in the connector URL when it is created. Currently, there is no validation >except for checking for empty/null value. This is by-design and the desired >behavior, IMHO. But filing this to be discussed with wider audience. {noformat} I tried creating a connector without the mysql JDBC URL specified properly and it went through, please see below, CREATE CONNECTOR mysql_test_2 TYPE 'mysql' URL 'jdbc://' COMMENT 'test connector' WITH DCPROPERTIES ( "hive.sql.dbcp.username"="hive1", "hive.sql.dbcp.password"="hive1"); CREATE CONNECTOR mysql_test_3 TYPE 'mysql' URL 'jdbc:derby://nightly1.apache.org:3306/hive1' COMMENT 'test connector' WITH DCPROPERTIES ( "hive.sql.dbcp.username"="hive1", "hive.sql.dbcp.password"="hive1"); I am not saying they are wrong, but we should probably call this out in the documentation. Document that URLs are not verified. Another thing I noticed is that the password is displayed in plain text on the command line. This used be considered a security problem in a product I worked in a past life. But I notice that an external table can be created with this semantics. I guess it is acceptable here. It is also stored in plain text in the metastore, please see below, CREATE TABLE DATACONNECTOR_PARAMS ( NAME VARCHAR(128) NOT NULL, PARAM_KEY VARCHAR(180) NOT NULL, PARAM_VALUE VARCHAR(4000), PRIMARY KEY (NAME, PARAM_KEY), CONSTRAINT DATACONNECTOR_NAME_FK1 FOREIGN KEY (NAME) REFERENCES DATACONNECTORS (NAME) ON DELETE CASCADE ) ENGINE=InnoDB DEFAULT CHARSET=latin1; Again I am not saying this is a problem, but I thought I can call this out to you. @nrg4878 nrg4878 24 minutes ago Author Member We check for null/empty values for URL. We error out in those cases. Other than that, any non-empty value is accepted. I dont think we should check for correctness of the URL or even can for that matter. a) The URL is meant to be a freeform value against dozens of datasource types (mysql, postgres, hive, AWS Glue, Redshift etc). For each such source type, there could be dozens of variations of the url (includes properties and other params specific to the source). So I dont think we can meaningfully detect incorrect URLs. For example, MySQL though the URL might look fine syntactically, we cannot confirm dbName1 or dbName2 exist without actually attempting to connect to the DB. jdbc:mysql://:3306/ jdbc:mysql://:3306/ b) The format for the URLs could be changing overtime as well. It is unnecessary burden for maintaining new formats in hive. We want to be able to plugin a new datasource type by simply adding a provider. c) To be able to validate the URL, we have to establish the connection to the datasource at the time of creation. We are trying to delay making that connection as long as possible. When actual show tables is called. We avoid using up extra resources and leak connections. d) Users can do "create connector" .. followed by "alter connector set url". So any incorrect URLS can be modified using alter. Also in this case, we would be checking the URL twice. Better to have the onus of configuring it correctly on the end user. Passwords can be secured using jceks files as described in the "Securing Password" section of the doc below. https://cwiki.apache.org/confluence/display/Hive/JDBC+Storage+Handler So users have an option of using non-CTVs {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24887) getDatabase() to call translation code even if client has no capabilities
Naveen Gangam created HIVE-24887: Summary: getDatabase() to call translation code even if client has no capabilities Key: HIVE-24887 URL: https://issues.apache.org/jira/browse/HIVE-24887 Project: Hive Issue Type: Sub-task Reporter: Naveen Gangam Assignee: Naveen Gangam We do this for other calls that go thru translation layer. For some reason, the current code only calls it when the client sets the capabilities. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24844) Add implementation for a 'hive' connector provider
Naveen Gangam created HIVE-24844: Summary: Add implementation for a 'hive' connector provider Key: HIVE-24844 URL: https://issues.apache.org/jira/browse/HIVE-24844 Project: Hive Issue Type: Sub-task Reporter: Naveen Gangam This connector implementation will allow HMS to communicate with remote HMS instances for metadata. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24770) Upgrade should update changed FQN in HMS DB.
Naveen Gangam created HIVE-24770: Summary: Upgrade should update changed FQN in HMS DB. Key: HIVE-24770 URL: https://issues.apache.org/jira/browse/HIVE-24770 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam While the parent change has does not cause this directly, but post upgrade the existing tables that use MultiDelimiterSerDe will be broken as the hive-contrib jar would no longer exist. Instead if the Hive schema upgrade script can update the SERDES table to alter the classname to the new classname, the old tables would work automatically. Much better user experience. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24464) Evaluate the need to have directSQL implementation for data connectors
Naveen Gangam created HIVE-24464: Summary: Evaluate the need to have directSQL implementation for data connectors Key: HIVE-24464 URL: https://issues.apache.org/jira/browse/HIVE-24464 Project: Hive Issue Type: Sub-task Reporter: Naveen Gangam I expect that there will be just a handful of connectors not 100's of them like databases. But creating a placeholder item to evaluate at a future time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24461) Provide CachedStore implementation for dataconnectors
Naveen Gangam created HIVE-24461: Summary: Provide CachedStore implementation for dataconnectors Key: HIVE-24461 URL: https://issues.apache.org/jira/browse/HIVE-24461 Project: Hive Issue Type: Sub-task Reporter: Naveen Gangam Currently, none of the connectors are cached. They are all delegated to the ObjectStore for every call. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24452) Add a generic JDBC implementation that can be used to other JDBC DBs
Naveen Gangam created HIVE-24452: Summary: Add a generic JDBC implementation that can be used to other JDBC DBs Key: HIVE-24452 URL: https://issues.apache.org/jira/browse/HIVE-24452 Project: Hive Issue Type: Sub-task Reporter: Naveen Gangam Currently, we added a custom provider for each of the JDBC DBs supported by hive (MySQL, Postgres, MSSQL(pending), Oracle(pending) and Derby (pending)). But if there are other JDBC providers we want to add support for, adding a generic JDBC provider would be useful that hive can default to. This means 1) We have to support means to indicate that a connector is for a JDBC datasource. So maybe add a property in DCPROPERTIES on connector to indicate that the datasource supports JDBC. 2) If there is no custom connector for a data source, use the GenericJDBCDatasource connector that is to be added as part of this jira. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24451) Add schema changes for MSSQL
Naveen Gangam created HIVE-24451: Summary: Add schema changes for MSSQL Key: HIVE-24451 URL: https://issues.apache.org/jira/browse/HIVE-24451 Project: Hive Issue Type: Sub-task Reporter: Naveen Gangam The current patch does not include schema changes for MSSQL backend. This should be right after the initial commit. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24449) Implement connector provider for Derby DB
Naveen Gangam created HIVE-24449: Summary: Implement connector provider for Derby DB Key: HIVE-24449 URL: https://issues.apache.org/jira/browse/HIVE-24449 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Provide an implementation of Connector provider for Derby DB. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24448) Support case-sensitivity for tables in REMOTE database.
Naveen Gangam created HIVE-24448: Summary: Support case-sensitivity for tables in REMOTE database. Key: HIVE-24448 URL: https://issues.apache.org/jira/browse/HIVE-24448 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Naveen Gangam Hive tables are case-insensitive. So any case specified in user queries are converted to lower case for query planning and all of the HMS metadata is also persisted as lower case names. However, with REMOTE data sources, certain data source will support case-sensitivity for tables. So HiveServer2 query planner needs to preserve user-provided case to be used with HMS APIs, for HMS to be able to fetch the metadata from a remote data source. We now see something like this {noformat} 2020-11-25T16:45:36,402 WARN [HiveServer2-Handler-Pool: Thread-76] thrift.ThriftCLIService: Error executing statement: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: RuntimeException MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Error while trying to get column names: Table 'hive1.txns' doesn't exist) at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:365) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.service.cli.operation.Operation.run(Operation.java:277) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:560) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:545) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_231] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231] at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_231] at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_231] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) ~[hadoop-common-3.1.0.jar:?] at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at com.sun.proxy.$Proxy43.executeStatementAsync(Unknown Source) ~[?:?] at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:571) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1550) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1530) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_231] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_231] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_231] Caused by: java.lang.RuntimeException:
[jira] [Created] (HIVE-24447) Move create/drop/alter table to the provider interface
Naveen Gangam created HIVE-24447: Summary: Move create/drop/alter table to the provider interface Key: HIVE-24447 URL: https://issues.apache.org/jira/browse/HIVE-24447 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam The support for such operations on a table in a REMOTE database will be left to the discretion of the providers to support/implement. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24425) Create table in REMOTE db should fail
Naveen Gangam created HIVE-24425: Summary: Create table in REMOTE db should fail Key: HIVE-24425 URL: https://issues.apache.org/jira/browse/HIVE-24425 Project: Hive Issue Type: Sub-task Reporter: Naveen Gangam Assignee: Naveen Gangam Currently it creates the table in that DB but show tables does not show anything. Preventing the creation of table will resolve this inconsistency too. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24405) Missing datatype for table column in oracle
Naveen Gangam created HIVE-24405: Summary: Missing datatype for table column in oracle Key: HIVE-24405 URL: https://issues.apache.org/jira/browse/HIVE-24405 Project: Hive Issue Type: Sub-task Components: Hive Reporter: Naveen Gangam Assignee: Naveen Gangam The parent change introduces an issue in the oracle schema script. No datatype is specified. {noformat} 1 row created. CQ_COMMIT_TIME(19) * ERROR at line 19: ORA-00902: invalid datatype {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24396) [New Feature] Add data connector support for remote datasources
Naveen Gangam created HIVE-24396: Summary: [New Feature] Add data connector support for remote datasources Key: HIVE-24396 URL: https://issues.apache.org/jira/browse/HIVE-24396 Project: Hive Issue Type: Improvement Components: Hive Reporter: Naveen Gangam Assignee: Naveen Gangam This feature work is to be able to support in Hive Metastore to be able to configure data connectors for remote datasources and map databases. We currently have support for remote tables via StorageHandlers like JDBCStorageHandler and HBaseStorageHandler. Data connectors are a natural extension to this where we can map an entire database or catalogs instead of individual tables. The tables within are automagically mapped at runtime. The metadata for these tables are not persisted in Hive. They are always mapped and built at runtime. With this feature, we introduce a concept of type for Databases in Hive. NATIVE vs REMOTE. All current databases are NATIVE. To create a REMOTE database, the following syntax is to be used CREATE REMOTE DATABASE remote_db USING WITH DCPROPERTIES (); Will attach a design doc to this jira. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24348) Beeline: Isolating dependencies and execution with java
Naveen Gangam created HIVE-24348: Summary: Beeline: Isolating dependencies and execution with java Key: HIVE-24348 URL: https://issues.apache.org/jira/browse/HIVE-24348 Project: Hive Issue Type: Improvement Components: Beeline Affects Versions: 3.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Currently, beeline code, binaries and executables are somewhat tightly coupled with the hive product. To be able to execute beeline from a node with just JRE installed and some jars in classpath is impossible. * beeline.sh/hive scripts rely on HADOOP_HOME to be set which are designed to use "hadoop" executable to run beeline. * Ideally, just the hive-beeline.jar and hive-jdbc-standalone jars should be enough but sadly they arent. The latter jar adds more problems than it solves because all the classfiles are shaded some dependencies cannot be resolved. * Beeline has many other dependencies like hive-exec, hive-common. hadoop-common, supercsv, jline, commons-cli, commons-io, commons-logging etc. While it may not be possible to eliminate some of these, we should atleast have a self-contains jar that contains all these to be able to make it work. * the underlying script used to run beeline should use JAVA as an alternate means to execute if HADOOP_HOME is not set -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24288) Files created by CompileProcessor have incorrect permissions
Naveen Gangam created HIVE-24288: Summary: Files created by CompileProcessor have incorrect permissions Key: HIVE-24288 URL: https://issues.apache.org/jira/browse/HIVE-24288 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Compile processor generates some temporary files as part of processing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24271) Create managed table relies on hive.create.as.acid settings.
Naveen Gangam created HIVE-24271: Summary: Create managed table relies on hive.create.as.acid settings. Key: HIVE-24271 URL: https://issues.apache.org/jira/browse/HIVE-24271 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam 0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> set hive.create.as.acid; ++ |set | ++ | hive.create.as.acid=false | ++ 1 row selected (0.018 seconds) 0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> set hive.create.as.insert.only; +---+ |set| +---+ | hive.create.as.insert.only=false | +---+ 1 row selected (0.013 seconds) 0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> create managed table mgd_table(a int); INFO : Compiling command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140): create managed table mgd_table(a int) INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140); Time taken: 0.021 seconds INFO : Executing command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140): create managed table mgd_table(a int) INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140); Time taken: 0.048 seconds INFO : OK No rows affected (0.107 seconds) 0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> describe formatted mgd_table; INFO : Compiling command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5): describe formatted mgd_table INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, type:string, comment:from deserializer), FieldSchema(name:data_type, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5); Time taken: 0.037 seconds INFO : Executing command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5): describe formatted mgd_table INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5); Time taken: 0.03 seconds INFO : OK +---+++ | col_name| data_type | comment | +---+++ | a | int || | | NULL | NULL | | # Detailed Table Information | NULL | NULL | | Database: | bothfalseonhs2 | NULL | | OwnerType:| USER | NULL | | Owner:| hive | NULL | | CreateTime: | Wed Oct 14 05:35:26 UTC 2020 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Retention:| 0 | NULL | | Location: | hdfs://ngangam-3.ngangam.root.hwx.site:8020/warehouse/tablespace/external/hive/bothfalseonhs2.db/mgd_table | NULL | | Table Type: | EXTERNAL_TABLE | NULL | | Table Parameters: | NULL | NULL | |
[jira] [Created] (HIVE-24175) Ease database managed location restrictions in HMS translation
Naveen Gangam created HIVE-24175: Summary: Ease database managed location restrictions in HMS translation Key: HIVE-24175 URL: https://issues.apache.org/jira/browse/HIVE-24175 Project: Hive Issue Type: Sub-task Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Currently, HMS translation layer restricts the path of database's managed location to be within hive warehouse. so a getDatabase call will return a managedlocation path that adheres to this restriction regardless of what has been set in the HMS DB. This leads to issues like having inconsistent paths if hive-site.xml is not in sync across HMS and HS2 instances or even different HMS instances as each instance has a different version of warehouse root. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24152) Comment out test until it is investigated.
Naveen Gangam created HIVE-24152: Summary: Comment out test until it is investigated. Key: HIVE-24152 URL: https://issues.apache.org/jira/browse/HIVE-24152 Project: Hive Issue Type: Sub-task Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Looks like this test was re-enabled between the time the precommits were run and it was committed (a few hours later). This is blocking all other commits. Commenting it out for now -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24086) CTAS with HMS translation enabled returns empty results.
Naveen Gangam created HIVE-24086: Summary: CTAS with HMS translation enabled returns empty results. Key: HIVE-24086 URL: https://issues.apache.org/jira/browse/HIVE-24086 Project: Hive Issue Type: Bug Components: Hive Reporter: Naveen Gangam Assignee: Naveen Gangam when you execute something like create table ctas_table as select * from mgd_table; if mgd_table is a managed table, the hive query planner creates a plan with ctas_table as a managed table, so the location is set to something in the managed warehouse directory. However with HMS translation enabled, non-acid MANAGED tables are converted to EXTERNAL with purge set to true. So the table location for this table is altered to be in the external warehouse directory. But after the table creation, the rest of the query executes but the data is copied to the location set in the query plan. As a result when you execute a select from ctas_table, it will not return any results because that location is empty. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24076) MetastoreDirectSql.getDatabase() needs a space in the query
Naveen Gangam created HIVE-24076: Summary: MetastoreDirectSql.getDatabase() needs a space in the query Key: HIVE-24076 URL: https://issues.apache.org/jira/browse/HIVE-24076 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam String queryTextDbSelector= "select " + "\"DB_ID\", \"NAME\", \"DB_LOCATION_URI\", \"DESC\", " + "\"OWNER_NAME\", \"OWNER_TYPE\", \"CTLG_NAME\" , \"CREATE_TIME\", \"DB_MANAGED_LOCATION_URI\"" + "FROM "+ DBS There needs to be a space before FROM so the query is right. Currently it falls back to JDO, so not lapse in functionality. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23970) Reject database creation if managedlocation is incorrect
Naveen Gangam created HIVE-23970: Summary: Reject database creation if managedlocation is incorrect Key: HIVE-23970 URL: https://issues.apache.org/jira/browse/HIVE-23970 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam With some changes in HIVE-23387, managed location check gets bypassed. Need to be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23603) transformDatabase() should work with changes from HIVE-22995
Naveen Gangam created HIVE-23603: Summary: transformDatabase() should work with changes from HIVE-22995 Key: HIVE-23603 URL: https://issues.apache.org/jira/browse/HIVE-23603 Project: Hive Issue Type: Sub-task Components: Hive Reporter: Naveen Gangam Assignee: Naveen Gangam Fix For: 4.0.0 The translation layer alters the locationUri on Database based on the capabilities of the client. Now that we have separate locations for managed and external for database, the implementation should be adjusted to work with both locations. locationUri could already be external location. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23588) create table like tabletype should match source tabletype and proper location
Naveen Gangam created HIVE-23588: Summary: create table like tabletype should match source tabletype and proper location Key: HIVE-23588 URL: https://issues.apache.org/jira/browse/HIVE-23588 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23562) Upgrade thrift version in hive
Naveen Gangam created HIVE-23562: Summary: Upgrade thrift version in hive Key: HIVE-23562 URL: https://issues.apache.org/jira/browse/HIVE-23562 Project: Hive Issue Type: Improvement Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Hive has been using thrift 0.9.3 for a long time. We might be able to avail new features like deprecation support etc in the newer releases of thrift. But this impacts interoperability between older clients and newer servers. We need to assess what can break atleast for the purposes of documenting before we make this change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23435) Full outer join result is missing rows
Naveen Gangam created HIVE-23435: Summary: Full outer join result is missing rows Key: HIVE-23435 URL: https://issues.apache.org/jira/browse/HIVE-23435 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 3.1.0 Reporter: Naveen Gangam Assignee: Jesus Camacho Rodriguez Full Outer join result has missing rows. Appears to be a bug with the full outer join logic. Expected output is receiving when we do a left and right outer join. Reproducible steps are mentioned below. ~~ SUPPORT ANALYSIS Steps to Reproduce: 1. Create a table and insert data: create table x (z char(5), x int, y int); insert into x values ('one', 1, 50), ('two', 2, 30), ('three', 3, 30), ('four', 4, 60), ('five', 5, 70), ('six', 6, 80); 2. Try full outer with the below command. The result is incomplete, it is missing the row: NULLNULLNULLthree 3 30.0 Full Outer Join: select x1.`z`, x1.`x`, x1.`y`, x2.`z`, x2.`x`, x2.`y` from `x` x1 full outer join `x` x2 on (x1.`x` > 3) and (x2.`x` < 4) and (x1.`x` = x2.`x`); Result: --+ x1.zx1.xx1.yx2.zx2.xx2.y --+ one 1 50 NULLNULLNULL NULLNULLNULLone 1 50 two 2 30 NULLNULLNULL NULLNULLNULLtwo 2 30 three 3 30 NULLNULLNULL four4 60 NULLNULLNULL NULLNULLNULLfour4 60 five5 70 NULLNULLNULL NULLNULLNULLfive5 70 six 6 80 NULLNULLNULL NULLNULLNULLsix 6 80 --+ 3. Expected output is coming when we use left/right join + union: select x1.`z`, x1.`x`, x1.`y`, x2.`z`, x2.`x`, x2.`y` from `x` x1 left outer join `x` x2 on (x1.`x` > 3) and (x2.`x` < 4) and (x1.`x` = x2.`x`) union select x1.`z`, x1.`x`, x1.`y`, x2.`z`, x2.`x`, x2.`y` from `x` x1 right outer join `x` x2 on (x1.`x` > 3) and (x2.`x` < 4) and (x1.`x` = x2.`x`); Result: + z x y _col3 _col4 _col5 + NULLNULLNULLfive5 70 NULLNULLNULLfour4 60 NULLNULLNULLone 1 50 four4 60 NULLNULLNULL one 1 50 NULLNULLNULL six 6 80 NULLNULLNULL three 3 30 NULLNULLNULL two 2 30 NULLNULLNULL NULLNULLNULLsix 6 80 NULLNULLNULLthree 3 30 NULLNULLNULLtwo 2 30 five5 70 NULLNULLNULL + ~~ EXPECTED ENGINEERING ACTION Confirm this is a bug. If so, any work around or just use left+right outer join. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23388) CTAS queries should use target's location for staging.
Naveen Gangam created HIVE-23388: Summary: CTAS queries should use target's location for staging. Key: HIVE-23388 URL: https://issues.apache.org/jira/browse/HIVE-23388 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam In cloud based storage systems, renaming files across different root level buckets seem to be disallowed. The S3AFileSystem throws the following exception. This appears to be bug in S3FS impl. Failed with exception Wrong FS s3a://hive-managed/clusters/env-x/warehouse--/warehouse/tablespace/managed/hive/tpch.db/customer/delta_001_001_ -expected s3a://hive-external 2020-04-27T19:34:27,573 INFO [Thread-6] jdbc.TestDriver: java.lang.IllegalArgumentException: Wrong FS s3a://hive-managed//clusters/env-/warehouse--/warehouse/tablespace/managed/hive/tpch.db/customer/delta_001_001_ -expected s3a://hive-external But we should fix our query plans to use the target table's directory for staging as well. That should resolve this issue and it is the right thing to do as well (in case there are different encryption zones/keys for these buckets). Fix in HIVE-22995 probably changed this behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23387) Flip the Warehouse.getDefaultTablePath() to return path from ext warehouse
Naveen Gangam created HIVE-23387: Summary: Flip the Warehouse.getDefaultTablePath() to return path from ext warehouse Key: HIVE-23387 URL: https://issues.apache.org/jira/browse/HIVE-23387 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam For backward compatibility, initial fix returned path that was set on db. It could have been either from managed warehouse or external depending on what was set. There were tests relying on certain paths to be returned. This fix is to address the tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23260) Add support for unmodified_metadata capability
Naveen Gangam created HIVE-23260: Summary: Add support for unmodified_metadata capability Key: HIVE-23260 URL: https://issues.apache.org/jira/browse/HIVE-23260 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Currently, the translator removes bucketing info for tables for clients that do not possess the HIVEBUCKET2 capability. While this is desirable, some clients that have write access to these tables can turn around overwrite the metadata thus corrupting original bucketing info. So adding support for a capability for client that are capable of interpreting the original metadata would prevent such corruption. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23192) "default" database locationUri should be external warehouse root.
Naveen Gangam created HIVE-23192: Summary: "default" database locationUri should be external warehouse root. Key: HIVE-23192 URL: https://issues.apache.org/jira/browse/HIVE-23192 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam When creating the default database, the database locationUri should be set to external warehouse. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23121) Re-examine TestWarehouseExternalDir to see if it uses HMS translation.
Naveen Gangam created HIVE-23121: Summary: Re-examine TestWarehouseExternalDir to see if it uses HMS translation. Key: HIVE-23121 URL: https://issues.apache.org/jira/browse/HIVE-23121 Project: Hive Issue Type: Sub-task Reporter: Naveen Gangam Assignee: Naveen Gangam TestWarehouseExternalDir currently passes with just one change related to HIVE-22995. But that change was assuming it was using HMS Translation to convert non-acid managed table to external. Ensure that it still does. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22995) Add support for location for managed tables on database
Naveen Gangam created HIVE-22995: Summary: Add support for location for managed tables on database Key: HIVE-22995 URL: https://issues.apache.org/jira/browse/HIVE-22995 Project: Hive Issue Type: Improvement Components: Hive Affects Versions: 3.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Attachments: Hive Metastore Support for Tenant-based storage heirarchy.pdf I have attached the initial spec to this jira. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22874) Beeline unable to use credentials from URL.
Naveen Gangam created HIVE-22874: Summary: Beeline unable to use credentials from URL. Key: HIVE-22874 URL: https://issues.apache.org/jira/browse/HIVE-22874 Project: Hive Issue Type: Bug Components: Beeline Reporter: Naveen Gangam Assignee: Naveen Gangam Fix For: 4.0.0 Beeline is not using password value from the URL. Using LDAP Auth in this case, so the failure is on connect. bin/beeline -u "jdbc:hive2://localhost:1/default;user=test1;password=test1" On the server side in LdapAuthenticator, the principals come out to (via a special debug logging) 2020-02-11T11:10:31,613 INFO [HiveServer2-Handler-Pool: Thread-67] auth.LdapAuthenticationProviderImpl: Connecting to ldap as user/password:test1:anonymous This bug may have been introduced via https://github.com/apache/hive/commit/749e831060381a8ae4775630efb72d5cd040652f pass = "" ( an empty string on this line) https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/BeeLine.java#L848 but on this line of code, it checks to see it is null which will not be true and hence it never picks up from the jdbc url https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/BeeLine.java#L900 It has another chance here but pass != null will always be true and never goes into the else condition. https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/BeeLine.java#L909 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22853) Beeline should use HS2 server defaults for fetchSize
Naveen Gangam created HIVE-22853: Summary: Beeline should use HS2 server defaults for fetchSize Key: HIVE-22853 URL: https://issues.apache.org/jira/browse/HIVE-22853 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Currently beeline uses a hard coded default of 1000 rows for fetchSize. This default value is different from what the server has set. While the beeline user can reset the value via set command, its cumbersome to change the workloads. Rather it should default to the server-side value and set should be used to override within the session. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22794) Disallow ACID table location outside hive warehouse
Naveen Gangam created HIVE-22794: Summary: Disallow ACID table location outside hive warehouse Key: HIVE-22794 URL: https://issues.apache.org/jira/browse/HIVE-22794 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 3.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam The co-location of managed tables enables hive to govern them effectively, using common policies for security, S3Guard, support quotas etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22708) To be updated later
Naveen Gangam created HIVE-22708: Summary: To be updated later Key: HIVE-22708 URL: https://issues.apache.org/jira/browse/HIVE-22708 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22498) Schema tool enhancements to merge catalogs
Naveen Gangam created HIVE-22498: Summary: Schema tool enhancements to merge catalogs Key: HIVE-22498 URL: https://issues.apache.org/jira/browse/HIVE-22498 Project: Hive Issue Type: Sub-task Reporter: Naveen Gangam Assignee: Naveen Gangam Schema tool currently supports relocation of database from one catalog to another, one at a time. While having to do this one at a time is painful, it also lacks support for converting them to external tables during migration, in lieu of the changes to the translation layer where a MANAGED table is strictly ACID-only table. Hence we also need to convert them to external tables during relocation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22497) Remove default value for Capabilities from HiveConf
Naveen Gangam created HIVE-22497: Summary: Remove default value for Capabilities from HiveConf Key: HIVE-22497 URL: https://issues.apache.org/jira/browse/HIVE-22497 Project: Hive Issue Type: Sub-task Reporter: Naveen Gangam This class is used and bundled in other jars that 3rd party connectors like teradata etc. So it would be good to remove this default value from HiveConf but rely on it being set in hive-site.xml instead. The HiveServer2 should still set this as part of HS2 initialization or via hiveserver2-site.xml -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22406) TRUNCATE TABLE fails due MySQL limitations on limit value
Naveen Gangam created HIVE-22406: Summary: TRUNCATE TABLE fails due MySQL limitations on limit value Key: HIVE-22406 URL: https://issues.apache.org/jira/browse/HIVE-22406 Project: Hive Issue Type: Bug Reporter: Naveen Gangam HMS currently has some APIs that accepts an integer limit value. Prior to the change in HIVE-21734, HMS was silently converting this int to short and thus we havent seen this issue. But semantically, its incorrect to do so quietly. {noformat} at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191] Caused by: java.sql.SQLException: setMaxRows() out of range. 2147483647 > 5000. at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:996) ~[mysql- connector-java.jar:5.1.33] at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:935) ~[mysql- connector-java.jar:5.1.33] at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:924) ~[mysql- connector-java.jar:5.1.33] at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:870) ~[mysql- connector-java.jar:5.1.33] at com.mysql.jdbc.StatementImpl.setMaxRows(StatementImpl.java:2525) ~[mysql- connector-java.jar:5.1.33] at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.setMaxRows(HikariProxyPreparedS tatement.java) ~[HikariCP-2.6.1.jar:?] {noformat} We cannot change the RawStore api to accept shorts instead of ints. So we have to fix the caller to use a lower limit instead of Integer.MAX_VALUE. {noformat} Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Exception thrown when executing query : SELECT DISTINCT 'org.apache.hadoop.hive.metastore.model.MPartition' AS `NUCLEUS_TYPE`,`A0`.`CREATE_TIME`,`A0`.`LAST_ACCESS_TIME`,`A0`.`PART_NAME`,`A0`.`WRITE_ID`,`A0`.`PART_ID`,`A0`.`PART_NAME` AS `NUCORDER0` FROM `PARTITIONS` `A0` LEFT OUTER JOIN `TBLS` `B0` ON `A0`.`TBL_ID` = `B0`.`TBL_ID` LEFT OUTER JOIN `DBS` `C0` ON `B0`.`DB_ID` = `C0`.`DB_ID` WHERE `B0`.`TBL_NAME` = ? AND `C0`.`NAME` = ? AND `C0`.`CTLG_NAME` = ? ORDER BY `NUCORDER0` LIMIT 0,2147483647 at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$truncate_table_req_result$truncate_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) ~[hive-exec-3.1.0.3.1.5.0-17.jar:3.1.0.3.1.5.0-17] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$truncate_table_req_result$truncate_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) ~[hive-exec-3.1.0.3.1.5.0-17.jar:3.1.0.3.1.5.0-17] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$truncate_table_req_result.read(ThriftHiveMetastore.java) ~[hive-exec-3.1.0.3.1.5.0-17.jar:3.1.0.3.1.5.0-17] at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[hive-exec-3.1.0.3.1.5.0-17.jar:3.1.0.3.1.5.0-17] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_truncate_table_req(ThriftHiveMetastore.java:1999) ~[hive-exec-3.1.0.3.1.5.0-17.jar:3.1.0.3.1.5.0-17] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.truncate_table_req(ThriftHiveMetastore.java:1986) ~[hive-exec-3.1.0.3.1.5.0-17.jar:3.1.0.3.1.5.0-17] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.truncateTableInternal(HiveMetaStoreClient.java:1450) ~[hive-exec-3.1.0.3.1.5.0-17.jar:3.1.0.3.1.5.0-17] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.truncateTable(HiveMetaStoreClient.java:1427) ~[hive-exec-3.1.0.3.1.5.0-17.jar:3.1.0.3.1.5.0-17] at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.truncateTable(SessionHiveMetaStoreClient.java:171) ~[hive-exec-3.1.0.3.1.5.0-17.jar:3.1.0.3.1.5.0-17] at sun.reflect.GeneratedMethodAccessor112.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_191] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191] at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212) ~[hive-exec-3.1.0.3.1.5.0-17.jar:3.1.0.3.1.5.0-17] at com.sun.proxy.$Proxy59.truncateTable(Unknown Source) ~[?:?] at sun.reflect.GeneratedMethodAccessor112.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_191] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:3122) ~[hive-exec-3.1.0.3.1.5.0-17.jar:3.1.0.3.1.5.0-17] at com.sun.proxy.$Proxy59.truncateTable(Unknown Source) ~[?:?] at org.apache.hadoop.hive.ql.metadata.Hive.truncateTable(Hive.java:1277) ~[hive-exec-3.1.0.3.1.5.0-17.jar:3.1.0.3.1.5.0-17] at org.apache.hadoop.hive.ql.exec.DDLTask.truncateTable(DDLTask.java:5111)
[jira] [Created] (HIVE-22342) HMS Translation: HIVE-22189 too strict with location for EXTERNAL tables
Naveen Gangam created HIVE-22342: Summary: HMS Translation: HIVE-22189 too strict with location for EXTERNAL tables Key: HIVE-22342 URL: https://issues.apache.org/jira/browse/HIVE-22342 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam HIVE-22189 restricts EXTERNAL tables being created to be restricted to the EXTERNAL_WAREHOUSE_DIR. This might be too strict as any other location should be allowed as long as the location is outside the MANAGED warehouse directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22291) HMS Translation: Limit translation to hive default catalog only
Naveen Gangam created HIVE-22291: Summary: HMS Translation: Limit translation to hive default catalog only Key: HIVE-22291 URL: https://issues.apache.org/jira/browse/HIVE-22291 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam HMS Translation should only be limited to a single catalog. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22266) Addendum fix to have HS2 pom add explicit curator dependency
Naveen Gangam created HIVE-22266: Summary: Addendum fix to have HS2 pom add explicit curator dependency Key: HIVE-22266 URL: https://issues.apache.org/jira/browse/HIVE-22266 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam It might be better to add an explicit dependency on apache-curator in the service/pom.xml. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22205) Upgrade zookeeper and curator versions
Naveen Gangam created HIVE-22205: Summary: Upgrade zookeeper and curator versions Key: HIVE-22205 URL: https://issues.apache.org/jira/browse/HIVE-22205 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Other components like hadoop have switched to using new ZK versions. So these jars end up in classpath for hive services and could cause issues due to in-compatible curator versions that hive uses. So it makes sense for hive to upgrade the ZK and curator versions to try to keep up. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (HIVE-22189) HMS Translation: Enforce strict locations for managed vs external tables.
Naveen Gangam created HIVE-22189: Summary: HMS Translation: Enforce strict locations for managed vs external tables. Key: HIVE-22189 URL: https://issues.apache.org/jira/browse/HIVE-22189 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Currently, HMS allows flexibility with location of a table. External tables can be located within Hive managed warehouse space and managed tables can be located within the external warehouse directory if the user chooses to do so. There are certain advantages to restrict such flexibility. We could have different encryption policies for different warehouses, different replication policies etc. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (HIVE-22158) HMS Translation layer - Disallow non-ACID MANAGED tables.
Naveen Gangam created HIVE-22158: Summary: HMS Translation layer - Disallow non-ACID MANAGED tables. Key: HIVE-22158 URL: https://issues.apache.org/jira/browse/HIVE-22158 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam In the recent commits, we have allowed non-ACID MANAGED tables to be created by clients that have some form of ACID WRITE capabilities. I think it would make sense to disallow this entirely. MANAGED tables should be ACID tables only. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (HIVE-22159) HMS Translation layer - Turn off HMS Translation by default.
Naveen Gangam created HIVE-22159: Summary: HMS Translation layer - Turn off HMS Translation by default. Key: HIVE-22159 URL: https://issues.apache.org/jira/browse/HIVE-22159 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Because of certain backward incompatibilities in terms of behavior, I think it makes sense to turn off this translation in the Apache Hive codebase. Consumers can selectively enable it and even plugin their own set of translation rules as well. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (HIVE-22123) Use GetDatabaseResponse to allow for future extension
Naveen Gangam created HIVE-22123: Summary: Use GetDatabaseResponse to allow for future extension Key: HIVE-22123 URL: https://issues.apache.org/jira/browse/HIVE-22123 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam As part of the review, it was suggested to use the GetDatabaseResponse object to allow for any potential future expansions for these requests. https://reviews.apache.org/r/71267/#comment304501 -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22109) Hive.renamePartition expects catalog name to be set instead of using default
Naveen Gangam created HIVE-22109: Summary: Hive.renamePartition expects catalog name to be set instead of using default Key: HIVE-22109 URL: https://issues.apache.org/jira/browse/HIVE-22109 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22095) Hive.get() resets the capabilities from HiveConf instead of set capabilities
Naveen Gangam created HIVE-22095: Summary: Hive.get() resets the capabilities from HiveConf instead of set capabilities Key: HIVE-22095 URL: https://issues.apache.org/jira/browse/HIVE-22095 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Hive.get() resets the capabilities set on the HiveMetaStoreClient from what is set in HiveConf instead of preserving the capabilities that have already been set via setHMSClientCapabilties() -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22087) HMS Translation: Translate getDatabase() API to alter warehouse location
Naveen Gangam created HIVE-22087: Summary: HMS Translation: Translate getDatabase() API to alter warehouse location Key: HIVE-22087 URL: https://issues.apache.org/jira/browse/HIVE-22087 Project: Hive Issue Type: Sub-task Reporter: Naveen Gangam Assignee: Naveen Gangam It makes sense to translate getDatabase() calls as well, to alter the location for the Database based on whether or not the processor has capabilities to write to the managed warehouse directory. Every DB has 2 locations, one external and the other in the managed warehouse directory. If the processor has any AcidWrite capability, then the location remains unchanged for the database. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22069) joda-time binary conflict between druid-handler and phoenix-hive jars.
Naveen Gangam created HIVE-22069: Summary: joda-time binary conflict between druid-handler and phoenix-hive jars. Key: HIVE-22069 URL: https://issues.apache.org/jira/browse/HIVE-22069 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.1.0, 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Hive's druid storage handler uses 2.8.1 version of the joda time library where as the phoenix-hive.jar uses 1.6 version of this library. When both jars are in the classpath, bad things happen. Apache phoenix has its own release cycle and them uptaking a new version is not what hive should count on. Besides they could decide to move to a new version of this library and we would still have this problem. So its best we use shaded jars in hive for the version we are on. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22002) Insert into table partition fails partially with stats.autogather is on.
Naveen Gangam created HIVE-22002: Summary: Insert into table partition fails partially with stats.autogather is on. Key: HIVE-22002 URL: https://issues.apache.org/jira/browse/HIVE-22002 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Naveen Gangam create table test_double(id int) partitioned by (dbtest double); insert into test_double partition(dbtest) values (1,9.9); --> this works insert into test_double partition(dbtest) values (1,10); --> this fails But if we change it to insert into test_double partition(dbtest) values (1, cast (10 as double)); it succeeds -> the problem is only seen when trying to insert a whole number i.e. 10, 10.0, 15, 14.0 etc. The issue is not seen when inserting a number with decimal values other than 0. So insert of 10.1 goes though. The underlying exception from the HMS is {code} 2019-07-11T07:58:16,670 ERROR [pool-6-thread-196]: server.TThreadPoolServer (TThreadPoolServer.java:run(297)) - Error occurred during processing of message. java.lang.IndexOutOfBoundsException: Index: 0 at java.util.Collections$EmptyList.get(Collections.java:4454) ~[?:1.8.0_112] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.updatePartColumnStatsWithMerge(HiveMetaStore.java:7808) ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:7769) ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] {code} With {{hive.stats.column.autogather=false}}, this exception does not occur with or without the explicit casting. The issue stems from the fact that HS2 created a partition with value {{dbtest=10}} for the table and the stats processor is attempting to add column statistics for partition with value {{dbtest=10.0}}. Thus HMS {{getPartitionsByNames}} cannot find the partition with that value and thus fails to insert the stats. So while the failure initiates on HMS side, the cause in the HS2 query planning. It makes sense that turning off {{hive.stats.column.autogather}} resolves the issue because there is no StatsTask in a query plan. But {{SHOW PARTITIONS}} shows the partition as created while the query planner is not including it any plan because of the absence of stats on the partition. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-21816) HMS Translation: Refactor tests to work with ACID tables.
Naveen Gangam created HIVE-21816: Summary: HMS Translation: Refactor tests to work with ACID tables. Key: HIVE-21816 URL: https://issues.apache.org/jira/browse/HIVE-21816 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam 1) TestHiveMetaStore unit tests does not work for full ACID tables as the TransactionalValidationListener enforces that this table use AcidIO. The Orc IO files are only included in the hive-exec jars that are not used by tests under standalone-metastore module. Even adding a test-scoped dependency on hive-exec did not work. I had to relocate these tests into itests. 2) Implementation of logic that allows skipping of translation via the use of "MANAGERAWMETADATA" capability. 3) Fixed some test bugs as the test was not failing originally when the createTable failed because of the issue in #1. As a result, about 3 tests never ran fully and never failed. The tests now fail if there are issues. 4) Refactoring of the code in the DefaultTransformer to make static lists of capabilities. The return capabilities now is dependent on the table capabilities, the processor capabilities and the accessType assigned to the table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21804) HMS Translation: External tables with no capabilities returns duplicate entries/
Naveen Gangam created HIVE-21804: Summary: HMS Translation: External tables with no capabilities returns duplicate entries/ Key: HIVE-21804 URL: https://issues.apache.org/jira/browse/HIVE-21804 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam 2019-05-24T12:50:52,978 WARN [pool-6-thread-4] metastore.HiveMetaStore: Unexpected resultset size:2 2019-05-24T12:50:52,981 ERROR [pool-6-thread-4] metastore.RetryingHMSHandler: MetaException(message:Unexpected result from metadata transformer:return list size=2) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3154) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3118) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) at com.sun.proxy.$Proxy28.get_table_req(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:16497) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:16481) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21744) Make hive side changes to enforce table access type on queries.
Naveen Gangam created HIVE-21744: Summary: Make hive side changes to enforce table access type on queries. Key: HIVE-21744 URL: https://issues.apache.org/jira/browse/HIVE-21744 Project: Hive Issue Type: Sub-task Affects Versions: 4.0.0 Reporter: Naveen Gangam -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21734) HMS Translation: Pending items from code review
Naveen Gangam created HIVE-21734: Summary: HMS Translation: Pending items from code review Key: HIVE-21734 URL: https://issues.apache.org/jira/browse/HIVE-21734 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam A sub-task of HIVE-21663. Some items came from the review feedback and some were left out from the initial implementation. 1) Enforce limit being passed into get_tables_ext. Currently being ignored. 2) Filter out some capabilities being returned to called based on the capabilities possessed by the processor. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21718) Improvement performance of UpdateInputAccessTimeHook
Naveen Gangam created HIVE-21718: Summary: Improvement performance of UpdateInputAccessTimeHook Key: HIVE-21718 URL: https://issues.apache.org/jira/browse/HIVE-21718 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 2.1.1 Reporter: Naveen Gangam Assignee: Naveen Gangam Currently, Hive does not update the lastAccessTime property for any entities when a query accesses them. Thus it has not possible to know when a table was last accessed. Hive does provide a configurable hook to HS2 that is execcuted as a pre-query hook prior to the query being executed. However, this hook is inefficient because for each table or partition it is attempting to update time for, it executes an "alter table ... " command internally. This is bad 1) For a query touching 1000's of partitions, this hook takes forever to update them. 2) Meanwhile, it is holding up the original query from executing. So even though we do not recommend using the hook, because the reward is too little (having lastAccessTime updated), we realize there is no other means to achieve this. Also, we can improve the performance of the hook significantly by adding a new thrift API on HMS to update the lastAccessTime on the database rows directly instead of going to HMS front end for 1 entity at time (leading to 1000's of HMS calls that lead to multiple 1000's of calls to the database). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21664) HMS Translation layer - Thrift API changes
Naveen Gangam created HIVE-21664: Summary: HMS Translation layer - Thrift API changes Key: HIVE-21664 URL: https://issues.apache.org/jira/browse/HIVE-21664 Project: Hive Issue Type: Sub-task Components: Standalone Metastore Reporter: Naveen Gangam Assignee: Naveen Gangam This jira is to track the HMS side changes of this feature. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21663) Hive Metastore Translation Layer
Naveen Gangam created HIVE-21663: Summary: Hive Metastore Translation Layer Key: HIVE-21663 URL: https://issues.apache.org/jira/browse/HIVE-21663 Project: Hive Issue Type: New Feature Components: Standalone Metastore Reporter: Naveen Gangam Assignee: Naveen Gangam This task is for the implementation of the default provider for translation, that is extensible if needed for a custom translator. Please refer the spec for additional details on the translation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21533) Nested CTE's with join does not return any data.
Naveen Gangam created HIVE-21533: Summary: Nested CTE's with join does not return any data. Key: HIVE-21533 URL: https://issues.apache.org/jira/browse/HIVE-21533 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 2.1.0 Reporter: Naveen Gangam Attachments: testcase.sql Attached is the testcase to reproduce the issue. the join on CTE6 is causing the problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21363) Ldap auth issue: group filter match should be case insensitive
Naveen Gangam created HIVE-21363: Summary: Ldap auth issue: group filter match should be case insensitive Key: HIVE-21363 URL: https://issues.apache.org/jira/browse/HIVE-21363 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 3.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Configure HiveServer2 with LDAP auth with (enable ldap, ldap URI, baseDN, userDNPattern, groupDNPattern and groupFilter). If the specified groupFilter case is different than the actual one in directory, then Hive cannot find a match and errors out. For example: groupFilter value= group name in directory server=grouptest. Similar search works by using other ldap clients like ldapsearch (ldap searches are case insensitive). While it is not a major issue as the workaround would be to configure the exact name, it is an easy fix that we should support out of box. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21337) HMS Metadata migration from Postgres/Derby to other DBs fail
Naveen Gangam created HIVE-21337: Summary: HMS Metadata migration from Postgres/Derby to other DBs fail Key: HIVE-21337 URL: https://issues.apache.org/jira/browse/HIVE-21337 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 3.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Customer recently was migrating from Postgres to Oracle for HMS metastore. During import of the [exported] data from HMS metastore from postgres, failures are seen as the COLUMNS_V2.COMMENT is 4000 bytes long whereas oracle and other schemas define it to be 256 bytes. This inconsistency in the schema makes the migration cumbersome and manual. This jira makes this column consistent in length across all databases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21336) HMS Index PCS_STATS_IDX too long for Oracle when NLS_LENGTH_SEMANTICS=char
Naveen Gangam created HIVE-21336: Summary: HMS Index PCS_STATS_IDX too long for Oracle when NLS_LENGTH_SEMANTICS=char Key: HIVE-21336 URL: https://issues.apache.org/jira/browse/HIVE-21336 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 3.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam CREATE INDEX PCS_STATS_IDX ON PAR T_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME) Error: ORA-01450: maximum key length (6398) exceeded (state=72000,code=1450) Customer tried the same DDL in SQLDevloper, and got the same error. This could be a result of combination of DB level settings like the db_block_size, limiting the maximum key length, as per below doc: http://www.dba-oracle.com/t_ora_01450_maximum_key_length_exceeded.htm Also {{NLS_LENGTH_SEMANTICS}} is by default BYTE, but users can set this at the session level to CHAR, thus reducing the max size of the index length. We have increased the size of the COLUMN_NAME from 128 to 767 (used to be at 1000) and TABLE_NAME from 128 to 256. This by setting {code} CREATE TABLE PART_COL_STATS ( CS_ID NUMBER NOT NULL, DB_NAME VARCHAR2(128) NOT NULL, TABLE_NAME VARCHAR2(256) NOT NULL, PARTITION_NAME VARCHAR2(767) NOT NULL, COLUMN_NAME VARCHAR2(767) NOT NULL, CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME); {code} Reproducer: {code} SQL*Plus: Release 11.2.0.2.0 Production on Wed Feb 27 11:02:16 2019 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production SQL> select * from v$nls_parameters where parameter = 'NLS_LENGTH_SEMANTICS'; PARAMETER VALUE NLS_LENGTH_SEMANTICS BYTE SQL> alter session set NLS_LENGTH_SEMANTICS=CHAR; Session altered. SQL> commit; Commit complete. SQL> select * from v$nls_parameters where parameter = 'NLS_LENGTH_SEMANTICS'; PARAMETER VALUE NLS_LENGTH_SEMANTICS CHAR SQL> CREATE TABLE PART_COL_STATS (CS_ID NUMBER NOT NULL, DB_NAME VARCHAR2(128) NOT NULL, TABLE_NAME VARCHAR2(256) NOT NULL, PARTITION_NAME VARCHAR2(767) NOT NULL, COLUMN_NAME VARCHAR2(767) NOT NULL); Table created. SQL> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME); CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME) * ERROR at line 1: ORA-01450: maximum key length (6398) exceeded SQL> alter session set NLS_LENGTH_SEMANTICS=BYTE; Session altered. SQL> commit; Commit complete. SQL> drop table PART_COL_STATS; Table dropped. SQL> commit; Commit complete. SQL> CREATE TABLE PART_COL_STATS (CS_ID NUMBER NOT NULL, DB_NAME VARCHAR2(128) NOT NULL, TABLE_NAME VARCHAR2(256) NOT NULL, PARTITION_NAME VARCHAR2(767) NOT NULL, COLUMN_NAME VARCHAR2(767) NOT NULL); Table created. SQL> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME); Index created. SQL> commit; Commit complete. SQL> {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21209) [Improvement] Exchange partitition to be metadata only change?
Naveen Gangam created HIVE-21209: Summary: [Improvement] Exchange partitition to be metadata only change? Key: HIVE-21209 URL: https://issues.apache.org/jira/browse/HIVE-21209 Project: Hive Issue Type: Improvement Components: Hive Affects Versions: 2.1.1 Reporter: Naveen Gangam https://issues.apache.org/jira/browse/HIVE-14560 Current implementation of the above jira is a metadata and a "copy" of the partition data on the DFS. Could possibly take a long time to copy the data for large partition data especially different storage clusters. When exchanging a partition from a HDFS to S3a or vice versa the data is copied and this is client copy operation and it can be very slow if the partition is very large. The customer would like the "exchange partition" operation to purely metadata. I would like to start a discussion on whether this improvement is to be made. Obviously, the current behavior will be supported but and option for it to be a metadata operation only needs to be evaluated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20909) Just "MSCK" should throw SemanticException
Naveen Gangam created HIVE-20909: Summary: Just "MSCK" should throw SemanticException Key: HIVE-20909 URL: https://issues.apache.org/jira/browse/HIVE-20909 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Per documentation, the syntax for MSCK command is {{MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS];}} So just submitting "MSCK" should throw a SemanticException like it does for other queries with incorrect syntax. But instead it appears to be attempting to do something. $ hive --hiveconf hive.root.logger=INFO,console -e "msck;" 2018-11-08T15:21:25,016 INFO [main] SessionState: 2018-11-08T15:21:26,203 INFO [main] session.SessionState: Created HDFS directory: /tmp/hive/hive/b1b62e04-5a1c-4c6a-babd-31b4f1d2bd78 2018-11-08T15:21:26,222 INFO [main] session.SessionState: Created local directory: /tmp/root/b1b62e04-5a1c-4c6a-babd-31b4f1d2bd78 2018-11-08T15:21:26,229 INFO [main] session.SessionState: Created HDFS directory: /tmp/hive/hive/b1b62e04-5a1c-4c6a-babd-31b4f1d2bd78/_tmp_space.db 2018-11-08T15:21:26,244 INFO [main] conf.HiveConf: Using the default value passed in for log id: b1b62e04-5a1c-4c6a-babd-31b4f1d2bd78 2018-11-08T15:21:26,246 INFO [main] session.SessionState: Updating thread name to b1b62e04-5a1c-4c6a-babd-31b4f1d2bd78 main 2018-11-08T15:21:26,246 INFO [b1b62e04-5a1c-4c6a-babd-31b4f1d2bd78 main] conf.HiveConf: Using the default value passed in for log id: b1b62e04-5a1c-4c6a-babd-31b4f1d2bd78 2018-11-08T15:21:26,548 INFO [b1b62e04-5a1c-4c6a-babd-31b4f1d2bd78 main] ql.Driver: Compiling command(queryId=root_20181108152126_3babeb6f-8396-4ef3-8f85-2cbf12ebe9c1): msck 2018-11-08T15:21:28,140 INFO [b1b62e04-5a1c-4c6a-babd-31b4f1d2bd78 main] hive.metastore: Trying to connect to metastore with URI thrift://nightly61x-1.vpc.cloudera.com:9083 2018-11-08T15:21:28,184 INFO [b1b62e04-5a1c-4c6a-babd-31b4f1d2bd78 main] hive.metastore: Opened a connection to metastore, current connections: 1 2018-11-08T15:21:28,185 INFO [b1b62e04-5a1c-4c6a-babd-31b4f1d2bd78 main] hive.metastore: Connected to metastore. FAILED: SemanticException empty table creation?? 2018-11-08T15:21:28,339 ERROR [b1b62e04-5a1c-4c6a-babd-31b4f1d2bd78 main] ql.Driver: FAILED: SemanticException empty table creation?? org.apache.hadoop.hive.ql.parse.SemanticException: empty table creation?? at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getTable(BaseSemanticAnalyzer.java:1670) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getTable(BaseSemanticAnalyzer.java:1652) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeMetastoreCheck(DDLSemanticAnalyzer.java:3118) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:414) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:600) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1414) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1543) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1332) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1321) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:342) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:802) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:774) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:701) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:313) at org.apache.hadoop.util.RunJar.main(RunJar.java:227) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: empty table creation?? at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1273) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1234) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getTable(BaseSemanticAnalyzer.java:1663) ... 22 more 2018-11-08T15:21:28,340 INFO [b1b62e04-5a1c-4c6a-babd-31b4f1d2bd78 main] ql.Driver: Completed compiling command(queryId=root_20181108152126_3babeb6f-8396-4ef3-8f85-2cbf12ebe9c1); Time
[jira] [Created] (HIVE-20205) Upgrade HBase dependencies off alpha4 release
Naveen Gangam created HIVE-20205: Summary: Upgrade HBase dependencies off alpha4 release Key: HIVE-20205 URL: https://issues.apache.org/jira/browse/HIVE-20205 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Appears Hive has dependencies on hbase 2.0.0-alpha4 releases. HBase 2.0.0 and 2.0.1 have been released. HBase team recommends 2.0.1 and says there shouldnt be any API surprises. (but we never know) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19700) Workaround for JLine issue with UnsupportedTerminal
Naveen Gangam created HIVE-19700: Summary: Workaround for JLine issue with UnsupportedTerminal Key: HIVE-19700 URL: https://issues.apache.org/jira/browse/HIVE-19700 Project: Hive Issue Type: Bug Reporter: Naveen Gangam Assignee: Naveen Gangam Fix For: 2.2.1 >From the JLine's ConsoleReader, readLine(prompt, mask) calls the following >beforeReadLine() method. {code} try { // System.out.println("is terminal supported " + terminal.isSupported()); if (!terminal.isSupported()) { beforeReadLine(prompt, mask); } {code} So specifically when using UnsupportedTerminal {{-Djline.terminal}} and {{prompt=null}} and {{mask!=null}}, a "null" string gets printed to the console before and after the query result. {{UnsupportedTerminal}} is required to be used when running beeline as a background process, hangs otherwise. {code} private void beforeReadLine(final String prompt, final Character mask) { if (mask != null && maskThread == null) { final String fullPrompt = "\r" + prompt + " " + " " + " " + "\r" + prompt; maskThread = new Thread() { public void run() { while (!interrupted()) { try { Writer out = getOutput(); out.write(fullPrompt); {code} So the {{prompt}} is null and {{mask}} is NOT in atleast 2 scenarios in beeline. when beeline's silent=true, prompt is null * https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/BeeLine.java#L1264 when running multiline queries * https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/Commands.java#L1093 When executing beeline in script mode (commands in a file), there should not be any masking while reading lines from the script file. aka, entire line should be a beeline command or part of a multiline hive query. So it should be safe to use a null mask instead of {{ConsoleReader.NULL_MASK}} when using UnsupportedTerminal as jline terminal. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19250) Schema column definitions inconsistencies in MySQL
Naveen Gangam created HIVE-19250: Summary: Schema column definitions inconsistencies in MySQL Key: HIVE-19250 URL: https://issues.apache.org/jira/browse/HIVE-19250 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam There are some inconsistencies in column definitions in MySQL between a schema that was upgraded to 2.1 (from an older release) vs installing the 2.1.0 schema directly. > `CQ_TBLPROPERTIES` varchar(2048) DEFAULT NULL, 117d117 < `CQ_TBLPROPERTIES` varchar(2048) DEFAULT NULL, 135a136 > `CC_TBLPROPERTIES` varchar(2048) DEFAULT NULL, 143d143 < `CC_TBLPROPERTIES` varchar(2048) DEFAULT NULL, 156c156 < `CTC_TXNID` bigint(20) DEFAULT NULL, --- > `CTC_TXNID` bigint(20) NOT NULL, 158c158 < `CTC_TABLE` varchar(256) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL, --- > `CTC_TABLE` varchar(256) DEFAULT NULL, 476c476 < `TBL_NAME` varchar(256) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL, --- > `TBL_NAME` varchar(256) DEFAULT NULL, 664c664 < KEY `PCS_STATS_IDX` (`DB_NAME`,`TABLE_NAME`,`COLUMN_NAME`,`PARTITION_NAME`), --- > KEY `PCS_STATS_IDX` (`DB_NAME`,`TABLE_NAME`,`COLUMN_NAME`,`PARTITION_NAME`) > USING BTREE, 768c768 < `PARAM_VALUE` mediumtext, --- > `PARAM_VALUE` mediumtext CHARACTER SET latin1 COLLATE latin1_bin, 814c814 < `PARAM_VALUE` mediumtext, --- > `PARAM_VALUE` mediumtext CHARACTER SET latin1 COLLATE latin1_bin, 934c934 < `PARAM_VALUE` mediumtext, --- > `PARAM_VALUE` mediumtext CHARACTER SET latin1 COLLATE latin1_bin, 1066d1065 < `TXN_HEARTBEAT_COUNT` int(11) DEFAULT NULL, 1067a1067 > `TXN_HEARTBEAT_COUNT` int(11) DEFAULT NULL, 1080c1080 < `TC_TXNID` bigint(20) DEFAULT NULL, --- > `TC_TXNID` bigint(20) NOT NULL, 1082c1082 < `TC_TABLE` varchar(128) DEFAULT NULL, --- > `TC_TABLE` varchar(128) NOT NULL, 1084c1084 < `TC_OPERATION_TYPE` char(1) DEFAULT NULL, --- > `TC_OPERATION_TYPE` char(1) NOT NULL, -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19231) Beeline generates garbled output when using UnsupportedTerminal
Naveen Gangam created HIVE-19231: Summary: Beeline generates garbled output when using UnsupportedTerminal Key: HIVE-19231 URL: https://issues.apache.org/jira/browse/HIVE-19231 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 2.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam We had a customer that was using some sort of front end that would invoke beeline commands with some query files on a node that that remote to the HS2 node. So beeline runs locally on this edge but connects to a remote HS2. Since the fix made in HIVE-14342, the beeline started producing garbled line in the output. Something like {code:java} ^Mnull ^Mnull^Mnull ^Mnull00- All Occupations 135185230 42270 11- Management occupations 6152650 100310{code} I havent been able to reproduce the issue locally as I do not have their system, but with some additional instrumentation I have been able to get some info regarding the beeline process. Essentially, such invocation causes beeline process to run with {{-Djline.terminal=jline.UnsupportedTerminal}} all the time and thus causes the issue. They can run the same beeline command directly in the shell on the same host and it does not cause this issue. PID S TTY TIME COMMAND 44107 S S ? 00:00:00 bash beeline -u ... PID S TTY TIME COMMAND 48453 S+ S pts/4 00:00:00 bash beeline -u ... Somehow that process wasnt attached to any local terminals. So the check made for /dev/stdin wouldnt work. Instead an additional check to check the TTY session of the process before using the UnsupportedTerminal (which really should only be used for backgrounded beeline sessions) seems to resolve the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19230) Schema column width inconsistency in Oracle
Naveen Gangam created HIVE-19230: Summary: Schema column width inconsistency in Oracle Key: HIVE-19230 URL: https://issues.apache.org/jira/browse/HIVE-19230 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam This is for oracle only. Does not appear to be an issue with other DBs. When you upgrade hive schema from 2.1.0 to hive 3.0.0, the width of TXN_COMPONENTS.TC_TABLE is 256 and COMPLETED_TXN_COMPONENTS.CTC_TABLE is 128. But if you install hive 3.0 schema directly, their widths are 128 and 256 respectively. This is consistent with schemas for other databases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18829) Inputs/Outputs are not propagated to SA hooks for explain commands.
Naveen Gangam created HIVE-18829: Summary: Inputs/Outputs are not propagated to SA hooks for explain commands. Key: HIVE-18829 URL: https://issues.apache.org/jira/browse/HIVE-18829 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 2.1.1 Reporter: Naveen Gangam Assignee: Naveen Gangam With Sentry enabled, commands like {{explain drop table foo}} fail with {code:java} explain drop table foo; Error: Error while compiling statement: FAILED: SemanticException No valid privileges Required privilege( Table) not available in input privileges The required privileges: (state=42000,code=4) {code} Sentry fails to authorize because the ExplainSemanticAnalyzer uses an instance of DDLSemanticAnalyzer to analyze the explain query. {code} BaseSemanticAnalyzer sem = SemanticAnalyzerFactory.get(conf, input); sem.analyze(input, ctx); sem.validate() {code} The inputs/outputs entities for this query are set in the above code. However, these are never set on the instance of ExplainSemanticAnalyzer itself and thus is not propagated into the HookContext in the calling Driver code. {code} sem.analyze(tree, ctx); --> this results in calling the above code that uses DDLSA hookCtx.update(sem); --> sem is an instance of ExplainSemanticAnalyzer, this code attempts to update the HookContext with the input/output info from ESA which is never set. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18501) Typo in beeline code
Naveen Gangam created HIVE-18501: Summary: Typo in beeline code Key: HIVE-18501 URL: https://issues.apache.org/jira/browse/HIVE-18501 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 3.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam [https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/BeeLine.java#L744] the string literal used here should be "silent", not "slient". There is no functional bug here just a silly typo. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18459) hive-exec.jar leaks contents fb303.jar into classpath
Naveen Gangam created HIVE-18459: Summary: hive-exec.jar leaks contents fb303.jar into classpath Key: HIVE-18459 URL: https://issues.apache.org/jira/browse/HIVE-18459 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.1.0 Environment: thrift classes are now in the hive classpath in the hive-exec.jar (HIVE-11553). This makes it hard to test with other versions of this library. This library is already a declared dependency and is not required to be included in the hive-exec.jar. I am proposing that we not include these classes like we have done in the past releases. Reporter: Naveen Gangam Assignee: Naveen Gangam -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18328) Improve schematool validator to report duplicate rows for column statistics
Naveen Gangam created HIVE-18328: Summary: Improve schematool validator to report duplicate rows for column statistics Key: HIVE-18328 URL: https://issues.apache.org/jira/browse/HIVE-18328 Project: Hive Issue Type: Improvement Components: Hive Affects Versions: 2.1.1 Reporter: Naveen Gangam Assignee: Naveen Gangam By design, in the {{TAB_COL_STATS}} table of the HMS schema, there should be ONE AND ONLY ONE row, representing its statistics, for each column defined in hive. A combination of DB_NAME, TABLE_NAME and COLUMN_NAME constitute a primary key/unique row. Each time the statistics are computed for a column, this row is updated. However, if somehow via BDR/replication process, we end up with multiple rows in this table for a given column, HMS server to recompute the statistics there after. So it would be good to detect this data anamoly via the schema validation tool. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17333) Schema changes in HIVE-12274 for Oracle may not work for upgrade
Naveen Gangam created HIVE-17333: Summary: Schema changes in HIVE-12274 for Oracle may not work for upgrade Key: HIVE-17333 URL: https://issues.apache.org/jira/browse/HIVE-17333 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 3.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam According to https://asktom.oracle.com/pls/asktom/f?p=100:11:0P11_QUESTION_ID:1770086700346491686 (reported in HIVE-12274) The alter table command to change the column datatype from {{VARCHAR}} to {{CLOB}} may not work. So the correct way to accomplish this is to add a new temp column, copy the value from the current column, drop the current column and rename the new column to old column. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-16974) Change the sort key for the schema tool validator to be
Naveen Gangam created HIVE-16974: Summary: Change the sort key for the schema tool validator to be Key: HIVE-16974 URL: https://issues.apache.org/jira/browse/HIVE-16974 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: 3.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam In HIVE-16729, we introduced ordering of results/failures returned by schematool's validators. This allows fault injection testing to expect results that can be verified. However, they were sorted on NAME values which in the HMS schema can be NULL. So if the introduced fault has a NULL/BLANK name column value, the result could be different depending on the backend database(if they sort NULLs first or last). So I think it is better to sort on a non-null column value. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-16912) Improve table validator's performance against Oracle
Naveen Gangam created HIVE-16912: Summary: Improve table validator's performance against Oracle Key: HIVE-16912 URL: https://issues.apache.org/jira/browse/HIVE-16912 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 3.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Currently, this validator uses DatabaseMetaData.getTables() that takes in the order of minutes to return because of the number of SYSTEM tables present in Oracle. Providing a schema name via a system property would limit the number of tables being returned and thus improve performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-16729) Improve location validator to check for blank paths.
Naveen Gangam created HIVE-16729: Summary: Improve location validator to check for blank paths. Key: HIVE-16729 URL: https://issues.apache.org/jira/browse/HIVE-16729 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 3.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Currently, the schema tool location validator succeeds even when the location for hive table/partitions have paths like hdfs://myhost.com:8020/ hdfs://myhost.com:8020 where there is actually no "real" path. Having the validator report such path would be beneficial in preventing runtime errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16697) Schema table validator should return a sorted list of missing tables
Naveen Gangam created HIVE-16697: Summary: Schema table validator should return a sorted list of missing tables Key: HIVE-16697 URL: https://issues.apache.org/jira/browse/HIVE-16697 Project: Hive Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor SchemaTool's validate feature has a schema table validator that checks to see if the HMS schema is missing tables. This validator reports a list of tables that are deemed to be missing. This list is currently unsorted (depends on the order of create table statements in the schema file, which is different for different DB schema files). This makes it hard to write a unit test that parses the results. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16316) Prepare master branch for 3.0.0 development.
Naveen Gangam created HIVE-16316: Summary: Prepare master branch for 3.0.0 development. Key: HIVE-16316 URL: https://issues.apache.org/jira/browse/HIVE-16316 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.3.0 Reporter: Naveen Gangam Assignee: Naveen Gangam branch-2 is now being used for 2.3.0 development. The build files will need to reflect this change. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16301) Prepare branch-2 for 2.3 development.
Naveen Gangam created HIVE-16301: Summary: Prepare branch-2 for 2.3 development. Key: HIVE-16301 URL: https://issues.apache.org/jira/browse/HIVE-16301 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.3.0 Reporter: Naveen Gangam Assignee: Naveen Gangam branch-2 is now being used for 2.3.0 development. The build files will need to reflect this change. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16257) Intermittent issue with incorrect resultset with Spark
Naveen Gangam created HIVE-16257: Summary: Intermittent issue with incorrect resultset with Spark Key: HIVE-16257 URL: https://issues.apache.org/jira/browse/HIVE-16257 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.1.0 Reporter: Naveen Gangam This issue is highly intermittent that only seems to occurs with spark engine. The following is the testcase. {code} drop table if exists test_hos_sample; create table test_hos_sample (name string, val1 decimal(18,2), val2 decimal(20,3)); insert into test_hos_sample values ('test1',101.12,102.123),('test1',101.12,102.123),('test2',102.12,103.234),('test1',101.12,102.123),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test4',104.52,104.456),('test4',104.52,104.456),('test5',105.52,105.567),('test3',103.52,102.345),('test5',105.52,105.567); set hive.execution.engine=spark; select name, val1,val2 from test_hos_sample group by name, val1, val2; {code} Expected Results: {code} nameval1val2 test5 105.52 105.567 test3 103.52 102.345 test1 101.12 102.123 test4 104.52 104.456 test2 102.12 103.234 {code} Incorrect results once in a while: {code} nameval1val2 test5 105.52 105.567 test3 103.52 102.345 test1 104.52 102.123 test4 104.52 104.456 test2 102.12 103.234 {code} 1) Not reproducible with HoMR. 2) Not an issue when running from spark-shell. 3) Occurs with parquet and text file format as well. (havent tried with other formats). 4) Occurs in both scenarios when table data is within encryption zone and outside. 5) Even in clusters where this is reproducible, this occurs once in like 20 times or more. 6) Occurs with both beeline and Hive CLI. -- This message was sent by Atlassian JIRA (v6.3.15#6346)