[jira] [Created] (HIVE-25230) add position and occurrence to instr()
Quanlong Huang created HIVE-25230: - Summary: add position and occurrence to instr() Key: HIVE-25230 URL: https://issues.apache.org/jira/browse/HIVE-25230 Project: Hive Issue Type: New Feature Components: UDF Reporter: Quanlong Huang Assignee: Quanlong Huang Current instr() only supports two arguments: {code:java} instr(str, substr) - Returns the index of the first occurance of substr in str {code} Other systems (Vertica, Oracle, Impala etc) support additional position and occurrence arguments: {code:java} instr(str, substr[, pos[, occurrence]]) {code} Oracle doc: [https://docs.oracle.com/database/121/SQLRF/functions089.htm#SQLRF00651] It'd be nice to support this as well. Otherwise, it's a SQL difference between Impala and Hive. Impala supports this in IMPALA-3973 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25229) Hive lineage is not generated for columns on CREATE MATERIALIZED VIEW
Soumyakanti Das created HIVE-25229: -- Summary: Hive lineage is not generated for columns on CREATE MATERIALIZED VIEW Key: HIVE-25229 URL: https://issues.apache.org/jira/browse/HIVE-25229 Project: Hive Issue Type: Bug Reporter: Soumyakanti Das Assignee: Soumyakanti Das While creating materialized view HookContext is supposed to send lineage info which is missing. CREATE MATERIALIZED VIEW tbl1_view as select * from tbl1; Hook Context passed from hive.ql.Driver to Hive Hook of Atlas through hookRunner.runPostExecHooks call doesn't have lineage info. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25228) Thrift CLI Service Protocol: Watch for lack of interest by client and kill queries faster
Matt McCline created HIVE-25228: --- Summary: Thrift CLI Service Protocol: Watch for lack of interest by client and kill queries faster Key: HIVE-25228 URL: https://issues.apache.org/jira/browse/HIVE-25228 Project: Hive Issue Type: Improvement Reporter: Matt McCline Assignee: Matt McCline CONSIDER: Have Hive Server 2 monitor operations (queries) for continuing client interest. If a client does not ask for status every 15 seconds, then automatically kill a query and release its txn locks and job resources. Users will experience queries cleaning up much faster (15 to 30 seconds instead of minutes and possibly many minutes) when client communication is lost. Cleaning up those queries prevents other queries from being blocked on EXCLUSIVE txn locks and blocking of scheduling of their queries including retries of the original query. Today, users can get timeouts when they retry a query that got a connection error causing understandably upset users. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25227) Thrift CLI Service Protocol: Eliminate long compile requests than proxies can timeout
Matt McCline created HIVE-25227: --- Summary: Thrift CLI Service Protocol: Eliminate long compile requests than proxies can timeout Key: HIVE-25227 URL: https://issues.apache.org/jira/browse/HIVE-25227 Project: Hive Issue Type: Improvement Reporter: Matt McCline Assignee: Matt McCline CONSIDER: Avoid proxy (GW) timeouts on long Hive query compiles. Use request to start the operation; then poll for status like we do for execution. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25226) Hive changes 'storage_handler' for existing Iceberg table when hive.engine.enabled is false
Zoltán Borók-Nagy created HIVE-25226: Summary: Hive changes 'storage_handler' for existing Iceberg table when hive.engine.enabled is false Key: HIVE-25226 URL: https://issues.apache.org/jira/browse/HIVE-25226 Project: Hive Issue Type: Bug Reporter: Zoltán Borók-Nagy If Hive writes to an existing Iceberg table but property 'hive.engine.enabled' is not set, then Hive rewrites the table metadata with different SerDe/Input/Output format than it had before. E.g. there's an existing table with the following metadata: {noformat} storage_handler | org.apache.iceberg.mr.hive.HiveIcebergStorageHandler | SerDe Library: | org.apache.iceberg.mr.hive.HiveIcebergSerDe | NULL | | InputFormat: | org.apache.iceberg.mr.hive.HiveIcebergInputFormat | NULL | | OutputFormat: | org.apache.iceberg.mr.hive.HiveIcebergOutputFormat | NULL | {noformat} Now when Hive inserts to this table it clears 'storage_handler' and rewrites the rest: {noformat} | SerDe Library:| org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL | | InputFormat: | org.apache.hadoop.mapred.FileInputFormat | NULL | | OutputFormat: | org.apache.hadoop.mapred.FileOutputFormat | NULL | {noformat} This means the table becomes unreadable: {noformat} Error: java.io.IOException: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in mapredWork! (state=,code=0) {noformat} I think Hive should always set 'hive.engine.enabled' for Iceberg. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25225) Update column stat throws NPE if direct sql is disabled
mahesh kumar behera created HIVE-25225: -- Summary: Update column stat throws NPE if direct sql is disabled Key: HIVE-25225 URL: https://issues.apache.org/jira/browse/HIVE-25225 Project: Hive Issue Type: Sub-task Reporter: mahesh kumar behera Assignee: mahesh kumar behera In case direct sql is disabled, the MetaStoreDirectSql object is not initialised and thats causing NPE. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25224) Multi insert statements involving tables with different bucketing_versions results in error
Zoltan Haindrich created HIVE-25224: --- Summary: Multi insert statements involving tables with different bucketing_versions results in error Key: HIVE-25224 URL: https://issues.apache.org/jira/browse/HIVE-25224 Project: Hive Issue Type: Bug Reporter: Zoltan Haindrich Assignee: Zoltan Haindrich {code} drop table if exists t; drop table if exists t2; drop table if exists t3; create table t (a integer); create table t2 (a integer); create table t3 (a integer); alter table t set tblproperties ('bucketing_version'='1'); explain from t3 insert into t select a insert into t2 select a; {code} results in {code} Error: Error while compiling statement: FAILED: RuntimeException Error setting bucketingVersion for group: [[op: FS[2], bucketingVersion=1], [op: FS[11], bucketingVersion=2]] (state=42000,code=4) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25223) Select with limit returns no rows on non native table
Attila Magyar created HIVE-25223: Summary: Select with limit returns no rows on non native table Key: HIVE-25223 URL: https://issues.apache.org/jira/browse/HIVE-25223 Project: Hive Issue Type: Bug Reporter: Attila Magyar Assignee: Attila Magyar Fix For: 4.0.0 Str: {code:java} CREATE EXTERNAL TABLE hht (key string, value int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "hht", "hbase.mapred.output.outputtable" = "hht"); insert into hht select uuid(), cast((rand() * 100) as int); insert into hht select uuid(), cast((rand() * 100) as int) from hht; insert into hht select uuid(), cast((rand() * 100) as int) from hht; insert into hht select uuid(), cast((rand() * 100) as int) from hht; insert into hht select uuid(), cast((rand() * 100) as int) from hht; insert into hht select uuid(), cast((rand() * 100) as int) from hht; insert into hht select uuid(), cast((rand() * 100) as int) from hht; set hive.fetch.task.conversion=none; select * from hht limit 10; +--++ | hht.key | hht.value | +--++ +--++ No rows selected (5.22 seconds) {code} This is caused by GlobalLimitOptimizer. The table directory is always empty with a non native table since the data is not managed by hive (but hbase in this case). The optimizer scans the directory and sets the file list to an empty list. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25222) Fix reading Iceberg tables with a comma in column names
Marton Bod created HIVE-25222: - Summary: Fix reading Iceberg tables with a comma in column names Key: HIVE-25222 URL: https://issues.apache.org/jira/browse/HIVE-25222 Project: Hive Issue Type: Bug Reporter: Marton Bod Assignee: Marton Bod When using a table with a column name containing a comma (e.g. `employ,ee`), reading an Iceberg table fails because we rely on the property "hive.io.file.readcolumn.names" which encodes the read columns in a comma-separated list, put together by the ColumnProjectionUtils class. Because it's comma-separated in all cases, it will produce a string like: "id,birth_date,employ,ee" which can cause problems for Iceberg readers which use this string list to construct their expected read schema. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25221) Hive ClI: execute alter table comamnd With Error: Unable to alter partitions because table or database does not exist
XixiHua created HIVE-25221: -- Summary: Hive ClI: execute alter table comamnd With Error: Unable to alter partitions because table or database does not exist Key: HIVE-25221 URL: https://issues.apache.org/jira/browse/HIVE-25221 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 2.3.8, 2.3.7 Reporter: XixiHua Assignee: XixiHua Use Hive CLI to execute the following command: {code:java} alter table xxx.xxx partition(xxx) set location 'xxx';{code} If don't execute +*use *+ first, there will fail with the error: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter partition. Unable to alter partitions because table or database does not exist. even if the table does exist. and more info: {code:java} 2021-06-08T07:37:21,596 ERROR [pool-6-thread-173] metastore.RetryingHMSHandler: InvalidOperationException(message:Unable to alter partitions because table or database does not exist.)2021-06-08T07:37:21,596 ERROR [pool-6-thread-173] metastore.RetryingHMSHandler: InvalidOperationException(message:Unable to alter partitions because table or database does not exist.) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:3908) at sun.reflect.GeneratedMethodAccessor100.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) at com.sun.proxy.$Proxy27.alter_partitions_with_environment_context(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions_with_environment_context.getResult(ThriftHiveMetastore.java:12598) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions_with_environment_context.getResult(ThriftHiveMetastore.java:12582) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} but when execute add/drop partition, don't need to follow this mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)