[jira] [Created] (HIVE-25230) add position and occurrence to instr()

2021-06-09 Thread Quanlong Huang (Jira)
Quanlong Huang created HIVE-25230:
-

 Summary: add position and occurrence to instr()
 Key: HIVE-25230
 URL: https://issues.apache.org/jira/browse/HIVE-25230
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Quanlong Huang
Assignee: Quanlong Huang


Current instr() only supports two arguments:
{code:java}
instr(str, substr) - Returns the index of the first occurance of substr in str
{code}
Other systems (Vertica, Oracle, Impala etc) support additional position and 
occurrence arguments:
{code:java}
instr(str, substr[, pos[, occurrence]])
{code}
Oracle doc: 
[https://docs.oracle.com/database/121/SQLRF/functions089.htm#SQLRF00651]

It'd be nice to support this as well. Otherwise, it's a SQL difference between 
Impala and Hive.
 Impala supports this in IMPALA-3973



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25229) Hive lineage is not generated for columns on CREATE MATERIALIZED VIEW

2021-06-09 Thread Soumyakanti Das (Jira)
Soumyakanti Das created HIVE-25229:
--

 Summary: Hive lineage is not generated for columns on CREATE 
MATERIALIZED VIEW
 Key: HIVE-25229
 URL: https://issues.apache.org/jira/browse/HIVE-25229
 Project: Hive
  Issue Type: Bug
Reporter: Soumyakanti Das
Assignee: Soumyakanti Das


While creating materialized view HookContext is supposed to send lineage info 
which is missing.

CREATE MATERIALIZED VIEW tbl1_view as select * from tbl1;

Hook Context passed from hive.ql.Driver to Hive Hook of Atlas through 
hookRunner.runPostExecHooks call doesn't have lineage info.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25228) Thrift CLI Service Protocol: Watch for lack of interest by client and kill queries faster

2021-06-09 Thread Matt McCline (Jira)
Matt McCline created HIVE-25228:
---

 Summary: Thrift CLI Service Protocol: Watch for lack of interest 
by client and kill queries faster
 Key: HIVE-25228
 URL: https://issues.apache.org/jira/browse/HIVE-25228
 Project: Hive
  Issue Type: Improvement
Reporter: Matt McCline
Assignee: Matt McCline


CONSIDER: Have Hive Server 2 monitor operations (queries) for continuing client 
interest. If a client does not ask for status every 15 seconds, then 
automatically kill a query and release its txn locks and job resources.

 

Users will experience queries cleaning up much faster (15 to 30 seconds instead 
of minutes and possibly many minutes) when client communication is lost. 
Cleaning up those queries prevents other queries from being blocked on 
EXCLUSIVE txn locks and blocking of scheduling of their queries including 
retries of the original query. Today, users can get timeouts when they retry a 
query that got a connection error causing understandably upset users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25227) Thrift CLI Service Protocol: Eliminate long compile requests than proxies can timeout

2021-06-09 Thread Matt McCline (Jira)
Matt McCline created HIVE-25227:
---

 Summary: Thrift CLI Service Protocol: Eliminate long compile 
requests than proxies can timeout
 Key: HIVE-25227
 URL: https://issues.apache.org/jira/browse/HIVE-25227
 Project: Hive
  Issue Type: Improvement
Reporter: Matt McCline
Assignee: Matt McCline


CONSIDER: Avoid proxy (GW) timeouts on long Hive query compiles. Use request to 
start the operation; then poll for status like we do for execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25226) Hive changes 'storage_handler' for existing Iceberg table when hive.engine.enabled is false

2021-06-09 Thread Jira
Zoltán Borók-Nagy created HIVE-25226:


 Summary: Hive changes 'storage_handler' for existing Iceberg table 
when hive.engine.enabled is false
 Key: HIVE-25226
 URL: https://issues.apache.org/jira/browse/HIVE-25226
 Project: Hive
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


If Hive writes to an existing Iceberg table but property 'hive.engine.enabled' 
is not set, then Hive rewrites the table metadata with different 
SerDe/Input/Output format than it had before.

E.g. there's an existing table with the following metadata:
{noformat}
  storage_handler  | 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler
| SerDe Library: | org.apache.iceberg.mr.hive.HiveIcebergSerDe | NULL |
| InputFormat:   | org.apache.iceberg.mr.hive.HiveIcebergInputFormat | NULL |
| OutputFormat:  | org.apache.iceberg.mr.hive.HiveIcebergOutputFormat | NULL |
{noformat}
Now when Hive inserts to this table it clears 'storage_handler' and rewrites 
the rest:
{noformat}
| SerDe Library:| 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL   
|
| InputFormat:  | org.apache.hadoop.mapred.FileInputFormat  
 | NULL   |
| OutputFormat: | org.apache.hadoop.mapred.FileOutputFormat 
 | NULL   |
{noformat}
This means the table becomes unreadable:
{noformat}
Error: java.io.IOException: java.io.IOException: Cannot create an instance of 
InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in 
mapredWork! (state=,code=0)
{noformat}

I think Hive should always set 'hive.engine.enabled' for Iceberg.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25225) Update column stat throws NPE if direct sql is disabled

2021-06-09 Thread mahesh kumar behera (Jira)
mahesh kumar behera created HIVE-25225:
--

 Summary: Update column stat throws NPE if direct sql is disabled
 Key: HIVE-25225
 URL: https://issues.apache.org/jira/browse/HIVE-25225
 Project: Hive
  Issue Type: Sub-task
Reporter: mahesh kumar behera
Assignee: mahesh kumar behera


In case direct sql is disabled, the MetaStoreDirectSql object is not 
initialised and thats causing NPE. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25224) Multi insert statements involving tables with different bucketing_versions results in error

2021-06-09 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25224:
---

 Summary: Multi insert statements involving tables with different 
bucketing_versions results in error
 Key: HIVE-25224
 URL: https://issues.apache.org/jira/browse/HIVE-25224
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich



{code}
drop table if exists t;
drop table if exists t2;
drop table if exists t3;
create table t (a integer);
create table t2 (a integer);
create table t3 (a integer);
alter table t set tblproperties ('bucketing_version'='1');
explain from t3 insert into t select a insert into t2 select a;
{code}

results in
{code}
Error: Error while compiling statement: FAILED: RuntimeException Error setting 
bucketingVersion for group: [[op: FS[2], bucketingVersion=1], [op: FS[11], 
bucketingVersion=2]] (state=42000,code=4)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25223) Select with limit returns no rows on non native table

2021-06-09 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-25223:


 Summary: Select with limit returns no rows on non native table
 Key: HIVE-25223
 URL: https://issues.apache.org/jira/browse/HIVE-25223
 Project: Hive
  Issue Type: Bug
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


Str:
{code:java}
CREATE EXTERNAL TABLE hht (key string, value int) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "hht", "hbase.mapred.output.outputtable" = 
"hht");

insert into hht select uuid(), cast((rand() * 100) as int);

insert into hht select uuid(), cast((rand() * 100) as int) from hht;
insert into hht select uuid(), cast((rand() * 100) as int) from hht;
insert into hht select uuid(), cast((rand() * 100) as int) from hht;
insert into hht select uuid(), cast((rand() * 100) as int) from hht;
insert into hht select uuid(), cast((rand() * 100) as int) from hht;
insert into hht select uuid(), cast((rand() * 100) as int) from hht;

 set hive.fetch.task.conversion=none;
 select * from hht limit 10;

+--++
| hht.key  | hht.value  |
+--++
+--++
No rows selected (5.22 seconds) {code}
 

This is caused by GlobalLimitOptimizer. The table directory is always empty 
with a non native table since the data is not managed by hive (but hbase in 
this case).

The optimizer scans the directory and sets the file list to an empty list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25222) Fix reading Iceberg tables with a comma in column names

2021-06-09 Thread Marton Bod (Jira)
Marton Bod created HIVE-25222:
-

 Summary: Fix reading Iceberg tables with a comma in column names
 Key: HIVE-25222
 URL: https://issues.apache.org/jira/browse/HIVE-25222
 Project: Hive
  Issue Type: Bug
Reporter: Marton Bod
Assignee: Marton Bod


When using a table with a column name containing a comma (e.g. `employ,ee`), 
reading an Iceberg table fails because we rely on the property 
"hive.io.file.readcolumn.names" which encodes the read columns in a 
comma-separated list, put together by the ColumnProjectionUtils class.

Because it's comma-separated in all cases, it will produce a string like: 
"id,birth_date,employ,ee" which can cause problems for Iceberg readers which 
use this string list to construct their expected read schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25221) Hive ClI: execute alter table comamnd With Error: Unable to alter partitions because table or database does not exist

2021-06-09 Thread XixiHua (Jira)
XixiHua created HIVE-25221:
--

 Summary: Hive ClI: execute alter table comamnd With Error: Unable 
to alter partitions because table or database does not exist
 Key: HIVE-25221
 URL: https://issues.apache.org/jira/browse/HIVE-25221
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 2.3.8, 2.3.7
Reporter: XixiHua
Assignee: XixiHua


Use Hive CLI to execute the following command:

 
{code:java}
alter table xxx.xxx partition(xxx) set location 'xxx';{code}
 

If don't execute +*use *+ first, there will fail with the error: 
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
Unable to alter partition. Unable to alter partitions because table or database 
does not exist. even if the table does exist.



and more info:
{code:java}
2021-06-08T07:37:21,596 ERROR [pool-6-thread-173] metastore.RetryingHMSHandler: 
InvalidOperationException(message:Unable to alter partitions because table or 
database does not exist.)2021-06-08T07:37:21,596 ERROR [pool-6-thread-173] 
metastore.RetryingHMSHandler: InvalidOperationException(message:Unable to alter 
partitions because table or database does not exist.) at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:3908)
 at sun.reflect.GeneratedMethodAccessor100.invoke(Unknown Source) at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
 at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
 at com.sun.proxy.$Proxy27.alter_partitions_with_environment_context(Unknown 
Source) at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions_with_environment_context.getResult(ThriftHiveMetastore.java:12598)
 at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions_with_environment_context.getResult(ThriftHiveMetastore.java:12582)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
 at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
 at java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
 at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
 at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)
{code}
but when execute add/drop partition, don't need to follow this mode.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)