[jira] [Created] (HIVE-25352) Optimise DBTokenStore for RDBMS

2021-07-19 Thread Sahana Bhat (Jira)
Sahana Bhat created HIVE-25352:
--

 Summary: Optimise DBTokenStore for RDBMS
 Key: HIVE-25352
 URL: https://issues.apache.org/jira/browse/HIVE-25352
 Project: Hive
  Issue Type: Bug
Reporter: Sahana Bhat
Assignee: Sahana Bhat






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25351) stddev(), sstddev_pop() with CBO enable returning null

2021-07-19 Thread Ashish Sharma (Jira)
Ashish Sharma created HIVE-25351:


 Summary: stddev(), sstddev_pop() with CBO enable returning null
 Key: HIVE-25351
 URL: https://issues.apache.org/jira/browse/HIVE-25351
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Sharma
Assignee: Ashish Sharma


script used to repro

create table cbo_test (key string, v1 double, v2 decimal(30,2), v3 
decimal(30,2));

insert into cbo_test values ("00140006375905", 10230.72, 
10230.72, 10230.69), ("00140006375905", 10230.72, 10230.72, 
10230.69), ("00140006375905", 10230.72, 10230.72, 10230.69), 
("00140006375905", 10230.72, 10230.72, 10230.69), 
("00140006375905", 10230.72, 10230.72, 10230.69), 
("00140006375905", 10230.72, 10230.72, 10230.69);

select stddev(v1), stddev(v2), stddev(v3) from cbo_test;


Enable CBO
++
|  Explain   |
++
| Plan optimized by CBO. |
||
| Vertex dependency in root stage|
| Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
||
| Stage-0|
|   Fetch Operator   |
| limit:-1   |
| Stage-1|
|   Reducer 2 vectorized |
|   File Output Operator [FS_13] |
| Select Operator [SEL_12] (rows=1 width=24) |
|   Output:["_col0","_col1","_col2"] |
|   Group By Operator [GBY_11] (rows=1 width=72) |
| 
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","count(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","count(VALUE._col5)","sum(VALUE._col6)","sum(VALUE._col7)","count(VALUE._col8)"]
 |
|   <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized  |
| PARTITION_ONLY_SHUFFLE [RS_10] |
|   Group By Operator [GBY_9] (rows=1 width=72) |
| 
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(_col3)","sum(_col0)","count(_col0)","sum(_col5)","sum(_col4)","count(_col1)","sum(_col7)","sum(_col6)","count(_col2)"]
 |
| Select Operator [SEL_8] (rows=6 width=232) |
|   
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"] |
|   TableScan [TS_0] (rows=6 width=232) |
| default@cbo_test,cbo_test, ACID 
table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] |
||
++



Disable CBO
++
|  Explain   |
++
| Vertex dependency in root stage|
| Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
||
| Stage-0|
|   Fetch Operator   |
| limit:-1   |
| Stage-1|
|   Reducer 2 vectorized |
|   File Output Operator [FS_11] |
| Group By Operator [GBY_10] (rows=1 width=24) |
|   
Output:["_col0","_col1","_col2"],aggregations:["stddev(VALUE._col0)","stddev(VALUE._col1)","stddev(VALUE._col2)"]
 |
| <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized|
|   PARTITION_ONLY_SHUFFLE [RS_9]|
| Group By Operator [GBY_8] (rows=1 width=240) |
|   
Output:["_col0","_col1","_col2"],aggregations:["stddev(v1)","stddev(v2)","stddev(v3)"]
 |
|   Select Operator [SEL_7] (rows=6 width=232) |
| Output:["v1","v2","v3"]|
| TableScan [TS_0] (rows=6 width=232) |
|   default@cbo_test,cbo_test, ACID 
table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] |
||
++




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25350) Replication fails for external tables on setting owner/groups

2021-07-19 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-25350:
---

 Summary: Replication fails for external tables on setting 
owner/groups
 Key: HIVE-25350
 URL: https://issues.apache.org/jira/browse/HIVE-25350
 Project: Hive
  Issue Type: Bug
Reporter: Ayush Saxena
Assignee: Ayush Saxena


DirCopyTask tries to preserve user group permissions, irrespective whether they 
have been specified to be preserved or not.

Changing user/group requires SuperUser privileges, hence the task fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Should we release Hive Storage API 2.8.0-rc0 ?

2021-07-19 Thread Owen O'Malley
+1 (binding):
* Built and tested
* Built hive main branch using it
* Verified signatures and checksums

It is too bad that we didn't get HIVE-25190 into it, but that can wait for
2.8.1.

.. Owen

On Mon, Jun 28, 2021 at 9:44 PM Pavan Lanka 
wrote:

> +1 (non-binding)
>
> I have done the following:
> * Built and Tested storage-release-2.8.0-rc0 using OpenJDK8
> * Built and Tested ORC with updated storage api version
>   - Had to fix a test class that implements PredicateLeaf which has a new
> method. This is a breaking change but I think this should be ok
> * Verified the performance gains of HIVE-24458
>
> Regards,
> Pavan
>
>
> > On Jun 21, 2021, at 8:07 AM, Panos Garefalakis 
> wrote:
> >
> > Hello all,
> >
> > Following on previous discussions, I would like to propose a new
> > storage-api release including HIVE-24458
> > .
> >
> > Shall we release the following artifacts as Hive Storage API 2.8.0?
> >
> > tar: http://home.apache.org/~pgaref/hive-storage-2.8.0/
> > tag:
> https://github.com/apache/hive/releases/tag/storage-release-2.8.0-rc0
> > jiras: https://issues.apache.org/jira/projects/HIVE/versions/12350287
> >
> > Cheers,
> > Panagiotis
>
>


[jira] [Created] (HIVE-25349) Skip password authentication when a trusted header is present in the Http request

2021-07-19 Thread Sai Hemanth Gantasala (Jira)
Sai Hemanth Gantasala created HIVE-25349:


 Summary: Skip password authentication when a trusted header is 
present in the Http request
 Key: HIVE-25349
 URL: https://issues.apache.org/jira/browse/HIVE-25349
 Project: Hive
  Issue Type: Improvement
  Components: Hive, HiveServer2
Reporter: Sai Hemanth Gantasala
Assignee: Sai Hemanth Gantasala


Whenever a trusted header is present in the HTTP servlet request, skip the 
password based authentication, since the user is pre-authorized and extract the 
user name from Authorization header.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Need write access to update Hive wiki

2021-07-19 Thread Nikhil Gupta
Hello,

I need write access to 
LanguageManual+UDF
 page to update the description of date_format​ function.

Confluence Username: gupta.nikhil0007

Associated Hive Jira:
https://issues.apache.org/jira/browse/HIVE-25268

Regards,
Nikhil Gupta


[jira] [Created] (HIVE-25348) Skip metrics collection about writes to tables with tblproperty no_auto_compaction=true if CTAS

2021-07-19 Thread Karen Coppage (Jira)
Karen Coppage created HIVE-25348:


 Summary: Skip metrics collection about writes to tables with 
tblproperty no_auto_compaction=true if CTAS
 Key: HIVE-25348
 URL: https://issues.apache.org/jira/browse/HIVE-25348
 Project: Hive
  Issue Type: Bug
Reporter: Karen Coppage
Assignee: Karen Coppage


We collect metrics about writes to tables with no_auto_compaction=true when 
allocating writeids. In the case of CTAS, if ACID is enabled on the new table, 
a writeid is allocated before the table object is created so we can't get 
tblproperties from it when allocating the writeid.

In this case we should skip collecting the metric.

This commit fixes errors like this:
{code:java}
2021-07-16 18:48:04,350 ERROR 
org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-9-thread-72]: 
java.lang.NullPointerException
at 
org.apache.hadoop.hive.metastore.HMSMetricsListener.onAllocWriteId(HMSMetricsListener.java:104)
at 
org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.lambda$static$6(MetaStoreListenerNotifier.java:229)
at 
org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:291)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.allocate_table_write_ids(HiveMetaStore.java:8592)
at sun.reflect.GeneratedMethodAccessor86.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:160)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:121)
at com.sun.proxy.$Proxy33.allocate_table_write_ids(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$allocate_table_write_ids.getResult(ThriftHiveMetastore.java:21584)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$allocate_table_write_ids.getResult(ThriftHiveMetastore.java:21568)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25347) RetryingMetastoreClients cannot retry

2021-07-19 Thread Jira
Zoltán Borók-Nagy created HIVE-25347:


 Summary: RetryingMetastoreClients cannot retry
 Key: HIVE-25347
 URL: https://issues.apache.org/jira/browse/HIVE-25347
 Project: Hive
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


Even if the connection is broken, RetryingMetastoreClient doesn't reconnect to 
HMS when the followings are true:

* metastore.client.socket.lifetime has default value 0, which means "infinite 
lifetime"
* Non-retryable method is invoked, e.g. 
['lock()'|https://github.com/apache/hive/blob/a75b8680214c490be6b092b4fa5f790ae7c2e5ce/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java#L3413]

If we only invoke non-retryable methods then RetryingMetastoreClient will never 
reconnect to HMS, therefore all the RPCs will fail.

It's because RetryingMetastoreClient only reconnects on the second attempt 
(which will never happen for non-retryable methods), or if the connection 
lifetime has expired (by default connections don't expire):
https://github.com/apache/hive/blob/a75b8680214c490be6b092b4fa5f790ae7c2e5ce/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java#L183




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25346) cleanTxnToWriteIdTable breaks SNAPSHOT isolation

2021-07-19 Thread Zoltan Chovan (Jira)
Zoltan Chovan created HIVE-25346:


 Summary: cleanTxnToWriteIdTable breaks SNAPSHOT isolation
 Key: HIVE-25346
 URL: https://issues.apache.org/jira/browse/HIVE-25346
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Chovan
Assignee: Zoltan Chovan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25345) Add logging based on new compaction metrics

2021-07-19 Thread Jira
László Pintér created HIVE-25345:


 Summary: Add logging based on new compaction metrics
 Key: HIVE-25345
 URL: https://issues.apache.org/jira/browse/HIVE-25345
 Project: Hive
  Issue Type: Improvement
Reporter: László Pintér
Assignee: László Pintér






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25344) Add a possibility to query Iceberg table snapshots based on the timestamp or the snapshot id

2021-07-19 Thread Peter Vary (Jira)
Peter Vary created HIVE-25344:
-

 Summary: Add a possibility to query Iceberg table snapshots based 
on the timestamp or the snapshot id
 Key: HIVE-25344
 URL: https://issues.apache.org/jira/browse/HIVE-25344
 Project: Hive
  Issue Type: New Feature
Reporter: Peter Vary
Assignee: Peter Vary


Implement the following commands:
{code:java}
SELECT * FROM t FOR SYSTEM_TIME AS OF ;
SELECT * FROM t FOR SYSTEM_VERSION AS OF ;{code}
where SYSTEM_TIME is the Iceberg table state at the given timestamp (UTC), or 
SYSTEM_VERSION is the Iceberg table snapshot id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25343) Create or replace view should clean the old table properties

2021-07-19 Thread Lantao Jin (Jira)
Lantao Jin created HIVE-25343:
-

 Summary: Create or replace view should clean the old table 
properties
 Key: HIVE-25343
 URL: https://issues.apache.org/jira/browse/HIVE-25343
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.3, 3.2.0
Reporter: Lantao Jin
 Attachments: Screen Shot 2021-07-19 at 15.36.29.png

In many cases, users use Spark and Hive together. When a user creates a view 
via Spark, the table output columns will store in table properties, such as 
 !Screen Shot 2021-07-19 at 15.36.29.png|width=80!

After that, if the user runs the command "create or replace view" via Hive, to 
change the schema. The old table properties added by Spark are not cleaned by 
Hive. Then users read the table via Spark. The schema didn't change. It very 
confused users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)