Re: FYI: MetaStore running out of threads

2022-08-31 Thread Rajesh Balamohan
>> In hive, FileUtils.checkFileAccessWithImpersonation can be fixed to use
create UGI once to reduce the impact (suspecting this will have 50%
impact).

Looked closely at the method impl for
"FileUtils.checkFileAccessWithImpersonation". It doesn't make 2
connections; 50% impact may not be relevant here.

On Thu, Sep 1, 2022 at 4:48 AM Rajesh Balamohan 
wrote:

>
> W.r.t to connection reuse issues, LLAP had a similar issue (not in HMS)
> https://issues.apache.org/jira/browse/HIVE-16020. It was making a
> connection in every task and UGI had to be persisted in the QueryInfo level
> to reduce the impact.
>
> In hive, FileUtils.checkFileAccessWithImpersonation can be fixed to use
> create UGI once to reduce the impact (suspecting this will have 50%
> impact).
>
>
> https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L418
>
> https://github.com/apache/hive/blob/d06957f254e026e719f30027d161264be43386b0/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L461
>
> May have to explore whether a local cache with expiry in FileUtils can
> help reduce the impact further.
>
> ~Rajesh.B
>
>
> On Thu, Sep 1, 2022 at 1:24 AM Owen O'Malley 
> wrote:
>
>> We're using HMS with Storage-Based Authorization and have been having
>> trouble with the HMS running out of threads. Looking at the jstack & code,
>> it appears to that the problem is that RPC's ConnectionId is using UGI's
>> equal/hash, which uses the Subject's Object equals/hash. Proxy user UGI's
>> always create a new Subject and thus are always unique.
>>
>> This leads to the HMS creating too many threads. I've created a jira in
>> Hadoop. https://issues.apache.org/jira/browse/HADOOP-18434
>>
>> Thanks,
>>Owen
>>
>


Re: FYI: MetaStore running out of threads

2022-08-31 Thread Rajesh Balamohan
W.r.t to connection reuse issues, LLAP had a similar issue (not in HMS)
https://issues.apache.org/jira/browse/HIVE-16020. It was making a
connection in every task and UGI had to be persisted in the QueryInfo level
to reduce the impact.

In hive, FileUtils.checkFileAccessWithImpersonation can be fixed to use
create UGI once to reduce the impact (suspecting this will have 50%
impact).

https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L418
https://github.com/apache/hive/blob/d06957f254e026e719f30027d161264be43386b0/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L461

May have to explore whether a local cache with expiry in FileUtils can help
reduce the impact further.

~Rajesh.B


On Thu, Sep 1, 2022 at 1:24 AM Owen O'Malley  wrote:

> We're using HMS with Storage-Based Authorization and have been having
> trouble with the HMS running out of threads. Looking at the jstack & code,
> it appears to that the problem is that RPC's ConnectionId is using UGI's
> equal/hash, which uses the Subject's Object equals/hash. Proxy user UGI's
> always create a new Subject and thus are always unique.
>
> This leads to the HMS creating too many threads. I've created a jira in
> Hadoop. https://issues.apache.org/jira/browse/HADOOP-18434
>
> Thanks,
>Owen
>


FYI: MetaStore running out of threads

2022-08-31 Thread Owen O'Malley
We're using HMS with Storage-Based Authorization and have been having
trouble with the HMS running out of threads. Looking at the jstack & code,
it appears to that the problem is that RPC's ConnectionId is using UGI's
equal/hash, which uses the Subject's Object equals/hash. Proxy user UGI's
always create a new Subject and thus are always unique.

This leads to the HMS creating too many threads. I've created a jira in
Hadoop. https://issues.apache.org/jira/browse/HADOOP-18434

Thanks,
   Owen


[jira] [Created] (HIVE-26508) Remove netty transitive dependencies from hcatalog and hbase pom files to avoid CVEs

2022-08-31 Thread Sai Hemanth Gantasala (Jira)
Sai Hemanth Gantasala created HIVE-26508:


 Summary: Remove netty transitive dependencies from hcatalog and 
hbase pom files to avoid CVEs
 Key: HIVE-26508
 URL: https://issues.apache.org/jira/browse/HIVE-26508
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler, HCatalog
Affects Versions: 4.0.0-alpha-1, 4.0.0, 4.0.0-alpha-2
Reporter: Sai Hemanth Gantasala
Assignee: Sai Hemanth Gantasala


Remove netty transitive dependencies (coming from hadoop related dependencies) 
from hcatalog and hbase pom files to avoid CVEs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26507) Iceberg: In place metadata generation may not work for certain datatypes

2022-08-31 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-26507:
---

 Summary: Iceberg: In place metadata generation may not work for 
certain datatypes
 Key: HIVE-26507
 URL: https://issues.apache.org/jira/browse/HIVE-26507
 Project: Hive
  Issue Type: Bug
Reporter: Rajesh Balamohan


"alter table" statements can be used for generating iceberg metadata 
information (i.e for converting external tables  -> iceberg tables).

As part of this process, it also converts certain datatypes to iceberg 
compatible types (e.g char -> string). "iceberg.mr.schema.auto.conversion" 
enables this conversion.

This could cause certain issues at runtime. Here is an example
{noformat}

Before conversion:
==
-- external table
select count(*) from customer_demographics where cd_gender = 'F' and 
cd_marital_status = 'U' and cd_education_status = '2 yr Degree';

27440

after conversion:
=
-- iceberg table
select count(*) from customer_demographics where cd_gender = 'F' and 
cd_marital_status = 'U' and cd_education_status = '2 yr Degree';

0

select count(*) from customer_demographics where cd_gender = 'F' and 
cd_marital_status = 'U' and trim(cd_education_status) = '2 yr Degree';

27440
 {noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)