Re: FYI: MetaStore running out of threads

2022-08-31 Thread Rajesh Balamohan
>> In hive, FileUtils.checkFileAccessWithImpersonation can be fixed to use
create UGI once to reduce the impact (suspecting this will have 50%
impact).

Looked closely at the method impl for
"FileUtils.checkFileAccessWithImpersonation". It doesn't make 2
connections; 50% impact may not be relevant here.

On Thu, Sep 1, 2022 at 4:48 AM Rajesh Balamohan 
wrote:

>
> W.r.t to connection reuse issues, LLAP had a similar issue (not in HMS)
> https://issues.apache.org/jira/browse/HIVE-16020. It was making a
> connection in every task and UGI had to be persisted in the QueryInfo level
> to reduce the impact.
>
> In hive, FileUtils.checkFileAccessWithImpersonation can be fixed to use
> create UGI once to reduce the impact (suspecting this will have 50%
> impact).
>
>
> https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L418
>
> https://github.com/apache/hive/blob/d06957f254e026e719f30027d161264be43386b0/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L461
>
> May have to explore whether a local cache with expiry in FileUtils can
> help reduce the impact further.
>
> ~Rajesh.B
>
>
> On Thu, Sep 1, 2022 at 1:24 AM Owen O'Malley 
> wrote:
>
>> We're using HMS with Storage-Based Authorization and have been having
>> trouble with the HMS running out of threads. Looking at the jstack & code,
>> it appears to that the problem is that RPC's ConnectionId is using UGI's
>> equal/hash, which uses the Subject's Object equals/hash. Proxy user UGI's
>> always create a new Subject and thus are always unique.
>>
>> This leads to the HMS creating too many threads. I've created a jira in
>> Hadoop. https://issues.apache.org/jira/browse/HADOOP-18434
>>
>> Thanks,
>>Owen
>>
>


Re: FYI: MetaStore running out of threads

2022-08-31 Thread Rajesh Balamohan
W.r.t to connection reuse issues, LLAP had a similar issue (not in HMS)
https://issues.apache.org/jira/browse/HIVE-16020. It was making a
connection in every task and UGI had to be persisted in the QueryInfo level
to reduce the impact.

In hive, FileUtils.checkFileAccessWithImpersonation can be fixed to use
create UGI once to reduce the impact (suspecting this will have 50%
impact).

https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L418
https://github.com/apache/hive/blob/d06957f254e026e719f30027d161264be43386b0/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L461

May have to explore whether a local cache with expiry in FileUtils can help
reduce the impact further.

~Rajesh.B


On Thu, Sep 1, 2022 at 1:24 AM Owen O'Malley  wrote:

> We're using HMS with Storage-Based Authorization and have been having
> trouble with the HMS running out of threads. Looking at the jstack & code,
> it appears to that the problem is that RPC's ConnectionId is using UGI's
> equal/hash, which uses the Subject's Object equals/hash. Proxy user UGI's
> always create a new Subject and thus are always unique.
>
> This leads to the HMS creating too many threads. I've created a jira in
> Hadoop. https://issues.apache.org/jira/browse/HADOOP-18434
>
> Thanks,
>Owen
>


FYI: MetaStore running out of threads

2022-08-31 Thread Owen O'Malley
We're using HMS with Storage-Based Authorization and have been having
trouble with the HMS running out of threads. Looking at the jstack & code,
it appears to that the problem is that RPC's ConnectionId is using UGI's
equal/hash, which uses the Subject's Object equals/hash. Proxy user UGI's
always create a new Subject and thus are always unique.

This leads to the HMS creating too many threads. I've created a jira in
Hadoop. https://issues.apache.org/jira/browse/HADOOP-18434

Thanks,
   Owen