[ 
https://issues.apache.org/jira/browse/YARN-10967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420342#comment-17420342
 ] 

Steve Loughran commented on YARN-10967:
---------------------------------------

think it's probably getContentSummary which is the enemy there. I don't know 
what HDFS does there, but for object stores its very bad. I wish hive would 
stop using it, entirely.

> setPermission() call floods HDFS NN RPC queue
> ---------------------------------------------
>
>                 Key: YARN-10967
>                 URL: https://issues.apache.org/jira/browse/YARN-10967
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Anand Srinivasan
>            Priority: Major
>              Labels: performance
>
> Checking the code changes  for the log aggregation feature, we could see that 
> when the log aggregator is inited for each app, we do verify and create 
> remote dir where we make an additional call to setPermission() even though 
> the remote dir exists and the permissions are set as expected.
> This code path was introduced to cater to the cloud storage where we had to 
> make this additional check to ensure the remote file system and the 
> corresponding cloud storage supports setting permissions.
> Upstream jira that introduced this call.
> https://issues.apache.org/jira/browse/YARN-9030
> This additional setPermission() call per each app/job floods the HDFS NN and 
> its RPC queue which affects the performance overall.
> The ask here is to see if it's feasible to do the following :
> (a)if we can put the code introduced via YARN-9030 behind a configuration 
> option (may be setting this option to false by default (assuming the storage 
> used is HDFS) to bypass this code)
> (b)check if customer is using HDFS storage internally in the code (by 
> checking yarn.nodemanager.remote-app-log-dir) and bypass this code if the 
> storage is indeed HDFS.
> given that the code introduced in YARN-9030 is mainly put in for cloud 
> storage providers.
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to