[jira] [Issue Comment Deleted] (YARN-9826) Blocked threads at EntityGroupFSTimelineStore#getCachedStore

Shen Yinjie (Jira) Mon, 28 Dec 2020 01:18:04 -0800


     [ 
https://issues.apache.org/jira/browse/YARN-9826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Shen Yinjie updated YARN-9826:
------------------------------
    Comment: was deleted

(was: Is there any progress on this issue? :))

> Blocked threads at EntityGroupFSTimelineStore#getCachedStore
> ------------------------------------------------------------
>
>                 Key: YARN-9826
>                 URL: https://issues.apache.org/jira/browse/YARN-9826
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: timelineserver
>    Affects Versions: 2.7.3
>            Reporter: Harunobu Daikoku
>            Priority: Minor
>
> We have observed this case several times on our production cluster where 100s 
> of TimelineServer threads are blocked at the following synchronized block in 
> EntityGroupFSTimelineStore#getCachedStore when our HDFS NameNode is under 
> high load.
> {code:java}
>     synchronized (this.cachedLogs) {
>       // Note that the content in the cache log storage may be stale.
>       cacheItem = this.cachedLogs.get(groupId);
>       if (cacheItem == null) {
>         LOG.debug("Set up new cache item for id {}", groupId);
>         cacheItem = new EntityCacheItem(groupId, getConfig());
>         AppLogs appLogs = getAndSetAppLogs(groupId.getApplicationId());
>         if (appLogs != null) {
>           LOG.debug("Set applogs {} for group id {}", appLogs, groupId);
>           cacheItem.setAppLogs(appLogs);
>           this.cachedLogs.put(groupId, cacheItem);
>         } else {
>           LOG.warn("AppLogs for groupId {} is set to null!", groupId);
>         }
>       }
>     }
> {code}
> One thread inside the synchronized block performs multiple fs operations 
> (fs.exists) inside getAndSetAppLogs, which could block other threads when, 
> for instance, the NameNode RPC queue is full.
> One possible solution is to move getAndSetAppLogs outside the synchronized 
> block.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Issue Comment Deleted] (YARN-9826) Blocked threads at EntityGroupFSTimelineStore#getCachedStore

Reply via email to