[
https://issues.apache.org/jira/browse/YARN-9826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shen Yinjie updated YARN-9826:
------------------------------
Comment: was deleted
(was: Is there any progress on this issue? :))
> Blocked threads at EntityGroupFSTimelineStore#getCachedStore
> ------------------------------------------------------------
>
> Key: YARN-9826
> URL: https://issues.apache.org/jira/browse/YARN-9826
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: timelineserver
> Affects Versions: 2.7.3
> Reporter: Harunobu Daikoku
> Priority: Minor
>
> We have observed this case several times on our production cluster where 100s
> of TimelineServer threads are blocked at the following synchronized block in
> EntityGroupFSTimelineStore#getCachedStore when our HDFS NameNode is under
> high load.
> {code:java}
> synchronized (this.cachedLogs) {
> // Note that the content in the cache log storage may be stale.
> cacheItem = this.cachedLogs.get(groupId);
> if (cacheItem == null) {
> LOG.debug("Set up new cache item for id {}", groupId);
> cacheItem = new EntityCacheItem(groupId, getConfig());
> AppLogs appLogs = getAndSetAppLogs(groupId.getApplicationId());
> if (appLogs != null) {
> LOG.debug("Set applogs {} for group id {}", appLogs, groupId);
> cacheItem.setAppLogs(appLogs);
> this.cachedLogs.put(groupId, cacheItem);
> } else {
> LOG.warn("AppLogs for groupId {} is set to null!", groupId);
> }
> }
> }
> {code}
> One thread inside the synchronized block performs multiple fs operations
> (fs.exists) inside getAndSetAppLogs, which could block other threads when,
> for instance, the NameNode RPC queue is full.
> One possible solution is to move getAndSetAppLogs outside the synchronized
> block.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]