[ 
https://issues.apache.org/jira/browse/YARN-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284731#comment-15284731
 ] 

Junping Du commented on YARN-4987:
----------------------------------

Thanks [~gtCarrera9] for posting a patch to fix this. I just go through the 
patch and a few questions/comments:
1. I see we are adding groupId to EntityCacheItem and we remove the parameter 
of releaseCache(). This is a good refactoring. However, I think we need to 
remove the groupId parameter for refreshCache() as well.
2. do we really need long type for refCount? int can support ~2 billions. Any 
case that int is not enough?
3. The same comments for activeStores. Given our default appCacheMaxSize is 
only set to 10. Do we really need that large number that int cannot handle? The 
long atomic operation should be much expensive than int.
Other looks fine to me.

> Read cache concurrency issue between read and evict in EntityGroupFS timeline 
> store 
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-4987
>                 URL: https://issues.apache.org/jira/browse/YARN-4987
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Li Lu
>            Assignee: Li Lu
>            Priority: Critical
>         Attachments: YARN-4987-trunk.001.patch
>
>
> To handle concurrency issues, key value based timeline storage may return 
> null on reads that are concurrent to service stop. This is actually caused by 
> a concurrency issue between cache reads and evicts. Specifically, if the 
> storage is being read when it gets evicted, the storage may turn into null. 
> EntityGroupFS timeline store needs to handle this case gracefully. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to