[ 
https://issues.apache.org/jira/browse/YARN-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394287#comment-15394287
 ] 

Li Lu commented on YARN-5432:
-----------------------------

Thanks for reporting this issue [~karams]! 

The main cause of this issue is that after concurrency changes in YARN-4987, it 
is possible for readers to hold a cache item from being released. If during 
this period another read request to the same entity group id occurs, the 
storage will try to create a new cache on the same file location. This will 
cause the locking issue on the leveldb. This also explains why the problem is 
severe when cache size is small and reader contention is high: with smaller 
cache sizes, cache evictions are more frequent. At the same time, higher reader 
contention will introduce higher chances for readers to "hold" a cache storage.

> Lock already held by another process while LevelDB cache store creation for 
> dag
> -------------------------------------------------------------------------------
>
>                 Key: YARN-5432
>                 URL: https://issues.apache.org/jira/browse/YARN-5432
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: timelineserver
>    Affects Versions: 2.8.0, 2.7.3
>            Reporter: Karam Singh
>            Assignee: Li Lu
>
> While running ATS  stress tests,  15 concurrent ATS reads (python thread 
> which gives ws/v1/time/TEZ_DAG_ID, 
> ws/v1/time/TEZ_VERTEX_DI?primaryFilter=TEZ_DAG_ID:<dag_id> etc) calls.
> Note: Summary store for ATSv1.5 is RLD, but as we for each dag/application 
> ATS also creates leveldb cache when vertex/task/taskattempts information is 
> queried from ATS.
>  
> Getting following type of excpetion very frequently in ATS logs :- 
> 2016-07-23 00:01:56,089 [1517798697@qtp-1198158701-850] INFO 
> org.apache.hadoop.service.AbstractService: Service 
> LeveldbCache.timelineEntityGroupId_1469090881194_4832_application_1469090881194_4832
>  failed in state INITED; cause: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock 
> /grid/4/yarn_ats/atsv15_rld/timelineEntityGroupId_1469090881194_4832_application_1469090881194_4832-timeline-cache.ldb/LOCK:
>  already held by process
> org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock 
> /grid/4/yarn_ats/atsv15_rld/timelineEntityGroupId_1469090881194_4832_application_1469090881194_4832-timeline-cache.ldb/LOCK:
>  already held by process
>         at 
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
>         at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
>         at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
>         at 
> org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore.serviceInit(LevelDBCacheTimelineStore.java:108)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at 
> org.apache.hadoop.yarn.server.timeline.EntityCacheItem.refreshCache(EntityCacheItem.java:113)
>         at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getCachedStore(EntityGroupFSTimelineStore.java:1021)
>         at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getTimelineStoresFromCacheIds(EntityGroupFSTimelineStore.java:936)
>         at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getTimelineStoresForRead(EntityGroupFSTimelineStore.java:989)
>         at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getEntities(EntityGroupFSTimelineStore.java:1041)
>         at 
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.doGetEntities(TimelineDataManager.java:168)
>         at 
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.getEntities(TimelineDataManager.java:138)
>         at 
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:117)
>         at sun.reflect.GeneratedMethodAccessor82.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>         at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
>         at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
>         at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
>         at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>         at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>         at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>         at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
>         at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
>         at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
>         at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
>         at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
>         at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>         at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>         at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>         at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>         at 
> com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>         at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>         at 
> org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to