[ 
https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512837#comment-16512837
 ] 

Eric Yang commented on YARN-8414:
---------------------------------

[~rohithsharma] We have 9 node managers running 1000 applications, each app has 
2 containers.  Master container NM goes down when ATS-HBase is unavailable.  
Sometimes NM goes down when many AMs are trying to talk to NM and runs out of 
file descriptor while ATS-HBase is running.
On a healthy node manager netstat -tnapl looks like this:

{code}
tcp        0      0 0.0.0.0:7447            0.0.0.0:*               LISTEN      
3400770/java        
tcp        0      0 0.0.0.0:13562           0.0.0.0:*               LISTEN      
3400770/java        
tcp        0      0 0.0.0.0:8040            0.0.0.0:*               LISTEN      
3400770/java        
tcp        0      0 0.0.0.0:46473           0.0.0.0:*               LISTEN      
3400770/java        
tcp        0      0 0.0.0.0:8042            0.0.0.0:*               LISTEN      
3400770/java        
tcp        0      0 0.0.0.0:45454           0.0.0.0:*               LISTEN      
3400770/java        
tcp        0      0 0.0.0.0:8048            0.0.0.0:*               LISTEN      
3400770/java        
tcp        1      0 172.26.32.105:59462     172.26.32.104:43939     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.106:50312     ESTABLISHED 
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.106:49858     ESTABLISHED 
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.111:41044     FIN_WAIT2   
3400770/java        
tcp        1      0 172.26.32.105:52339     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.109:59572     ESTABLISHED 
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.109:33316     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:37372     172.26.32.111:44675     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.106:48964     FIN_WAIT2   
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.106:48006     FIN_WAIT2   
3400770/java        
tcp        1      0 172.26.32.105:43014     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:46714     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:49158     172.26.32.104:43939     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.105:44576     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:42900     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.112:58558     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:35058     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:39134     172.26.32.104:43939     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.112:55064     FIN_WAIT2   
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.111:41752     FIN_WAIT2   
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.108:34892     FIN_WAIT2   
3400770/java        
tcp        1      0 172.26.32.105:41856     172.26.32.106:33915     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.109:56932     FIN_WAIT2   
3400770/java        
tcp        1      0 172.26.32.105:51486     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.108:35686     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:59954     172.26.32.106:33915     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:37614     172.26.32.104:43939     ESTABLISHED 
3400770/java        
tcp        0      0 172.26.32.105:47254     172.26.32.104:43939     ESTABLISHED 
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.108:34356     FIN_WAIT2   
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.108:36030     ESTABLISHED 
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.106:50552     ESTABLISHED 
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.106:50826     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:39836     172.26.32.112:45839     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:47736     172.26.32.104:43939     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.111:41584     FIN_WAIT2   
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.105:51144     FIN_WAIT2   
3400770/java        
tcp        1      0 172.26.32.105:47411     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:39896     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.108:36704     ESTABLISHED 
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.106:49854     ESTABLISHED 
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.108:36246     ESTABLISHED 
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.108:36032     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:56782     172.26.32.109:35169     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:41272     172.26.32.112:17020     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:59512     172.26.32.104:43939     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:52320     172.26.32.104:43939     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:43803     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.111:41980     ESTABLISHED 
3400770/java        
tcp        0      0 172.26.32.105:41118     172.26.32.111:44675     ESTABLISHED 
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.109:33690     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:47856     172.26.32.104:43939     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:39428     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:41128     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.106:48264     FIN_WAIT2   
3400770/java        
tcp        1      0 172.26.32.105:33813     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:43250     172.26.32.111:44675     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.106:50558     ESTABLISHED 
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.105:58766     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:38632     172.26.32.111:44675     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:52362     172.26.32.104:43939     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.106:48720     FIN_WAIT2   
3400770/java        
tcp        1      0 172.26.32.105:60629     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:59448     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.108:35158     FIN_WAIT2   
3400770/java        
tcp        1      0 172.26.32.105:58251     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:32900     172.26.32.111:44675     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:47098     172.26.32.104:43939     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.105:42236     ESTABLISHED 
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.108:36702     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:38479     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:34711     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:46894     172.26.32.104:43939     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:48698     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:37716     172.26.32.104:43939     ESTABLISHED 
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.105:51780     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:40948     172.26.32.111:44675     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:40582     172.26.32.111:44675     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.108:36540     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:32936     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.106:49620     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:40782     172.26.32.111:44675     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:56127     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:55422     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:54392     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.106:49724     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:51580     172.26.32.104:43939     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.108:36536     ESTABLISHED 
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.108:36254     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:59050     172.26.32.109:35169     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:56668     172.26.32.104:43939     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.109:59410     FIN_WAIT2   
3400770/java        
tcp        0      0 172.26.32.105:42604     172.26.32.101:8031      ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:43488     172.26.32.111:44675     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:47036     172.26.32.104:43939     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.105:46949     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:43440     172.26.32.111:44675     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.105:32820     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:55650     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.109:59570     FIN_WAIT2   
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.109:33688     ESTABLISHED 
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.108:35682     FIN_WAIT2   
3400770/java        
tcp        1      0 172.26.32.105:54020     172.26.32.104:43939     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.112:57912     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:38514     172.26.32.111:44675     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:38022     172.26.32.104:43939     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:46228     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        1      0 172.26.32.105:45375     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.108:35334     FIN_WAIT2   
3400770/java        
tcp        1      0 172.26.32.105:59081     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.108:34680     FIN_WAIT2   
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.105:34822     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:43142     172.26.32.105:46473     CLOSE_WAIT  
3400770/java        
tcp        0      0 172.26.32.105:46473     172.26.32.106:50160     ESTABLISHED 
3400770/java        
tcp        1      0 172.26.32.105:51678     172.26.32.104:43939     CLOSE_WAIT  
3400770/java  
{code}

This list has 386 entries.  On a unhealthy node manager, the number reaches 
20,000 before crashing.  We are losing 1 node manager every 12 hours even with 
ATS-HBase running.


> Nodemanager crashes soon if ATSv2 HBase is either down or absent
> ----------------------------------------------------------------
>
>                 Key: YARN-8414
>                 URL: https://issues.apache.org/jira/browse/YARN-8414
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: yarn
>    Affects Versions: 3.1.0
>            Reporter: Eric Yang
>            Priority: Critical
>
> Test cluster has 1000 apps running, and a user trigger capacity scheduler 
> queue changes.  This crashes all node managers.  It looks like node manager 
> encounter too many files open while aggregating logs for containers:
> {code}
> 2018-06-07 21:17:59,307 WARN  server.AbstractConnector 
> (AbstractConnector.java:handleAcceptFailure(544)) -
> java.io.IOException: Too many open files
>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>         at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
>         at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
>         at 
> org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:371)
>         at 
> org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:601)
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>         at java.lang.Thread.run(Thread.java:745)
> 2018-06-07 21:17:59,758 WARN  util.SysInfoLinux 
> (SysInfoLinux.java:readProcMemInfoFile(238)) - Couldn't read /proc/meminfo; 
> can't determine memory settings
> 2018-06-07 21:17:59,758 WARN  util.SysInfoLinux 
> (SysInfoLinux.java:readProcMemInfoFile(238)) - Couldn't read /proc/meminfo; 
> can't determine memory settings
> 2018-06-07 21:18:00,842 WARN  client.ConnectionUtils 
> (ConnectionUtils.java:getStubKey(236)) - Can not resolve host12.example.com, 
> please check your network
> java.net.UnknownHostException: host1.example.com: System error
>         at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
>         at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
>         at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
>         at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
>         at java.net.InetAddress.getAllByName(InetAddress.java:1192)
>         at java.net.InetAddress.getAllByName(InetAddress.java:1126)
>         at java.net.InetAddress.getByName(InetAddress.java:1076)
>         at 
> org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:233)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getClient(ConnectionImplementation.java:1189)
>         at 
> org.apache.hadoop.hbase.client.ReversedScannerCallable.prepare(ReversedScannerCallable.java:111)
>         at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:399)
>         at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
>         at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> Timeline service has thousands of exceptions:
> {code}
> 2018-06-07 21:18:34,182 ERROR client.AsyncProcess 
> (AsyncProcess.java:submit(291)) - Failed to get region location
> java.io.InterruptedIOException
>         at 
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:265)
>         at 
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437)
>         at 
> org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312)
>         at 
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:834)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732)
>         at 
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
>         at 
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:236)
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:307)
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:212)
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:170)
>         at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.mutate(TypedBufferedMutator.java:54)
>         at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.store(ColumnRWHelper.java:153)
>         at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.store(ColumnRWHelper.java:107)
>         at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.store(HBaseTimelineWriterImpl.java:395)
>         at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.write(HBaseTimelineWriterImpl.java:198)
>         at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector.writeTimelineEntities(TimelineCollector.java:164)
>         at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector.putEntitiesAsync(TimelineCollector.java:196)
>         at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorWebService.putEntities(TimelineCollectorWebService.java:173)
>         at sun.reflect.GeneratedMethodAccessor145.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>         at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
>         at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
>         at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
>         at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>         at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>         at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>         at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
>         at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
>         at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
>         at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
>         at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
>         at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>         at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>         at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>         at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644)
>         at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:304)
>         at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592)
>         at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>         at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
>         at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>         at 
> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>         at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>         at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>         at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>         at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>         at org.eclipse.jetty.server.Server.handle(Server.java:534)
>         at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>         at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>         at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>         at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>         at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>         at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>         at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>         at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>         at java.lang.Thread.run(Thread.java:745)
> 2018-06-07 21:18:36,266 INFO  retry.RetryInvocationHandler 
> (RetryInvocationHandler.java:log(411)) - java.net.UnknownHostException: 
> Invalid host name: local host is: (unknown); destination host is: 
> "host1.example.com":8020; java.net.UnknownHostException; For more details 
> see:  http://wiki.apache.org/hadoop/UnknownHost, while invoking 
> ClientNamenodeProtocolTranslatorPB.getServerDefaults over 
> host1.example.com:8020 after 10 failover attempts. Trying to failover after 
> sleeping for 9634ms.
> 2018-06-07 21:18:36,612 WARN  storage.HBaseTimelineWriterImpl 
> (HBaseTimelineWriterImpl.java:write(170)) - Found null for one of: 
> flowName=null appId=application_1528316765723_0030 userId=csingh 
> clusterId=yarn-cluster . Not proceeding with writing to hbase
> 2018-06-07 21:18:38,396 INFO  client.RpcRetryingCallerImpl 
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=6, 
> retries=6, started=4213 ms ago, cancelled=false, msg=Call to 
> host1.example.com/142.26.32.112:17020 failed on connection exception: 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: host12.example.com/142.26.32.112:17020, details=row 
> 'prod.timelineservice.entity,csingh!yarn-cluster!scale-1-182!^?���(�^@<!^?���)8��^?���!COMPONENT!^@^@^@^@^@^@^@^@!simple,99999999999999'
>  on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=host12.example.com,17020,1528302866813, seqNum=-1
> 2018-06-07 21:18:38,662 ERROR util.ShutdownHookManager 
> (ShutdownHookManager.java:run(82)) - ShutdownHookManger shutdown forcefully
> {code}
> Nodes were temporarily unable to resolve hostname to IP mapping.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to