[
https://issues.apache.org/jira/browse/YARN-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533285#comment-16533285
]
Rohith Sharma K S commented on YARN-8302:
-----------------------------------------
Couple of comments
# monitorConn is not closed anywhere. This connection should be closed in
serviceStop().
# serviceStop() has the {{executor.shutdownNow()}}. Sometimes this API might
not shutdown if on-going task doesn't have any wait/sleep. It would be better
if we use {{executor.awaitTermination}} to wait for certain period rather than
infinite period.
# nit: Since HBaseMonitor thread runs for every minute, its better keep it as
debug logs than info right?
Just for curiosity, what is the minimum time at which monitorThread get
exception if Hbase is not available? Is it less than one minute?
> ATS v2 should handle HBase connection issue properly
> ----------------------------------------------------
>
> Key: YARN-8302
> URL: https://issues.apache.org/jira/browse/YARN-8302
> Project: Hadoop YARN
> Issue Type: Bug
> Components: ATSv2
> Affects Versions: 3.1.0
> Reporter: Yesha Vora
> Assignee: Billie Rinaldi
> Priority: Major
> Attachments: YARN-8302.1.patch
>
>
> ATS v2 call times out with below error when it can't connect to HBase
> instance.
> {code}
> bash-4.2$ curl -i -k -s -1 -H 'Content-Type: application/json' -H 'Accept:
> application/json' --max-time 5 --negotiate -u :
> 'https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/YARN_CONTAINER?fields=ALL&_=1526425686092'
> curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received
> {code}
> {code:title=ATS log}
> 2018-05-15 23:10:03,623 INFO client.RpcRetryingCallerImpl
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=7,
> retries=7, started=8165 ms ago, cancelled=false, msg=Call to xxx/xxx:17020
> failed on connection exception:
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
> Connection refused: xxx/xxx:17020, details=row
> 'prod.timelineservice.app_flow,
> ,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740,
> hostname=xxx,17020,1526348294182, seqNum=-1
> 2018-05-15 23:10:13,651 INFO client.RpcRetryingCallerImpl
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=8,
> retries=8, started=18192 ms ago, cancelled=false, msg=Call to xxx/xxx:17020
> failed on connection exception:
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
> Connection refused: xxx/xxx:17020, details=row
> 'prod.timelineservice.app_flow,
> ,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740,
> hostname=xxx,17020,1526348294182, seqNum=-1
> 2018-05-15 23:10:23,730 INFO client.RpcRetryingCallerImpl
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=9,
> retries=9, started=28272 ms ago, cancelled=false, msg=Call to xxx/xxx:17020
> failed on connection exception:
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
> Connection refused: xxx/xxx:17020, details=row
> 'prod.timelineservice.app_flow,
> ,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740,
> hostname=xxx,17020,1526348294182, seqNum=-1
> 2018-05-15 23:10:33,788 INFO client.RpcRetryingCallerImpl
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=10,
> retries=10, started=38330 ms ago, cancelled=false, msg=Call to xxx/xxx:17020
> failed on connection exception:
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
> Connection refused: xxx/xxx:17020, details=row
> 'prod.timelineservice.app_flow,
> ,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740,
> hostname=xxx,17020,1526348294182, seqNum=-1{code}
> There are two issues here.
> 1) Check why ATS can't connect to HBase
> 2) In case of connection error, ATS call should not get timeout. It should
> fail with proper error.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]