[
https://issues.apache.org/jira/browse/YARN-9581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16850809#comment-16850809
]
Peter Bacsko commented on YARN-9581:
------------------------------------
Thanks for the patch [~Prabhu Joseph]. Overall looks good, but this piece of
code keeps showing up:
{noformat}
String webAppAddress = WebAppUtils.getRMWebAppURLWithScheme(conf, 0);
try {
return getAMContainerInfoFromRM(appId, webAppAddress);
} catch (Exception e) {
if (HAUtil.isHAEnabled(conf)) {
webAppAddress = WebAppUtils.getRMWebAppURLWithScheme(conf, 1);
return getAMContainerInfoFromRM(appId, webAppAddress);
}
throw e;
}
}
{noformat}
I've been thinking about removing the duplicates and it requires lambda usage.
Right now we have this at three places so I'm not sure if it's worth the
trouble but I'll show my solution regardless.
Add this to eg. WebAppUtils:
{noformat}
public static <T,R> R execOnActiveRM(Configuration conf,
ThrowingBiFunction<String, T, R> func, T arg) throws Exception {
String rm1Address = WebAppUtils.getRMWebAppURLWithScheme(conf, 0);
try {
return func.apply(rm1Address, arg);
} catch (Exception e) {
if (HAUtil.isHAEnabled(conf)) {
String rm2Address = WebAppUtils.getRMWebAppURLWithScheme(conf, 1);
LOG.info("RM on host {} is unavailable, trying {}", rm1Address,
rm2Address);
return func.apply(rm2Address, arg);
}
LOG.error("Error connecting to RM");
throw e;
}
}
@FunctionalInterface
public interface ThrowingBiFunction<T, U, R> {
R apply(T t, U u) throws Exception;
}
{noformat}
And then repeated invocations are reduced to:
{{LogsCLI.java}}:
{noformat}
protected List<JSONObject> getAMContainerInfoForRMWebService(
Configuration conf, String appId) throws Exception {
return WebAppUtils.execOnActiveRM(conf,
this::getAMContainerInfoFromRM, appId);
}
{noformat}
{{SchedConfCLI.java}}:
{noformat}
... (in SchedConfCLI.run())
return WebAppUtils.execOnActiveRM(conf,
this::updateSchedulerConfOnRMNode, updateInfo);
{noformat}
{{YarnWebServiceUtils.java}}:
{noformat}
public static JSONObject getNodeInfoFromRMWebService(Configuration conf,
String nodeId) throws ClientHandlerException,
UniformInterfaceException {
try {
return WebAppUtils.execOnActiveRM(conf,
YarnWebServiceUtils::getNodeInfoFromRM, nodeId);
} catch (Exception e) {
if (e instanceof ClientHandlerException) {
throw ((ClientHandlerException) e);
} else if (e instanceof UniformInterfaceException) {
throw ((UniformInterfaceException) e);
} else {
throw new RuntimeException(e);
}
}
}
{noformat}
Exception handling causes minor issues, eg. {{Exception}} has to be caught in
{{getNodeInfoFromRMWebService}} because different methods have different
{{throws}} clauses.
It's slightly more elegant and we have the fallback logic at a single place.
From clean code perspective, that's a win.
[~adam.antal] [~snemeth] opinions?
> WebAppUtils#getRMWebAppURLWithScheme ignores rm2
> ------------------------------------------------
>
> Key: YARN-9581
> URL: https://issues.apache.org/jira/browse/YARN-9581
> Project: Hadoop YARN
> Issue Type: Bug
> Components: client
> Affects Versions: 3.2.0
> Reporter: Prabhu Joseph
> Assignee: Prabhu Joseph
> Priority: Major
> Attachments: YARN-9581-001.patch, YARN-9581-002.patch,
> YARN-9581-003.patch
>
>
> Yarn Logs fails for a running job in case of RM HA with rm2 active and rm1 is
> down.
> {code}
> hrt_qa@prabhuYarn:~> /usr/hdp/current/hadoop-yarn-client/bin/yarn logs
> -applicationId application_1558613472348_0004 -am 1
> 19/05/24 18:04:49 INFO client.AHSProxy: Connecting to Application History
> server at prabhuYarn/172.27.23.55:10200
> 19/05/24 18:04:50 INFO client.ConfiguredRMFailoverProxyProvider: Failing over
> to rm2
> Unable to get AM container informations for the
> application:application_1558613472348_0004
> java.io.IOException:
> org.apache.hadoop.security.authentication.client.AuthenticationException:
> Error while authenticating with endpoint:
> https://prabhuYarn:8090/ws/v1/cluster/apps/application_1558613472348_0004/appattempts
> Can not get AMContainers logs for the
> application:application_1558613472348_0004 with the appOwner:hrt_qa
> {code}
> LogsCli getRMWebAppURLWithoutScheme only checks the first one from the RM
> list yarn.resourcemanager.ha.rm-ids.
> {code}
> yarnConfig.set(YarnConfiguration.RM_HA_ID, rmIds.get(0));
> {code}
> SchedConfCli also fails
> {code}
> [ambari-qa@pjosephdocker-3 ~]$ yarn schedulerconf -update
> root.default:maximum-capacity=90
> Exception in thread "main" com.sun.jersey.api.client.ClientHandlerException:
> java.net.ConnectException: Connection refused (Connection refused)
> at
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
> at com.sun.jersey.api.client.Client.handle(Client.java:652)
> at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]