[
https://issues.apache.org/jira/browse/YARN-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434691#comment-15434691
]
Chackaravarthy commented on YARN-5445:
--------------------------------------
[~templedf] Thanks for taking a look at this.
{quote}
Moving DFS_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY to CommonConfigurationKeys doesn't
sound like the right solution. If the log aggregation code is going to depend
on specific behavior of HDFS why shouldn't the project depend on hadoop-hdfs?
{quote}
Yes It makes sense. But I thought that adding dependency to hadoop-hdfs is a
major change. If this can be done, then good.
{quote}
As you pointed out, there are other JIRAs that are attempting to resolve the
base issue that is prompting your unusual cluster config. I don't think
introducing a new configuration parameter to deal with a temporary issue is a
good idea. Config params, like diamonds, are forever, and we already have
entirely too many
{quote}
Yes, agreed on not introducing a new config for workaround. But recently we
moved the applogs to different namenode (federated cluster) and seeing a
benefit out of it as it reduces client/service rpc queue length and also rpc
latencies. Also LogAggregationService is designed to write applogs to different
filesystem. Hence thinking adding a config here might be worthy. But I am open
to get input from you on this.
{quote}
I haven't looked at the code for log aggregation. What happens what the DFS
connection fails?
{quote}
Seems this been handled in ConfiguredFailoverProxyProvider where
{{ipc.client.connect.max.retries}} is set to 0 (default of
{{dfs.client.failover.connection.retries}} . Please correct me If I am missing
out something here.
{code:title=ConfiguredFailoverProxyProvider.java|borderStyle=solid}
public ConfiguredFailoverProxyProvider(Configuration conf, URI uri,
Class<T> xface) {
int maxRetries = this.conf.getInt(
DFSConfigKeys.DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_KEY,
DFSConfigKeys.DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_DEFAULT);
this.conf.setInt(
CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_MAX_RETRIES_KEY,
maxRetries);
}
{code}
> Log aggregation configured to different namenode can fail fast
> --------------------------------------------------------------
>
> Key: YARN-5445
> URL: https://issues.apache.org/jira/browse/YARN-5445
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 2.7.1
> Reporter: Chackaravarthy
> Attachments: YARN-5445-1.patch
>
>
> Log aggregation is enabled and configured to write applogs to different
> cluster or different namespace (NN federation). In these cases, would like to
> have some configs on attempts or retries to fail fast in case the other
> cluster is completely down.
> Currently it takes default {{dfs.client.failover.max.attempts}} as 15 and
> hence adding a latency of 2 to 2.5 mins in each container launch (per node
> manager).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]