[ 
https://issues.apache.org/jira/browse/YARN-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434691#comment-15434691
 ] 

Chackaravarthy commented on YARN-5445:
--------------------------------------

[~templedf] Thanks for taking a look at this.

{quote}
Moving DFS_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY to CommonConfigurationKeys doesn't 
sound like the right solution. If the log aggregation code is going to depend 
on specific behavior of HDFS why shouldn't the project depend on hadoop-hdfs?
{quote}

Yes It makes sense. But I thought that adding dependency to hadoop-hdfs is a 
major change. If this can be done, then good.

{quote}
As you pointed out, there are other JIRAs that are attempting to resolve the 
base issue that is prompting your unusual cluster config. I don't think 
introducing a new configuration parameter to deal with a temporary issue is a 
good idea. Config params, like diamonds, are forever, and we already have 
entirely too many
{quote}

Yes, agreed on not introducing a new config for workaround. But recently we 
moved the applogs to different namenode (federated cluster) and seeing a 
benefit out of it as it reduces client/service rpc queue length and also rpc 
latencies. Also LogAggregationService is designed to write applogs to different 
filesystem. Hence thinking adding a config here might be worthy. But I am open 
to get input from you on this.

{quote}
I haven't looked at the code for log aggregation. What happens what the DFS 
connection fails?
{quote}

Seems this been handled in ConfiguredFailoverProxyProvider where 
{{ipc.client.connect.max.retries}} is set to 0 (default of 
{{dfs.client.failover.connection.retries}} . Please correct me If I am missing 
out something here.
{code:title=ConfiguredFailoverProxyProvider.java|borderStyle=solid}
public ConfiguredFailoverProxyProvider(Configuration conf, URI uri,
      Class<T> xface) {
    int maxRetries = this.conf.getInt(
        DFSConfigKeys.DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_KEY,
        DFSConfigKeys.DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_DEFAULT);
    this.conf.setInt(
        CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_MAX_RETRIES_KEY,
        maxRetries);
}
{code}


> Log aggregation configured to different namenode can fail fast
> --------------------------------------------------------------
>
>                 Key: YARN-5445
>                 URL: https://issues.apache.org/jira/browse/YARN-5445
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.1
>            Reporter: Chackaravarthy
>         Attachments: YARN-5445-1.patch
>
>
> Log aggregation is enabled and configured to write applogs to different 
> cluster or different namespace (NN federation). In these cases, would like to 
> have some configs on attempts or retries to fail fast in case the other 
> cluster is completely down.
> Currently it takes default {{dfs.client.failover.max.attempts}} as 15 and 
> hence adding a latency of 2 to 2.5 mins in each container launch (per node 
> manager).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to