[jira] [Commented] (YARN-5193) For long running services, aggregate logs when a container completes instead of when the app completes

Siddharth Seth (JIRA) Thu, 02 Jun 2016 14:38:21 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313121#comment-15313121
 ]


Siddharth Seth commented on YARN-5193:
--------------------------------------

bq. I don't think long-running necessarily means low container churn, although 
I'm sure it does for the use-case you have in mind. For example, an 
app-as-service that farms out work as containers on YARN and runs forever. High 
load with short work duration for such a service = high container churn but it 
never exits.
Fair point. I'm guessing this would end up getting implemented as a parameter 
in the API, rather than a blanket 'long-running=aggregate after container 
complete'

bq. Periodic aggregation would be more palatable for such a use-case. Also 
log-aggregation duration is not guaranteed. Even if we aggregate as the 
container completes there's no guarantee how long it will take, so any client 
that wants to see the logs in HDFS just as containers complete has to handle 
fetching it from the nodes in the worst-case scenario or retrying until it's 
available.
There would definitely still be the time window where the container has 
completed, and the log hasn't yet been aggregated. It'll likely be a little 
shorter than a specific time window - if that's worth anything.

The main problem seems to be discovering these dead containers, and where they 
ran. ATS/AHS would have been ideal, but can't really be enabled on a reasonably 
sized cluster to log container information.
Maybe log-aggregation can write out indexing information up front - so that the 
CLI can at least find all containers / the node where containers ran.

> For long running services, aggregate logs when a container completes instead 
> of when the app completes
> ------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-5193
>                 URL: https://issues.apache.org/jira/browse/YARN-5193
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>
> For a long running service, containers will typically not complete very 
> often. However, when a container completes - it would be useful to aggregate 
> the logs right then, instead of waiting for the app to complete.
> This will allow the command line log tool to lookup containers for an app 
> from the log file index itself, instead of having to go and talk to YARN. 
> Talking to YARN really only works if ATS is enabled, and YARN is configured 
> to publish container information to ATS (That may not always be the case - 
> since this can overload ATS quite fast).
> There's some added benefits like cleaning out local disk space early, instead 
> of waiting till the app completes. (There's probably a separate jira 
> somewhere about cleanup of container for long running services anyway)
> cc [~vinodkv], [~xgong]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-5193) For long running services, aggregate logs when a container completes instead of when the app completes

Reply via email to