[
https://issues.apache.org/jira/browse/YARN-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313121#comment-15313121
]
Siddharth Seth commented on YARN-5193:
--------------------------------------
bq. I don't think long-running necessarily means low container churn, although
I'm sure it does for the use-case you have in mind. For example, an
app-as-service that farms out work as containers on YARN and runs forever. High
load with short work duration for such a service = high container churn but it
never exits.
Fair point. I'm guessing this would end up getting implemented as a parameter
in the API, rather than a blanket 'long-running=aggregate after container
complete'
bq. Periodic aggregation would be more palatable for such a use-case. Also
log-aggregation duration is not guaranteed. Even if we aggregate as the
container completes there's no guarantee how long it will take, so any client
that wants to see the logs in HDFS just as containers complete has to handle
fetching it from the nodes in the worst-case scenario or retrying until it's
available.
There would definitely still be the time window where the container has
completed, and the log hasn't yet been aggregated. It'll likely be a little
shorter than a specific time window - if that's worth anything.
The main problem seems to be discovering these dead containers, and where they
ran. ATS/AHS would have been ideal, but can't really be enabled on a reasonably
sized cluster to log container information.
Maybe log-aggregation can write out indexing information up front - so that the
CLI can at least find all containers / the node where containers ran.
> For long running services, aggregate logs when a container completes instead
> of when the app completes
> ------------------------------------------------------------------------------------------------------
>
> Key: YARN-5193
> URL: https://issues.apache.org/jira/browse/YARN-5193
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Siddharth Seth
>
> For a long running service, containers will typically not complete very
> often. However, when a container completes - it would be useful to aggregate
> the logs right then, instead of waiting for the app to complete.
> This will allow the command line log tool to lookup containers for an app
> from the log file index itself, instead of having to go and talk to YARN.
> Talking to YARN really only works if ATS is enabled, and YARN is configured
> to publish container information to ATS (That may not always be the case -
> since this can overload ATS quite fast).
> There's some added benefits like cleaning out local disk space early, instead
> of waiting till the app completes. (There's probably a separate jira
> somewhere about cleanup of container for long running services anyway)
> cc [~vinodkv], [~xgong]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]