[
https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daryn Sharp reopened YARN-3760:
-------------------------------
Line numbers are from an old release but the error is evident.
{code}
java.lang.IllegalStateException: Cannot close TFile in the middle of key-value
insertion.
at org.apache.hadoop.io.file.tfile.TFile$Writer.close(TFile.java:310)
at
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.close(AggregatedLogFormat.java:456)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:326)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:429)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:388)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$2.run(LogAggregationService.java:387)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
{code}
_AggregatedLogFormat.LogWriter_
{code}
public void close() {
try {
this.writer.close();
} catch (IOException e) {
LOG.warn("Exception closing writer", e);
}
IOUtils.closeStream(fsDataOStream);
}
{code}
TFile writer's close which may throw {{IllegalStateException}} if the
underlying fs data stream failed. Unfortunately it only catches IOE, so the
ISE rips out w/o closing the fsdata stream.
Additionally, the ctor creates the fs data stream then a TFile.Writer w/o a
try/catch. If the TFile.Writer ctor throws an exception, it's impossible to
close the stream.
I haven't checked if there are futher issues with closing the writer high in
the stack.
> Log aggregation failures
> -------------------------
>
> Key: YARN-3760
> URL: https://issues.apache.org/jira/browse/YARN-3760
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 2.4.0
> Reporter: Daryn Sharp
> Priority: Critical
>
> The aggregated log file does not appear to be properly closed when writes
> fail. This leaves a lease renewer active in the NM that spams the NN with
> lease renewals. If the token is marked not to be cancelled, the renewals
> appear to continue until the token expires. If the token is cancelled, the
> periodic renew spam turns into a flood of failed connections until the lease
> renewer gives up.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]