[
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas Graves updated YARN-1670:
--------------------------------
Priority: Critical (was: Major)
> aggregated log writer can write more log data then it says is the log length
> ----------------------------------------------------------------------------
>
> Key: YARN-1670
> URL: https://issues.apache.org/jira/browse/YARN-1670
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 0.23.10, 2.2.0
> Reporter: Thomas Graves
> Priority: Critical
>
> We have seen exceptions when using 'yarn logs' to read log files.
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Long.parseLong(Long.java:441)
> at java.lang.Long.parseLong(Long.java:483)
> at
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
> at
> org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
> at
> org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
> at
> org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
> We traced it down to the reader trying to read the file type of the next file
> but where it reads is still log data from the previous file. What happened
> was the Log Length was written as a certain size but the log data was
> actually longer then that.
> Inside of the write() routine in LogValue it first writes what the logfile
> length is, but then when it goes to write the log itself it just goes to the
> end of the file. There is a race condition here where if someone is still
> writing to the file when it goes to be aggregated the length written could be
> to small.
> We should have the write() routine stop when it writes whatever it said was
> the length. It would be nice if we could somehow tell the user it might be
> truncated but I'm not sure of a good way to do this.
> We also noticed that a bug in readAContainerLogsForALogType where it is using
> an int for curRead whereas it should be using a long.
> while (len != -1 && curRead < fileLength) {
> This isn't actually a problem right now as it looks like the underlying
> decoder is doing the right thing and the len condition exits.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)