[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943356#comment-13943356
 ] 

Vinod Kumar Vavilapalli commented on YARN-1670:
-----------------------------------------------

Looks good, checking this in.

> aggregated log writer can write more log data then it says is the log length
> ----------------------------------------------------------------------------
>
>                 Key: YARN-1670
>                 URL: https://issues.apache.org/jira/browse/YARN-1670
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 0.23.10, 2.2.0
>            Reporter: Thomas Graves
>            Assignee: Mit Desai
>            Priority: Critical
>         Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
> YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
> YARN-1670.patch, YARN-1670.patch
>
>
> We have seen exceptions when using 'yarn logs' to read log files. 
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>        at java.lang.Long.parseLong(Long.java:441)
>        at java.lang.Long.parseLong(Long.java:483)
>        at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
>        at 
> org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
>        at 
> org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
>        at 
> org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
> We traced it down to the reader trying to read the file type of the next file 
> but where it reads is still log data from the previous file.  What happened 
> was the Log Length was written as a certain size but the log data was 
> actually longer then that.  
> Inside of the write() routine in LogValue it first writes what the logfile 
> length is, but then when it goes to write the log itself it just goes to the 
> end of the file.  There is a race condition here where if someone is still 
> writing to the file when it goes to be aggregated the length written could be 
> to small.
> We should have the write() routine stop when it writes whatever it said was 
> the length.  It would be nice if we could somehow tell the user it might be 
> truncated but I'm not sure of a good way to do this.
> We also noticed that a bug in readAContainerLogsForALogType where it is using 
> an int for curRead whereas it should be using a long. 
>       while (len != -1 && curRead < fileLength) {
> This isn't actually a problem right now as it looks like the underlying 
> decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to