[ 
https://issues.apache.org/jira/browse/YARN-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181028#comment-14181028
 ] 

Zhijie Shen commented on YARN-2724:
-----------------------------------

[~xgong], w.r.t to the fix, I think we shouldn't put 
{{out.writeUTF(logFile.getName());}} and 
{{out.writeUTF(String.valueOf(fileLength));}} into the try block where file 
content is being written, because it will raise other race conditions. For 
example, if {{out.writeUTF(logFile.getName());}} throws the exception, the 
exception message will be written into the file without name and length being 
written upfront, corrupting the log file format again.

Other than the permission issue, in general, IOException is going to happen at 
any iteration while copying buf into the output stream, resulting writing fewer 
bytes than predefined file length. One possible solution is to fill the 
remaining space with all 0. For example, given the file length is 1000, for 
some reason, writing 500 ~ 600 bytes throws the exception,  we can try to fill 
0 from pos 500 to pos 999. Will file a separate ticket for general IOException 
while copying the file.

> If an unreadable file is encountered during log aggregation then aggregated 
> file in HDFS badly formed
> -----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-2724
>                 URL: https://issues.apache.org/jira/browse/YARN-2724
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: log-aggregation
>    Affects Versions: 2.5.1
>            Reporter: Sumit Mohanty
>            Assignee: Xuan Gong
>         Attachments: YARN-2724.1.patch
>
>
> Look into the log output snippet. It looks like there is an issue during 
> aggregation when an unreadable file is encountered. Likely, this results in 
> bad encoding.
> {noformat}
> LogType: command-13.json
> LogLength: 13934
> Log Contents:
> Error aggregating log file. Log file : 
> /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_000004/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_000004/command-13.json
>  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
> /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_000004/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_000004/command-3.json
>  (Permission denied)
>               
> errors-3.txt0gc.log-20141021044514484052014-10-21T04:45:12.046+0000: 5.134: 
> [GC2014-10-21T04:45:12.046+0000: 5.134: [ParNew: 163840K->15575K(184320K), 
> 0.0488700 secs] 163840K->15575K(1028096K), 0.0492510 secs] [Times: user=0.06 
> sys=0.01, real=0.05 secs]
> 2014-10-21T04:45:14.939+0000: 8.027: [GC2014-10-21T04:45:14.939+0000: 8.027: 
> [ParNew: 179415K->11865K(184320K), 0.0941310 secs] 179415K->17228K(1028096K), 
> 0.0943140 secs] [Times: user=0.13 sys=0.04, real=0.09 secs]
> 2014-10-21T04:46:42.099+0000: 95.187: [GC2014-10-21T04:46:42.099+0000: 
> 95.187: [ParNew: 175705K->12802K(184320K), 0.0466420 secs] 
> 181068K->18164K(1028096K), 0.0468490 secs] [Times: user=0.06 sys=0.00, 
> real=0.04 secs]
> {noformat}
> Specifically, look at the text after the exception text. There should be two 
> more entries for log files but none exist. This is likely due to the fact 
> that command-13.json is expected to be of length 13934 but its is not as the 
> file was never read.
> I think, it should have been
> {noformat}
> LogType: command-13.json
> LogLength: <Length of the exception text>
> Log Contents:
> Error aggregating log file. Log file : 
> /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_000004/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_000004/command-13.json
>  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
> /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_000004/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_000004/command-3.json
>  (Permission denied)
> {noformat}
> {noformat}
> LogType: errors-3.txt
> LogLength:0
> Log Contents:
> {noformat}
> {noformat}
> LogType:gc.log
> LogLength:???
> Log Contents:
> ......-20141021044514484052014-10-21T04:45:12.046+0000: 5.134: 
> [GC2014-10-21T04:45:12.046+0000: 5.134: [ParNew: 163840K- .......
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to