[ 
https://issues.apache.org/jira/browse/YARN-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105859#comment-16105859
 ] 

Xuan Gong commented on YARN-6875:
---------------------------------

Thanks for the comments. [~jlowe]. I fully understand your consideration. But,

bq.  I'm not a big fan of having a separate file, even temporarily, because log 
aggregation can already be a large portion of the namenode's write load on 
large clusters. Having that separate file will increase the namenode write load 
significantly (approximately 2x per log aggregation cycle if I understand it 
correctly).

I agree with this. But the proposed solution will not be worse than current 
solution (TFile). Also, the index file will be created only when the partially 
log aggregation is enabled.
If we enable partially log aggregation:
* For T-File solution (currently used), we would create a new file every time 
we do the log aggregation. If we have done log aggregation three times, we 
would have three T-Files
* For the proposed solution, at most, we would have two files: the log file and 
index file.

bq. Note that the separate index file doesn't solve all the race conditions for 
the reader.

Yes, this corn case is valid. But I think that this is OK. The reader would 
fail in this case, but we can always retry the reader later.

> New aggregated log file format for YARN log aggregation.
> --------------------------------------------------------
>
>                 Key: YARN-6875
>                 URL: https://issues.apache.org/jira/browse/YARN-6875
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>         Attachments: YARN-6875-NewLogAggregationFormat-design-doc.pdf
>
>
> T-file is the underlying log format for the aggregated logs in YARN. We have 
> seen several performance issues, especially for very large log files.
> We will introduce a new log format which have better performance for large 
> log files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to