[ 
https://issues.apache.org/jira/browse/YARN-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105871#comment-16105871
 ] 

Wangda Tan commented on YARN-6875:
----------------------------------

Thanks for comments from [~jlowe]/[~xgong]. 

I think I misled Jason before, we didn't plan to add the separate index design 
at beginning, but we figured out it is required for recovery. 

I agree the points from Jason:
- Log files are rarely read after write.
- Creation of  a separate index file during write means 2x workload of 
Namenode. 

However, if we don't write the (temp) index file, and the approach listed in 
Jason's comment will make read become very slow since it need to repeatedly 
find where's the last successful write. And the worst part is, we only need to 
read logs when app fails or slow, it will be likely that we will read such app 
logs for a couple of times. I don't think it will be a good user experience to 
do this every-time. 

I agree with comments from Xuan, if partial log aggregation is not enabled, 
this design doesn't increase any workload. [~jlowe], what's the percentage of 
apps running in your cluster which enabled partial log aggregation? 

For partial log aggregation case, an alternative solution is to write log+index 
to a separate file every time, which makes write perf exactly same as TFile, 
but read performance can be much better. Jason, could you share your thoughts 
here?


> New aggregated log file format for YARN log aggregation.
> --------------------------------------------------------
>
>                 Key: YARN-6875
>                 URL: https://issues.apache.org/jira/browse/YARN-6875
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>         Attachments: YARN-6875-NewLogAggregationFormat-design-doc.pdf
>
>
> T-file is the underlying log format for the aggregated logs in YARN. We have 
> seen several performance issues, especially for very large log files.
> We will introduce a new log format which have better performance for large 
> log files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to