[ 
https://issues.apache.org/jira/browse/YARN-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103982#comment-16103982
 ] 

Robert Kanter commented on YARN-6875:
-------------------------------------

[~xgong], have you taken a look at YARN-2942 and subtasks?  I tried to do 
something like this a while ago and we went through a few different designs (I 
think there are 3 major different approaches, and some minor revisions for 
each); one of the approaches was very similar to your design, where there's an 
index file.

In the end, we decided to do something completely different (MAPREDUCE-6415) by 
adding a command to combine log files into HAR files.  This was to help with 
the too-many-small-files problem; though we still kept the T-files, so the goal 
was slightly different.  

Anyway, I did write a bunch of code for YARN-2942 and some subtasks before we 
canned it, so you might want to take a look in case you find something useful 
in there or the design documents.

> New aggregated log file format for YARN log aggregation.
> --------------------------------------------------------
>
>                 Key: YARN-6875
>                 URL: https://issues.apache.org/jira/browse/YARN-6875
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>         Attachments: YARN-6875-NewLogAggregationFormat-design-doc.pdf
>
>
> T-file is the underlying log format for the aggregated logs in YARN. We have 
> seen several performance issues, especially for very large log files.
> We will introduce a new log format which have better performance for large 
> log files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to