[
https://issues.apache.org/jira/browse/YARN-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103982#comment-16103982
]
Robert Kanter commented on YARN-6875:
-------------------------------------
[~xgong], have you taken a look at YARN-2942 and subtasks? I tried to do
something like this a while ago and we went through a few different designs (I
think there are 3 major different approaches, and some minor revisions for
each); one of the approaches was very similar to your design, where there's an
index file.
In the end, we decided to do something completely different (MAPREDUCE-6415) by
adding a command to combine log files into HAR files. This was to help with
the too-many-small-files problem; though we still kept the T-files, so the goal
was slightly different.
Anyway, I did write a bunch of code for YARN-2942 and some subtasks before we
canned it, so you might want to take a look in case you find something useful
in there or the design documents.
> New aggregated log file format for YARN log aggregation.
> --------------------------------------------------------
>
> Key: YARN-6875
> URL: https://issues.apache.org/jira/browse/YARN-6875
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Xuan Gong
> Assignee: Xuan Gong
> Attachments: YARN-6875-NewLogAggregationFormat-design-doc.pdf
>
>
> T-file is the underlying log format for the aggregated logs in YARN. We have
> seen several performance issues, especially for very large log files.
> We will introduce a new log format which have better performance for large
> log files.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]