[
https://issues.apache.org/jira/browse/YARN-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103346#comment-16103346
]
Jason Lowe commented on YARN-6875:
----------------------------------
Thanks for posting the doc! I'm not a big fan of having a separate file, even
temporarily, because log aggregation can already be a large portion of the
namenode's write load on large clusters. Having that separate file will
increase the namenode write load significantly (approximately 2x per log
aggregation cycle if I understand it correctly).
Note that the separate index file doesn't solve all the race conditions for the
reader. For example, this sequence:
# Reader checks for an index file which is not there
# Writer begins append and creates index file and starts appending
# Reader seeks to the end of the log file but does _not_ find the metainfo
structure because the writer is in the process of appending more data
This could be mitigated by having the reader repeat the attempt to read process
from the beginning so it can rediscover the index file, but this requires that
the reader is capable of recognizing that it is _not_ looking at a proper
metainfo block on that first attempt. The document does not cover this
necessary rinse-repeat cycle required on the reader's part, nor how a reader
can reliably identify the case where it is not looking at a proper metainfo
block because it happened to try to read just as an append operation occurs.
I'm wondering if we can eliminate the need for the index file, and thus reduce
the write load on the namenode, by having the reader be able to discover the
metainfo file even during an append operation. Similar to sync markers in
SequenceFile, we could create a unique, UUID-like sync marker that is written
out before every metainfo block. The reader would attempt to find the metainfo
block normally (i.e.: seek to the last 64 bits of the file, read the 64-bit
offset, then seek back that far to check for a metainfo block). If it finds it
then great, the reader is ready to read whatever it is looking for. If it does
not find a proper metainfo file then it can start scanning backwards through
the file looking for a metainfo sync marker. This scan could be accomplished
via a number of ways, such as sequentially scanning backwards block at a time
in fixed-size blocks or seeking much farther backwards in a larger chunk that
is scanned forward in fixed-sized chunks then repeating if the marker is not
found.
Isn't this a lot slower for the reader when it has to scan for the marker?
Yep, it sure is. However I would argue this is probably a rare occurrence in
practice for two reasons:
# Logs are often written and never read
# Appending is a relatively rare and short-lived operation during the lifespan
of a log file
By having the writer create the index file, we're essentially optimizing for
this rare read-during-append case at the expense of making every writer more
expensive. Instead the sync marker approach optimizes for the much more common
writing case, putting the load on the reader side if it happens to encounter a
log file mid-append during a read operation. I would argue that should be a
relatively rare occurrence, and thus I'd rather optimize for the more common
case.
Another alternative to the index file is using xattrs to associate the last
good metainfo offset with the file. However that still leads to approximately
the same namenode write ops as the separate index file and requires special
support on the underlying filesystem. I'm not a fan of using xattrs myself,
but I thought I'd mention it in the interest of covering the potential
solutions.
> New aggregated log file format for YARN log aggregation.
> --------------------------------------------------------
>
> Key: YARN-6875
> URL: https://issues.apache.org/jira/browse/YARN-6875
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Xuan Gong
> Assignee: Xuan Gong
> Attachments: YARN-6875-NewLogAggregationFormat-design-doc.pdf
>
>
> T-file is the underlying log format for the aggregated logs in YARN. We have
> seen several performance issues, especially for very large log files.
> We will introduce a new log format which have better performance for large
> log files.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]