[
https://issues.apache.org/jira/browse/YARN-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107375#comment-16107375
]
Jason Lowe commented on YARN-6875:
----------------------------------
To be clear, I'm not a fan of the current approach for partial aggregations
that generates a separate file per pass. I think we're all in agreement that
partial aggregations should not result in multiple files after the operation
completes. I'm just proposing a way to avoid any additional files, even
transient, during partial aggregation. We already need some kind of marker for
the metainfo block so the reader can know with certainly it has found a proper
metainfo block, otherwise the race condition I pointed out above will result in
undefined behavior for the reader. I'm proposing we leverage this marker so we
can avoid the need for a transient index file.
bq. However, if we don't write the (temp) index file, and the approach listed
in Jason's comment will make read become very slow since it need to repeatedly
find where's the last successful write. And the worst part is, we only need to
read logs when app fails or slow, it will be likely that we will read such app
logs for a couple of times. I don't think it will be a good user experience to
do this every-time.
Quite a few important points to note here:
# The read scan won't be as slow as it is today. Today it has to decompress
each block in order to locate the next block. The scan for the metainfo marker
would not require any decompression, just a straight read.
# The read scan today must start from the beginning of the file, so it has to
read (and decompress!) the worst-case amount of data to find logs at the end.
For the metainfo scan we only need to scan from the end of the file to the
first metainfo block we find. That means, worst-case, we're only going to read
(without decompressing) the amount of data for the last append operation
currently in progress to locate any log in the file.
# The read scan only needs to occur when we are trying to read during an append
operation. This will only be a repeating process if the append operation is
still ongoing when we try to do subsequent reads.
I would argue this scan is going to be much faster than you are assuming, and
we only need to perform it when there is an ongoing append. What is the
anticipated duty cycle of append operations? How likely will the repeated read
scan scenario occur in practice, and to a point where the scan is not fast
enough?
bq. what's the percentage of apps running in your cluster which enabled partial
log aggregation?
We currently do not have any partial aggregations enabled in our clusters. The
number of additional files it creates today are one of the obstacles to
creating it, but as we see longer and longer running apps on our clusters we
will eventually need a partial aggregation solution. Hopefully we're in
agreement that no transient index file should be created during a normal log
aggregation, and we're only debating what to do for partial aggregations.
> New aggregated log file format for YARN log aggregation.
> --------------------------------------------------------
>
> Key: YARN-6875
> URL: https://issues.apache.org/jira/browse/YARN-6875
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Xuan Gong
> Assignee: Xuan Gong
> Attachments: YARN-6875-NewLogAggregationFormat-design-doc.pdf
>
>
> T-file is the underlying log format for the aggregated logs in YARN. We have
> seen several performance issues, especially for very large log files.
> We will introduce a new log format which have better performance for large
> log files.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]