[ 
https://issues.apache.org/jira/browse/YARN-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107375#comment-16107375
 ] 

Jason Lowe commented on YARN-6875:
----------------------------------

To be clear, I'm not a fan of the current approach for partial aggregations 
that generates a separate file per pass.  I think we're all in agreement that 
partial aggregations should not result in multiple files after the operation 
completes.  I'm just proposing a way to avoid any additional files, even 
transient, during partial aggregation.  We already need some kind of marker for 
the metainfo block so the reader can know with certainly it has found a proper 
metainfo block, otherwise the race condition I pointed out above will result in 
undefined behavior for the reader.  I'm proposing we leverage this marker so we 
can avoid the need for a transient index file.

bq. However, if we don't write the (temp) index file, and the approach listed 
in Jason's comment will make read become very slow since it need to repeatedly 
find where's the last successful write. And the worst part is, we only need to 
read logs when app fails or slow, it will be likely that we will read such app 
logs for a couple of times. I don't think it will be a good user experience to 
do this every-time.

Quite a few important points to note here:
# The read scan won't be as slow as it is today.  Today it has to decompress 
each block in order to locate the next block.  The scan for the metainfo marker 
would not require any decompression, just a straight read.
# The read scan today must start from the beginning of the file, so it has to 
read (and decompress!) the worst-case amount of data to find logs at the end.  
For the metainfo scan we only need to scan from the end of the file to the 
first metainfo block we find.  That means, worst-case, we're only going to read 
(without decompressing) the amount of data for the last append operation 
currently in progress to locate any log in the file.
# The read scan only needs to occur when we are trying to read during an append 
operation.  This will only be a repeating process if the append operation is 
still ongoing when we try to do subsequent reads.

I would argue this scan is going to be much faster than you are assuming, and 
we only need to perform it when there is an ongoing append.  What is the 
anticipated duty cycle of append operations?  How likely will the repeated read 
scan scenario occur in practice, and to a point where the scan is not fast 
enough?

bq. what's the percentage of apps running in your cluster which enabled partial 
log aggregation?

We currently do not have any partial aggregations enabled in our clusters.  The 
number of additional files it creates today are one of the obstacles to 
creating it, but as we see longer and longer running apps on our clusters we 
will eventually need a partial aggregation solution.  Hopefully we're in 
agreement that no transient index file should be created during a normal log 
aggregation, and we're only debating what to do for partial aggregations.

> New aggregated log file format for YARN log aggregation.
> --------------------------------------------------------
>
>                 Key: YARN-6875
>                 URL: https://issues.apache.org/jira/browse/YARN-6875
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>         Attachments: YARN-6875-NewLogAggregationFormat-design-doc.pdf
>
>
> T-file is the underlying log format for the aggregated logs in YARN. We have 
> seen several performance issues, especially for very large log files.
> We will introduce a new log format which have better performance for large 
> log files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to