[ 
https://issues.apache.org/jira/browse/YARN-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103346#comment-16103346
 ] 

Jason Lowe commented on YARN-6875:
----------------------------------

Thanks for posting the doc!  I'm not a big fan of having a separate file, even 
temporarily, because log aggregation can already be a large portion of the 
namenode's write load on large clusters.  Having that separate file will 
increase the namenode write load significantly (approximately 2x per log 
aggregation cycle if I understand it correctly).

Note that the separate index file doesn't solve all the race conditions for the 
reader.  For example, this sequence:
# Reader checks for an index file which is not there
# Writer begins append and creates index file and starts appending
# Reader seeks to the end of the log file but does _not_ find the metainfo 
structure because the writer is in the process of appending more data

This could be mitigated by having the reader repeat the attempt to read process 
from the beginning so it can rediscover the index file, but this requires that 
the reader is capable of recognizing that it is _not_ looking at a proper 
metainfo block on that first attempt.  The document does not cover this 
necessary rinse-repeat cycle required on the reader's part, nor how a reader 
can reliably identify the case where it is not looking at a proper metainfo 
block because it happened to try to read just as an append operation occurs.

I'm wondering if we can eliminate the need for the index file, and thus reduce 
the write load on the namenode, by having the reader be able to discover the 
metainfo file even during an append operation.  Similar to sync markers in 
SequenceFile, we could create a unique, UUID-like sync marker that is written 
out before every metainfo block.  The reader would attempt to find the metainfo 
block normally (i.e.: seek to the last 64 bits of the file, read the 64-bit 
offset, then seek back that far to check for a metainfo block).  If it finds it 
then great, the reader is ready to read whatever it is looking for.  If it does 
not find a proper metainfo file then it can start scanning backwards through 
the file looking for a metainfo sync marker.  This scan could be accomplished 
via a number of ways, such as sequentially scanning backwards block at a time 
in fixed-size blocks or seeking much farther backwards in a larger chunk that 
is scanned forward in fixed-sized chunks then repeating if the marker is not 
found.

Isn't this a lot slower for the reader when it has to scan for the marker?  
Yep, it sure is.  However I would argue this is probably a rare occurrence in 
practice for two reasons:
# Logs are often written and never read
# Appending is a relatively rare and short-lived operation during the lifespan 
of a log file

By having the writer create the index file, we're essentially optimizing for 
this rare read-during-append case at the expense of making every writer more 
expensive.  Instead the sync marker approach optimizes for the much more common 
writing case, putting the load on the reader side if it happens to encounter a 
log file mid-append during a read operation.  I would argue that should be a 
relatively rare occurrence, and thus I'd rather optimize for the more common 
case.

Another alternative to the index file is using xattrs to associate the last 
good metainfo offset with the file.  However that still leads to approximately 
the same namenode write ops as the separate index file and requires special 
support on the underlying filesystem.  I'm not a fan of using xattrs myself, 
but I thought I'd mention it in the interest of covering the potential 
solutions.


> New aggregated log file format for YARN log aggregation.
> --------------------------------------------------------
>
>                 Key: YARN-6875
>                 URL: https://issues.apache.org/jira/browse/YARN-6875
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>         Attachments: YARN-6875-NewLogAggregationFormat-design-doc.pdf
>
>
> T-file is the underlying log format for the aggregated logs in YARN. We have 
> seen several performance issues, especially for very large log files.
> We will introduce a new log format which have better performance for large 
> log files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to