Mike Percy created KUDU-2693:
--------------------------------

             Summary: Buffer DiskRowSet flushes to more efficiently write many 
columns
                 Key: KUDU-2693
                 URL: https://issues.apache.org/jira/browse/KUDU-2693
             Project: Kudu
          Issue Type: Improvement
          Components: fs, tablet
    Affects Versions: 1.9.0
            Reporter: Mike Percy


When looking at a trace of some MRS flushes on a table with 280 columns, it was 
observed that during the course of the flush some 695 fdatasync() calls 
occurred.

One possible way to minimize the number of fsync calls would be to flush 
directly to memory buffers first, determine the ideal layout on disk for the 
flushed blocks (possibly striped across one log block container per data disk) 
and then potentially write the data out to the containers in parallel. This 
would require some memory buffer space to be reserved per maintenance manager 
thread, possibly 64MB since the DRS roll size is 32MB.

According to Todd we could probably do it all in LogBlockManager by adding a 
new flag to CreateBlockOptions that says whether to buffer or something like 
that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to