Adar Dembo has posted comments on this change. ( )

Change subject: KUDU-2693. Buffer blocks while flushing rowsets

Patch Set 1:

Commit Message:
PS1, Line 9: This adds the ability to the LogBlockManager to write a block that 
           : fully buffered in memory before eventually being flushed.
Worth noting that if this behavior becomes the default, the LogWritableBlock 
finalize vs. close code could be drastically simplified (in effect, there'd be 
no difference between the two because an in-memory writable block does not 
occupy a container).
PS1, Line 13: This means that, instead of each flush
            : requiring its own LBM container, many of the output blocks can be
            : consolidated into a small number of new containers (typically one 
            : disk).
While certainly an improvement over the random block-to-container assignment we 
have going on now, I don't know how this will work when considering the longer 
term vision of "put all blocks belonging to one column in one container" (i.e. 
to reduce the number of seeks needed to scan a particular column's data).

I know you've voiced skepticism about this in the past; could you remind me 
what your concerns were? It's also interesting to consider this in light of 
what you write in the next paragraph, as storing DRS-colocated data in one 
container is decidedly "row-oriented" storage.
PS1, Line 24: --diskrowset-writer-buffer-data
Nit: although valid, we still prefer "--foo_bar_baz" as the canonical gflag 
name to make searching easier, and that ought to apply to searching commit 
history too.
PS1, Line 27: up to 2x overhead from buffer size-doubling
Not following this bit; there's just one buffer per writable block, so where's 
the doubling effect come from?
PS1, Line 42:    "data dirs.queue_time_us" : 1847,
            :    "data dirs.run_cpu_time_us" : 409,
            :    "data dirs.run_wall_time_us" : 21185,
Were these edited out of "with the change"? If so, could drop them from here 

I'm also curious as to what this actually was. FlushMRSOp doesn't delete 
blocks, so there's no hole punching. I can't see how else the flush path might 
do some work on the data dir threadpools.
PS1, Line 66:    "lbm_writes_lt_1ms" : 1419
This is almost an order of magnitude less than before the change. Any thoughts 
on why?
PS1, Line 73: On a spinning disk the reduction from 450
            : fsyncs to 5 fsyncs is likely to be even more impactful in terms 
of wall
            : clock.
Yeah I'd love to see some numbers on a spinning disk.
File src/kudu/fs/
PS1, Line 517:   // TODO(todd) add tests for aborting buffered blocks and 
making sure the metrics
             :   // are appropriate
Looks like you did this below.
File src/kudu/fs/
PS1, Line 20: #include <errno.h>
Let's use cerrno instead.
PS1, Line 1667:   RETURN_NOT_OK(manager_->GetOrCreateContainer(opts_, 
What does it mean for this to fail? Before this change, creating a container 
was tightly coupled to block creation; if the former failed, so did the latter. 
That's no longer the case. What happens if we run out of disk space trying to 
create a new container for a block with some data in its buffer?
File src/kudu/tablet/
PS1, Line 278: Status MajorDeltaCompaction::OpenRedoDeltaFileWriter() {
Why not extend buffering to the redo/undo blocks here? They're logically 
grouped into a transaction with the new base data, so why not buffer and 
colocate them too?

To view, visit
To unsubscribe, visit

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacc662ba812ece8e68b0ef28f4ccdf0b7475fbc0
Gerrit-Change-Number: 12425
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <>
Gerrit-Reviewer: Adar Dembo <>
Gerrit-Reviewer: Andrew Wong <>
Gerrit-Reviewer: Andrew Wong <>
Gerrit-Reviewer: Grant Henke <>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <>
Gerrit-Comment-Date: Wed, 13 Feb 2019 07:07:33 +0000
Gerrit-HasComments: Yes

Reply via email to