Alexey Serbin has posted comments on this change.
Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode
Patch Set 25:
Thank you for review. I'm planning to post updated patch soon.
PS25, Line 182: // Must be public to use as a thread closure.
> They can be static functions, though, right? I don't see them accessing any
Good observation: yes, now they can be static methods of this class.
Previously they could not.
But I don't want make them just static functions -- they need access to private
members of KuduSession, and I don't like idea of declaring many friend
functions for the KuduSession class.
PS25, Line 187: Microseconds
> How did you choose 10 us? It seems low to me (i.e. maybe we can get away wi
The idea was to monitor those values as closely as possible. Without resorting
to invasive changes to KuduSession::Data or a separate class which would
inherit from KuduSession::Data to monitor the maximum, I had nothing better
Why do you think 10 us is too low?
PS25, Line 199:
> Nit: extra space here.
PS25, Line 553: oprations
> Nit: operations
PS25, Line 556: WaitForAutoFlushBackground
> Hmm, when this returns, there's no actual guarantee that a batch flushed, r
Exactly, that's the idea. When guaranteed flush is needed, it's necessary to
use KuduSession::Flush() call.
PS25, Line 558: MAX_WAIT_MS
Line 1207: << "unexpected status: " << result.ToString();
> There's a better technique for handling situations like these:
OK, thanks. Will use.
PS25, Line 2153: BUFFER_SIZE
> kBufferSizeBytes (camel case, k prefix, and append the unit)
PS25, Line 2186: // Applying a big operation which does not fit into the buffer
: // return an error with session running in any supported flush
> Doesn't this test make the second half of TestApplyTooMuchWithoutFlushing r
Nope. This is different: this test verifies that an attempt to apply a
_single_ operation which does not fit the buffer produces an error right away
with no buffered data prior to that.
This test exercises different code path (not the same path as the second half
of the TestApplyTooMuchWithoutFlushing test).
PS25, Line 2197: for (auto mode: FLUSH_MODES)
> Nit: auto mode : FLUSH_MODES
PS25, Line 2215: STLDeleteElements(&errors);
> Nit: use an ElementDeleter and declare it right after declaring 'errors'. T
PS25, Line 2258: WaitForAutoFlushBackground(session, FLUSH_INTERVAL_MS);
: // There should be some pending ops: the automatic
: // should not be active after switch to the MANUAL_FLUSH
> I see. How expensive is this in terms of test runtime? You've dropped the f
About 500 ms. Is that too much?
Line 2304: *result = t_end - t_begin;
> Have you looked at the StopWatch class? May be easier for this kind of thin
Will do, thanks!
PS25, Line 2335: // Check that call to Flush()/FlushAsync() is no-op if there
: // batcher for the session.
> Doesn't this test make TestEmptyBatch redundant?
good observation -- I'll remove that.
PS25, Line 2348: ASSERT_OK(sync0.Wait());
: // The same with the synchronous flush.
: ASSERT_EQ(0, session->CountPendingErrors());
> How does this verify that flush was a no-op? Seems to me you need to find s
This is to test that no error happens when trying to call flush and current
batcher is not there. The comment for the test is misleading -- will fix.
PS25, Line 2391: kudu::Thread::Create
> Now that we're building with C++11, we prefer to use std::thread for test t
OK, that's sounds good. I thought I had to use KuduThreads, otherwise I would
use std::thread from the beginning.
Just another instance when addressing my doubts by direct asking would help.
It seems I need to communicate more on issues like this.
Line 2428: // AUTO_FLUSH_BACKGROUND should be faster than AUTO_FLUSH_SYNC.
> Hmm, is this expected to be true 100% of all test runs? Even with debug ins
Due to the nature of the things and chosen parameters, the difference is in
order of magnitude. Therefore, no flakiness is expected. Besides, if using
debug/TSAN/ASAN builds, both versions become slower, so they are expected to
keep the speed ratio.
Yes, I did that check locally: ran it using shell while true; do ... || break;
done cycle. It looked stable (more than 4096 iterations in the release mode
and 1024 in debug one). Will do with dist-test.py as well.
Line 2429: EXPECT_GT(t_diff_afs.ToMilliseconds(),
> Can't we just EXPECT_GT on the MonoDeltas directly now?
Good observation! Yes, now we can compare them directly.
PS25, Line 2451: // AUTO_FLUSH_BACKGROUND should be faster than
> Same issues and questions as in the above test.
Yep, this test hasn't shown flakiness so far.
Yes, it's possible to compare MonoDelta directly now -- will update.
PS25, Line 2456: // applying a bunch of small rows without a flush should not
: // an error, even with low limit on the buffer space. This is
: // Session::Apply() blocks and waits for buffer space to
> For this test, you may want to scan once you're done writing to make sure t
It's a good idea! Actually, it would be nice to do that for every test related
to batching while performing inserts via Apply() :) But in this context, this
test looks as the best place for that since if we expect an error like that,
then it most likely to trigger here.
OK, I'll add that check.
Line 2488: // applying a bunch of rows every of which is so big in size that
> Nit: every one of which
PS25, Line 2489: does
> Nit: do
PS25, Line 2510: 3
> Why does i start at 3 and not 0?
Starting from i == 3: this is the lowest i when
((i - 1) * BUFFER_SIZE / i) has a value greater
than BUFFER_SIZE / 2. We want a pair of operations
that both don't fit into the buffer,
but every single one does.
I will add the comment.
PS25, Line 2574: ASSERT_TRUE(session->HasPendingOperations());
> This is the first test where I've noticed this but others may be affected t
Yes, I also have this concern. The principal solution would be
delaying/stopping the Messenger's task queue for a while or throttling the
target tablet server to 0 operations-per-second, checking for the
and then restoring the original state of affairs. However, I did not find the
way to do that. If you know how to do that, please let me know. Probably,
that would be a good to-do project that will help to make many testcases more
I ran multiple iterations of this test on my laptop and at the test machine: no
flakiness observed. Let's address this in separate changelist.
PS25, Line 2612: const MonoTime t_begin(MonoTime::Now());
: const MonoTime t_end(MonoTime::Now());
> Consider StopWatch for this.
OK, will do. Thanks!
PS25, Line 2627: // It's supposed that sending an operation to the server
: // order of magnitude less than the periodic flush
> Nit: "Sending an operation to the server is supposed to be at least an orde
Actually, after some consideration I understood we can get rid of this
PS25, Line 2630: EXPECT_GT(FLUSH_INTERVAL_MS / 10, t_diff.ToMilliseconds());
> Again, worried about flakiness due to testing timing like this.
Yes, I share that concern. I ran the tests in cycle many times (1000, that's
for sure). This has shown no flakiness so far.
I addressed the concern via increasing the flush timeout and using Flush()
while finalizing the test.
PS25, Line 2634: It's supposed the flushed operations
: // reach the tablet server faster than the max-wait flush
> Nit: "The flushed operations should reach the tablet server faster than the
After some consideration I found it's better to call Flush() here: the only
thing we want to ensure in this test is that the second Apply() does not take
too long. The functionality of the background flusher is checked in the
PS25, Line 2636: WaitForAutoFlushBackground(session, FLUSH_INTERVAL_MS);
> This will be at least a 10s pause, right? If so, can you condition the enti
Actually, it should have been session->Flush() call here. Fixed.
PS25, Line 2645: accomodate
> Nit: accommodate
PS25, Line 2653: gurantees
> Nit: guarantees
PS25, Line 2663: TestAutoFlushBackgroundApplyBlocks
> Lots of opportunities for flakiness here due to timing measurements, please
The comparable measurements are set to be different in order of magnitude. I
ran this test many times with TSAN, no issues appeared.
Having a provision to block processing of Messenger's queue would allow to have
bullet-proof guarantee for stability of this test. Let's do that in a separate
PS25, Line 2714: It's supposed the flushed operations
: // reach the tablet server faster than wait_timeout_ms.
> Nit: reword as per the above.
Actually, it should have been session->Flush() instead of
PS25, Line 1351: successfull
PS25, Line 1352: left
> be left
PS25, Line 106: PRCs
PS25, Line 150: PRCs
PS25, Line 174: PRCs
PS25, Line 208: PRCs
PS25, Line 102: NoLock
> Kudu convention is to use an "Unlocked" suffix for methods like these.
PS25, Line 144: This method is used by tests only.
> Kudu convention is to add the suffix "ForTests" for test-only methods like
PS25, Line 151: is here to represent
> Nit: just "represents"
PS25, Line 152: // the first argument in expressions like
: // this is the watermark corresponding to 1 byte of data.
> Yes, but what is the effect? What is special about choosing a value of 1?
Oops, it seems I missed explaining the essence.
The effect is that using this watermark allows to flush the batcher only if
it's non-empty. This watermark level is the minimum possible for a non-empty
batcher, so any non-empty batcher will be flushed if calling
FlushCurrentBatcher() using this watermark.
I'll will add the comment.
PS25, Line 181: // Note that this mutex should not be acquired if the thread
: // holding a Batcher's lock. This must be acquired first.
> How is such sequencing possible? Isn't the batcher's lock internal to the b
That's came from the prior lock_ description -- I just moved it here. Frankly,
after reading this I was also puzzled.
As I understand, this is outdated. This comment is already stale in the
original file. As I see from git history, this came here from client.h during
the 'pimplesation' of the KuduSession class. It might come from old times when
there was a separate batcher lock or the Batcher did not have its own locks.
To view, visit http://gerrit.cloudera.org:8080/3952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <d...@cloudera.com>
Gerrit-Reviewer: Dinesh Bhat <din...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>