[kudu-CR] async background flush provision for C++ client
Alexey Serbin has posted comments on this change. Change subject: async background flush provision for C++ client .. Patch Set 2: (59 comments) Thank you for review. I addressed most of the comments, and I'm working on testing the change more thoroughly by all means (more tests will follow) and also looking at the other option of implementing background flushing. Adar recommended to look at the option to utilize one of the reactor threads for that purpose. I'll post an update on this change, but more changes are expected to appear soon. http://gerrit.cloudera.org:8080/#/c/3668/2//COMMIT_MSG Commit Message: PS2, Line 7: provision > nit: remove provision Done Line 7: async background flush provision for C++ client > nit: capitalize (Async) Done PS2, Line 10: is > nit: avoid using the passive voice. i.e. "KuduSession starts a background f Done PS2, Line 13: virtual > what's a virtual buffer? There isn't a single buffer as is, there is a set of buffers used by multiple batchers. Will add clarification comment. Line 16: The flush criteria is based on buffer size for not-yet-scheduled > I'm having trouble parsing this sentence. Done PS2, Line 17: operatations > nit: type Done PS2, Line 20: operatations > nit: typo Done PS2, Line 23: KUDU > Mention this issue in the commit message title. Done http://gerrit.cloudera.org:8080/#/c/3668/2/src/kudu/client/batcher.cc File src/kudu/client/batcher.cc: Line 267: #ifndef NDEBUG > spurious change It's not spurious, it fixes compilation warning of unused variables for debug builds. Line 383: > spurious change Done Line 553: //since the other time it's called in KuduSession::Apply() > nit: rephrase. How about: Done http://gerrit.cloudera.org:8080/#/c/3668/2/src/kudu/client/batcher.h File src/kudu/client/batcher.h: Line 109: // Return number of bytes used for batcher buffer > nit: "Returns the" I see most of comments in this file does not use third person singular form. The only exception is external_consistency_mode(). Are you sure it should be 'Returns the number', but not 'Return the number'? PS2, Line 216: // > no need for this comment Done http://gerrit.cloudera.org:8080/#/c/3668/2/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: Line 1991: // return an error with session running in any supported flush mode > nit: period Done PS2, Line 1993: static > not need for static. Done PS2, Line 1993: buffer_space_bytes_limit > style: kBufferSpaceBytesLimit or BUFFER_SPACE_BYTES_LIMIT Done PS2, Line 1994: static > same Done Line 1995: static const KuduSession::FlushMode modes[] = { > same Done Line 2001: for (const auto mode: modes) { > isn't 'auto' already const ? Done Line 2007: ASSERT_TRUE(s.IsIncomplete()) << "got unexpected status: " << s.ToString(); > nit: capitalize Got Done Line 2014: // Applying a bunch of small rows without a flush should not result in > this sentence is huge and hard to read. please break it down. Done Line 2016: // since there is a flow control which makes Session::Apply() block and wait > what: "a flow control"? Done Line 2018: { > no need for the extra { scope Done Line 2023: ASSERT_OK(ApplyInsertToSession(session.get(), client_table_, 1, 1, "x")); > How this testing that Apply() is eventually blocking, or making sure that t This test is to verify that there isn't an error about overrunning the buffer. Yes, it's not an explicit test to verify that Apply() is blocking. Will address that separately. Line 2029: // operations are put into the queue flushed after some time. > you mean : "into the queue and flushed" right? Done Line 2030: // The exact timeout interval is implementation-dependent, > which timeout interval? Done Line 2031: // but 100 ms is a good upper limit anyways. > s/anyways/anyway Done Line 2033: { > no need for the extra { Done Line 2041: SleepFor(MonoDelta::FromMilliseconds(100)); > loop instead of wait, otherwise this will be brittle in jenkins Done http://gerrit.cloudera.org:8080/#/c/3668/2/src/kudu/client/client.cc File src/kudu/client/client.cc: Line 705: "Cannot change flush mode when writes are buffered"); > nit period Done Line 707: data_->SetFlushMode(m); > would it simplify things to now allow the flush mode to change if there is I think we can allow the flush mode to change regardless presence of buffered/pending write operations, but it requires proper design and some additional logic to be implemented. I would try to address this in a separate patch/change-list. Line 728: return data_->SetBufferBytesLimit(size); > why not use the same method name? Because from internal perspective there is no context of any mutation. It's just a buffer that accommodates byte representation of write operations. >From the other hand, the outer (API-facing) name cannot be changes now without >braking
[kudu-CR] async background flush provision for C++ client
David Ribeiro Alves has posted comments on this change. Change subject: async background flush provision for C++ client .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/3668/2/src/kudu/client/client.cc File src/kudu/client/client.cc: Line 707: data_->SetFlushMode(m); > would it simplify things to now allow the flush mode to change if there is "to _not_ allow the flush mode" -- To view, visit http://gerrit.cloudera.org:8080/3668 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idfde05bef342db24990c6e3da3b0270c3bb37a9d Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] async background flush provision for C++ client
David Ribeiro Alves has posted comments on this change. Change subject: async background flush provision for C++ client .. Patch Set 2: (62 comments) overall I think this needs more tests. Since the flusher is a thread, likely the unit tests will also have to use threads to: - make sure limits are not exceeded - make sure that operations are in fact buffered up to a point - Test the multiple triggers of a flush. Beyond the the unit tests, I think this also needs an integration test that stresses it all. http://gerrit.cloudera.org:8080/#/c/3668/2//COMMIT_MSG Commit Message: Line 7: async background flush provision for C++ client nit: capitalize (Async) PS2, Line 7: provision nit: remove provision PS2, Line 10: is nit: avoid using the passive voice. i.e. "KuduSession starts a background flusher thread" not a thread is started PS2, Line 13: virtual what's a virtual buffer? Line 16: The flush criteria is based on buffer size for not-yet-scheduled I'm having trouble parsing this sentence. PS2, Line 17: operatations nit: type PS2, Line 20: operatations nit: typo PS2, Line 23: KUDU Mention this issue in the commit message title. http://gerrit.cloudera.org:8080/#/c/3668/2/src/kudu/client/batcher.cc File src/kudu/client/batcher.cc: Line 267: #ifndef NDEBUG spurious change Line 383: spurious change Line 553: //since the other time it's called in KuduSession::Apply() nit: rephrase. How about: Optimize and call write_op->SizeInBuffer() only once. We're currently calling it twice, here and in KuduSession::Apply() http://gerrit.cloudera.org:8080/#/c/3668/2/src/kudu/client/batcher.h File src/kudu/client/batcher.h: Line 109: // Return number of bytes used for batcher buffer nit: "Returns the" nit: period at the end os sentences/ Line 114: // Compute in-buffer size for the given write operation nit: Computes nit: period PS2, Line 216: // no need for this comment http://gerrit.cloudera.org:8080/#/c/3668/2/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: Line 1991: // return an error with session running in any supported flush mode nit: period PS2, Line 1993: buffer_space_bytes_limit style: kBufferSpaceBytesLimit or BUFFER_SPACE_BYTES_LIMIT PS2, Line 1993: static not need for static. PS2, Line 1994: static same Line 1995: static const KuduSession::FlushMode modes[] = { same Line 2001: for (const auto mode: modes) { isn't 'auto' already const ? Line 2007: ASSERT_TRUE(s.IsIncomplete()) << "got unexpected status: " << s.ToString(); nit: capitalize Got Line 2014: // Applying a bunch of small rows without a flush should not result in this sentence is huge and hard to read. please break it down. Line 2016: // since there is a flow control which makes Session::Apply() block and wait what: "a flow control"? Line 2018: { no need for the extra { scope Line 2023: ASSERT_OK(ApplyInsertToSession(session.get(), client_table_, 1, 1, "x")); How this testing that Apply() is eventually blocking, or making sure that the buffer value is not surpassed for that matter? Line 2029: // operations are put into the queue flushed after some time. you mean : "into the queue and flushed" right? Line 2030: // The exact timeout interval is implementation-dependent, which timeout interval? Line 2031: // but 100 ms is a good upper limit anyways. s/anyways/anyway Line 2033: { no need for the extra { Line 2041: SleepFor(MonoDelta::FromMilliseconds(100)); loop instead of wait, otherwise this will be brittle in jenkins http://gerrit.cloudera.org:8080/#/c/3668/2/src/kudu/client/client.cc File src/kudu/client/client.cc: Line 705: "Cannot change flush mode when writes are buffered"); nit period Line 707: data_->SetFlushMode(m); would it simplify things to now allow the flush mode to change if there is a flusher thread? if not would it make sense to have this stop the thread (if the new mode is different) and wait for it to finish before allowing to change the mode? seems like there is some complexity attached to the fact that we allow this to change at anytime. simplifying the possible states there would be good Line 728: return data_->SetBufferBytesLimit(size); why not use the same method name? http://gerrit.cloudera.org:8080/#/c/3668/2/src/kudu/client/client.h File src/kudu/client/client.h: Line 678: // The Flush() call can be used to block until the latest batch is sent. maybe add/replace: "a batch is sent and the buffer has available space again" http://gerrit.cloudera.org:8080/#/c/3668/2/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 48: // intentionally left empty no ned for this, you can also join the brackets in the line above, i.e. {} Line 53: WARN_NOT_OK(Stop(), "unable to stop AUTO_FLUSH_BACKGROUND thread"); capitalize. Also a WARN? what happens if this fails? Line 58:
[kudu-CR] async background flush provision for C++ client
Kudu Jenkins has posted comments on this change. Change subject: async background flush provision for C++ client .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/2528/ -- To view, visit http://gerrit.cloudera.org:8080/3668 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idfde05bef342db24990c6e3da3b0270c3bb37a9d Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] async background flush provision for C++ client
Alexey Serbin has posted comments on this change. Change subject: async background flush provision for C++ client .. Patch Set 1: > What was the rationale behind using a dedicated thread to manage > background flushing? I think synchronization would be net simpler > if instead, we reused the messenger's reactor threads for > background flushing. They're already used to run callbacks whenever > a Proxy::FooAsync() call is made, and you can use > Messenger::ScheduleOnReactor() > to schedule the background flushing to happen at some time in the > future. > > Some other arguments in favor of using reactor threads: > 1. It reduces the thread count of the client. > 2. Structurally it makes the implementation similar to the Java > client. I thought messenger reactor threads are for messaging, and I was not so familiar with that part of code. OK, I'll explore that option in this context. -- To view, visit http://gerrit.cloudera.org:8080/3668 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idfde05bef342db24990c6e3da3b0270c3bb37a9d Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] async background flush provision for C++ client
Adar Dembo has posted comments on this change. Change subject: async background flush provision for C++ client .. Patch Set 1: What was the rationale behind using a dedicated thread to manage background flushing? I think synchronization would be net simpler if instead, we reused the messenger's reactor threads for background flushing. They're already used to run callbacks whenever a Proxy::FooAsync() call is made, and you can use Messenger::ScheduleOnReactor() to schedule the background flushing to happen at some time in the future. Some other arguments in favor of using reactor threads: 1. It reduces the thread count of the client. 2. Structurally it makes the implementation similar to the Java client. -- To view, visit http://gerrit.cloudera.org:8080/3668 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idfde05bef342db24990c6e3da3b0270c3bb37a9d Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] async background flush provision for C++ client
Alexey Serbin has uploaded a new change for review. http://gerrit.cloudera.org:8080/3668 Change subject: async background flush provision for C++ client .. async background flush provision for C++ client Implemented AUTO_FLUSH_BACKGROUND for C++ client library. A dedicated background flusher thread is started by KuduSession instance for this purpose. In AUTO_FLUSH_BACKGROUND mode the KuduSession::Apply() method blocks if there is not enough place in the virtual buffer. The limit on the buffer size is set by KuduSession::SetMutationBufferSpace(). The flush criteria is based on buffer size for not-yet-scheduled session's write operatations and timeout. The flow control criteria in KuduSession::Apply() is based on total buffer size of session's write operatations. Addressed the following JIRA issues: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: Idfde05bef342db24990c6e3da3b0270c3bb37a9d --- M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/error_collector.cc M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h 7 files changed, 432 insertions(+), 103 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/68/3668/1 -- To view, visit http://gerrit.cloudera.org:8080/3668 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Idfde05bef342db24990c6e3da3b0270c3bb37a9d Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin