[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for pending operations reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flush logic checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * The over-the-watermark criterion: check whether the total size of the freshly submitted (i.e. not-yet-scheduled-for-flush) write operations is over the threshold. The threshold can be set as the percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * The maximum wait time criterion: check whether the current batch of operations has been accumulating for more than the maximum wait time. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterval() method. A KuduSession object uses RPC messenger's thread pool to monitor batches' maximum wait time. Added functionality to control the maximum number of batchers per session. If number of batchers is at the limit already, KuduSession::Apply() blocks until it's possible to add a new batcher to accommodate the incoming operation. Modified behavior of the KuduSession::Flush(): now it waits until all batchers are flushed before returning. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Reviewed-on: http://gerrit.cloudera.org:8080/3952 Reviewed-by: Adar Dembo Tested-by: Kudu Jenkins --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/error_collector.cc M src/kudu/client/error_collector.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h 13 files changed, 1,509 insertions(+), 251 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 29 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Adar Dembo has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 28: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 28 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Dan Burkert, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#28). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for pending operations reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flush logic checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * The over-the-watermark criterion: check whether the total size of the freshly submitted (i.e. not-yet-scheduled-for-flush) write operations is over the threshold. The threshold can be set as the percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * The maximum wait time criterion: check whether the current batch of operations has been accumulating for more than the maximum wait time. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterval() method. A KuduSession object uses RPC messenger's thread pool to monitor batches' maximum wait time. Added functionality to control the maximum number of batchers per session. If number of batchers is at the limit already, KuduSession::Apply() blocks until it's possible to add a new batcher to accommodate the incoming operation. Modified behavior of the KuduSession::Flush(): now it waits until all batchers are flushed before returning. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/error_collector.cc M src/kudu/client/error_collector.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h 13 files changed, 1,509 insertions(+), 251 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/28 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 28 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 28: Build Started http://104.196.14.100/job/kudu-gerrit/3194/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 28 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 27: (9 comments) Will send a new version in a moment. http://gerrit.cloudera.org:8080/#/c/3952/27/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: PS27, Line 95: sing std::thread; : using std::bind; : using std::function; : using std::for_each; > Nit: can you order this list so it's alphabetical? Done Line 2173: const string LONG_STRING(kBufferSizeBytes + 1, 'x'); > Nit: kLongString. Done Line 2206: const string LONG_STRING(kBufferSizeBytes / 2, 'x'); > Nit: kLongString. Done Line 2378: thread monitor(monitor_func); > How about: I was under impression that elsewhere in the code we have those intermediate variables for clarity. OK, but it seems this time not having that would be an advantage. Will do. I was thinking about the lambda as well, but preferred to keep it as is to be in line with the MonitorSessionBufferSize counterpart (as you noticed). I would prefer to keep it as is. Do you insist on using a lambda here? PS27, Line 2455: function monitor_func( : bind(&ClientTest::MonitorSessionBufferSize, : session.get(), &monitor_run_ctl, : &monitor_max_buffer_size)); : thread monitor(monitor_func); > Likewise here, combine or use auto to elide the std::function syntax. Done PS27, Line 2493: function monitor_func( : bind(&ClientTest::MonitorSessionBufferSize, : session.get(), &monitor_run_ctl, &monitor_max_buffer_size)); : thread monitor(monitor_func); > And here. Done PS27, Line 2498: BUFFER_SIZE > Nit: replace with kBufferSizeBytes. Below too. Done Line 2501: for (size_t i = 3; i < 256; ++i) { > Nit: maybe store 256 in a constant like kRowNum and reuse it in EXPECT_EQ b Done PS27, Line 2533: function monitor_func( : bind(&ClientTest::MonitorSessionBufferSize, : session.get(), &monitor_run_ctl, &monitor_max_buffer_size)); : thread monitor(monitor_func); > Combine or auto. Done -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 27 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Adar Dembo has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 27: (9 comments) Almost there. http://gerrit.cloudera.org:8080/#/c/3952/27/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: PS27, Line 95: sing std::thread; : using std::bind; : using std::function; : using std::for_each; Nit: can you order this list so it's alphabetical? Line 2173: const string LONG_STRING(kBufferSizeBytes + 1, 'x'); Nit: kLongString. Line 2206: const string LONG_STRING(kBufferSizeBytes / 2, 'x'); Nit: kLongString. Line 2378: thread monitor(monitor_func); How about: thread monitor(bind(&ClientTest::MonitorSessionBatchersCount, session.get(), &monitor_run_ctl, &monitor_max_batchers_count)); That is, combine the two statements from above. This avoids exposing the "messy" std::function signature, though I suppose you could also elide it by using "auto" as the type. Ideally you'd use a lambda here. I think MonitorSessionBatchersCount is only used in this test, so maybe make it a lambda instead? MonitorSessionBatchersSize is used in two places, so you can't use it there. PS27, Line 2455: function monitor_func( : bind(&ClientTest::MonitorSessionBufferSize, : session.get(), &monitor_run_ctl, : &monitor_max_buffer_size)); : thread monitor(monitor_func); Likewise here, combine or use auto to elide the std::function syntax. PS27, Line 2493: function monitor_func( : bind(&ClientTest::MonitorSessionBufferSize, : session.get(), &monitor_run_ctl, &monitor_max_buffer_size)); : thread monitor(monitor_func); And here. PS27, Line 2498: BUFFER_SIZE Nit: replace with kBufferSizeBytes. Below too. Line 2501: for (size_t i = 3; i < 256; ++i) { Nit: maybe store 256 in a constant like kRowNum and reuse it in EXPECT_EQ below? PS27, Line 2533: function monitor_func( : bind(&ClientTest::MonitorSessionBufferSize, : session.get(), &monitor_run_ctl, &monitor_max_buffer_size)); : thread monitor(monitor_func); Combine or auto. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 27 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Dan Burkert, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#27). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for pending operations reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flush logic checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * The over-the-watermark criterion: check whether the total size of the freshly submitted (i.e. not-yet-scheduled-for-flush) write operations is over the threshold. The threshold can be set as the percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * The maximum wait time criterion: check whether the current batch of operations has been accumulating for more than the maximum wait time. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterval() method. A KuduSession object uses RPC messenger's thread pool to monitor batches' maximum wait time. Added functionality to control the maximum number of batchers per session. If number of batchers is at the limit already, KuduSession::Apply() blocks until it's possible to add a new batcher to accommodate the incoming operation. Modified behavior of the KuduSession::Flush(): now it waits until all batchers are flushed before returning. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/error_collector.cc M src/kudu/client/error_collector.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h 13 files changed, 1,515 insertions(+), 250 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/27 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 27 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 27: Build Started http://104.196.14.100/job/kudu-gerrit/3189/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 27 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 26: (2 comments) Thank you for the review! Will send an updated patch in a moment. http://gerrit.cloudera.org:8080/#/c/3952/26/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: Line 2393: //monitor->Join(); > This looks leftover? Yes, it is a left-over, will fix. PS26, Line 2503: every single > each individual Done -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 26 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Dan Burkert has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 26: Woops, I meant that to be a +1, definitely want to give Adar and others a chance to weigh in. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 26 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Dan Burkert has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 26: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/3952/26/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: Line 2393: //monitor->Join(); This looks leftover? PS26, Line 2503: every single each individual -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 26 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Dan Burkert has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 26: Code-Review+1 couple spelling/phrasing nits, but code LGTM! -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 26 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#26). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for pending operations reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flush logic checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * The over-the-watermark criterion: check whether the total size of the freshly submitted (i.e. not-yet-scheduled-for-flush) write operations is over the threshold. The threshold can be set as the percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * The maximum wait time criterion: check whether the current batch of operations has been accumulating for more than the maximum wait time. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterval() method. A KuduSession object uses RPC messenger's thread pool to monitor batches' maximum wait time. Added functionality to control the maximum number of batchers per session. If number of batchers is at the limit already, KuduSession::Apply() blocks until it's possible to add a new batcher to accommodate the incoming operation. Modified behavior of the KuduSession::Flush(): now it waits until all batchers are flushed before returning. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/error_collector.cc M src/kudu/client/error_collector.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h 13 files changed, 1,521 insertions(+), 253 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/26 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 26 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 26: Build Started http://104.196.14.100/job/kudu-gerrit/3184/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 26 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 25: (5 comments) http://gerrit.cloudera.org:8080/#/c/3952/25/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: PS25, Line 187: Microseconds > Just to be clear, I meant "coarser" where I wrote "higher" granularity. Tha I understand the concern. What if we put 1 ms? :) Frankly, this kind of check is not really reliable anyway. It can spot an issue if operations are slowly pushed to tablet servers, but otherwise this check might not catch issues. I just could not find anything else which would not be invasive. Keeping max value in the session itself does not seem to be a good solution. The other solution involves building mock Batcher or Messenger which seems to be an overkill. PS25, Line 2186: // Applying a big operation which does not fit into the buffer should : // return an error with session running in any supported flush mode. > Can you help me see the difference? AFAICT the second test in TestApplyTooM Yes, after looking at TestApplyTooMuchWithoutFlushing once more I understand you are right. For some reason I was thinking that the second part of the TestApplyTooMuchWithoutFlushing was about big operations where each fits the buffer, but eventually they overflow it. My bad. OK, I'll remove that second part of the TestApplyTooMuchWithoutFlushing test. Because this test is a superset of the former: it verifies the behavior is correct for all flush modes. PS25, Line 2258: WaitForAutoFlushBackground(session, FLUSH_INTERVAL_MS); : // There should be some pending ops: the automatic background flush : // should not be active after switch to the MANUAL_FLUSH mode. > Nah, I think that's fine. OK, I see. Thanks for the guideline. Line 2428: // AUTO_FLUSH_BACKGROUND should be faster than AUTO_FLUSH_SYNC. > Yes, and please try TSAN. It slows things down in unexpected ways (e.g. con Yep, did that as well. However, I'm not sure I got 1000 cycles -- that was too slow. Will run tonight. PS25, Line 2574: ASSERT_TRUE(session->HasPendingOperations()); > That's fine, just loop it using dist-test.py in TSAN mode, and I'll be sati ok, will do. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 25 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Adar Dembo has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 25: (5 comments) http://gerrit.cloudera.org:8080/#/c/3952/25/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: PS25, Line 187: Microseconds > The idea was to monitor those values as closely as possible. Without resor Just to be clear, I meant "coarser" where I wrote "higher" granularity. That is, perhaps this could be as large of a value as 10ms, for example. It just struck me as a very tight loop. Tight loops are useful when trying to exercise a data race, but in this case, I would expect a full end-to-end Flush() to take on the order of 1ms, especially in a debug or slower build. If that's true, checking every 10us is excessive. But, it shouldn't slow down the tests significantly. If I'm the only one who feels this way, I'm fine with you leaving it as is. PS25, Line 2186: // Applying a big operation which does not fit into the buffer should : // return an error with session running in any supported flush mode. > Nope. This is different: this test verifies that an attempt to apply a _si Can you help me see the difference? AFAICT the second test in TestApplyTooMuchWithoutFlushing: 1. Starts a new session in MANUAL_FLUSH mode. 2. Customizes the buffer size. 3. Applies a single insert, sized to twice the buffer size. 4. Asserts that this fails with Incomplete. TestCheckMutationBufferSpaceLimitInEffect does the same thing across all flush modes, except that the buffer size is a little different, and that the insert isn't twice the buffer size, but only one byte larger. PS25, Line 2258: WaitForAutoFlushBackground(session, FLUSH_INTERVAL_MS); : // There should be some pending ops: the automatic background flush : // should not be active after switch to the MANUAL_FLUSH mode. > About 500 ms. Is that too much? Nah, I think that's fine. Anything on the order of multiple seconds is something we should think about. Line 2428: // AUTO_FLUSH_BACKGROUND should be faster than AUTO_FLUSH_SYNC. > Due to the nature of the things and chosen parameters, the difference is in Yes, and please try TSAN. It slows things down in unexpected ways (e.g. context switches become much more expensive, as does thread startup). PS25, Line 2574: ASSERT_TRUE(session->HasPendingOperations()); > Yes, I also have this concern. The principal solution would be delaying/st That's fine, just loop it using dist-test.py in TSAN mode, and I'll be satisfied. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 25 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 25: (44 comments) Thank you for review. I'm planning to post updated patch soon. http://gerrit.cloudera.org:8080/#/c/3952/25/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: PS25, Line 182: // Must be public to use as a thread closure. > They can be static functions, though, right? I don't see them accessing any Good observation: yes, now they can be static methods of this class. Previously they could not. But I don't want make them just static functions -- they need access to private members of KuduSession, and I don't like idea of declaring many friend functions for the KuduSession class. PS25, Line 187: Microseconds > How did you choose 10 us? It seems low to me (i.e. maybe we can get away wi The idea was to monitor those values as closely as possible. Without resorting to invasive changes to KuduSession::Data or a separate class which would inherit from KuduSession::Data to monitor the maximum, I had nothing better than this. Why do you think 10 us is too low? PS25, Line 199: > Nit: extra space here. Done PS25, Line 553: oprations > Nit: operations Done PS25, Line 556: WaitForAutoFlushBackground > Hmm, when this returns, there's no actual guarantee that a batch flushed, r Exactly, that's the idea. When guaranteed flush is needed, it's necessary to use KuduSession::Flush() call. PS25, Line 558: MAX_WAIT_MS > kMaxWaitMs Done Line 1207: << "unexpected status: " << result.ToString(); > There's a better technique for handling situations like these: OK, thanks. Will use. PS25, Line 2153: BUFFER_SIZE > kBufferSizeBytes (camel case, k prefix, and append the unit) Done PS25, Line 2186: // Applying a big operation which does not fit into the buffer should : // return an error with session running in any supported flush mode. > Doesn't this test make the second half of TestApplyTooMuchWithoutFlushing r Nope. This is different: this test verifies that an attempt to apply a _single_ operation which does not fit the buffer produces an error right away with no buffered data prior to that. This test exercises different code path (not the same path as the second half of the TestApplyTooMuchWithoutFlushing test). PS25, Line 2197: for (auto mode: FLUSH_MODES) > Nit: auto mode : FLUSH_MODES Done PS25, Line 2215: STLDeleteElements(&errors); > Nit: use an ElementDeleter and declare it right after declaring 'errors'. T Done PS25, Line 2258: WaitForAutoFlushBackground(session, FLUSH_INTERVAL_MS); : // There should be some pending ops: the automatic background flush : // should not be active after switch to the MANUAL_FLUSH mode. > I see. How expensive is this in terms of test runtime? You've dropped the f About 500 ms. Is that too much? Line 2304: *result = t_end - t_begin; > Have you looked at the StopWatch class? May be easier for this kind of thin Will do, thanks! PS25, Line 2335: // Check that call to Flush()/FlushAsync() is no-op if there isn't current : // batcher for the session. > Doesn't this test make TestEmptyBatch redundant? good observation -- I'll remove that. PS25, Line 2348: ASSERT_OK(sync0.Wait()); : // The same with the synchronous flush. : ASSERT_OK(session->Flush()); : ASSERT_FALSE(session->HasPendingOperations()); : ASSERT_EQ(0, session->CountPendingErrors()); > How does this verify that flush was a no-op? Seems to me you need to find s This is to test that no error happens when trying to call flush and current batcher is not there. The comment for the test is misleading -- will fix. PS25, Line 2391: kudu::Thread::Create > Now that we're building with C++11, we prefer to use std::thread for test t OK, that's sounds good. I thought I had to use KuduThreads, otherwise I would use std::thread from the beginning. Just another instance when addressing my doubts by direct asking would help. It seems I need to communicate more on issues like this. Line 2428: // AUTO_FLUSH_BACKGROUND should be faster than AUTO_FLUSH_SYNC. > Hmm, is this expected to be true 100% of all test runs? Even with debug ins Due to the nature of the things and chosen parameters, the difference is in order of magnitude. Therefore, no flakiness is expected. Besides, if using debug/TSAN/ASAN builds, both versions become slower, so they are expected to keep the speed ratio. Yes, I did that check locally: ran it using shell while true; do ... || break; done cycle. It looked stable (more than 4096 iterations in the release mode and 1024 in debug one). Will do with dist-test.py as well. Line 2429: EXPECT_GT(t_diff_afs.ToMilliseconds(), t_diff_afb.ToMilliseconds()); > Can't we just EXPECT_GT
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Adar Dembo has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 25: (45 comments) I actually reviewed the changes to client-test this time. http://gerrit.cloudera.org:8080/#/c/3952/25/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: PS25, Line 182: // Must be public to use as a thread closure. They can be static functions, though, right? I don't see them accessing any test state. PS25, Line 187: Microseconds How did you choose 10 us? It seems low to me (i.e. maybe we can get away with a higher granularity), but maybe I'm missing something. PS25, Line 199: Nit: extra space here. PS25, Line 553: oprations Nit: operations PS25, Line 556: WaitForAutoFlushBackground Hmm, when this returns, there's no actual guarantee that a batch flushed, right? We may have just expired the deadline? PS25, Line 558: MAX_WAIT_MS kMaxWaitMs Line 1207: << "unexpected status: " << result.ToString(); There's a better technique for handling situations like these: Status s = DoSomething(); SCOPED_TRACE(s.ToString()); ASSERT_TRUE(s.IsIOError()); ASSERT_TRUE(...); If a test fails while SCOPED_TRACE() is still in scope, gtest will print s.ToString(). That way you don't have to worry about printing it yourself following every potential ASSERT/EXPECT that may fail. PS25, Line 2153: BUFFER_SIZE kBufferSizeBytes (camel case, k prefix, and append the unit) Below too. Please fix all of your "constants" to follow this naming convention. PS25, Line 2186: // Applying a big operation which does not fit into the buffer should : // return an error with session running in any supported flush mode. Doesn't this test make the second half of TestApplyTooMuchWithoutFlushing redundant? PS25, Line 2197: for (auto mode: FLUSH_MODES) Nit: auto mode : FLUSH_MODES (separate 'mode' from the colon with a space) PS25, Line 2215: STLDeleteElements(&errors); Nit: use an ElementDeleter and declare it right after declaring 'errors'. Then there's no possibility for a leak. PS25, Line 2258: WaitForAutoFlushBackground(session, FLUSH_INTERVAL_MS); : // There should be some pending ops: the automatic background flush : // should not be active after switch to the MANUAL_FLUSH mode. I see. How expensive is this in terms of test runtime? You've dropped the flush interval to 100 ms; how much time do we end up waiting here? Line 2304: *result = t_end - t_begin; Have you looked at the StopWatch class? May be easier for this kind of thing. PS25, Line 2335: // Check that call to Flush()/FlushAsync() is no-op if there isn't current : // batcher for the session. Doesn't this test make TestEmptyBatch redundant? PS25, Line 2348: ASSERT_OK(sync0.Wait()); : // The same with the synchronous flush. : ASSERT_OK(session->Flush()); : ASSERT_FALSE(session->HasPendingOperations()); : ASSERT_EQ(0, session->CountPendingErrors()); How does this verify that flush was a no-op? Seems to me you need to find some metrics indicating that no Write RPCs were sent. PS25, Line 2391: kudu::Thread::Create Now that we're building with C++11, we prefer to use std::thread for test threads. There's less boilerplate with them (i.e. no need to make up a thread category and name), and they have better support for lambdas, which I think you can use here pretty effectively. Same goes for the other new uses of kudu::Thread. Line 2428: // AUTO_FLUSH_BACKGROUND should be faster than AUTO_FLUSH_SYNC. Hmm, is this expected to be true 100% of all test runs? Even with debug instrumentation, TSAN, ASAN, etc? Wondering if this test will be flaky. A good way to find out is to use build-support/dist-test.py to loop your test 1000 times. You can use --gtest_filter on the command line to run just this one test case (as opposed to all of client-test, much of which is already flaky). Line 2429: EXPECT_GT(t_diff_afs.ToMilliseconds(), t_diff_afb.ToMilliseconds()); Can't we just EXPECT_GT on the MonoDeltas directly now? PS25, Line 2451: // AUTO_FLUSH_BACKGROUND should be faster than AUTO_FLUSH_SYNC. : EXPECT_GT(t_diff_afs.ToMilliseconds(), t_diff_afb.ToMilliseconds()); Same issues and questions as in the above test. PS25, Line 2456: // applying a bunch of small rows without a flush should not result in : // an error, even with low limit on the buffer space. This is because : // Session::Apply() blocks and waits for buffer space to become available. For this test, you may want to scan once you're done writing to make sure that all of the rows really made it and that Apply() didn't just decide to throw some rows away because we ran out of buffer space. Line 2488: // applying a bunch of rows ev
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 25: Build Started http://104.196.14.100/job/kudu-gerrit/3167/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 25 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#25). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for pending operations reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flush logic checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * The over-the-watermark criterion: check whether the total size of the freshly submitted (i.e. not-yet-scheduled-for-flush) write operations is over the threshold. The threshold can be set as the percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * The maximum wait time criterion: check whether the current batch of operations has been accumulating for more than the maximum wait time. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterval() method. A KuduSession object uses RPC messenger's thread pool to monitor batches' maximum wait time. Added functionality to control the maximum number of batchers per session. If number of batchers is at the limit already, KuduSession::Apply() blocks until it's possible to add a new batcher to accommodate the incoming operation. Modified behavior of the KuduSession::Flush(): now it waits until all batchers are flushed before returning. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/error_collector.cc M src/kudu/client/error_collector.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h 13 files changed, 1,516 insertions(+), 238 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/25 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 25 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#24). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for pending operations reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flush logic checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * The over-the-watermark criterion: check whether the total size of the freshly submitted (i.e. not-yet-scheduled-for-flush) write operations is over the threshold. The threshold can be set as the percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * The maximum wait time criterion: check whether the current batch of operations has been accumulating for more than the maximum wait time. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterval() method. A KuduSession object uses RPC messenger's thread pool to monitor batches' maximum wait time. Added functionality to control the maximum number of batchers per session. If number of batchers is at the limit already, KuduSession::Apply() blocks until it's possible to add a new batcher to accommodate the incoming operation. Modified behavior of the KuduSession::Flush(): now it waits until all batchers are flushed before returning. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/error_collector.cc M src/kudu/client/error_collector.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h 13 files changed, 1,517 insertions(+), 238 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/24 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 24 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 24: Build Started http://104.196.14.100/job/kudu-gerrit/3166/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 24 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 23: (2 comments) http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 118: // Thread-safety note: the buffer_bytes_limit_ is not supposed to be accessed > I think it's important to have safety notes like this in the header so that But from the method caller's perspective all what's needed to know is that it's not supposed to be called from multiple threads, plus generic info on the parameters. I don't see more. Am I missing something? I can put notion regarding the thread-safety constraint into the header. Would it be enough? http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/session-internal.h File src/kudu/client/session-internal.h: Line 191: std::unordered_set flushed_batchers_; > Isn't the count of bytes in currently pending operations kept in the buffer Correct. As a matter of fact, I've removed flished_batchers_ already in the patchset I'm preparing. I think now it's looks better. Probably, it would be nice to get some feedback on that from the original author of the Batcher class (Todd and/or David?). -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 23 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Dan Burkert has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 23: (3 comments) http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/batcher.cc File src/kudu/client/batcher.cc: Line 474: Status s = had_errors_ ? Status::Incomplete("Some errors occurred") > OK, that reasoning makes sense. Yah, I agree IOError isn't a great fit, but the Status codes are pretty general. http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 118: // Thread-safety note: the buffer_bytes_limit_ is not supposed to be accessed > I tried that, but from the reading perspective it looked worse -- there is I think it's important to have safety notes like this in the header so that callers can see these invariants without going to the implementation. This is, after all, an invariant that is imposed on the caller. http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/session-internal.h File src/kudu/client/session-internal.h: Line 191: std::unordered_set flushed_batchers_; > That's a good question! Besides using those for current implementation of Isn't the count of bytes in currently pending operations kept in the buffer_bytes_used_ counter? -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 23 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 23: (27 comments) Dan & Adar, thank you for the review. I'll post updated version as soon as I make sure the tests are passing OK. http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/batcher.cc File src/kudu/client/batcher.cc: Line 474: Status s = had_errors_ ? Status::Incomplete("Some errors occurred") > I don't think Incomplete is the right status here; the flush is complete. OK, that reasoning makes sense. However, I thought IOError might be irrelevant here as well. Or this is the best match among all possible status codes? There is seemingly more generic RunTimeError, but I'm not sure that's the right case to use it. I'll try to return IOError back to get back to the original, and meanwhile we can discuss it separately. http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: PS23, Line 2360: there the current batcher is at place > Not sure what this is trying to say, maybe: ops, my bad. Should have been something like 'OK, now the current batcher should be at place.' Will fix. http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/client.cc File src/kudu/client/client.cc: Line 322:hp); > this can go on the line above. Done http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 63: TimeBasedFlushInit(session); > How about checking if we are in AUTO_FLUSH_BACKGROUND first? I know this i The check you mentioned is done by TimeBasedFlushTask() method _before_ scheduling a task (i.e. no task scheduled if current mode is not AUTO_FLUSH_BACKGROUND). So, I see no need to perform that check here. I put all those ugly guts (synchronization, etc.) into that method with intention to avoid doing those checks elsewhere. Line 63: TimeBasedFlushInit(session); > The funny thing is that, by definition, we cannot be in AUTO_FLUSH_BACKGROU TimeBasedFlushInit() is no-op in other than AUTO_FLUSH_BACKGROUND mode. Yes, it might be a bad name for the method. What name would you recommend instead? PS23, Line 105:std::lock_guard l(lock_); > Since this isn't acquired on any hot path, what do you think of removing it I thought it was better to keep those locks separate to avoid having one giant lock for all the methods. Those locks based on simple_spinlock look lighter than based on mutex. But as you mentioned, since we already have that for the most critical paths, and the rest with simple_spinlock are just for methods which almost never called, I think it's a good idea. Will do, thanks! Line 118: // Thread-safety note: the buffer_bytes_limit_ is not supposed to be accessed > These thread safety notes (here and below) would be better in the header de I tried that, but from the reading perspective it looked worse -- there is not much of context when looking at the signature of a function/method. However, in the body of a method, there is much more relevant context since you can see the logic of the method, more comments, corresponding variables/members, etc. That's said, I'm hesitant to move those in the header. However, if you feel strong about that and have a compelling reasoning which addresses my concern regarding the lack of context, I'll move those. Line 156: Status KuduSession::Data::SetMaxBatchersNum(unsigned int max_num) { > How come this one doesn't check HasPendingOperations? I didn't see a restriction that would arise from the current design. Actually, we can remove that for SetBufferFlushInterval() and other except for the KuduSession::SetFlushMode(). But as I see from our recent HipChat discussion, we want to keep us more space for the future, because once published, it's easier to make those restrictions less strict. But making these more strict could be almost impossible in the future. So, I added the artificial restriction here as well. Line 181: FlushCurrentBatcher(1, cb); > Nit: add a comment explaining the significance of 1 (i.e. why not 0?) Done PS23, Line 188: nullptr > I spent a lot of time wondering if the loss of the synchronizer and RETURN_ I did a research on that prior to using nullptr as a callback in the new code and I didn't see any cases where an error would be dropped on the floor (i.e. in all cases where Batcher::had_errors_ is set to true, the error is registered into the Batcher::error_collector_). So, it's safe as far as I can see. But I'll add corresponding comments into the batcher and for the FlushCurrentBatcher() methods to clarify on that. I think it's a good idea to document the difference. I even thought I had done it, but re-visiting the client.h comments revealed that I missed that part. Thank you for the remin
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Adar Dembo has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 23: (11 comments) http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 63: TimeBasedFlushInit(session); > How about checking if we are in AUTO_FLUSH_BACKGROUND first? I know this i The funny thing is that, by definition, we cannot be in AUTO_FLUSH_BACKGROUND, because the default flush mode is AUTO_FLUSH_SYNC. But, that can change in the future. In any case, I think TimeBasedFlushInit() is a no-op in AUTO_FLUSH_BACKGROUND mode. PS23, Line 105:std::lock_guard l(lock_); Since this isn't acquired on any hot path, what do you think of removing it in favor of batch_ctl_mutex_? I think it's a little easier to reason about the flow if we can minimize the number of locks. If you do end up doing this, probably best to rename batch_ctl_mutex_ to lock_ (and maybe rename batch_ctl_cond_ too). Line 181: FlushCurrentBatcher(1, cb); Nit: add a comment explaining the significance of 1 (i.e. why not 0?) PS23, Line 188: nullptr I spent a lot of time wondering if the loss of the synchronizer and RETURN_NOT_OK(s.Wait()) meant we were dropping errors on the floor. Meaning, is there any case where the batcher invokes the flush callback with !Status::OK() and does NOT write errors to the error collector? I don't think there are, but the flow is tricky enough that I thought I'd ask you the same question. BTW, it would be worth documenting in client.h the difference in behavior between Flush() and FlushAsync() w.r.t. waiting for outstanding batches to flush. Or do you think it's implicitly understood? PS23, Line 296: The flush_mode_ is accessed : // from the background time-based flush task, but it accesses it : // only for reading. TSAN may complain about this. ISTR it doesn't like accesses to the same members with a mixed set of locks (i.e. one lock held in one access, and no locks in another). Line 354: batch_ctl_cond_.Wait(); This isn't going to work for ApplyAsync(). Now, we haven't actually implemented ApplyAsync() yet, so there's no reason to do anything. But we should start thinking about how ApplyWriteOp() would work in an ApplyAsync() world, hopefully without the creation of two separate apply code paths. http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/session-internal.h File src/kudu/client/session-internal.h: PS23, Line 42: It's Nit: Its PS23, Line 43: implemented Nit: don't need this word. PS23, Line 128: // The self-rescheduling task to flush write operations which have been : // accumulating for too long (controlled by flush_interval_). : // This does real work only in case of AUTO_FLUSH_BACKGROUND mode. Should document what 'do_startup_check' means. PS23, Line 172: which Nit: "whose" PS23, Line 172: requries Nit: "requires" -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 23 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Dan Burkert has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 23: (16 comments) Nice. This is much more understandable than previously. http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/batcher.cc File src/kudu/client/batcher.cc: Line 474: Status s = had_errors_ ? Status::Incomplete("Some errors occurred") I don't think Incomplete is the right status here; the flush is complete. http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: PS23, Line 2360: there the current batcher is at place Not sure what this is trying to say, maybe: there is a current batch http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/client.cc File src/kudu/client/client.cc: Line 322:hp); this can go on the line above. http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 63: TimeBasedFlushInit(session); How about checking if we are in AUTO_FLUSH_BACKGROUND first? I know this is basically a no-op if not, but it does schedule a background task. Line 118: // Thread-safety note: the buffer_bytes_limit_ is not supposed to be accessed These thread safety notes (here and below) would be better in the header declaration doc. Line 156: Status KuduSession::Data::SetMaxBatchersNum(unsigned int max_num) { How come this one doesn't check HasPendingOperations? PS23, Line 283: nil NULL PS23, Line 356: } : if ( should this be an else if? Line 368: buffer_bytes_used_ += required_size; hmm, this seems a little early. I can't think of a reason it would matter, but it seems to me this counter shouldn't be updated until the batcher is available and we are just about to or just have added the op to the batcher. Otherwise we are updating the counter, potentially dropping the lock via the cond_.Wait() call, and then later actually adding the op. Line 379: if (!batcher_) { Will the batcher_ ever be non-null here? I can't think of a reason why it would be. Maybe a DCHECK is in order instead of the conditional. Line 399: buffer_bytes_used_ -= required_size; Another good reason not to increment this counter until after the op is successfully added to the batcher_. http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/session-internal.h File src/kudu/client/session-internal.h: PS23, Line 46: Aply Apply PS23, Line 91: Return once no more pending operations are left. this sentence seems redundant. PS23, Line 91: finished their flushing finish flushing Line 191: std::unordered_set flushed_batchers_; Why is it necessary to hold on to these batchers? I see that it's used for calling HasPendingOperations, but wouldn't it be enough to check that batches_num_ > 0 in that case? PS23, Line 199: // protected by batch_ctl_mutex_ the setter doesn't take batch_ctl_mutex_ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 23 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 23: Build Started http://104.196.14.100/job/kudu-gerrit/3136/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 23 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#23). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for pending operations reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flush logic checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * The over-the-watermark criterion: check whether the total size of the freshly submitted (i.e. not-yet-scheduled-for-flush) write operations is over the threshold. The threshold can be set as the percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * The maximum wait time criterion: check whether the current batch of operations has been accumulating for more than the maximum wait time. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterval() method. A KuduSession object uses RPC messenger's thread pool to monitor batches' maximum wait time. Added functionality to control the maximum number of batchers per session. If number of batchers is at the limit already, KuduSession::Apply() blocks until it's possible to add a new batcher to accommodate the incoming operation. Modified behavior of the KuduSession::Flush(): now it waits until all batchers are flushed before returning. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/error_collector.cc M src/kudu/client/error_collector.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h 13 files changed, 1,440 insertions(+), 209 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/23 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 23 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 22: (1 comment) http://gerrit.cloudera.org:8080/#/c/3952/22/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: PS22, Line 191: Synchronizer s; : KuduStatusMemberCallback ksmcb(&s, &Synchronizer::StatusCB); : NextBatcher(session, BATCHER_WATERMARK_NON_EMPTY, &ksmcb, BLOCK); : RETURN_NOT_OK(s.Wait()); > Cool. So regardless of what change you do or don't make to NextBatcher(), c Sure, I'll add those comments. And just another thought to justify separating the responsibilities of the NextBatcher() method (most likely you have spotted that already): it will allow more robust flushes in case if maximum number of batchers is reached. This is because NextBatch() in blocking mode first waits until it's possible to allocate a new batcher and only after that initiates flushing of the corresponding batcher. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Adar Dembo has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 22: (1 comment) http://gerrit.cloudera.org:8080/#/c/3952/22/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: PS22, Line 191: Synchronizer s; : KuduStatusMemberCallback ksmcb(&s, &Synchronizer::StatusCB); : NextBatcher(session, BATCHER_WATERMARK_NON_EMPTY, &ksmcb, BLOCK); : RETURN_NOT_OK(s.Wait()); > OK, I see the point now. Cool. So regardless of what change you do or don't make to NextBatcher(), could you add a comment here explaining the various events we're waiting on? Separately, could you look into splitting NextBatcher()'s responsibilities? Naively I think it'd simplify things, but it might be one of those things that you can only really evaluate once you're staring at the new code. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 22: (1 comment) http://gerrit.cloudera.org:8080/#/c/3952/22/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: PS22, Line 191: Synchronizer s; : KuduStatusMemberCallback ksmcb(&s, &Synchronizer::StatusCB); : NextBatcher(session, BATCHER_WATERMARK_NON_EMPTY, &ksmcb, BLOCK); : RETURN_NOT_OK(s.Wait()); > I wasn't asking about NextBatcher()'s wait; I was asking about s.Wait(). To OK, I see the point now. Right -- waiting on total_bytes_used_ should be more than enough given the fact the flush of the current batcher has been initiated. I was concerned with collecting the exact error, if any, from the current batcher and missed the point that error_collector_ would account for that error as well. And yes, if we separate two responsibilities of the NextBatch() method, then all that inconsistency will go away, so the next batcher would be allocated in Apply()/ApplyAsync() and FlushAsync() would not report on problems due to max number of batchers. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Adar Dembo has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 22: (2 comments) http://gerrit.cloudera.org:8080/#/c/3952/22/src/kudu/client/client.h File src/kudu/client/client.h: Line 1188: /// The default and the minimum setting for this parameter is 2 (two). > Yes, technically the minimum could be 1. OK. If it simplifies the implementation, let's keep the minimum at 2. We can always remove the restriction later. http://gerrit.cloudera.org:8080/#/c/3952/22/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: PS22, Line 191: Synchronizer s; : KuduStatusMemberCallback ksmcb(&s, &Synchronizer::StatusCB); : NextBatcher(session, BATCHER_WATERMARK_NON_EMPTY, &ksmcb, BLOCK); : RETURN_NOT_OK(s.Wait()); > NextBatcher() not only flushes the current batcher, but it also allocates a I wasn't asking about NextBatcher()'s wait; I was asking about s.Wait(). To help answer the question, here are all of the waits going on: 1. BLOCK in NextBatcher(). This ensures that a new batcher has become available. 2. s.Wait(). This ensures that the batcher we've flushed has finished flushing. 3. total_bytes_used_ == 0. This ensures that any other batchers in flight have finished flushing. Do we need all of these? Naively, I think waiting for total_bytes_used_ to reach 0 is a superset of s.Wait(), no? Moreover, don't we only need to get (and wait for) a new batcher on the next Apply()/ApplyAsync(), rather than here? -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 22: (17 comments) Thank you for the review. I'll try to take a fresh look in the context of simplifying the code. Frankly, it would help a lot if it were clear from the very beginning that the KuduSession interface is not supposed to be thread-safe. De-facto I started with the idea to have most of its methods thread-safe (as it was advertised by the comments in client.h). http://gerrit.cloudera.org:8080/#/c/3952/22/src/kudu/client/client.h File src/kudu/client/client.h: PS22, Line 1037: current > Nit: "the current" Done PS22, Line 1143: when > Nit: "only when", right? Done PS22, Line 1180: called > Nit: "called the" Done PS22, Line 1183: taken care > Nit: "taken care of" Done PS22, Line 1184: Once flushing of the prior : /// mutation buffer is initiated, the new one is created and set current : /// to accommodate incoming write operations, limit permitting. > Nit: "Upon flushing the current mutation buffer, a new buffer is created to Done Line 1188: /// The default and the minimum setting for this parameter is 2 (two). > Hmm, technically couldn't the minimum be 1? If set to 1, an Apply() with an Yes, technically the minimum could be 1. Yes, the behavior of the KuduSession::Apply() you described is there, but it now comes into play only when the incoming operation triggers flush of the current buffer. I put 2 to keep current semantics of always having the current batcher available for incoming operations and allocating it when starting flush of the prior one. I can modify the code and make it 1. That would involve checking for the current batcher availability at every Apply() call. Right now it's guaranteed the current batcher is always at place. PS22, Line 1191: the > Nit: drop 'the' Done PS22, Line 1201: implicitly amended : /// as if it were 0. > Nit: "implicitly set to 0". Done http://gerrit.cloudera.org:8080/#/c/3952/22/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: PS22, Line 80: the > Nit: double the Done Line 130: buffer_bytes_limit_ = size; > Likewise, I don't see why this needs to be protected. It seems that's a remnant from the older code which was written with multi-threading safety requirement in mind. Thanks, will remove. Line 145: buffer_watermark_pct_ = watermark_pct; > Why does this need to be protected? It's only ever accessed by the writer t Right. That's the same as for the KuduSession::SetBufferBytesLimit() -- remnant from the past. PS22, Line 191: Synchronizer s; : KuduStatusMemberCallback ksmcb(&s, &Synchronizer::StatusCB); : NextBatcher(session, BATCHER_WATERMARK_NON_EMPTY, &ksmcb, BLOCK); : RETURN_NOT_OK(s.Wait()); > Why do we need to use the synchronizer to wait on this batch if, below, we NextBatcher() not only flushes the current batcher, but it also allocates a new one for next incoming operations. Probably, it's better to separate those functions, and then it will be possible to make 1 the minimum number for the batchers_num_limit_. I'll do that. Back to your question. The NextBatcher() method has two modes of operation: blocking and non-blocking. In non-blocking mode it reports an error if it's not possible to allocate a new batcher due to the batchers_num_limit_ setting; in blocking mode it blocks until it's possible to allocate a new one and do its job. So, it's necessary to initiate flushing current batcher and then wait for total_bytes_used_ to become 0. BTW, the error collector does not capture errors due to batchers_num_limit_ limitations -- those errors are reported via the provided callback for the NextBatcher() (which usually comes from the FlushAsync()). PS22, Line 222: relevent > Nit: relevant Done http://gerrit.cloudera.org:8080/#/c/3952/22/src/kudu/client/session-internal.h File src/kudu/client/session-internal.h: PS22, Line 104: // On successful return, the output flush_mode parameter : // is set to the effective flush mode. > I thought we no longer needed the output parameter for flush mode? Right. The reason of returning the effective flush_mode in output parameter is that I don't want to acquire the lock protecting flush_mode_ again just in the upper-level KuduSession::Apply() -- that needs to know the flush mode to decide whether it's necessary to do Flush() in case of AUTO_FLUSH_SYNC mode. Strictly speaking, it's necessary either to access it under lock or make it atomic. As for the task cancelation, I don't think it would be a better approach. PS22, Line 142: > Nit: looks like there are two spaces here. Below too. Done PS22, Line 147: BATCHER_WATERMARK_NON_EMPTY > Our convention for naming class constants like these
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Adar Dembo has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 22: (18 comments) Did another pass. I'm still blown away by the overall complexity; I wonder if reducing the number of locks will help here, or making other infrastructure improvements (e.g. allowing reactor tasks to be canceled). http://gerrit.cloudera.org:8080/#/c/3952/22/src/kudu/client/batcher.h File src/kudu/client/batcher.h: Line 189: // The time when the very first operation was added into the batcher. Should note that this is protected by lock_. While you're there, can you make sure all other protected fields are annotated as such? http://gerrit.cloudera.org:8080/#/c/3952/22/src/kudu/client/client.h File src/kudu/client/client.h: PS22, Line 1037: current Nit: "the current" PS22, Line 1143: when Nit: "only when", right? PS22, Line 1180: called Nit: "called the" PS22, Line 1183: taken care Nit: "taken care of" PS22, Line 1184: Once flushing of the prior : /// mutation buffer is initiated, the new one is created and set current : /// to accommodate incoming write operations, limit permitting. Nit: "Upon flushing the current mutation buffer, a new buffer is created to replace it, provided the limit is not exceeded." Line 1188: /// The default and the minimum setting for this parameter is 2 (two). Hmm, technically couldn't the minimum be 1? If set to 1, an Apply() with an outstanding background flush ought to block (and an ApplyAsync() should return some sort of backpressure error). Though I guess you'd need that kind of behavior when the limit is reached regardless of whether it's 1 or higher. PS22, Line 1191: the Nit: drop 'the' PS22, Line 1201: implicitly amended : /// as if it were 0. Nit: "implicitly set to 0". http://gerrit.cloudera.org:8080/#/c/3952/22/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: PS22, Line 80: the Nit: double the Line 130: buffer_bytes_limit_ = size; Likewise, I don't see why this needs to be protected. Line 145: buffer_watermark_pct_ = watermark_pct; Why does this need to be protected? It's only ever accessed by the writer thread (not a reactor thread). PS22, Line 191: Synchronizer s; : KuduStatusMemberCallback ksmcb(&s, &Synchronizer::StatusCB); : NextBatcher(session, BATCHER_WATERMARK_NON_EMPTY, &ksmcb, BLOCK); : RETURN_NOT_OK(s.Wait()); Why do we need to use the synchronizer to wait on this batch if, below, we wait until total_bytes_used_ reaches 0? Is it just to report errors? Would the error collector usage below capture that? PS22, Line 222: relevent Nit: relevant http://gerrit.cloudera.org:8080/#/c/3952/22/src/kudu/client/session-internal.h File src/kudu/client/session-internal.h: PS22, Line 104: // On successful return, the output flush_mode parameter : // is set to the effective flush mode. I thought we no longer needed the output parameter for flush mode? Oh, is this because we read flush_mode_ from the flusher task, which means we need to protect it with something, or make it an atomic type of some kind? And so you'd rather pass back the flush mode from ApplyWriteOp() to its caller to avoid taking the lock a second time? If so, it seems like that's working around the lack of "canceling" a scheduled task. That is, if we could cancel the flusher task (e.g. when the flush mode changes away from AUTO_BACKGROUND_FLUSH), it'd never need to read the flush mode. I wonder how much complexity is involved with adding such cancellation behavior; we've wanted it elsewhere and it would certainly simplify the flusher task considerably. PS22, Line 142: Nit: looks like there are two spaces here. Below too. PS22, Line 147: BATCHER_WATERMARK_NON_EMPTY Our convention for naming class constants like these "kCamelCaseFoo". We reserve ALL_UPPER_CASE for macros. PS22, Line 158: // A dedicated lock to prevent multiple threads executing the ApplyWriteOp() : // methods simultaneously. Should not be used for any other purpose. I don't understand; we've expressly forbidden multiple Apply() threads. So why bother to do this? It's just another lock to maintain; if this is to prevent people from shooting themselves in the foot, I'd rather remove it and simplify. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComment
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 22: Build Started http://104.196.14.100/job/kudu-gerrit/3072/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#22). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for pending operations reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flush logic checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * The over-the-watermark criterion: check whether the total size of the freshly submitted (i.e. not-yet-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * The maximum wait time criterion: check whether the current batch of operations has been accumulating for more than the maximum wait time. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterval() method. A KuduSession object uses RPC messenger's thread pool to monitor batches' maximum wait time. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h 11 files changed, 1,445 insertions(+), 183 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/22 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 15: (8 comments) Thank you for your feedback. I'm posting a new version in a minute. http://gerrit.cloudera.org:8080/#/c/3952/15/src/kudu/client/client.h File src/kudu/client/client.h: Line 1146: Status SetMutationBufferFlushWatermark(double watermark_pct) > I don't think it's a big deal either way. Sessions are cheap (especially no OK, that seems to be good reasoning to me. No getters then. http://gerrit.cloudera.org:8080/#/c/3952/15/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: PS15, Line 64: -1, > I don't think the existing behavior is necessarily worth preserving; an emp All right, I removed the semantics of flushing an empty batcher. That will make a difference when there is a tight limit on maximum number of batchers allowed to exist simultaneously. Line 66: PeriodicFlushTask(Status::OK(), messenger_, session); > Sorry, I don't understand what you're suggesting. Can you explain it in mor I'm sorry for that poorly articulated and poorly written piece. To clarify on that thought, I'm copy-pasting a comment from the code which implements that idea, please see below. Let's measure age of a batcher as the time elapsed from the moment of adding the very first operation into the batcher. The idea is to flush a batcher when its age is very close to the flush_interval_: let's call it 'batcher flush age'. If current batcher hasn't reached its flush age yet, just re-schedule the task to re-evaluate the age of then-will-be-current batcher, so if the current batcher is still current at that time, it will be exactly of its flush age. Doing so allows to address the issue you pointed at. Line 109: flow_control_cond_.Broadcast(); > Nope, simplifying the implementation is the underlying motivation. But I th OK, I see. It seems that after recent discussions an additional reason appeared: we want to have some sort of parity with the Java client. Line 135: CHECK_GE(timeout_ms, 0); > I think that's reasonable provided it's documented. Done PS15, Line 230: RETURN_NOT_OK(TryApplyWriteOp(write_op, &required_size, : ¤t_flush_mode, ¤t_flush_watermark)); > I don't understand why friendship factors into this: both ApplyWriteOp and I thought you were talking about calling the batcher directly. Yes, I addressed the required_size so it's no longer an output parameter of the TryApplyWriteOp() method. Line 398: data->NextBatcher(session, 0, nullptr); > I find the semantics of -1, 0, and >0 to be non-obvious. Perhaps we can con Done http://gerrit.cloudera.org:8080/#/c/3952/15/src/kudu/client/session-internal.h File src/kudu/client/session-internal.h: PS15, Line 87: On successful return, the output flush_mode parameter is set : // to the effective flush mode. > I don't understand how flush_mode_ can change in the middle of ApplyWriteOp Yes, this is a holdover. The comment was written when I was under impression we allow to have multiple threads working with a single KuduSession object. Also, calls of Apply() from different threads are not relevant here -- actually, the reason was that other thread might call KuduSession::SetFlushMode() while waiting on the condition variable. But now all those concerns are gone. We don't allow to change the mode while operations are pending and we don't allow making calls to a single KuduSession object from multiple threads. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 15 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Adar Dembo has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 15: (8 comments) http://gerrit.cloudera.org:8080/#/c/3952/15/src/kudu/client/client.h File src/kudu/client/client.h: Line 1146: Status SetMutationBufferFlushWatermark(double watermark_pct) > Will do. BTW, do we need getters for those parameters? I think getters mi I don't think it's a big deal either way. Sessions are cheap (especially now that they don't have a dedicated thread to flush in the background) and so I tend to think they're short-lived, which means it's easy for the application to remember what value was provided here. http://gerrit.cloudera.org:8080/#/c/3952/15/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: PS15, Line 64: -1, > This is to force getting a new batcher and flushing the prior, if any, even I don't think the existing behavior is necessarily worth preserving; an empty flush doesn't actually accomplish anything. In any case, that wasn't what my comment was about; I was asking about the semantic difference between '-1' and '0' as the watermark value. Line 66: PeriodicFlushTask(Status::OK(), messenger_, session); > OK, I'll change it then. Sorry, I don't understand what you're suggesting. Can you explain it in more detail? Line 109: flow_control_cond_.Broadcast(); > The original idea was to get rid of those limitations like empty buffer whe Nope, simplifying the implementation is the underlying motivation. But I think it's enough; the clients are generally complicated code and aren't as well tested as the rest of Kudu. Line 135: CHECK_GE(timeout_ms, 0); > Yep. What we can do instead is to set it to 0 if the specified parameter w I think that's reasonable provided it's documented. PS15, Line 230: RETURN_NOT_OK(TryApplyWriteOp(write_op, &required_size, : ¤t_flush_mode, ¤t_flush_watermark)); > Yep, having those facts simplifies this. Again, I was under impression the I don't understand why friendship factors into this: both ApplyWriteOp and TryApplyWriteOp are members of KuduSession::Data, so they should be equivalent w.r.t. calling Batcher::SizeInBuffer(). Line 398: data->NextBatcher(session, 0, nullptr); > Nope. It would, however, if the second parameter were -1 (as you noticed t I find the semantics of -1, 0, and >0 to be non-obvious. Perhaps we can convey the desired behavior from caller to PeriodicFlushTask() in another way? Through an additional parameter, an enum, or something like that? http://gerrit.cloudera.org:8080/#/c/3952/15/src/kudu/client/session-internal.h File src/kudu/client/session-internal.h: PS15, Line 87: On successful return, the output flush_mode parameter is set : // to the effective flush mode. > This is because the flush_mode_ can change in the middle and the code which I don't understand how flush_mode_ can change in the middle of ApplyWriteOp if the session may only be used by a single thread. Or is this a holdover from before, when it was OK for multiple threads to write to a session? -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 15 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 21: (7 comments) Thank you for the review. I addressed the issues you pointed at and will send the updated version with additional changes soon. http://gerrit.cloudera.org:8080/#/c/3952/21//COMMIT_MSG Commit Message: PS21, Line 27: va( > Interval Done http://gerrit.cloudera.org:8080/#/c/3952/21/src/kudu/client/batcher.h File src/kudu/client/batcher.h: PS21, Line 69: const scoped_refptr& error_collector, : const client::sp::weak_ptr& session, > both of these should be taken by value, and then moved into their field. Done http://gerrit.cloudera.org:8080/#/c/3952/21/src/kudu/client/client.h File src/kudu/client/client.h: PS21, Line 1143: when newly submitted : /// operations are about to overflow the buffer > when the newly applied operation would overflow the buffer Done http://gerrit.cloudera.org:8080/#/c/3952/21/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: PS21, Line 428: requied > required Done PS21, Line 451: Aplly > Apply Done http://gerrit.cloudera.org:8080/#/c/3952/21/src/kudu/client/session-internal.h File src/kudu/client/session-internal.h: PS21, Line 86: apply write operation > apply a write operation Done PS21, Line 86: Check whether it's possible > Does this not actually apply the op? If not I think the method should be r Oops, exactly. I messed up those comments, my bad. Will fix. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 21 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Dan Burkert has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 21: (7 comments) http://gerrit.cloudera.org:8080/#/c/3952/21//COMMIT_MSG Commit Message: PS21, Line 27: va( Interval http://gerrit.cloudera.org:8080/#/c/3952/21/src/kudu/client/batcher.h File src/kudu/client/batcher.h: PS21, Line 69: const scoped_refptr& error_collector, : const client::sp::weak_ptr& session, both of these should be taken by value, and then moved into their field. http://gerrit.cloudera.org:8080/#/c/3952/21/src/kudu/client/client.h File src/kudu/client/client.h: PS21, Line 1143: when newly submitted : /// operations are about to overflow the buffer when the newly applied operation would overflow the buffer http://gerrit.cloudera.org:8080/#/c/3952/21/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: PS21, Line 428: requied required PS21, Line 451: Aplly Apply http://gerrit.cloudera.org:8080/#/c/3952/21/src/kudu/client/session-internal.h File src/kudu/client/session-internal.h: PS21, Line 86: apply write operation apply a write operation PS21, Line 86: Check whether it's possible Does this not actually apply the op? If not I think the method should be renamed to 'CheckApplyWriteOp' or similar. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 21 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 21: Build Started http://104.196.14.100/job/kudu-gerrit/3023/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 21 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#21). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for pending operations reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flush logic checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * The over-the-watermark criterion: check whether the total size of the freshly submitted (i.e. not-yet-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * The maximum wait time criterion: check whether the current batch of operations has been accumulating for more than the maximum wait time. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterva() method. A KuduSession object uses RPC messenger's thread pool to monitor batches' maximum wait time. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h 11 files changed, 1,179 insertions(+), 174 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/21 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 21 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 20: Build Started http://104.196.14.100/job/kudu-gerrit/3019/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 20 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#20). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for pending operations reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flush logic checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * The over-the-watermark criterion: check whether the total size of the freshly submitted (i.e. not-yet-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * The maximum wait time criterion: check whether the current batch of operations has been accumulating for more than the maximum wait time. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterva() method. A KuduSession object uses RPC messenger's thread pool to monitor batches' maximum wait time. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h 11 files changed, 1,178 insertions(+), 174 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/20 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 20 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 18: (15 comments) http://gerrit.cloudera.org:8080/#/c/3952/18/src/kudu/client/batcher.cc File src/kudu/client/batcher.cc: PS18, Line 522: Add > Does it make sense to add a coment that callers are expected to serialize d I think the original design of the Batch class is to accommodate multiple callers of this method. http://gerrit.cloudera.org:8080/#/c/3952/16/src/kudu/client/batcher.h File src/kudu/client/batcher.h: Line 168: > typo: the the Done PS16, Line 169: ck > s/is/are/ Done http://gerrit.cloudera.org:8080/#/c/3952/18/src/kudu/client/client.cc File src/kudu/client/client.cc: Line 692: : data_(new KuduSession::Data(client, client->data_->messenger_)) { > I haven't looked at how data_ being set, just wondering if data_ could be N The data_ is allocated in the constructor of the Kudu::Client() object, so it should not be nullptr: if there is not enough memory, std::bad_alloc() should be thrown and it will be termination with SIGABRT due to the unexpected exception. PS18, Line 736: int > uint ? Good point -- I added this to be in line with SetTimeoutMillis(), but using unit would be better. Will change. PS18, Line 740: int > uint ? This comes from the interface which is already published. I'm not sure whether it's OK to change it in an incompatible way now. http://gerrit.cloudera.org:8080/#/c/3952/16/src/kudu/client/client.h File src/kudu/client/client.h: Line 1017: /// Modes of flush operations. > Is there a default mode of op here ? if so, can we mention it in comment ? Yes, there is. You could find it in the description for the AUTO_FLUSH_SYNC mode. Line 1038: /// batch is sent and the reclamed space is available for new operations. > typo: reclamed Done Line 1040: /// @todo Provide an API for the user to specify a callback to do their own > If this is not specific to this mode alone, we could move this todo in the No, this is specific only for this mode. In all other cases it's either possible to specify a callback or it's clear that the callback is not needed (because the flush operation is synchronous). Line 1044: /// (probably the messenger IO threads?). > I guess this todo is already addressed ? Yes, this is clear that it's not needed now -- will remove. http://gerrit.cloudera.org:8080/#/c/3952/19/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: PS19, Line 233: weak_session > any significance of usage of 'weak' session ? :) Yes -- this is to avoid circular dependencies later when NextBatcher is called. It's possible to use shared_ptr here as well, but it should be transformed into weak_ptr in the batcher anyway. PS19, Line 263: ( > Isn't this redundant ? We could as well place NextBatcher above ? No, it's not. Only in rare cases the additional pre-flush is needed (i.e. call of NextBatcher()). Please read on the diagram which you liked. PS19, Line 270: lock_ > I thought whole routine is serialized, so why another spinlock here ? Because there is another actor -- the background 'periodic' flush task which calls NextBatcher() and replaces batcher_. http://gerrit.cloudera.org:8080/#/c/3952/18/src/kudu/client/session-internal.h File src/kudu/client/session-internal.h: PS18, Line 126: to > s/to/for ? Done PS18, Line 165: cond_mutex_ > Trivial nit: I suggest to rename this to flow_control_cond_mutex_ which cou Good idea -- thanks! I'll rename it and leave the comment since it explains why this is different from the lock_ and gives info in which order it's necessary to acquire both, if needed (actually, the code never acquires both). -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 18 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Dinesh Bhat has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 19: (17 comments) Hi Alexey, went through this patch more from knowledge point of view, so feel free to ignore any review comments you find not useful. Also I haven't completely digested all the routines in sesion-internal.cc, so I may as well take a second pass tomorrow later if time permits. http://gerrit.cloudera.org:8080/#/c/3952/18/src/kudu/client/batcher.cc File src/kudu/client/batcher.cc: PS18, Line 522: we Does it make sense to add a coment that callers are expected to serialize during Add ? http://gerrit.cloudera.org:8080/#/c/3952/16/src/kudu/client/batcher.h File src/kudu/client/batcher.h: Line 168: typo: the the PS16, Line 169: ck s/is/are/ http://gerrit.cloudera.org:8080/#/c/3952/17/src/kudu/client/batcher.h File src/kudu/client/batcher.h: PS17, Line 122: int64_t Naive doubt here: If Load() is returning a non-atomic value we seem to be accessing the atomic value of buffer_bytes_used_ in write path vs non-atomic in read path ? http://gerrit.cloudera.org:8080/#/c/3952/18/src/kudu/client/client.cc File src/kudu/client/client.cc: Line 692: : data_(new KuduSession::Data(client, client->data_->messenger_)) { I haven't looked at how data_ being set, just wondering if data_ could be NULL since it seems to be a regular ptr. PS18, Line 736: int uint ? PS18, Line 740: int uint ? http://gerrit.cloudera.org:8080/#/c/3952/16/src/kudu/client/client.h File src/kudu/client/client.h: Line 1017: /// Modes of flush operations. Is there a default mode of op here ? if so, can we mention it in comment ? Line 1038: /// batch is sent and the reclamed space is available for new operations. typo: reclamed Line 1040: /// @todo Provide an API for the user to specify a callback to do their own If this is not specific to this mode alone, we could move this todo in the beginning perhaps ? Line 1044: /// (probably the messenger IO threads?). I guess this todo is already addressed ? http://gerrit.cloudera.org:8080/#/c/3952/18/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: PS18, Line 448: Nice !! Liked these ascii diagrams. http://gerrit.cloudera.org:8080/#/c/3952/19/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: PS19, Line 233: weak_session any significance of usage of 'weak' session ? :) PS19, Line 263: ( Isn't this redundant ? We could as well place NextBatcher above ? do { NextBatcher(); CanApplyWriteOp(&need_pre_flush); } while (need_pre_flush) PS19, Line 270: lock_ I thought whole routine is serialized, so why another spinlock here ? http://gerrit.cloudera.org:8080/#/c/3952/18/src/kudu/client/session-internal.h File src/kudu/client/session-internal.h: PS18, Line 126: to s/to/for ? PS18, Line 165: cond_mutex_ Trivial nit: I suggest to rename this to flow_control_cond_mutex_ which could be self explanatory(and remove al the comment associated). Just a suggestion, feel free to ignore this. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 19: Build Started http://104.196.14.100/job/kudu-gerrit/3016/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#19). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for pending operations reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flush logic checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * The over-the-watermark criterion: check whether the total size of the freshly submitted (i.e. not-yet-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * The maximum wait time criterion: check whether the current batch of operations has been accumulating for more than the maximum wait time. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterva() method. A KuduSession object uses RPC messenger's thread pool to monitor batches' maximum wait time. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h 11 files changed, 1,184 insertions(+), 171 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/19 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 18: Build Started http://104.196.14.100/job/kudu-gerrit/3015/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 18 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#18). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for pending operations reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flush logic checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * The over-the-watermark criterion: check whether the total size of the freshly submitted (i.e. not-yet-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * The maximum wait time criterion: check whether the current batch of operations has been accumulating for more than the maximum wait time. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterva() method. A KuduSession object uses RPC messenger's thread pool to monitor batches' maximum wait time. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h 11 files changed, 1,184 insertions(+), 171 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/18 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 18 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 17: Build Started http://104.196.14.100/job/kudu-gerrit/3014/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 17 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#17). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for pending operations reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flush logic checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * The over-the-watermark criterion: check whether the total size of the freshly submitted (i.e. not-yet-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * The maximum wait time criterion: check whether the current batch of operations has been accumulating for more than the maximum wait time. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterva() method. A KuduSession object uses RPC messenger's thread pool to monitor batches' maximum wait time. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h 11 files changed, 1,184 insertions(+), 171 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/17 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 17 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 16: Build Started http://104.196.14.100/job/kudu-gerrit/3011/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 16 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#16). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for pending operations reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flush logic checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * The over-the-watermark criterion: check whether the total size of the freshly submitted (i.e. not-yet-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * The maximum wait time criterion: check whether the current batch of operations has been accumulating for more than the maximum wait time. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterva() method. A KuduSession object uses RPC messenger's thread pool to monitor batches' maximum wait time. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h 11 files changed, 1,189 insertions(+), 170 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/16 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 16 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 15: (41 comments) Thank you for the review! Will post updated version soon. http://gerrit.cloudera.org:8080/#/c/3952/15//COMMIT_MSG Commit Message: PS15, Line 20: waterline > Nit: I think elsewhere you changed this to watermark; could you do the same Done http://gerrit.cloudera.org:8080/#/c/3952/15/src/kudu/client/client-internal.h File src/kudu/client/client-internal.h: Line 240: private: > In the PIMPL idiom, the impl class is typically completely public. Not sure I just noticed the DISSALLOW_COPY_AND_ASSIGN() does not do what it's supposed to since the private was missing. I think the idea is to protect against unintentional copying. http://gerrit.cloudera.org:8080/#/c/3952/15/src/kudu/client/client.cc File src/kudu/client/client.cc: Line 53: #include "kudu/gutil/strings/human_readable.h" > Is this used for anything? It turned out that it's not used. Will remove. Line 692: : data_(new KuduSession::Data(client, client->data_->messenger_)) { > Since the messenger is reachable through the client, why not have KuduSessi Exactly. It's not possible to declare an embedded class to be a friend (at least I don't know how to do that). I.e., KuduSession::Data is an embedded/internal class, and it's not possible to add it into the frendship list for the KuduClient class. Line 733: Status KuduSession::SetMutationBufferSpace(size_t size) { > One of the simplifying assumptions made by the Java client is that many (if OK, that's a good idea. I'll think in that direction, thanks! http://gerrit.cloudera.org:8080/#/c/3952/15/src/kudu/client/client.h File src/kudu/client/client.h: PS15, Line 1035: In this mode, the Flush() call can be used to block until a batch : /// is sent and the buffer has available space again. > To be clear, Flush() also causes a background flush that normally would hav Yes, exactly. Thank you for making this clear. Will update the comment. PS15, Line 1127: the > Nit: can drop this. Done PS15, Line 1129: pushing > Nit: sending Done PS15, Line 1130: into > Nit: to the Done PS15, Line 1133: OK > Nit: safe Done PS15, Line 1134: does not come into play > Nit: has no effect Done PS15, Line 1139: > Nit: got two spaces here, can you fix and check the rest? Done PS15, Line 1140: by be > What happened here? Oops, that were aliens, probably. PS15, Line 1141: n > And here? Done Line 1146: Status SetMutationBufferFlushWatermark(double watermark_pct) > It would also be useful to describe (in the comment) what the default water Will do. BTW, do we need getters for those parameters? I think getters might be useful here. However, having no getter for the buffer space hints that there was some deliberate decision not having getters. OK, I'll figure it out. PS15, Line 1161: OK > Nit: safe Done PS15, Line 1162: does not come into play > Nit: has no effect Done PS15, Line 1166: internal > Nit: interval Done http://gerrit.cloudera.org:8080/#/c/3952/15/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 57: flush_interval_(MonoDelta::FromMilliseconds(10)) { > The default flush interval seems really low. Isn't it something like 5s in I just thought it supposed to be that low. This is not due to a mistake in the logic which would block sometimes. I'll put 1 second here as in AsyncKuduSesion, as you mentioned. PS15, Line 64: -1, > PeriodicFlushTask uses 0, not -1. Is there a significance to this differenc This is to force getting a new batcher and flushing the prior, if any, even if the prior hasn't any buffered data. That was the default behavior prior to my modifications here: there is a unit test which verifies that upon a call of FlushAsync() even the empty batcher is 'flushed'. Line 66: PeriodicFlushTask(Status::OK(), messenger_, session); > I don't think this is quite the periodic semantics we want. OK, I'll change it then. Basically, the idea is to run the periodic task as it, making re-scheduling itself, but check against the latest flush time when the task executes. There, schedule the task to run next after (last_flush_time + interval - now) interval, so next periodic flush happens not earlier than the period of the background flush task. Does this sound good? Line 74: bytes_flushed = batcher->buffer_bytes_used(); > Pull this out of the lock, presumably it's only flushed_batchers_ that need Done PS15, Line 81: there might be several data pushers > How? Shouldn't there be at most one outstanding thread in Apply(), since on David mentioned that in the very first review of my changes, at least I understood it like 'make sure there is a test which verifies it's safe to call Apply() from multiple threads'. And I implemented that test, actually a couple of th
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Adar Dembo has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 15: (41 comments) Here's my first pass. http://gerrit.cloudera.org:8080/#/c/3952/15//COMMIT_MSG Commit Message: PS15, Line 20: waterline Nit: I think elsewhere you changed this to watermark; could you do the same here, for consistency? http://gerrit.cloudera.org:8080/#/c/3952/15/src/kudu/client/client-internal.h File src/kudu/client/client-internal.h: Line 240: private: In the PIMPL idiom, the impl class is typically completely public. Not sure what privatizing these deleted constructors/operators buys us. http://gerrit.cloudera.org:8080/#/c/3952/15/src/kudu/client/client.cc File src/kudu/client/client.cc: Line 53: #include "kudu/gutil/strings/human_readable.h" Is this used for anything? Line 692: : data_(new KuduSession::Data(client, client->data_->messenger_)) { Since the messenger is reachable through the client, why not have KuduSession::Data do it in the constructor? Is it to avoid more friendship? Line 733: Status KuduSession::SetMutationBufferSpace(size_t size) { One of the simplifying assumptions made by the Java client is that many (if not all) of these setters may only be called when there are no pending operations. I suggest we do the same here; I think it'll obviate the need to broadcast on flow_control_cond_, for one. http://gerrit.cloudera.org:8080/#/c/3952/15/src/kudu/client/client.h File src/kudu/client/client.h: PS15, Line 1035: In this mode, the Flush() call can be used to block until a batch : /// is sent and the buffer has available space again. To be clear, Flush() also causes a background flush that normally would have happened at some point in the near future to happen _now_, right? PS15, Line 1127: the Nit: can drop this. PS15, Line 1129: pushing Nit: sending PS15, Line 1130: into Nit: to the PS15, Line 1133: OK Nit: safe PS15, Line 1134: does not come into play Nit: has no effect PS15, Line 1139: Nit: got two spaces here, can you fix and check the rest? PS15, Line 1140: by be What happened here? PS15, Line 1141: n And here? Line 1146: Status SetMutationBufferFlushWatermark(double watermark_pct) It would also be useful to describe (in the comment) what the default watermark percentage is. PS15, Line 1161: OK Nit: safe PS15, Line 1162: does not come into play Nit: has no effect PS15, Line 1166: internal Nit: interval http://gerrit.cloudera.org:8080/#/c/3952/15/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 57: flush_interval_(MonoDelta::FromMilliseconds(10)) { The default flush interval seems really low. Isn't it something like 5s in Java? Why so low here? PS15, Line 64: -1, PeriodicFlushTask uses 0, not -1. Is there a significance to this difference? If not, can you make them the same? Line 66: PeriodicFlushTask(Status::OK(), messenger_, session); I don't think this is quite the periodic semantics we want. Take a look at the Java client. It only schedules a background flush when the first op is applied. The advantage of that approach is that if you create a session, wait (flush_interval - 1) ms, and apply an operation, you will have to wait an additional flush_interval ms before your operation is automatically flushed. With this approach, you'll get the background flush happening almost immediately, giving the client no time to accumulate more operations into the buffer. Line 74: bytes_flushed = batcher->buffer_bytes_used(); Pull this out of the lock, presumably it's only flushed_batchers_ that needs to be protected. PS15, Line 81: there might be several data pushers How? Shouldn't there be at most one outstanding thread in Apply(), since one KuduSession is meant for one writing thread? PS15, Line 101: flusher Not clear what a "flusher" is (now that there's no dedicated thread). Line 102: std::lock_guard l(cond_mutex_); Why is this taken here? flush_mode_ is not protected by cond_mutex_. Now that I take another look, it seems flush_mode_ can be accessed by the reactor thread in PeriodicFlushTask(), so it does need some synchronization, but it's not being applied consistently yet (nor is it documented). Line 109: flow_control_cond_.Broadcast(); Why do this? We've guaranteed there are no pending operations, so who would care about this broadcast? PS15, Line 115: // The data flow control logic needs to know if the buffer limit changes. : // There might be several threads calling KuduSession::Apply(). Let's simplify by preventing buffer size changes when there are pending operations. Also, if it wasn't already obvious, remember that KuduSession is inherently single-threaded: clients must use a separate KuduSession for each writing thread. Line 122: CHECK_GE(watermark_pct, 0); This seems harsh for a c
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/3952/6/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 30: DEFINE_int32(client_buffer_bytes_limit, 7 * 1024 * 1024, > The symbol hiding logic is only on Linux, which is, amongst other reasons, OK, I see. Thanks! -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Adar Dembo has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/3952/6/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 30: DEFINE_int32(client_buffer_bytes_limit, 7 * 1024 * 1024, > Thank you for the heads-up! The symbol hiding logic is only on Linux, which is, amongst other reasons, why we don't actually ship a macOS C++ client. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/3952/6/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 30: DEFINE_int32(client_buffer_bytes_limit, 7 * 1024 * 1024, > Chiming in late, but I agree with Dan: gflags are totally unusable to C++ c Thank you for the heads-up! Just to verify that the google flags in kudu_client library is not visible to the application it's linking with, I modified the sample.cc -- basically, added DECLARE_int32 and called ParseCommandLineFlags() and HandleCommandLineHelpFlags(). From that I see those flags are available for the client application. At list I see it on MacOS X. I can check that on Linux, if needed. If it turns out the flags are available via the kudu_client library, do we consider it as an issue? -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#15). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. KuduSession objects use RPC messenger's thread pool to schedule background flush tasks. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for operations not pushed to server reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flusher checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * the over-the-waterline criterion: check whether the total size of the freshly submitted (i.e. not-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * the maximum wait time criterion: check whether the previous flush happened more than the maximum wait time ago. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterva() method. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h 11 files changed, 1,115 insertions(+), 158 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/15 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 15 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 15: Build Started http://104.196.14.100/job/kudu-gerrit/2997/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 15 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#14). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. KuduSession objects use RPC messenger's thread pool to schedule background flush tasks. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for operations not pushed to server reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flusher checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * the over-the-waterline criterion: check whether the total size of the freshly submitted (i.e. not-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * the maximum wait time criterion: check whether the previous flush happened more than the maximum wait time ago. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterva() method. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h 11 files changed, 1,115 insertions(+), 158 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/14 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 14: Build Started http://104.196.14.100/job/kudu-gerrit/2990/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#13). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. KuduSession objects use RPC messenger's thread pool to schedule background flush tasks. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for operations not pushed to server reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flusher checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * the over-the-waterline criterion: check whether the total size of the freshly submitted (i.e. not-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * the maximum wait time criterion: check whether the previous flush happened more than the maximum wait time ago. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterva() method. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h M src/kudu/util/monotime.cc M src/kudu/util/monotime.h 13 files changed, 1,121 insertions(+), 160 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/13 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 13: Build Started http://104.196.14.100/job/kudu-gerrit/2989/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 12: Build Started http://104.196.14.100/job/kudu-gerrit/2988/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#12). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. KuduSession objects use RPC messenger's thread pool to schedule background flush tasks. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for operations not pushed to server reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flusher checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * the over-the-waterline criterion: check whether the total size of the freshly submitted (i.e. not-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * the maximum wait time criterion: check whether the previous flush happened more than the maximum wait time ago. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterva() method. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h M src/kudu/util/monotime.cc M src/kudu/util/monotime.h 13 files changed, 1,124 insertions(+), 160 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/12 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 11: Build Started http://104.196.14.100/job/kudu-gerrit/2984/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#11). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. KuduSession objects use RPC messenger's thread pool to schedule background flush tasks. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for operations not pushed to server reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method. The background flusher checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * the over-the-waterline criterion: check whether the total size of the freshly submitted (i.e. not-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the KuduSession::SetMutationBufferFlushWatermark() method. * the maximum wait time criterion: check whether the previous flush happened more than the maximum wait time ago. The maximum wait time can be specified in milliseconds using the KuduSession::SetMutationBufferFlushInterva() method. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h M src/kudu/util/monotime.cc M src/kudu/util/monotime.h 13 files changed, 1,124 insertions(+), 160 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/11 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/3952/9/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 421: flow_control_cond_.Wait(); > The condition awaits till total size of pending operations drops below the I meant both the periodic- and waterline-triggerd flush will notify on update of total_bytes_used_, if any happened. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 10: Build Started http://104.196.14.100/job/kudu-gerrit/2983/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#10). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. KuduSession objects use RPC messenger's thread pool to schedule background flush tasks. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for operations not pushed to server reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method or via the --client_buffer_bytes_limit command-line flag. The background flusher checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * over-the-waterline criterion: check whether the total size of the freshly submitted (i.e. not-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the --client_buffer_flush_waterline_pct command-line flag. * maximum wait time criterion: check whether the previous flush happened more than the maximum wait time ago. The maximum wait time can be specified in milliseconds using the --client_buffer_flush_timeout_ms command-line flag. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h M src/kudu/util/monotime.cc M src/kudu/util/monotime.h 13 files changed, 1,149 insertions(+), 160 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/10 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 9: (22 comments) http://gerrit.cloudera.org:8080/#/c/3952/9/src/kudu/client/batcher.h File src/kudu/client/batcher.h: Line 64: // Takes a reference on error_collector. Creates a weak_ptr to 'session'. > Could you update this comment, especially WRT the error_collector? I can't Done Line 115: static int64_t GetOperationSizeInBuffer(KuduWriteOperation* write_op); > This doesn't seem too useful since it just calls write_op->SizeInBuffer(). I didn't want to introduce an additional KuduSession::Data friend to the Batcher class (and it's not possible because Data is a nested class in the KuduSession class). Probably, there is a more elegant solution for this. Basically, I want to get wire size for KuduWriteOperation. http://gerrit.cloudera.org:8080/#/c/3952/9/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: PS9, Line 555: session > the session Done PS9, Line 557: maximum > the maximum Done PS9, Line 557: 10x multiplier > 10x seems like a lot of leeway. Could we get away with 2 or 3x? I would c Done PS9, Line 2221: Status s; > looks like everywhere this is used the call could instead be wrapped in ASS Somehow I overlooked that :) Thanks! PS9, Line 2307: scenrio > 'scenario' here and below Done PS9, Line 2478: previos > previous Done http://gerrit.cloudera.org:8080/#/c/3952/9/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 36: DEFINE_int32(client_buffer_bytes_limit, 7 * 1024 * 1024, > Remove this and use SetMutationBufferSpace instead. Done Line 49: DEFINE_int32(client_buffer_flush_watermark_pct, 75, > Instead of a gflag, this should be an option on KuduSession, like SetMutati Done Line 61: DEFINE_int32(client_buffer_flush_timeout_ms, 10, > Same here, this should be an option on KuduSession. https://kudu.apache.or Done PS9, Line 82: : > wrap init list. Done Line 115: messenger->ScheduleOnReactor( > It's not clear to me why this needs to be scheduled on a reactor thread. I Good point! The original design had a separate thread which was busy with all flush-related activity, and I just transferred it on the new scheme with messenger/reactor threads. This should not be blocking, so I can safely call it from the same thread where CheckAndRun is being executed. PS9, Line 154: if (PREDICT_FALSE(!status.ok())) { : return; : } : // Check that the session is still alive to access the data safely. : sp::shared_ptr session(weak_session.lock()); : if (PREDICT_FALSE(!session)) { : return; : } : // Flush mode could change during the operation. If current mode : // is no longer AUTO_FLUSH_BACKGROUND, then stop re-scheduling the task. : if (PREDICT_FALSE(data_->flush_mode_ != AUTO_FLUSH_BACKGROUND)) { : return; : } : // Detach the batcher, schedule its flushing, : // and install a new one to accumulate incoming write operations. : data_->NextBatcher(weak_session, 0, nullptr); > Looks like this might be able to call Run instead of duplicating? Done PS9, Line 205: MutexLock > prefer to use std::lock_guard if possible. Done PS9, Line 392: int64_t* op_raw_size > is this still necessary now that the size is cached? yep -- that's the issue of visibility of a private method. I cannot make an internal class a friend to the Batcher class. Line 421: flow_control_cond_.Wait(); > Is a flush forced somewhere in this situation, or is it just waiting for th The condition awaits till total size of pending operations drops below the limit. This is enforced by the logic of the code. If no updates happen on total_bytes_used_, this will wait forever. http://gerrit.cloudera.org:8080/#/c/3952/9/src/kudu/client/session-internal.h File src/kudu/client/session-internal.h: PS9, Line 111: BACKGFOUND > BACKGROUND Done Line 126: ~BackgroundFlusher(); > I think this can use the default keyword instead of defining the no-op dtor OK, I just removed it to rely on auto-generated ones. Line 136: void CheckAndRun(const std::shared_ptr& messenger, > Could you doc this? I can't tell exactly what it's for, or why it chooses Renamed and added more comments. PS9, Line 151: KuduSession::Data* const > It looks like all of the methods on the background flusher that do any work It's a reference (a pointer, actually) to the outer/enclosing object. It's possible to keep those weak pointers as members of the BackgroundFlusher, but when a BackgroundFlusher object, as a sub-object of KuduSession::Data, is deallocated on destruction of the enclosing KuduSession::Data object, those members cannot
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Adar Dembo has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/3952/6/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 30: DEFINE_int32(client_buffer_bytes_limit, 7 * 1024 * 1024, > Yes, this reasoning makes sense to me. Since David suggested to use glags Chiming in late, but I agree with Dan: gflags are totally unusable to C++ client applications. The client build hides all classes and functions that aren't marked as KUDU_EXPORT. This means DECLARE_foo() gflags will get hidden and won't be accessible to the linking application, preventing that application from using DEFINE_foo() and manipulating the flag value at runtime. Meaning, even if the application went through the trouble of linking its own copy of gflags and running google::ParseCommandLineFlags(), it wouldn't "see" the flags declared in the C++ client and any flag values passed on the command line will be treated as unknown. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Dan Burkert has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 9: (22 comments) http://gerrit.cloudera.org:8080/#/c/3952/9/src/kudu/client/batcher.h File src/kudu/client/batcher.h: Line 64: // Takes a reference on error_collector. Creates a weak_ptr to 'session'. Could you update this comment, especially WRT the error_collector? I can't tell if the intent is that this constructor takes ownership of the error collector, or if it's borrowed. Line 115: static int64_t GetOperationSizeInBuffer(KuduWriteOperation* write_op); This doesn't seem too useful since it just calls write_op->SizeInBuffer(). Any reason to prefer having it on the batcher? http://gerrit.cloudera.org:8080/#/c/3952/9/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: PS9, Line 555: session the session PS9, Line 557: 10x multiplier 10x seems like a lot of leeway. Could we get away with 2 or 3x? I would consider it a bug if we routinely took 10x longer than the deadline. PS9, Line 557: maximum the maximum PS9, Line 2221: Status s; looks like everywhere this is used the call could instead be wrapped in ASSERT_OK. I think that also tends to give better error messages on failure. PS9, Line 2307: scenrio 'scenario' here and below PS9, Line 2478: previos previous http://gerrit.cloudera.org:8080/#/c/3952/9/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 36: DEFINE_int32(client_buffer_bytes_limit, 7 * 1024 * 1024, Remove this and use SetMutationBufferSpace instead. Line 49: DEFINE_int32(client_buffer_flush_watermark_pct, 75, Instead of a gflag, this should be an option on KuduSession, like SetMutationBufferSpace. This also mirrors the equivalent Java options: https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setMutationBufferLowWatermark-float- Line 61: DEFINE_int32(client_buffer_flush_timeout_ms, 10, Same here, this should be an option on KuduSession. https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setFlushInterval-int- PS9, Line 82: : wrap init list. Line 115: messenger->ScheduleOnReactor( It's not clear to me why this needs to be scheduled on a reactor thread. I get that you don't want to block the user thread on the apply call, but if BackgroundFlusher::Run blocks, it's not appropriate to be running on the reactor thread anyway. PS9, Line 154: if (PREDICT_FALSE(!status.ok())) { : return; : } : // Check that the session is still alive to access the data safely. : sp::shared_ptr session(weak_session.lock()); : if (PREDICT_FALSE(!session)) { : return; : } : // Flush mode could change during the operation. If current mode : // is no longer AUTO_FLUSH_BACKGROUND, then stop re-scheduling the task. : if (PREDICT_FALSE(data_->flush_mode_ != AUTO_FLUSH_BACKGROUND)) { : return; : } : // Detach the batcher, schedule its flushing, : // and install a new one to accumulate incoming write operations. : data_->NextBatcher(weak_session, 0, nullptr); Looks like this might be able to call Run instead of duplicating? PS9, Line 205: MutexLock prefer to use std::lock_guard if possible. PS9, Line 392: int64_t* op_raw_size is this still necessary now that the size is cached? Line 421: flow_control_cond_.Wait(); Is a flush forced somewhere in this situation, or is it just waiting for the periodic flush? http://gerrit.cloudera.org:8080/#/c/3952/9/src/kudu/client/session-internal.h File src/kudu/client/session-internal.h: PS9, Line 111: BACKGFOUND BACKGROUND Line 126: ~BackgroundFlusher(); I think this can use the default keyword instead of defining the no-op dtor in session-internal.cc. Line 136: void CheckAndRun(const std::shared_ptr& messenger, Could you doc this? I can't tell exactly what it's for, or why it chooses to run the flush on the reactor thread with no wait. PS9, Line 151: KuduSession::Data* const It looks like all of the methods on the background flusher that do any work take a weak_ptr. I can't quite follow why this raw ptr is held, and the methods take the weak_ptr passed in. Could a weak_ptr instead be passed in as part of the ctor and stored in a field, and then the methods above could use that? I think that would also remove the dtor ordering thing below which is pretty fragile, since weak_ptrs can be checked before use. PS9, Line 159: write operation the write operation -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-P
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 9: Build Started http://104.196.14.100/job/kudu-gerrit/2968/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#9). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. KuduSession objects use RPC messenger's thread pool to schedule background flush tasks. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for operations not pushed to server reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method or via the --client_buffer_bytes_limit command-line flag. The background flusher checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * over-the-waterline criterion: check whether the total size of the freshly submitted (i.e. not-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the --client_buffer_flush_waterline_pct command-line flag. * maximum wait time criterion: check whether the previous flush happened more than the maximum wait time ago. The maximum wait time can be specified in milliseconds using the --client_buffer_flush_timeout_ms command-line flag. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h M src/kudu/util/monotime.cc M src/kudu/util/monotime.h 13 files changed, 1,162 insertions(+), 154 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/9 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/3952/6/src/kudu/client/client.cc File src/kudu/client/client.cc: Line 400: return shared_ptr(); > I'm suggesting that maybe it's not needed, because the operation is infalli That's true -- the method returned Status since I brought it from the previous revision where I used a separate flusher thread, and Start() could return an error if it could not start the flusher thread. Thank you for pointing at that -- I removed the Status return code, so now it returns void. http://gerrit.cloudera.org:8080/#/c/3952/6/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 30: DEFINE_int32(client_buffer_bytes_limit, 7 * 1024 * 1024, > We don't want to impose a configuration framework on clients. Imagine if t Yes, this reasoning makes sense to me. Since David suggested to use glags for those parameters, I though that was a good idea. Probably, David did not mean to expose those to clients. But then the only usage for them is throughout the tests. OK, that sounds good to me. If there is a necessity to expose those to client applications, we can add that later. Line 98: messenger->ScheduleOnReactor( > I'm not quite following. It sounds like the C++ client does not have an eq This code is about determine whether to perform flush. This is not about determine whether to block (that when-to-block code you could find in KuduSession::Data::ApplyWriteOp()). The waterline/watermark criterion is to determine when it's necessary to flush the current batcher. The idea is too check how much of the buffer is taken by the freshly submitted (not yet scheduled for flush) operations: if it's more than the specified percentage, then flush the current batcher. Only the current batcher has freshly submitted (not yet scheduled for flush) operations. In my previous comment, I tried to point to the presence of batchers in the C++ client code. Probably, the confusion is due to the fact that Java implementation does not have those. In C++ client, there might be several batchers accommodating pending operations. However, the limit on the total size of pending operations is enforced -- in total, all currently active batchers cannot consume more space for their pending operations than the limit specified via the KuduSession::SetMutationBufferSpace() method. Does it help to resolve the confusion (if any)? http://gerrit.cloudera.org:8080/#/c/3952/6/src/kudu/util/monotime.h File src/kudu/util/monotime.h: Line 133: friend MonoTime operator -(const MonoTime& t, const MonoDelta& delta); > Thanks! Actually, thank you for pointing to this. For some reason I overlooked the fact that having small independent patches would benefit everyone. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Dan Burkert has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 6: (5 comments) http://gerrit.cloudera.org:8080/#/c/3952/6/src/kudu/client/client.cc File src/kudu/client/client.cc: Line 400: return shared_ptr(); > OK, that sounds reasonable. Do I understand correctly that you suggest to I'm suggesting that maybe it's not needed, because the operation is infallible. It looks like you added Status returns instead of void in a bunch of places, but I couldn't find any places which could actually fail. http://gerrit.cloudera.org:8080/#/c/3952/6/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 30: DEFINE_int32(client_buffer_bytes_limit, 7 * 1024 * 1024, > Why not? If we exposed them in the documentation, a user could link a bina We don't want to impose a configuration framework on clients. Imagine if the application used multiple libraries, all of which came with their own configuration framework. This is a real issue in Hadoop where the HDFS client libraries ship with a configuration framework, and it's an absolute disaster. PS6, Line 91: waterline > That's fun, because originally I had put 'watermark' and then figured that Yah, I just suggest watermark to keep it consistent with the Java impl. In reality the words are synonymous. Line 98: messenger->ScheduleOnReactor( > Yes, the Java implementation is different -- it does not have batchers and I'm not quite following. It sounds like the C++ client does not have an equivalent to the PleaseThrottle exception, and instead we block when there are no buffers available. That makes sense. But why in that circumstance is there a watermark? Why would you want to block earlier than necessary? http://gerrit.cloudera.org:8080/#/c/3952/6/src/kudu/util/monotime.h File src/kudu/util/monotime.h: Line 133: friend MonoTime operator -(const MonoTime& t, const MonoDelta& delta); > I added them here because the tests I added rely on those. Thanks! -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#8). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. KuduSession objects use RPC messenger's thread pool to schedule background flush tasks. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for operations not pushed to server reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method or via the --client_buffer_bytes_limit command-line flag. The background flusher checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * over-the-waterline criterion: check whether the total size of the freshly submitted (i.e. not-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the --client_buffer_flush_waterline_pct command-line flag. * maximum wait time criterion: check whether the previous flush happened more than the maximum wait time ago. The maximum wait time can be specified in milliseconds using the --client_buffer_flush_timeout_ms command-line flag. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h M src/kudu/util/monotime.cc M src/kudu/util/monotime.h 13 files changed, 1,209 insertions(+), 154 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/8 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 8: Build Started http://104.196.14.100/job/kudu-gerrit/2940/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 6: (7 comments) Thank you for the review! I posted a new version (patchset 7) because the LINT build failed for patchset 6, so I wanted to fix that. http://gerrit.cloudera.org:8080/#/c/3952/6/src/kudu/client/client.cc File src/kudu/client/client.cc: Line 400: return shared_ptr(); > I've traced this through, and I can't find any place that this can actually OK, that sounds reasonable. Do I understand correctly that you suggest to remove this check at all? http://gerrit.cloudera.org:8080/#/c/3952/6/src/kudu/client/client.h File src/kudu/client/client.h: PS6, Line 1029: be > s/be/become Done http://gerrit.cloudera.org:8080/#/c/3952/6/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 30: DEFINE_int32(client_buffer_bytes_limit, 7 * 1024 * 1024, > I think gflags don't apply to the client, so these will not really be confi Why not? If we exposed them in the documentation, a user could link a binary with the library having those flags exposed. But I agree: making those accessible via the client API would be easier to use, otherwise those are left hidden. However, I don't like idea of mutators for there -- I would better set them in the constructor and keep then unchanged. Probably, those might be properties of some sort of configuration object. How do you like the following: I'll mark those as hidden flags, which we will use only for testing purposes. Later on, we can expose controls for those up to the client API level. Does this make sense? Line 59: KuduSession::Data::BackgroundFlusher::BackgroundFlusher( > formatting (no wrap on initial argument, and 4 spaces for initializer list) OK, will fix. PS6, Line 91: waterline > We typically use 'watermark' instead of 'waterline' (at least in the Java i That's fun, because originally I had put 'watermark' and then figured that that was not the best term for that: https://en.wikipedia.org/wiki/Watermark_(disambiguation) OK, I'll replace it with watermark. Line 98: messenger->ScheduleOnReactor( > This looks like it's flushing the current buffer if it's over the waterline Yes, the Java implementation is different -- it does not have batchers and other stuff. Also, here the waterline is about amount of freshly added operations, not total size of the pending operations. http://gerrit.cloudera.org:8080/#/c/3952/6/src/kudu/util/monotime.h File src/kudu/util/monotime.h: Line 133: friend MonoTime operator -(const MonoTime& t, const MonoDelta& delta); > Maybe split the changes to MonoTime into its own review. I added them here because the tests I added rely on those. All right, having that as a separate change will be better from codereview's point. Will separate into a stand-alone change. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Dan Burkert has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 7: (7 comments) Incremental review, since I see a new revision has been put up. I don't think this is going to make 0.10, so I'm going to preempt it in the meantime. http://gerrit.cloudera.org:8080/#/c/3952/6/src/kudu/client/client.cc File src/kudu/client/client.cc: Line 400: return shared_ptr(); I've traced this through, and I can't find any place that this can actually fail. I think as it's written will be quite problematic, I can see this being a common source of nullptr dereferences (if in fact there are cases where it can fail). http://gerrit.cloudera.org:8080/#/c/3952/6/src/kudu/client/client.h File src/kudu/client/client.h: PS6, Line 1029: be s/be/become http://gerrit.cloudera.org:8080/#/c/3952/6/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 30: I think gflags don't apply to the client, so these will not really be configurable. I think setting the default to be a constant and exposing mutators on the Session instance is the best we can do. Line 59: formatting (no wrap on initial argument, and 4 spaces for initializer list). https://google.github.io/styleguide/cppguide.html#Constructor_Initializer_Lists. A lot of the methods below are wrapped too much as well. PS6, Line 91: We typically use 'watermark' instead of 'waterline' (at least in the Java impl). Line 98: const bool over_waterline = This looks like it's flushing the current buffer if it's over the waterline, but that's not exactly what the Java client does. If a write comes in over the waterline, a PleaseThrottle exception is passed back to the user. Also, this only ever happens if there isn't a free buffer available. http://gerrit.cloudera.org:8080/#/c/3952/6/src/kudu/util/monotime.h File src/kudu/util/monotime.h: Line 133: friend MonoTime operator -(const MonoTime& t, const MonoDelta& delta); Maybe split the changes to MonoTime into its own review. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#7). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. KuduSession objects use RPC messenger's thread pool to schedule background flush tasks. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for operations not pushed to server reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method or via the --client_buffer_bytes_limit command-line flag. The background flusher checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * over-the-waterline criterion: check whether the total size of the freshly submitted (i.e. not-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the --client_buffer_flush_waterline_pct command-line flag. * maximum wait time criterion: check whether the previous flush happened more than the maximum wait time ago. The maximum wait time can be specified in milliseconds using the --client_buffer_flush_timeout_ms command-line flag. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h M src/kudu/util/monotime.cc M src/kudu/util/monotime.h 13 files changed, 1,221 insertions(+), 155 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/7 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 7: Build Started http://104.196.14.100/job/kudu-gerrit/2920/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 6: Build Started http://104.196.14.100/job/kudu-gerrit/2916/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#6). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. KuduSession objects use RPC messenger's thread pool to schedule background flush tasks. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for operations not pushed to server reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method or via the --client_buffer_bytes_limit command-line flag. The background flusher checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * over-the-waterline criterion: check whether the total size of the freshly submitted (i.e. not-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the --client_buffer_flush_waterline_pct command-line flag. * maximum wait time criterion: check whether the previous flush happened more than the maximum wait time ago. The maximum wait time can be specified in milliseconds using the --client_buffer_flush_timeout_ms command-line flag. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h M src/kudu/util/env_posix.cc M src/kudu/util/monotime.cc M src/kudu/util/monotime.h 14 files changed, 1,218 insertions(+), 155 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/6 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#5). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. KuduSession objects use RPC messenger's thread pool to schedule background flush tasks. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for operations not pushed to server reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method or via the --client_buffer_bytes_limit command-line flag. The background flusher checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * over-the-waterline criterion: check whether the total size of the freshly submitted (i.e. not-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the --client_buffer_flush_waterline_pct command-line flag. * maximum wait time criterion: check whether the previous flush happened more than the maximum wait time ago. The maximum wait time can be specified in milliseconds using the --client_buffer_flush_timeout_ms command-line flag. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/util/env_posix.cc M src/kudu/util/monotime.cc M src/kudu/util/monotime.h 12 files changed, 1,138 insertions(+), 151 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/5 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 5: Build Started http://104.196.14.100/job/kudu-gerrit/2896/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#4). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. KuduSession objects use RPC messenger's thread pool to schedule background flush tasks. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for operations not pushed to server reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method or via the --client_buffer_bytes_limit command-line flag. The background flusher checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * over-the-waterline criterion: check whether the total size of the freshly submitted (i.e. not-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the --client_buffer_flush_waterline_pct command-line flag. * maximum wait time criterion: check whether the previous flush happened more than the maximum wait time ago. The maximum wait time can be specified in milliseconds using the --client_buffer_flush_timeout_ms command-line flag. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/util/monotime.cc M src/kudu/util/monotime.h M src/kudu/util/stopwatch.h 12 files changed, 990 insertions(+), 151 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/4 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 4: Build Started http://104.196.14.100/job/kudu-gerrit/2861/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#3). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. KuduSession objects use RPC messenger's thread pool to schedule background flush tasks. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for operations not pushed to server reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method or via the --client_buffer_bytes_limit command-line flag. The background flusher checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * over-the-waterline criterion: check whether the total size of the freshly submitted (i.e. not-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the --client_buffer_flush_waterline_pct command-line flag. * maximum wait time criterion: check whether the previous flush happened more than the maximum wait time ago. The maximum wait time can be specified in milliseconds using the --client_buffer_flush_timeout_ms command-line flag. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/util/monotime.cc M src/kudu/util/monotime.h M src/kudu/util/stopwatch.h 12 files changed, 970 insertions(+), 151 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/3 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 3: Build Started http://104.196.14.100/job/kudu-gerrit/2860/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/2837/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3952 to look at the new patch set (#2). Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for the Kudu C++ client library. KuduSession objects use RPC messenger's thread pool to schedule background flush tasks. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for operations not pushed to server reaches the buffer size limit. The limit on the buffer size can be set by the KuduSession::SetMutationBufferSpace() method or via the --client_buffer_bytes_limit command-line flag. The background flusher checks whether at least one of the following two conditions is satisfied to determine whether it's time to flush the accumulated write operations: * over-the-waterline criterion: check whether the total size of the freshly submitted (i.e. not-scheduled-for-flush) write operations is over the threshold. The threshold can be set as a percentage of the total buffer size using the --client_buffer_flush_waterline_pct command-line flag. * maximum wait time criterion: check whether the previous flush happened more than the maximum wait time ago. The maximum wait time can be specified in milliseconds using the --client_buffer_flush_timeout_ms command-line flag. This change also addresses the following JIRA issue: KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/util/monotime.cc M src/kudu/util/monotime.h 11 files changed, 966 insertions(+), 148 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/2 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has abandoned this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Abandoned Abandoned in favor of https://gerrit.cloudera.org/#/c/3952/ The new approach uses RPC messenger thread pool instead of spawning a thread per KuduSession object. -- To view, visit http://gerrit.cloudera.org:8080/3668 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Idfde05bef342db24990c6e3da3b0270c3bb37a9d Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/2827/ -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Alexey Serbin has uploaded a new change for review. http://gerrit.cloudera.org:8080/3952 Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode Implemented AUTO_FLUSH_BACKGROUND for C++ client library. KuduSession instance uses RPC messenger's thread pool to schedule background flush tasks. In AUTO_FLUSH_BACKGROUND mode, the KuduSession::Apply() method blocks if total amount of data for operations not pushed to server reaches limit on the buffer size. The limit on the buffer size can be set by KuduSession::SetMutationBufferSpace() or via the --client_buffer_bytes_limit command-line flag. The background flusher uses high-watermark approach for its flush criterion: it counts buffer size used by not-yet-flushed (i.e. freshly submitted) write operations and checks whether the specified threshold is reached. The threshold can be set via the --client_buffer_hw_pct command-line flag as a percentage of the total buffer space. This change also addresses the following JIRA issue(s): KUDU-1376 KuduSession::SetMutationBufferSpace is not defined Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 --- M python/kudu/tests/test_client.py M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.h M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/util/monotime.cc M src/kudu/util/monotime.h 11 files changed, 951 insertions(+), 149 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3952/1 -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin
[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode
Kudu Jenkins has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode .. Patch Set 8: Build Started http://104.196.14.100/job/kudu-gerrit/2811/ -- To view, visit http://gerrit.cloudera.org:8080/3668 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idfde05bef342db24990c6e3da3b0270c3bb37a9d Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No