Zoltan Chovan has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22867 )

Change subject: Add option to send no-op heartbeat operations batched PART1
......................................................................


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/22867/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22867/5//COMMIT_MSG@10
PS5, Line 10: CPU and networking
maybe "CPU and network resources"?


http://gerrit.cloudera.org:8080/#/c/22867/5//COMMIT_MSG@34
PS5, Line 34: in
at


http://gerrit.cloudera.org:8080/#/c/22867/5//COMMIT_MSG@41
PS5, Line 41: sigle
single


http://gerrit.cloudera.org:8080/#/c/22867/5//COMMIT_MSG@44
PS5, Line 44: hearthbeat is waiting
heartbeats are waiting


http://gerrit.cloudera.org:8080/#/c/22867/5//COMMIT_MSG@44
PS5, Line 44: write
writes


http://gerrit.cloudera.org:8080/#/c/22867/5//COMMIT_MSG@59
PS5, Line 59: Next 2 parts that are needed:
            :
            : 1) Process response in multiple threads:
            :   If we start to write multiple tablets at the same time that are 
in the
            :   same buffer, then after the flush, when their responses arrive, 
the new
            :   heartbeats (with operations) will be sent out on the same thread
            :   (unbatched, so their responses will be multi-threaded, and Kudu 
will
            :   return back to normal).
            :
            :   This would rarely cause problems on a usual cluster. However, 
if you
            :   have a 3 tserver setup with a single table having 30 tablets 
with hash
            :   partitions, it can add multiple seconds of delay to the write
            :   operation (but not increase the overall CPU consumption): If 
all 30
            :   heartbeats are waiting in the buffer, one of the writes will 
flush it.
            :   When the response arrives back, we will process it in a single 
thread.
            :   We will send out 30 updates with actual operations on this 
single
            :   thread.
            :
            :   Possible solution:
            :   + Keep track if we are called in batch mode and if there was 
already
            :     1-2 "send_more_immediately" cases, then request a callback 
instead
            :     of sending the message immediately.
            :
            : 2) If a write request finds a no-op message still in the
            :   buffer, it should discard it, not flush the buffer. It would 
make the
            :   problem in 1) appear much less frequently (and stabilize the 
unit
            :   tests that are now flaky with 
enable_multi_raft_heartbeat_batcher=1),
            :   so this should be done after 1) is implemented (so we do not 
hide it).
            :
This seems like it's a duplicate of the previous part



--
To view, visit http://gerrit.cloudera.org:8080/22867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie92ba4de5eae00d56cd513cb644dce8fb6e14538
Gerrit-Change-Number: 22867
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Martonka <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Zoltan Chovan <[email protected]>
Gerrit-Comment-Date: Wed, 28 May 2025 09:25:09 +0000
Gerrit-HasComments: Yes

Reply via email to