Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/17129 )
Change subject: KUDU-3248: Match C++ replica selection behavior of Java client ...................................................................... Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG@15 PS3, Line 15: This is expected to be a good balance > How do we know? What if a single process does a lot of full table scans? That single process is itself not distributed in nature at that point and I would expect if it is doing multiple full scans of the same tablet it would be a good choice to pick the already warm tablet. IMPALA-10481 describes why our current choice is suboptimal and this change aims to address that concern. We do plan to performance test and validate this change Impala side as well. The one case I could see where a single process might want to use multiple tablet replicas is the case where the tablet splitting feature is used on the same process, though there hasn't been a lot of performance evaluation of that yet. IMPALA-9792 is still open tracking that evaluation from the Impala side (Which is the only known usage today). http://gerrit.cloudera.org:8080/#/c/17129/3/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: http://gerrit.cloudera.org:8080/#/c/17129/3/src/kudu/client/client-internal.cc@191 PS3, Line 191: still benefit from spreading the load across replicas > I meant we already have a recommendation to have just a single client per a I didn't have a good reason to suspect that per-client would necessarily be a better choice than per process. Especially considering we often see a single client per process anyway. The benefit of per process is that it matches the current java behavior and it means that multiple clients can be used without impacting the selection affinity on a given process. I could see a world where both is beneficial but didn't want to overcomplicate the change. I think the "ultimate" fix would be to allow to user to provide their own "seed" for replica selection randomness. I think that could be a useful follow on change to track with a jira. -- To view, visit http://gerrit.cloudera.org:8080/17129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04 Gerrit-Change-Number: 17129 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 20:09:00 +0000 Gerrit-HasComments: Yes
