Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17152 )

Change subject: KUDU-3254 fix bug in meta-cache exposed by KUDU-1802
......................................................................


Patch Set 6:

(5 comments)

> (2 comments)
 >
 > Thanks for the fix and test. This test should also be added to
 > master right?
 >
 > It looks like there are test failures due to unrelated python infra
 > issues on the branch.

Yep, I'm planning to back-port the test scenario to the main branch from 
1.13.x: the 'fix' is already there.

http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/meta_cache.cc
File src/kudu/client/meta_cache.cc:

http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/meta_cache.cc@932
PS6, Line 932:       if (remote.get() != nullptr) {
> Ah, so this issue only exists if a range partition was dropped and then rep
Yep: in pre-KUDU-1802 case, the crash could only happen if the range partition 
was dropped and then replaced with the same range again.  After KUDU-1802, the 
crash happened even if a range was just dropped, when the client instance was 
fed with 'stale' scan tokens.

In the test scenario, the RANGE_DROPPED case is the essence of the culprit, the 
rest are just variations on top.  I added sub-scenarios for larger and smaller 
ranges in there just to explicitly document how that case works w.r.t. 
discovering the corresponding 'mapped' ranges, so it's clear what rows are read 
given a set of 'stale' scan tokens.


http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/meta_cache.cc@943
PS6, Line 943:         auto* entry = FindOrNull(tablets_by_key, 
tablet_lower_bound);
> I generally disagree that we shouldn't use FindOrDie in production code. It
Yep, I agree: FindOrDie() should be used when there isn't a way to gracefully 
handle the situation otherwise.  For example, in many cases continuing further 
could damage the integrity of the data or even worse -- that's definitely a 
case to use FindOrDie().  In all other cases, it makes sense to use 
FindOrNull() or alike, sure.

FWIW, I think it makes sense to revise the usage of FindOrDie() in the code 
since I suspect that in many cases FindOrDie() is used only because it was 
easier to write it like that and not handle the error otherwise, even if it 
were possible to handle the error condition gracefully.


http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/meta_cache.cc@955
PS6, Line 955:
> nit: replace with spaces
Done


http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/scan_token-test.cc
File src/kudu/client/scan_token-test.cc:

http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/scan_token-test.cc@950
PS6, Line 950:
> nit: spacing
Done


http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/scan_token-test.cc@1052
PS6, Line 1052:       [[fallthrough]];
> nit: maybe use FALLTHROUGH_INTENDED? Or maybe we should deprecate that in f
OK, I guess keeping it uniform makes more sense: replaced with 
FALLTHROUGH_INTENDED.  I think I can post a separate patch replacing 
FALLTHROUGH_INTENDED with [[fallthrough]].



--
To view, visit http://gerrit.cloudera.org:8080/17152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.13.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Gerrit-Change-Number: 17152
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sun, 07 Mar 2021 07:18:25 +0000
Gerrit-HasComments: Yes

Reply via email to