Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20270 )
Change subject: KUDU-3461 [client] Avoid impala crash by returning error if invalid tablet id found ...................................................................... Patch Set 10: (5 comments) http://gerrit.cloudera.org:8080/#/c/20270/10/src/kudu/client/meta_cache.cc File src/kudu/client/meta_cache.cc: http://gerrit.cloudera.org:8080/#/c/20270/10/src/kudu/client/meta_cache.cc@1291 PS10, Line 1291: Status MetaCache::FastLookupTabletByKey(const KuduTable* table, : PartitionKey partition_key, : MetaCache::LookupType lookup_type, : scoped_refptr<RemoteTablet>* remote_tablet) { : return DoFastPathLookup(table, &partition_key, lookup_type, remote_tablet); : } > The issue at hand here is that PickLeader logic could potentially run into All right: as I understand, the MetaCache isn't aware of the context of tablets' invalidation: it just helps in looking up for a tablet that covers a particular range or contains some particular key. So, the tablet invalidation have to be handled at the higher level then, IIUC. OK, but with that I think the proper way of handling such 'tablet invalidation' would be marking the corresponding RemoteTablet object as invalid in addition to what's being done in MetaCacheServerPicker::PickLeader(). And the code that tries to read from and write into should be aware of that, I guess. Probably, it should be something similar to the 'RemoteTablet::stale_' field, but it has different semantics. http://gerrit.cloudera.org:8080/#/c/20270/12/src/kudu/client/meta_cache.cc File src/kudu/client/meta_cache.cc: http://gerrit.cloudera.org:8080/#/c/20270/12/src/kudu/client/meta_cache.cc@542 PS12, Line 542: tablet nit: the tablet ? http://gerrit.cloudera.org:8080/#/c/20270/12/src/kudu/client/meta_cache.cc@542 PS12, Line 542: local nit: why 'local'? Is there a 'remote' metacache as well? http://gerrit.cloudera.org:8080/#/c/20270/12/src/kudu/client/meta_cache.cc@552 PS12, Line 552: emote_tablet- What if the range is now a non-covered one? The 'remote_tablet' will be a wrapper around nullptr then, so this is going to crash with SIGSEGV, no? BTW, it would be great to add an test for such a scenario to make sure the code handles that situation as expected. http://gerrit.cloudera.org:8080/#/c/20270/12/src/kudu/client/meta_cache.cc@552 PS12, Line 552: ablet_id() != table I'm curious: is it even possible to have tablet_id empty? If yes, what's the semantics of such a RemoteTablet object? Would be great to have a comment about that in the code. -- To view, visit http://gerrit.cloudera.org:8080/20270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia09cf6fb1b1d10f1ad13a62b5c863bcd1e3ab26a Gerrit-Change-Number: 20270 Gerrit-PatchSet: 10 Gerrit-Owner: Ashwani Raina <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <[email protected]> Gerrit-Comment-Date: Wed, 06 Sep 2023 00:46:12 +0000 Gerrit-HasComments: Yes
