Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20270 )
Change subject: KUDU-3461 [client] Avoid impala crash by returning error if invalid tablet id found ...................................................................... Patch Set 12: (3 comments) http://gerrit.cloudera.org:8080/#/c/20270/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20270/10//COMMIT_MSG@9 PS10, Line 9: if Java client issued a DDL op : on the partition where data is inserted into the partition that was altered : by java client in between. nit: the fact that the other client is Kudu _Java_ client isn't important and might be even confusing, so maybe drop the notion of 'Java client', and use just 'another client'? http://gerrit.cloudera.org:8080/#/c/20270/10/src/kudu/client/meta_cache.cc File src/kudu/client/meta_cache.cc: http://gerrit.cloudera.org:8080/#/c/20270/10/src/kudu/client/meta_cache.cc@1291 PS10, Line 1291: : Status MetaCache::FastLookupTabletByKey(const KuduTable* table, : PartitionKey partition_key, : MetaCache::LookupType lookup_type, : scoped_refptr<RemoteTablet>* remote_tablet) { : > Tablet invalidation happens as part of lookup via slowpath when client asks Yeah, but invalidation in terms of a stale entries or tablet-dropped-or-moved-somewhere-else is something different, IIUC. With replacing tablets for the same range, that's a different sort of 'invalidation' since it cannot be expressed in attributes of the same RemoteTablet object. Basically, that's what the metacache isn't able to handle since it has no means to operate beyond of MetaCache entries (that contain corresponding RemoteTablet objects). OK, I guess the decision on handling this in some different (and maybe more generic) way should be done after doing some work on KUDU-3506, but at this point it's premature to think of doing anything else once the issue in the write path is addressed by this patch. With that, I don't have any other follow-up requests for the topic discussed in this thread in the scope of this patch. Thanks! http://gerrit.cloudera.org:8080/#/c/20270/12/src/kudu/client/meta_cache.cc File src/kudu/client/meta_cache.cc: http://gerrit.cloudera.org:8080/#/c/20270/12/src/kudu/client/meta_cache.cc@552 PS12, Line 552: remote_tablet > I am not sure if non-covered case will be applicable here. In this specific Thanks for the detailed response and adding a check. As for the scenario I mentioned in prior comment, I meant the following scenario involving two clients, clientA and clientB: * clientA creates are new range and writes a few rows into the newly created tablet backing the range * clientB drops the range that was created by the clientA * clientA isn't aware yet of the clientB's actions, and have its metacache populated with the information on tablet locations as it was when it successfully wrote a few rows into the range it created * clientA tries to write a few more rows into the range I verified that the scenario above didn't trigger a SIGSEGV with this patch, at least as of PS13. As a side note, it seems we already have various test scenarios for a non-covered range situations in ClientTest.TestOpenTableClearsNonCoveringRangePartitions, but that's not exactly the scenario outlined above. FWIW, adding a check for non-null is necessary to handle situations when FastLookupTabletByKey() returns non-OK status other than Status::Incomplete(). The idea was to have a test scenario that would trigger that condition, but I guess it's not quite trivial. However, how do we know that there will not be similar infinite recursion issue if FastLookupTabletByKey() at line 544 returns Status::NotFound()? As of PS13, the callback at line 559 will not be called in such a case (since remote_tablet will be nullptr) and the code will proceed with invoking LookupTabletByKey() at line 565, so I'm a bit concerned about this. Thanks! -- To view, visit http://gerrit.cloudera.org:8080/20270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia09cf6fb1b1d10f1ad13a62b5c863bcd1e3ab26a Gerrit-Change-Number: 20270 Gerrit-PatchSet: 12 Gerrit-Owner: Ashwani Raina <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <[email protected]> Gerrit-Comment-Date: Wed, 13 Sep 2023 22:11:30 +0000 Gerrit-HasComments: Yes
