Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20270 )

Change subject: KUDU-3461 [client] Avoid impala crash by returning error if 
invalid tablet id found
......................................................................


Patch Set 12:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20270/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20270/10//COMMIT_MSG@9
PS10, Line 9: if Java client issued a DDL op
            : on the partition where data is inserted into the partition that 
was altered
            : by java client in between.
nit: the fact that the other client is Kudu _Java_ client isn't important and 
might be even confusing, so maybe drop the notion of 'Java client', and use 
just 'another client'?


http://gerrit.cloudera.org:8080/#/c/20270/10/src/kudu/client/meta_cache.cc
File src/kudu/client/meta_cache.cc:

http://gerrit.cloudera.org:8080/#/c/20270/10/src/kudu/client/meta_cache.cc@1291
PS10, Line 1291:
               : Status MetaCache::FastLookupTabletByKey(const KuduTable* table,
               :                                         PartitionKey 
partition_key,
               :                                         MetaCache::LookupType 
lookup_type,
               :                                         
scoped_refptr<RemoteTablet>* remote_tablet) {
               :
> Tablet invalidation happens as part of lookup via slowpath when client asks
Yeah, but invalidation in terms of a stale entries or  
tablet-dropped-or-moved-somewhere-else is something different, IIUC. With 
replacing tablets for the same range, that's a different sort of 'invalidation' 
since it cannot be expressed in attributes of the same RemoteTablet object.  
Basically, that's what the metacache isn't able to handle since it has no means 
to operate beyond of MetaCache entries (that contain corresponding RemoteTablet 
objects).

OK, I guess the decision on handling this in some different (and maybe more 
generic) way should be done after doing some work on KUDU-3506, but at this 
point it's premature to think of doing anything else once the issue in the 
write path is addressed by this patch.

With that, I don't have any other follow-up requests for the topic discussed in 
this thread in the scope of this patch.  Thanks!


http://gerrit.cloudera.org:8080/#/c/20270/12/src/kudu/client/meta_cache.cc
File src/kudu/client/meta_cache.cc:

http://gerrit.cloudera.org:8080/#/c/20270/12/src/kudu/client/meta_cache.cc@552
PS12, Line 552: remote_tablet
> I am not sure if non-covered case will be applicable here. In this specific
Thanks for the detailed response and adding a check.

As for the scenario I mentioned in prior comment, I meant the following 
scenario involving two clients, clientA and clientB:
* clientA creates are new range and writes a few rows into the newly created 
tablet backing the range
* clientB drops the range that was created by the clientA
* clientA isn't aware yet of the clientB's actions, and have its metacache 
populated with the information on tablet locations as it was when it 
successfully wrote a few rows into the range it created
* clientA tries to write a few more rows into the range

I verified that the scenario above didn't trigger a SIGSEGV with this patch, at 
least as of PS13.

As a side note, it seems we already have various test scenarios for a 
non-covered range situations in 
ClientTest.TestOpenTableClearsNonCoveringRangePartitions, but that's not 
exactly the scenario outlined above.

FWIW, adding a check for non-null is necessary to handle situations when 
FastLookupTabletByKey() returns non-OK status other than Status::Incomplete().  
The idea was to have a test scenario that would trigger that condition, but I 
guess it's not quite trivial.  However, how do we know that there will not be 
similar infinite recursion issue if FastLookupTabletByKey() at line 544 returns 
Status::NotFound()?  As of PS13, the callback at line 559 will not be called in 
such a case (since remote_tablet will be nullptr) and the code will proceed 
with invoking LookupTabletByKey() at line 565, so I'm a bit concerned about 
this.

Thanks!



--
To view, visit http://gerrit.cloudera.org:8080/20270
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia09cf6fb1b1d10f1ad13a62b5c863bcd1e3ab26a
Gerrit-Change-Number: 20270
Gerrit-PatchSet: 12
Gerrit-Owner: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <[email protected]>
Gerrit-Comment-Date: Wed, 13 Sep 2023 22:11:30 +0000
Gerrit-HasComments: Yes

Reply via email to