Alexey Serbin has posted comments on this change. Change subject: KUDU-1753 [tablet_service] continue scan request on deleted tablet ......................................................................
Patch Set 5: > > > Alexey, just thinking out loud here: Tablet not found in the > > JIRA > > > corresponds to tablet tombstoned state in the tablet server, > > which > > > means any subsequent scan on that server is not expected to > yield > > > any data, right ? This fix aims to continue the scan until it > > > fetches whatever is cached so far. So, I am curious if this > > > guarantees to fix the underlying issue though, because the > > > GetTabletRef can fail anywhere during the scan and not just > under > > > HandleContinueScanRequest, right ? > > > > The original issue described in JIRA is 'Illegal state: Tablet is > > not running'. The hypothesis about the 'Tablet not found' -- is > > just a hypothesis that it's related to that. > > I discovered recently that TABLET_NOT_RUNNING error code embodies > tablets in bootstrapped state, and also the tablets failed to come > up on the node for whatever reasons. There is also another distinct > error code TABLET_NOT_FOUND for deleted tablets and this fix is > aimed to address the latter situation, right ? The description in > the JIRA seemed to indicate this situation may arise due to former > error code too. I am just curious if the fix and the related test > have taken former error code into account. Yep, it might happen that the tablet is not running not due to the fact it is deleted in the middle, but due to the other reasons. This fix and the test addresses the case of when tablet is deleted while the scan operation was in progress. That's what I saw from logs I collected in Impala cluster. And the test shows that prior to that fix that situation was not handled properly. I don't think this could address the problem related to TABLET_NOT_FOUND or other situation when TABLET_IS_NOT_RUNNING returned, and this fix is not intended to fix that. The scope of this fix and test is to deal with issues related to HandleContinueScanRequest(). The short description might be confusing, but more detailed description of explains that. I will change the short description to 'continue scan on tablet being deleted' -- To view, visit http://gerrit.cloudera.org:8080/5346 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ica48c52a81862f47a9245003915d18be411bf8b1 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Dinesh Bhat <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-HasComments: No
