Alex Behm has posted comments on this change. Change subject: [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects ......................................................................
Patch Set 1: (22 comments) Very helpful explanation in the commit msg! Approach looks sane. Some high-level comments. http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: Line 288: BuildTopicUpdates(catalog_objects.updated_objects, false); Is there a requirement/assumption that the deletions must come after the updates? Line 333: << catalog_object.catalog_version; indent http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/catalog/catalog-server.h File be/src/catalog/catalog-server.h: Line 154: /// determine items that have been deleted, it saves the set of topic entry keys sent Update comment Line 164: bool topic_deletions); What does this param do? http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/scheduling/scheduler-test-util.cc File be/src/scheduling/scheduler-test-util.cc: Line 481: item.key = host.ip; Use __set fn like above to be consistent http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: Line 141: if (delta.is_delta && delta.topic_entries.empty()) { single line http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 1323: TCatalogObject catalog_object; Please move this right before DeserializeThriftMsg() so we can see where it's used/populated Line 1343: << item.key << " is " indentation Line 1367: LOG(INFO) << "Deleted item: " << item.key << " version: " << catalog_object.catalog_version; That's a lot of logging, sure we need this at INFO level? Line 1398: if (existing_object.catalog_version <= catalog_object.catalog_version) { Shouldn't this be < and not <=? My understanding is that we should never have a deletion entry with exactly the same version as an existing object, so might even add a DCHECK for the == case Line 1527: if (item.deleted) { Is it possible that within a single list of topic_entries there is both an addition and a deletion for the same thing? Which one wins? http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: Line 485: item.value.empty() ? Statestore::TopicEntry::NULL_VALUE : item.value, Why not just 'item.value'? The value gets copied. Line 527: deleted_key_size_bytes -= itr->first.size(); += ? http://gerrit.cloudera.org:8080/#/c/7731/1/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: Line 97: 5: required bool is_delta; fix member numbering http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java: Line 56: private Map<String, Long> removedCatalogObjectVersionsByKey_ = Maps.newHashMap(); I might be wrong, but I think we can end up having the same object deleted multiple times with different versions. Consider several add/drop of the same object between a statestore topic update. In the current scheme, I believe some objects may never get garbage collected because the object/version relationship is not 1:1. Line 70: * key 'key'. update comment Line 163: return "FUNCTION:" + catalogObject.getFn().getName() + "(" + the function name has db and name, let's print db.fnname similar to what we do for a table Line 173: default: return null; IllegalStateException? http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: Line 152: // Log of deleted catalog objects. What exactly does this contain? When is it garbage collected? Add comment here or in class comment. Line 349: // versions >= 'fromVersion'. The catalog objects that satisfy that condition are added the version condition here is ">=" but the deltaLog_.retrieveObjects() uses ">" Line 358: deltaLog_.deleteRemovedObject(catalogDb); Is the deleteRemovedObject() API really needed? If there exists an object that also has an entry in the delta log, then the version in the delta log is definitely < currentCatalogVersion so the deltaLog_.garbageCollect() in L341 should naturally purge those invalid entries. The deleteRemovedObject() API seems to complicate the delta log and would be great if we did not need it. http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: Line 1084: resp.result.setVersion(dataSource.getCatalogVersion()); Shouldn't we increment the catalog version for a deleted object? Otherwise we may have the same object with the same version be both deleted and not deleted. Seems easier to keep incrementing the version when anything changes. -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-HasComments: Yes
