Dimitris Tsirogiannis has posted comments on this change. Change subject: [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects ......................................................................
Patch Set 1: (28 comments) http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: Line 288: BuildTopicUpdates(catalog_objects.updated_objects, false); > Is there a requirement/assumption that the deletions must come after the up No, it doesn't matter. Line 300: void CatalogServer::BuildTopicUpdates(const vector<TCatalogObject>& catalog_objects, > IMO, it is more readable if the signature of BuildTopicUpdates takes TGetAl These two "do something" blocks are almost identical. I refactored the code a bit to make it a bit less flaky. Line 333: << catalog_object.catalog_version; > indent Done http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/catalog/catalog-server.h File be/src/catalog/catalog-server.h: Line 154: /// determine items that have been deleted, it saves the set of topic entry keys sent > Update comment Done Line 164: bool topic_deletions); > What does this param do? Updated the comment. http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/scheduling/scheduler-test-util.cc File be/src/scheduling/scheduler-test-util.cc: Line 481: item.key = host.ip; > Use __set fn like above to be consistent Done http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: Line 141: if (delta.is_delta && delta.topic_entries.empty()) { > single line Done http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 1323: TCatalogObject catalog_object; > Please move this right before DeserializeThriftMsg() so we can see where it Done Line 1343: << item.key << " is " > indentation Done Line 1367: LOG(INFO) << "Deleted item: " << item.key << " version: " << catalog_object.catalog_version; > That's a lot of logging, sure we need this at INFO level? Sorry, that was for debugging. Line 1398: if (existing_object.catalog_version <= catalog_object.catalog_version) { > Shouldn't this be < and not <=? Done PS1, Line 1406: batch_size_bytes > Looks like we didn't account this earlier for deletions. This logic makes m Done Line 1527: if (item.deleted) { > Is it possible that within a single list of topic_entries there is both an No, it's not possible to have both an addition and a deletion in the same list of topic_entries. http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: Line 485: item.value.empty() ? Statestore::TopicEntry::NULL_VALUE : item.value, > Why not just 'item.value'? The value gets copied. Good point. Done Line 527: deleted_key_size_bytes -= itr->first.size(); > += ? Done http://gerrit.cloudera.org:8080/#/c/7731/1/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: Line 92 > - I kind of like this separation of updates and deletions in different list By switching to TTopicItem for topic deletions there is now significant overlap between operations performed on both added and deleted items. We could try to extract most of the common logic in separate functions and keep the existing logic in place, but for now it seems simple enough to have one loop over one list of items and separate the logic with an if statement when needed. If we had to add many of these if statements in multiple places, I'd agree that splitting the logic would have been better. Line 97: 5: required bool is_delta; > fix member numbering Done http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java: Line 56: private Map<String, Long> removedCatalogObjectVersionsByKey_ = Maps.newHashMap(); > I might be wrong, but I think we can end up having the same object deleted Good catch, that was a bug indeed. Removed the second map and simplified the logic around handling the deleted objects. Line 70: * key 'key'. > update comment No longer applicable. Function is removed. Line 90: public synchronized void garbageCollect(long currentCatalogVersion) { > Although these methods are synchronized, the underlying maps aren't thread That's not correct. The essence of having synchronized functions is exactly that, to ensure proper synchronization even though the underlying collections are not thread safe. Two threads calling garbageCollect and addRemoved cannot both proceed, one will grab the lock on the CatalogDeltaLog object and make progress while the other will simply block. In any case, this class is simplified and it only contains one collection. Line 154: private String toCatalogObjectKey(TCatalogObject catalogObject) { > static? Done Line 163: return "FUNCTION:" + catalogObject.getFn().getName() + "(" + > the function name has db and name, let's print db.fnname similar to what we Db is already included in the function name. Line 173: default: return null; > IllegalStateException? Done http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: Line 152: // Log of deleted catalog objects. > Yea, I'm trying to wrap my head around the lifecycle of delta log on a the Done Line 152: // Log of deleted catalog objects. > What exactly does this contain? When is it garbage collected? Add comment h Done Line 349: // versions >= 'fromVersion'. The catalog objects that satisfy that condition are added > the version condition here is ">=" but the deltaLog_.retrieveObjects() uses Done Line 358: deltaLog_.deleteRemovedObject(catalogDb); > Is the deleteRemovedObject() API really needed? If there exists an object t Done http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: Line 1084: resp.result.setVersion(dataSource.getCatalogVersion()); > Shouldn't we increment the catalog version for a deleted object? Otherwise That already happens in catalog_.removeDataSource(). We're not really consistent in the way we handle version increment and assignment (i.e. it happens in multiple places today) and this is something that we will eventually have to fix. -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-HasComments: Yes
