Alex Behm has posted comments on this change.

Change subject: [PREVIEW] IMPALA-5538: Use explicit catalog versions for 
deleted objects
......................................................................


Patch Set 1:

(22 comments)

Very helpful explanation in the commit msg! Approach looks sane. Some 
high-level comments.

http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

Line 288:         BuildTopicUpdates(catalog_objects.updated_objects, false);
Is there a requirement/assumption that the deletions must come after the 
updates?


Line 333:             << catalog_object.catalog_version;
indent


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

Line 154:   /// determine items that have been deleted, it saves the set of 
topic entry keys sent
Update comment


Line 164:       bool topic_deletions);
What does this param do?


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/scheduling/scheduler-test-util.cc
File be/src/scheduling/scheduler-test-util.cc:

Line 481:   item.key = host.ip;
Use __set fn like above to be consistent


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 141:   if (delta.is_delta && delta.topic_entries.empty()) {
single line


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1323:       TCatalogObject catalog_object;
Please move this right before DeserializeThriftMsg() so we can see where it's 
used/populated


Line 1343:                      << item.key << " is "
indentation


Line 1367:         LOG(INFO) << "Deleted item: " << item.key << " version: " << 
catalog_object.catalog_version;
That's a lot of logging, sure we need this at INFO level?


Line 1398:             if (existing_object.catalog_version <= 
catalog_object.catalog_version) {
Shouldn't this be < and not <=?

My understanding is that we should never have a deletion entry with exactly the 
same version as an existing object, so might even add a DCHECK for the == case


Line 1527:       if (item.deleted) {
Is it possible that within a single list of topic_entries there is both an 
addition and a deletion for the same thing? Which one wins?


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

Line 485:             item.value.empty() ? Statestore::TopicEntry::NULL_VALUE : 
item.value,
Why not just 'item.value'? The value gets copied.


Line 527:           deleted_key_size_bytes -= itr->first.size();
+= ?


http://gerrit.cloudera.org:8080/#/c/7731/1/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 97:   5: required bool is_delta;
fix member numbering


http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java:

Line 56:   private Map<String, Long> removedCatalogObjectVersionsByKey_ = 
Maps.newHashMap();
I might be wrong, but I think we can end up having the same object deleted 
multiple times with different versions. Consider several add/drop of the same 
object between a statestore topic update.

In the current scheme, I believe some objects may never get garbage collected 
because the object/version relationship is not 1:1.


Line 70:    * key 'key'.
update comment


Line 163:         return "FUNCTION:" + catalogObject.getFn().getName() + "(" +
the function name has db and name, let's print db.fnname similar to what we do 
for a table


Line 173:       default: return null;
IllegalStateException?


http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 152:   // Log of deleted catalog objects.
What exactly does this contain? When is it garbage collected? Add comment here 
or in class comment.


Line 349:   // versions >= 'fromVersion'. The catalog objects that satisfy that 
condition are added
the version condition here is ">=" but the deltaLog_.retrieveObjects() uses ">"


Line 358:       deltaLog_.deleteRemovedObject(catalogDb);
Is the deleteRemovedObject() API really needed? If there exists an object that 
also has an entry in the delta log, then the version in the delta log is 
definitely < currentCatalogVersion so the deltaLog_.garbageCollect() in L341 
should naturally purge those invalid entries.

The deleteRemovedObject() API seems to complicate the delta log and would be 
great if we did not need it.


http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 1084:     resp.result.setVersion(dataSource.getCatalogVersion());
Shouldn't we increment the catalog version for a deleted object? Otherwise we 
may have the same object with the same version be both deleted and not deleted. 
Seems easier to keep incrementing the version when anything changes.


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-HasComments: Yes

Reply via email to