Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects ......................................................................
Patch Set 5: (27 comments) http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: PS5, Line 231: // If this is not a delta update, request an update from version 0 from the local : // catalog. There is an optimization that checks if pending_topic_updates_ was just : // reloaded from version 0. If so, then skip this step and use that data. > I found it impossible to understand this comment without reading through mo Done PS5, Line 234: delta.from_version == 0 && delta.to_version == 0 > from the comment above, presumably this condition means "this is not a delt Yeah, you're right. This condition is really confusing. from_version = 0 implies delta = false and can be used here instead. The part that is not clear to me is the to_version check but I couldn't find any case where it would make sense to check the to_version in order to decide whether to send the full catalog update or not. I will remove it and see if it breaks something. PS5, Line 310: if (catalog_object.catalog_version <= last_sent_catalog_version_) continue; > given that last_sent_catalog_version_ was passed to GetAllCatalogObjects(), Right, it's not needed. Replaced it with a DCHECK. http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/catalog/catalog-server.h File be/src/catalog/catalog-server.h: PS5, Line 150: added > nit: May be 'added/updated/removed' to be consistent with other places? Done http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS5, Line 1392: > can we delete this function now? Yes. Done PS5, Line 1366: ; > Could you add 'len' too, given this is trace logging anyway. (might be help Done PS5, Line 1399: // Nothing to do here. > Remove? Kinda seems obvious Done http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: PS5, Line 177: entry_it->second.SetDeleted(true); : entry_it->second.SetVersion(last_version_); > now that some subscribes require deleted entries to have a value, how does The behavior is topic specific and currently only subscribers of catalog-update topic rely on the value for deleted entries. Not sure how to make it less fragile in the sense that "value_" is opaque to the statestore and is interpreted only by subscribers. PS5, Line 490: > would be easier to read if you line break it there to keep the last arg all Done PS5, Line 538: if (topic_entry.value() != Statestore::TopicEntry::NULL_VALUE) > why do we need that guard? why can't we set topic_item.value even in that c Yeap, removed it. PS5, Line 546: int64_t topic_size = topic.total_key_size_bytes() - deleted_key_size_bytes : + topic.total_value_size_bytes(); > that math doesn't look correct anymore (deleted entries can have non-zero v Done http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.h File be/src/statestore/statestore.h: PS5, Line 180: may > may or will? "will". Changed it. http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: PS5, Line 25: GetAllCatalogObjects > see below. I think we should rename this. Done PS5, Line 25: Contains all known Catalog objects : // (databases, tables/views, and functions) from the CatalogService's cache. > I think that should be updated to explain that this is relative to a partic Done PS5, Line 38: empty list if no objects detected in the Catalog > Thanks. As we talked about in person, I think we should rename GetAllCatalo Done http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: PS5, Line 87: If true, this item was deleted > How about: Done PS5, Line 87: If true, this item was deleted > Actually, the parenthesis I suggested doesn't clarify. Maybe that part sho Done PS5, Line 97: List of changes to topic entries, including deletions. > That's only true when is_delta==true. So, how about we say: Done PS5, Line 100: applied to in-memory state, > that's kinda dependent on the subscriber's implementation, right? would it Done http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java: PS5, Line 53: identiy > nit: typo Done PS5, Line 123: if (first.getType() != second.getType()) return false; > This seems to already be handled by the next line (by generating a differen Good point. Removed http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: PS5, Line 318: TGetAllCatalogObjectsResponse resp = new TGetAllCatalogObjectsResponse(); : resp.setUpdated_objects(new ArrayList<TCatalogObject>()); : resp.setDeleted_objects(new ArrayList<TCatalogObject>()); : resp.setMax_catalog_version(Catalog.INITIAL_CATALOG_VERSION); > Move all of this into getCatalogChanges() and make it return resp? Done Line 350: * to 'resp'. > May be worth adding that this method expects that the global read lock is h Done Line 369: if (tbl.getCatalogVersion() > fromVersion) { > Given we hold the global readLock() already (which means that the version c It's racy if you do that. A concurrent table modification may have gotten a new version but not assigned it yet to the table. If you read tbl version without holding the lock you don't know if the table operation has finished or not hence, you don't know if it ok to reject this object or not. PS5, Line 374: trace > This seems like a serious enough error to me, basically blocking updates on Done Line 970: } > I see you made changes in the CatalogOpEx.drop*() to add the corresponding I thought about it but there are two reasons not to do it: a) we need the TCatalogObject which is not created in the remove* functions and b) it's not only the remove* functions that record the deleted objects (see execResetMetadata). http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java: Line 322: * Note that drop operations that come from statestore heartbeats always have a > Update? Done -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-HasComments: Yes
