Dan Hecht has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects ......................................................................
Patch Set 5: (16 comments) http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: PS5, Line 231: // If this is not a delta update, request an update from version 0 from the local : // catalog. There is an optimization that checks if pending_topic_updates_ was just : // reloaded from version 0. If so, then skip this step and use that data. I found it impossible to understand this comment without reading through most of the code in this file, which kinda defeats the point of having a comment :) i think it would be clearer if it said something like: If not generating a delta update and 'pending_topic_updates_' doesn't already contain the full catalog (beginning with version 0), then force GatherCatalogUpdatesThread() to reload the full catalog. PS5, Line 234: delta.from_version == 0 && delta.to_version == 0 from the comment above, presumably this condition means "this is not a delta update"? Why is that the condition and not 'delta.is_delta'? PS5, Line 310: if (catalog_object.catalog_version <= last_sent_catalog_version_) continue; given that last_sent_catalog_version_ was passed to GetAllCatalogObjects(), why is this needed? http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS5, Line 1392: can we delete this function now? http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: PS5, Line 177: entry_it->second.SetDeleted(true); : entry_it->second.SetVersion(last_version_); now that some subscribes require deleted entries to have a value, how does this all work? are we implicitly requiring that only non-transient subscribers depend on values for deletions? If that's the case, that seems fragile (and at the least should be documented). PS5, Line 490: would be easier to read if you line break it there to keep the last arg all on one line. PS5, Line 538: if (topic_entry.value() != Statestore::TopicEntry::NULL_VALUE) why do we need that guard? why can't we set topic_item.value even in that case? assuming we don't need it, it seems like we can remove the NULL_VALUE constant altogether. PS5, Line 546: int64_t topic_size = topic.total_key_size_bytes() - deleted_key_size_bytes : + topic.total_value_size_bytes(); that math doesn't look correct anymore (deleted entries can have non-zero value size). it might be more straight forward to just add up the topic size as we go rather than try to subtract out the deleted portion. http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.h File be/src/statestore/statestore.h: PS5, Line 180: may may or will? http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: PS5, Line 25: GetAllCatalogObjects see below. I think we should rename this. PS5, Line 25: Contains all known Catalog objects : // (databases, tables/views, and functions) from the CatalogService's cache. I think that should be updated to explain that this is relative to a particular catalog version and that it now expresses it as an update and delete. PS5, Line 38: empty list if no objects detected in the Catalog > Yes, there is some context that is missing here, hence it's not easy to und Thanks. As we talked about in person, I think we should rename GetAllCatalogObjects to GetCatalogDelta() or something similar to make it clear that this gets the delta. There's no hint at that until you read the header file comment. http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: Line 84: // is not included in the topic key (e.g. catalog version of deleted catalog objects). > Let me try to explain. Statestore only uses that flag to avoid sending dele Please see my comments elsewhere in this file for suggestions on clarifying the interface. It still seems confusing and weird that 'value' can matter for delete when delete, but I guess the comments I'm suggested elsewhere help clarify at least what 'delete' actually means. PS5, Line 87: If true, this item was deleted How about: If true, this item was deleted. When false, this TTopicItem need not be included in non-delta TTopicDelta's (since the complete set of TTopicItems will be included in that case). or something like that. PS5, Line 97: List of changes to topic entries, including deletions. That's only true when is_delta==true. So, how about we say: When is_delta=true, a list of changes to topic entries, including deletions, within [from_version, to_version]. When is_delta=false, this is the list of all non-delete topic entries for [0, to_version], which can be used to reconstruct the topic from scratch. PS5, Line 100: applied to in-memory state, that's kinda dependent on the subscriber's implementation, right? would it be clearer to say: relative to the topic at versions [0, from_version). or something like that? -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-HasComments: Yes
