Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
......................................................................


Patch Set 5:

(27 comments)

http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

PS5, Line 231:   // If this is not a delta update, request an update from 
version 0 from the local
             :   // catalog. There is an optimization that checks if 
pending_topic_updates_ was just
             :   // reloaded from version 0. If so, then skip this step and use 
that data.
> I found it impossible to understand this comment without reading through mo
Done


PS5, Line 234: delta.from_version == 0 && delta.to_version == 0
> from the comment above, presumably this condition means "this is not a delt
Yeah, you're right. This condition is really confusing. from_version = 0 
implies delta = false and can be used here instead. The part that is not clear 
to me is the to_version check but I couldn't find any case where it would make 
sense to check the to_version in order to decide whether to send the full 
catalog update or not. I will remove it and see if it breaks something.


PS5, Line 310: if (catalog_object.catalog_version <= 
last_sent_catalog_version_) continue;
> given that last_sent_catalog_version_ was passed to GetAllCatalogObjects(),
Right, it's not needed. Replaced it with a DCHECK.


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

PS5, Line 150: added
> nit: May be 'added/updated/removed' to be consistent with other places?
Done


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS5, Line 1392: 
> can we delete this function now?
Yes. Done


PS5, Line 1366: ;
> Could you add 'len' too, given this is trace logging anyway. (might be help
Done


PS5, Line 1399: // Nothing to do here.
> Remove? Kinda seems obvious
Done


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

PS5, Line 177:     entry_it->second.SetDeleted(true);
             :     entry_it->second.SetVersion(last_version_);
> now that some subscribes require deleted entries to have a value, how does 
The behavior is topic specific and currently only subscribers of catalog-update 
topic rely on the value for deleted entries. Not sure how to make it less 
fragile in the sense that "value_" is opaque to the statestore and is 
interpreted only by subscribers.


PS5, Line 490:  
> would be easier to read if you line break it there to keep the last arg all
Done


PS5, Line 538: if (topic_entry.value() != Statestore::TopicEntry::NULL_VALUE)
> why do we need that guard? why can't we set topic_item.value even in that c
Yeap, removed it.


PS5, Line 546:         int64_t topic_size = topic.total_key_size_bytes() - 
deleted_key_size_bytes
             :             + topic.total_value_size_bytes();
> that math doesn't look correct anymore (deleted entries can have non-zero v
Done


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

PS5, Line 180: may
> may or will?
"will". Changed it.


http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

PS5, Line 25: GetAllCatalogObjects
> see below. I think we should rename this.
Done


PS5, Line 25: Contains all known Catalog objects
            : // (databases, tables/views, and functions) from the 
CatalogService's cache.
> I think that should be updated to explain that this is relative to a partic
Done


PS5, Line 38: empty list if no objects detected in the Catalog
> Thanks. As we talked about in person, I think we should rename GetAllCatalo
Done


http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

PS5, Line 87: If true, this item was deleted
> How about:
Done


PS5, Line 87: If true, this item was deleted
> Actually, the parenthesis I suggested doesn't clarify.  Maybe that part sho
Done


PS5, Line 97: List of changes to topic entries, including deletions.
> That's only true when is_delta==true.  So, how about we say:
Done


PS5, Line 100: applied to in-memory state,
> that's kinda dependent on the subscriber's implementation, right? would it 
Done


http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java:

PS5, Line 53: identiy
> nit: typo
Done


PS5, Line 123: if (first.getType() != second.getType()) return false;
> This seems to already be handled by the next line (by generating a differen
Good point. Removed


http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

PS5, Line 318: TGetAllCatalogObjectsResponse resp = new 
TGetAllCatalogObjectsResponse();
             :     resp.setUpdated_objects(new ArrayList<TCatalogObject>());
             :     resp.setDeleted_objects(new ArrayList<TCatalogObject>());
             :     resp.setMax_catalog_version(Catalog.INITIAL_CATALOG_VERSION);
> Move all of this into getCatalogChanges() and make it return resp?
Done


Line 350:    * to 'resp'.
> May be worth adding that this method expects that the global read lock is h
Done


Line 369:           if (tbl.getCatalogVersion() > fromVersion) {
> Given we hold the global readLock() already (which means that the version c
It's racy if you do that. A concurrent table modification may have gotten a new 
version but not assigned it yet to the table. If you read tbl version without 
holding the lock you don't know if the table operation has finished or not 
hence, you don't know if it ok to reject this object or not.


PS5, Line 374: trace
> This seems like a serious enough error to me, basically blocking updates on
Done


Line 970:     }
> I see you made changes in the CatalogOpEx.drop*() to add the corresponding 
I thought about it but there are two reasons not to do it: a) we need the 
TCatalogObject which is not created in the remove* functions and b) it's not 
only the remove* functions that record the deleted objects (see 
execResetMetadata).


http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java:

Line 322:    *  Note that drop operations that come from statestore heartbeats 
always have a
> Update?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-HasComments: Yes

Reply via email to