Bharath Vissapragada has posted comments on this change.

Change subject: [PREVIEW] IMPALA-5538: Use explicit catalog versions for 
deleted objects
......................................................................


Patch Set 1:

(6 comments)

Got some high level comments. I'm still trying to understand the life-cycle of 
the delta log on the Catalog server, might need some from you :)

http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

Line 300: void CatalogServer::BuildTopicUpdates(const vector<TCatalogObject>& 
catalog_objects,
IMO, it is more readable if the signature of BuildTopicUpdates takes 
TGetAllCatalogObjectsResponse as input. Something like

BuildTopicUpdates(TGetAllCatalogObjectsResponse foo) {

  for (object: foo.updated_objects) {
     // do something
  }

  for (object: foo.deleted_objects) {
     // do something
  }

}

Expecting the callers pass the type of objects being sent seems flaky. Thoughts?


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS1, Line 1406: batch_size_bytes
Looks like we didn't account this earlier for deletions. This logic makes more 
sense to me. Ofcourse it doesn't affect the end output, just an edge case when 
there are tons of deletions.


http://gerrit.cloudera.org:8080/#/c/7731/1/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 92
- I kind of like this separation of updates and deletions in different lists as 
the code is more readable and obvious as to how we process each of them 
separately. Without that we have additional branches in every place to check if 
each item is deleted, especially the logic in 
ImpalaServer#CatalogUpdateCallback().

- After going through the full patch, I'm wondering what is the advantage of 
removing topic_deletions totally. Why not just make it a required 
list<TTopicItem> topic_deletions? That adds the version no and you needn't 
change much of the logic. Does that not work for some reason?


http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java:

Line 90:   public synchronized void garbageCollect(long currentCatalogVersion) {
Although these methods are synchronized, the underlying maps aren't thread 
safe. So ideally the class as such isn't thread safe (in the sense that calling 
garbageCollect() and addRemoved() objects in two threads can result in an 
inconsistent state).

However that scenario is not possible since these parallel ops are prevented by 
catalogLock_ in CatalogServiceCatalog?


Line 154:   private String toCatalogObjectKey(TCatalogObject catalogObject) {
static?


http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 152:   // Log of deleted catalog objects.
> What exactly does this contain? When is it garbage collected? Add comment h
Yea, I'm trying to wrap my head around the lifecycle of delta log on a the 
CatalogServer side. It helps to add more details on how it works and why the 
Catalog server also needs to maintain it after your change.


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-HasComments: Yes

Reply via email to