Bharath Vissapragada has posted comments on this change. Change subject: [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects ......................................................................
Patch Set 1: (6 comments) Got some high level comments. I'm still trying to understand the life-cycle of the delta log on the Catalog server, might need some from you :) http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: Line 300: void CatalogServer::BuildTopicUpdates(const vector<TCatalogObject>& catalog_objects, IMO, it is more readable if the signature of BuildTopicUpdates takes TGetAllCatalogObjectsResponse as input. Something like BuildTopicUpdates(TGetAllCatalogObjectsResponse foo) { for (object: foo.updated_objects) { // do something } for (object: foo.deleted_objects) { // do something } } Expecting the callers pass the type of objects being sent seems flaky. Thoughts? http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS1, Line 1406: batch_size_bytes Looks like we didn't account this earlier for deletions. This logic makes more sense to me. Ofcourse it doesn't affect the end output, just an edge case when there are tons of deletions. http://gerrit.cloudera.org:8080/#/c/7731/1/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: Line 92 - I kind of like this separation of updates and deletions in different lists as the code is more readable and obvious as to how we process each of them separately. Without that we have additional branches in every place to check if each item is deleted, especially the logic in ImpalaServer#CatalogUpdateCallback(). - After going through the full patch, I'm wondering what is the advantage of removing topic_deletions totally. Why not just make it a required list<TTopicItem> topic_deletions? That adds the version no and you needn't change much of the logic. Does that not work for some reason? http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java: Line 90: public synchronized void garbageCollect(long currentCatalogVersion) { Although these methods are synchronized, the underlying maps aren't thread safe. So ideally the class as such isn't thread safe (in the sense that calling garbageCollect() and addRemoved() objects in two threads can result in an inconsistent state). However that scenario is not possible since these parallel ops are prevented by catalogLock_ in CatalogServiceCatalog? Line 154: private String toCatalogObjectKey(TCatalogObject catalogObject) { static? http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: Line 152: // Log of deleted catalog objects. > What exactly does this contain? When is it garbage collected? Add comment h Yea, I'm trying to wrap my head around the lifecycle of delta log on a the CatalogServer side. It helps to add more details on how it works and why the Catalog server also needs to maintain it after your change. -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-HasComments: Yes
