Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8545 )
Change subject: [PREVIEW] IMPALA-5058: Improve concurrency of DDL/DML operations ...................................................................... Patch Set 2: (22 comments) The renames make the code a lot easier to read, thank you! http://gerrit.cloudera.org:8080/#/c/8545/2/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/8545/2/be/src/service/client-request-state.cc@187 PS2, Line 187: reset_req.reset_metadata_params.__set_sync_ddl( Why do we need to set sync_ddl twice? http://gerrit.cloudera.org:8080/#/c/8545/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8545/2/be/src/service/impala-server.cc@1430 PS2, Line 1430: LOG(INFO) << "Update applied with version: " << new_catalog_version Catalog update applied... http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionManager.java File fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionManager.java: http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionManager.java@29 PS2, Line 29: public class CatalogObjectVersionManager { Name is not very telling. Some alternatives: CatalogObjectVersionQueue CatalogObjectVersions ActiveCatalogObjectVersions CurrentCatalogObjectVersions http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionManager.java@37 PS2, Line 37: public void updateCatalogObjectVersions(long oldVersion, long newVersion) { Do you think it's worth checking the return codes of the calls to objectVersions_ to make sure we are in a sane state? For example, I assume that in this call we expect 'oldVersion' to exist. Is it ok to add newVersion if the old version does not exist? Do we expect add/remove to actually add/remove something? Does it matter? http://gerrit.cloudera.org:8080/#/c/8545/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/8545/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@606 PS1, Line 606: * Get a snapshot view of all the functions in a database. > There is one writer (thread doing getCatalogDelta) and multiple readers (th Ah right, got it. http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@86 PS2, Line 86: * and processing of DDL requests. For each DDL request, the CatalogServiceCatalog This description of the versioning scheme is not accurate anymore. I think the versioning, topic update, and skipping mechanics deserve to be described in more detail here. http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@96 PS2, Line 96: * - Delete log. The delete log records catalog objects that have been removed from the Let's give an intuition for the purpose of this log, e.g.: Since deleted objects are removed from the cache, the cache itself is not useful for tracking deletions. This log is used for populating the list of deleted objects during a topic update. http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@97 PS2, Line 97: * catalog. An entry is added to this log every time an object is removed (e.g. An entry with a new version is added ... http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@99 PS2, Line 99: * be perfomed atomically. An entry is removed from this log from the topic update removed from this log by the topic update thread (also typo: performed) http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@102 PS2, Line 102: * been processed by the topic update thread. The topic update thread is the only one that have been included in a topic update. (The current sentence makes it sound like there is one single topic update thread, but really it's probably a different thread every time). http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@105 PS2, Line 105: * Information per catalog object includes the number of times a catalog object has Each entry includes the number of times a catalog object has been omitted in a topic update ... http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@111 PS2, Line 111: * version that the issueing impalad must wait for in order to ensure that the effects typo: issuing http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@112 PS2, Line 112: * of this operation have been broadcast to all the coordinators. The time-based cleanup Why did we chose a time-based cleanup then? I think the time-based cleanup needs to be motivated if this serious-sounding issue is a consequence of that choice. Would also be good to mention that the value of TOPIC_UPDATE_LOG_GC_FREQUENC was chosen to make the hang/timeout case unlikely. http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@129 PS2, Line 129: * that should be employed if both the catalog lock and table locks need to be held at catalog lock -> version lock? http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@183 PS2, Line 183: // Version of the last topic update sent to the statestore. sent -> returned (to make the control flow clearer) http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@192 PS2, Line 192: private final static int TOPIC_UPDATE_LOG_GC_FREQUENCY = 50; Mention how this affects the lifetime of an individual entry. My understanding is that an entry can be live for at most (2*TOPIC_UPDATE_LOG_GC_FREQUENCY)-1 topic updates assuming it is not being updated http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@195 PS2, Line 195: p Let's create a TopicUpdateLog class that encapsulates these counters and the GC logic. http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@258 PS2, Line 258: private final Object topicUpdateEventNotifier_ = new Object(); Maybe we should use the topicUpdateLog_ for this purpose once we have a separate TopicUpdateLog class. Seems a little more direct to wait/notify on the log object since changes in the log that trigger a notify. Alternatively, we can add wait/notify methods to the TopicUpdateLog itself to simplify the code in this file. http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1228 PS2, Line 1228: private void removeDbHelper(Db db) { updateDeleteLog(Db db)? http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1938 PS2, Line 1938: topicUpdateLog_.get(CatalogDeltaLog.toCatalogObjectKey(tCatalogObject)); Move toCatalogObjectKey() to Catalog, referring to the delta log here is a little confusing http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1976 PS2, Line 1976: if (topicUpdateLog_.remove(entry.getKey(), entry.getValue())) { Are uou sure it's ok to remove() while iterating over the map? Might be clearer to use an iterator. http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java: http://gerrit.cloudera.org:8080/#/c/8545/2/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java@179 PS2, Line 179: // functions and priviles) and then the top-level objects (databases and roles). typo: privlieges -- To view, visit http://gerrit.cloudera.org:8080/8545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3032437f83d39bcc8cff14d897c7c106a4ab62d3 Gerrit-Change-Number: 8545 Gerrit-PatchSet: 2 Gerrit-Owner: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Comment-Date: Tue, 28 Nov 2017 23:39:20 +0000 Gerrit-HasComments: Yes
