Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 )
Change subject: IMPALA-6671: Skip locked tables from topic updates ...................................................................... Patch Set 12: (23 comments) http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG@35 PS10, Line 35: > Did you loop any of these tests to make sure they're not flaky? Good idea. I will loop them overnight and report back. I also made some changes to the test to speed them up since they were taking very long. http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG@46 PS10, Line 46: c > nit: 3 Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@212 PS10, Line 212: topicUpdateTblLockMaxWaitTimeM > nit: maxSkippedUpdatesLockContention_ Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@214 PS10, Line 214: l long LOCK_RETRY_TIMEOUT_MS = > nit: topicUpdateTblLockMaxWaitTimeMs_ Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@437 PS10, Line 437: * held when the function returns. Returns false otherwise and no lock is held in > Comment needs a minor update to explain that it gets the write lock. Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@447 PS10, Line 447: */ > Comment needs update to mention useWriteLock. Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@453 PS10, Line 453: em.c > It may be useful to know the lock type (read/write) in debugging. Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@460 PS10, Line 460: want to unnecessari > Can we directly use the specified timeout here? I think the answer is no an yeah, this 0 timeout is to make sure that we don't also hold the version lock for the timeout duration. I added a comment. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1395 PS10, Line 1395: when topic update thread inspe > I think the timeout should be topicUpdateTblLockMaxWaitTimeMs / maxAttempts yeah, makes sense. Although it adds a bit more to the complexity. Added a comment to explain. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1399 PS10, Line 1399: * @return t > nit: we can return true directly here. Then we don't need the lockAcquired Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1416 PS10, Line 1416: // table lock was successfully acquired. We can now release the versionLock. : versionLock_.writeLock().unlock(); : return > Should we update tblVersion to be hdfsTable.getCatalogVersion() in this cas I don't think so. The tblVersion is used later down below to identify if we have skipped this table below before. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2215 PS10, Line 2215: Db db = getDb(updatedTbl.getDb().getName()); > If the existingTbl has a pendingVersion and both instances are > HdfsTable, we loss the pendingVersion in the updatedTbl. But looks > like it's ok because incrementAndGetCatalogVersion() will always > get a version larger than the pendingVersion. This is just a corner > case that only happens when a table loading is triggered due to > stale writeIdList. It's worth a comment here. Yes, like you mentioned this is okay since the incrementAndGetCatalogVersion will push the table out of topic update window. Also its worth noting that the the addTableToCatalogDelta() call in addDatabaseToCatalogDelta() is on a table object which is received from getAllTables(). getAllTables holds the versionLock's readlock. So the replaceTableIfUnchanged can either occur before the getAllTables() or after. In case of later we are okay since we don't lose the pendingVersion. In case of former, we are okay too since the table will be pushed out of topic-update window. > BTW, what if we use incrementAndGetCatalogVersion() at the end of > table modifications to set its catalog version? Is it a smipler > solution than the pendingVersion solution? We just need to avoid > deadlocks since incrementAndGetCatalogVersion() requires the > catalog versionLock but we are holding the table lock in table > modifications. It seems ok if we always use tryLock to acquire the > table lock and give up the catalog version lock when it fails. Yes, I had looked into this possibility initially. The root-cause of the topic update thread getting blocked is not the fact that we get the nextTblVersion early, but rather the table lock is held for a longer time. Hence even if we update the version towards the end, we will still be holding the table lock for the same duration as before and it will not really help solve the problem. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3383 PS10, Line 3383: Preconditions.checkState(objectDesc.type.equals(TABLE)); > We can remove this TODO now. Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@299 PS10, Line 299: Hdfs > nit: HdfsTable? Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2709 PS10, Line 2709: > nit: the Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/Table.java@28 PS10, Line 28: import java.util.concurrent.locks.ReentrantReadWriteLock; > nit: blank line Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/Table.java@29 PS10, Line 29: import java.util.concurrent.locks.ReentrantReadW > nit: unused import Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/Table.java@84 PS10, Line 84: // or when topic-update thread serializes the table in the topic update) > Could you briefly describe when the read lock can be safely taken and when Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/Table.java@85 PS10, Line 85: refr > nit: it is worth mentioning that this means using a fair ordering policy. Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/Table.java@142 PS10, Line 142: > This is no longer needed Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@660 PS10, Line 660: addCatalogServiceIdentifiers(tbl, catalog_.getCatalogServiceId(), newCatalogVersion); > Can you remind me why this is necessary on this code path and not other cod Thanks for this comment. I looked at this closely and I think we don't really need this change in this method. The reasoning being: We use pendingVersion in tbl.setCatalogVersion(version) if it is higher than then provided version. The value we set to pendingVersion to here will never be used because it will either be equal to the newCatalogVersion or would be bumped up to something higher by the topic update thread. I removed this change from the patch. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@5088 PS10, Line 5088: > nit: of Done http://gerrit.cloudera.org:8080/#/c/16549/10/tests/metadata/test_topic_update_frequency.py File tests/metadata/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16549/10/tests/metadata/test_topic_update_frequency.py@143 PS10, Line 143: > nit: the order of functions in this class is a bit weird, maybe this should yeah, makes sense. Changed the ordering. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 12 Gerrit-Owner: Vihang Karajgaonkar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Shant Hovsepian <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]> Gerrit-Comment-Date: Wed, 16 Dec 2020 09:38:43 +0000 Gerrit-HasComments: Yes
