Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16549 )

Change subject: IMPALA-6671: Skip locked tables from topic updates
......................................................................


Patch Set 12:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG@35
PS10, Line 35:
> Did you loop any of these tests to make sure they're not flaky?
Good idea. I will loop them overnight and report back. I also made some changes 
to the test to speed them up since they were taking very long.


http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG@46
PS10, Line 46: c
> nit: 3
Done


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@212
PS10, Line 212:  topicUpdateTblLockMaxWaitTimeM
> nit: maxSkippedUpdatesLockContention_
Done


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@214
PS10, Line 214: l long LOCK_RETRY_TIMEOUT_MS =
> nit: topicUpdateTblLockMaxWaitTimeMs_
Done


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@437
PS10, Line 437:    * held when the function returns. Returns false otherwise 
and no lock is held in
> Comment needs a minor update to explain that it gets the write lock.
Done


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@447
PS10, Line 447:    */
> Comment needs update to mention useWriteLock.
Done


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@453
PS10, Line 453: em.c
> It may be useful to know the lock type (read/write) in debugging.
Done


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@460
PS10, Line 460: want to unnecessari
> Can we directly use the specified timeout here? I think the answer is no an
yeah, this 0 timeout is to make sure that we don't also hold the version lock 
for the timeout duration. I added a comment.


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1395
PS10, Line 1395:  when topic update thread inspe
> I think the timeout should be topicUpdateTblLockMaxWaitTimeMs / maxAttempts
yeah, makes sense. Although it adds a bit more to the complexity. Added a 
comment to explain.


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1399
PS10, Line 1399:    * @return t
> nit: we can return true directly here. Then we don't need the lockAcquired
Done


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1416
PS10, Line 1416:         // table lock was successfully acquired. We can now 
release the versionLock.
               :         versionLock_.writeLock().unlock();
               :         return
> Should we update tblVersion to be hdfsTable.getCatalogVersion() in this cas
I don't think so. The tblVersion is used later down below to identify if we 
have skipped this table below before.


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2215
PS10, Line 2215:       Db db = getDb(updatedTbl.getDb().getName());
> If the existingTbl has a pendingVersion and both instances are
 > HdfsTable, we loss the pendingVersion in the updatedTbl. But looks
 > like it's ok because incrementAndGetCatalogVersion() will always
 > get a version larger than the pendingVersion. This is just a corner
 > case that only happens when a table loading is triggered due to
 > stale writeIdList. It's worth a comment here.

Yes, like you mentioned this is okay since the  incrementAndGetCatalogVersion 
will push the table out of topic update window. Also its worth noting that the 
the addTableToCatalogDelta() call in addDatabaseToCatalogDelta() is on a table 
object which is received from getAllTables(). getAllTables holds the 
versionLock's readlock. So the replaceTableIfUnchanged can either occur before 
the getAllTables() or after. In case of later we are okay since we don't lose 
the pendingVersion. In case of former, we are okay too since the table will be 
pushed out of topic-update window.

 > BTW, what if we use incrementAndGetCatalogVersion() at the end of
 > table modifications to set its catalog version? Is it a smipler
 > solution than the pendingVersion solution? We just need to avoid
 > deadlocks since incrementAndGetCatalogVersion() requires the
 > catalog versionLock but we are holding the table lock in table
 > modifications. It seems ok if we always use tryLock to acquire the
 > table lock and give up the catalog version lock when it fails.

Yes, I had looked into this possibility initially. The root-cause of the topic 
update thread getting blocked is not the fact that we get the nextTblVersion 
early, but rather the table lock is held for a longer time. Hence even if we 
update the version towards the end, we will still be holding the table lock for 
the same duration as before and it will not really help solve the problem.


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3383
PS10, Line 3383:           
Preconditions.checkState(objectDesc.type.equals(TABLE));
> We can remove this TODO now.
Done


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@299
PS10, Line 299: Hdfs
> nit: HdfsTable?
Done


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2709
PS10, Line 2709:
> nit: the
Done


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/Table.java@28
PS10, Line 28: import java.util.concurrent.locks.ReentrantReadWriteLock;
> nit: blank line
Done


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/Table.java@29
PS10, Line 29: import java.util.concurrent.locks.ReentrantReadW
> nit: unused import
Done


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/Table.java@84
PS10, Line 84:   // or when topic-update thread serializes the table in the 
topic update)
> Could you briefly describe when the read lock can be safely taken and when
Done


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/Table.java@85
PS10, Line 85: refr
> nit: it is worth mentioning that this means using a fair ordering policy.
Done


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/Table.java@142
PS10, Line 142:
> This is no longer needed
Done


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@660
PS10, Line 660:     addCatalogServiceIdentifiers(tbl, 
catalog_.getCatalogServiceId(), newCatalogVersion);
> Can you remind me why this is necessary on this code path and not other cod
Thanks for this comment. I looked at this closely and I think we don't really 
need this change in this method. The reasoning being:

We use pendingVersion in tbl.setCatalogVersion(version) if it is higher than 
then provided version. The value we set to pendingVersion to here will never be 
used because it will either be equal to the newCatalogVersion or would be 
bumped up to something higher by the topic update thread.

I removed this change from the patch.


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@5088
PS10, Line 5088:
> nit: of
Done


http://gerrit.cloudera.org:8080/#/c/16549/10/tests/metadata/test_topic_update_frequency.py
File tests/metadata/test_topic_update_frequency.py:

http://gerrit.cloudera.org:8080/#/c/16549/10/tests/metadata/test_topic_update_frequency.py@143
PS10, Line 143:
> nit: the order of functions in this class is a bit weird, maybe this should
yeah, makes sense. Changed the ordering.



--
To view, visit http://gerrit.cloudera.org:8080/16549
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655
Gerrit-Change-Number: 16549
Gerrit-PatchSet: 12
Gerrit-Owner: Vihang Karajgaonkar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Shant Hovsepian <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]>
Gerrit-Comment-Date: Wed, 16 Dec 2020 09:38:43 +0000
Gerrit-HasComments: Yes

Reply via email to