Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 )
Change subject: IMPALA-6671: Skip locked tables from topic updates ...................................................................... Patch Set 1: (10 comments) http://gerrit.cloudera.org:8080/#/c/16549/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16549/1//COMMIT_MSG@9 PS1, Line 9: This change adds a mechanism for topic-update thread > question, that maybe will be answered as I read the patch - if an update is if an update on a table is missed, the topic-update thread picks up the table in the next scheduled run. However, it is possible that for a table which is updated very frequently, we might get unlucky and topic update thread keeps skipping it every topic-update run. To handle this scenario we keep track of last version of the table when topic update thread skipped it. In the next iteration is the table version is different that the last version we skipped, it means that by the time topci update thread came back, the table was locked again. We count such instances and tolerate upto 2 times by default. Once this happens 2 times, topic update thread blocks until it adds the table to prevent starvation. Note that this is different that naively counting the number of topic update runs when we skip the table. It is possible that a long running table lock spans multiple topic update runs and hence we should only count such instances as 1 to be effective. http://gerrit.cloudera.org:8080/#/c/16549/1//COMMIT_MSG@23 PS1, Line 23: This solution unblocks topic-updates for a while until a > Maybe this is the best we can do at the moment, but it does seem to have si I believe the solution which I described above is better than IMPALA-5058 since we distinguish between a single lock spanning multiple topic updates v/s multiple lock operations which block the topic update thread. Another improvement over IMPALA-5058 is that this limit is configurable so that we have some options to tune this based on what we see on production instances. http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1294 PS1, Line 1294: true > A freeform boolean is hard to read, best to add an inline comment , true /* Done http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1316 PS1, Line 1316: topicUpdateTblLockMaxWaitTimeMs > I could be misunderstanding the code, but isn't 7200 seconds too long to wa yeah, I kept topicUpdateTblLockMaxWaitTimeMs to 7200 seconds to keep the existing behavior unless it is overridden. This way for users where this is not a real problem, this patch essentially would be a no-op. But if we think there are more users who will run into this problem than the ones who will not, perhaps it is better to set it to lower. I think we can definitely set this value to say to 60 seconds since we have see for really bad cases the lock runs for several minutes. I have set a aggressive default value of 500ms to make sure we hit this code path in the exhaustive runs but I am open to suggestions about the default value. http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1325 PS1, Line 1325: return; > I hope the updateTblPendingVersion() rarely exceeds the number of attempts, I would be really surprised if we this needs more than 10 attempts since there are only 2 threads (one doing alter and other getting catalog delta for topic updates) at any point of time for a given table who will attempt to update the pending version. Let me think of a way to handle this better. http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1328 PS1, Line 1328: "Unable to update the pending version of the table " + tbl.getFullName(), ex); > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1352 PS1, Line 1352: if (attempt >= Table.NUMBER_OF_ATTEMPTS_TO_UPDATE_VERSION) { : throw new CatalogException("Exhausted max number of attempts " : + Table.NUMBER_OF_ATTEMPTS_TO_UPDATE_VERSION : + " to update the pending table version"); : } > Wouldn't we want to try something more graceful in this case. How would the According to me, it would be highly unlikely for this to happen. Other than bumping up the attempts dynamically in the code or putting this whole logic under a flag which was can turned off, I cannot think of a better way currently. If you have any ideas, I will be happy to look into them. http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1392 PS1, Line 1392: if (tblVersion > ctx.toVersion && > I've seen some cases in production workloads where the topic update thread Yeah, I think this makes a lot of sense but this is non-trivial to implement and may be done as a separate patch. The reason is that in CatalogOpExecutor we do the following (eg. see alterTable()) 1. Lock the table. 2. Get a new catalog version (note that this not assigned to the table yet). Say the new version is 100 and current version of table is 99. 3. Do the alter operation 4. Reload the table as required 5. assign the new catalog version 6. release table lock. Now, the topic update thread does the following: Consider the topic update thread is looking for updates from (99,100] 1. lock the table. 2. Check if the table needs to be in the update. if not, release lock and return. 3. add the table to the update. 4. release lock and return. If we modify the code such that topic-update thread's 2 is done before taking the table lock, it is possible that a table which is currently doing operation 3-4 in CatalogOpExecutor, is skipped since it is not in the topic update window. However, the catalog version counter is already updated. Hence the next topic update will run from (100, catalog_version] which would miss the update 100 on table. If the DDL was a sync_ddl the thread will be stuck forever since it will expect a update which never comes. The solution possibly here is to change the catalogOp to do following: 1. Lock the table. 2. Do the alter operation 3. Reload.. 4. increment and assign the new catalog version atomically** 5. release table lock. Going that route will definitely cascade to other stuff. http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2662 PS1, Line 2662: LOG.info(String : .format( : "%s the pending catalog version of table %s from %s to %s%s.", : (success ? "Bumped up" : "Attempt to bump up"), getFullName(), : expectedPendingVersion, : newCatalogVersion, success ? "" : " failed")); > LOG Debug? Isn't this too frequent. Done http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2677 PS1, Line 2677: getAndSet > Why set it to -1? I haven't been able to convince myself that setting it -1 I think resetting it to -1 is not strictly needed. I just thought it would be cleaner. But now that you mention it I think it would probably better to not set it -1 since it doesn't serve any purpose other than confusing readers :) For correctness reasoning, the way I think of this is that updatePendingVersion() sets the pendingVersionNumber using the value returned from incrementAndGetCatalogVersion() which is always monotonically increasing. So every invocation of updatePendingVersion() on a table is guaranteed to always advance the pendingVersionNumber. For concurrency control, since we cannot take a table lock, it makes sure that the value before setting it was the same at the time of setting it and retries in case someone else has bumped it up as well. The setCatalogVersion() here sets the higher of pendingVersionNumber or newVersion both being monotonically increasing numbers. I don't see why the set version will go backwards. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Shant Hovsepian <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]> Gerrit-Comment-Date: Thu, 15 Oct 2020 21:53:29 +0000 Gerrit-HasComments: Yes
