[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates

Vihang Karajgaonkar (Code Review) Thu, 15 Oct 2020 14:54:28 -0700

Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16549 )

Change subject: IMPALA-6671: Skip locked tables from topic updates
......................................................................

Patch Set 1:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/16549/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16549/1//COMMIT_MSG@9
PS1, Line 9: This change adds a mechanism for topic-update thread
> question, that maybe will be answered as I read the patch - if an update is
if an update on a table is missed, the topic-update thread picks up the table
in the next scheduled run. However, it is possible that for a table which is
updated very frequently, we might get unlucky and topic update thread keeps
skipping it every topic-update run. To handle this scenario we keep track of
last version of the table when topic update thread skipped it. In the next
iteration is the table version is different that the last version we skipped,
it means that by the time topci update thread came back, the table was locked
again. We count such instances and tolerate upto 2 times by default. Once this
happens 2 times, topic update thread blocks until it adds the table to prevent
starvation.

Note that this is different that naively counting the number of topic update
runs when we skip the table. It is possible that a long running table lock
spans multiple topic update runs and hence we should only count such instances
as 1 to be effective.

http://gerrit.cloudera.org:8080/#/c/16549/1//COMMIT_MSG@23
PS1, Line 23: This solution unblocks topic-updates for a while until a
> Maybe this is the best we can do at the moment, but it does seem to have si
I believe the solution which I described above is better than IMPALA-5058 since
we distinguish between a single lock spanning multiple topic updates v/s
multiple lock operations which block the topic update thread. Another
improvement over IMPALA-5058 is that this limit is configurable so that we have
some options to tune this based on what we see on production instances.

http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1294
PS1, Line 1294: true
> A freeform boolean is hard to read, best to add an inline comment , true /*
Done

http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1316
PS1, Line 1316: topicUpdateTblLockMaxWaitTimeMs
> I could be misunderstanding the code, but isn't 7200 seconds too long to wa
yeah, I kept topicUpdateTblLockMaxWaitTimeMs to 7200 seconds to keep the
existing behavior unless it is overridden. This way for users where this is not
a real problem, this patch essentially would be a no-op. But if we think there
are more users who will run into this problem than the ones who will not,
perhaps it is better to set it to lower.

I think we can definitely set this value to say to 60 seconds since we have see
for really bad cases the lock runs for several minutes. I have set a aggressive
default value of 500ms to make sure we hit this code path in the exhaustive
runs but I am open to suggestions about the default value.

http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1325
PS1, Line 1325: return;
> I hope the updateTblPendingVersion() rarely exceeds the number of attempts,
I would be really surprised if we this needs more than 10 attempts since there
are only 2 threads (one doing alter and other getting catalog delta for topic
updates) at any point of time for a given table who will attempt to update the
pending version. Let me think of a way to handle this better.

http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1328
PS1, Line 1328: "Unable to update the pending version of the
table " + tbl.getFullName(), ex);
> line too long (92 > 90)
Done

http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1352
PS1, Line 1352: if (attempt >=
Table.NUMBER_OF_ATTEMPTS_TO_UPDATE_VERSION) {
: throw new CatalogException("Exhausted max number of
attempts "
: + Table.NUMBER_OF_ATTEMPTS_TO_UPDATE_VERSION
: + " to update the pending table version");
: }
> Wouldn't we want to try something more graceful in this case. How would the
According to me, it would be highly unlikely for this to happen. Other than
bumping up the attempts dynamically in the code or putting this whole logic
under a flag which was can turned off, I cannot think of a better way
currently. If you have any ideas, I will be happy to look into them.

http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1392
PS1, Line 1392: if (tblVersion > ctx.toVersion &&
> I've seen some cases in production workloads where the topic update thread
Yeah, I think this makes a lot of sense but this is non-trivial to implement
and may be done as a separate patch. The reason is that in CatalogOpExecutor we
do the following (eg. see alterTable())

1. Lock the table.
2. Get a new catalog version (note that this not assigned to the table yet).
Say the new version is 100 and current version of table is 99.
3. Do the alter operation
4. Reload the table as required
5. assign the new catalog version
6. release table lock.

Now, the topic update thread does the following:
Consider the topic update thread is looking for updates from (99,100]
1. lock the table.
2. Check if the table needs to be in the update. if not, release lock and
return.
3. add the table to the update.
4. release lock and return.

If we modify the code such that topic-update thread's 2 is done before taking
the table lock, it is possible that a table which is currently doing operation
3-4 in CatalogOpExecutor, is skipped since it is not in the topic update
window. However, the catalog version counter is already updated. Hence the next
topic update will run from (100, catalog_version] which would miss the update
100 on table. If the DDL was a sync_ddl the thread will be stuck forever since
it will expect a update which never comes.

The solution possibly here is to change the catalogOp to do following:
1. Lock the table.
2. Do the alter operation
3. Reload..
4. increment and assign the new catalog version atomically**
5. release table lock.

Going that route will definitely cascade to other stuff.

http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2662
PS1, Line 2662: LOG.info(String
: .format(
: "%s the pending catalog version of table %s from %s
to %s%s.",
: (success ? "Bumped up" : "Attempt to bump up"),
getFullName(),
: expectedPendingVersion,
: newCatalogVersion, success ? "" : " failed"));
> LOG Debug? Isn't this too frequent.
Done

http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2677
PS1, Line 2677: getAndSet
> Why set it to -1? I haven't been able to convince myself that setting it -1
I think resetting it to -1 is not strictly needed. I just thought it would be
cleaner. But now that you mention it I think it would probably better to not
set it -1 since it doesn't serve any purpose other than confusing readers :)

For correctness reasoning, the way I think of this is that
updatePendingVersion() sets the pendingVersionNumber using the value returned
from incrementAndGetCatalogVersion() which is always monotonically increasing.
So every invocation of updatePendingVersion() on a table is guaranteed to
always advance the pendingVersionNumber. For concurrency control, since we
cannot take a table lock, it makes sure that the value before setting it was
the same at the time of setting it and retries in case someone else has bumped
it up as well.

The setCatalogVersion() here sets the higher of pendingVersionNumber or
newVersion both being monotonically increasing numbers. I don't see why the set
version will go backwards.

--
To view, visit http://gerrit.cloudera.org:8080/16549
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655
Gerrit-Change-Number: 16549
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Shant Hovsepian <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]>
Gerrit-Comment-Date: Thu, 15 Oct 2020 21:53:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates

Reply via email to