Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16549 )

Change subject: IMPALA-6671: Skip locked tables from topic updates
......................................................................


Patch Set 1:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/16549/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16549/1//COMMIT_MSG@9
PS1, Line 9: This change adds a mechanism for topic-update thread
> question, that maybe will be answered as I read the patch - if an update is
if an update on a table is missed, the topic-update thread picks up the table 
in the next scheduled run. However, it is possible that for a table which is 
updated very frequently, we might get unlucky and topic update thread keeps 
skipping it every topic-update run. To handle this scenario we keep track of 
last version of the table when topic update thread skipped it. In the next 
iteration is the table version is different that the last version we skipped, 
it means that by the time topci update thread came back, the table was locked 
again. We count such instances and tolerate upto 2 times by default. Once this 
happens 2 times, topic update thread blocks until it adds the table to prevent 
starvation.

Note that this is different that naively counting the number of topic update 
runs when we skip the table. It is possible that a long running table lock 
spans multiple topic update runs and hence we should only count such instances 
as 1 to be effective.


http://gerrit.cloudera.org:8080/#/c/16549/1//COMMIT_MSG@23
PS1, Line 23: This solution unblocks topic-updates for a while until a
> Maybe this is the best we can do at the moment, but it does seem to have si
I believe the solution which I described above is better than IMPALA-5058 since 
we distinguish between a single lock spanning multiple topic updates v/s 
multiple lock operations which block the topic update thread. Another 
improvement over IMPALA-5058 is that this limit is configurable so that we have 
some options to tune this based on what we see on production instances.


http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1294
PS1, Line 1294: true
> A freeform boolean is hard to read, best to add an inline comment , true /*
Done


http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1316
PS1, Line 1316: topicUpdateTblLockMaxWaitTimeMs
> I could be misunderstanding the code, but isn't 7200 seconds too long to wa
yeah, I kept topicUpdateTblLockMaxWaitTimeMs to 7200 seconds to keep the 
existing behavior unless it is overridden. This way for users where this is not 
a real problem, this patch essentially would be a no-op. But if we think there 
are more users who will run into this problem than the ones who will not, 
perhaps it is better to set it to lower.

I think we can definitely set this value to say to 60 seconds since we have see 
for really bad cases the lock runs for several minutes. I have set a aggressive 
default value of 500ms to make sure we hit this code path in the exhaustive 
runs but I am open to suggestions about the default value.


http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1325
PS1, Line 1325:           return;
> I hope the updateTblPendingVersion() rarely exceeds the number of attempts,
I would be really surprised if we this needs more than 10 attempts since there 
are only 2 threads (one doing alter and other getting catalog delta for topic 
updates) at any point of time for a given table who will attempt to update the 
pending version. Let me think of a way to handle this better.


http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1328
PS1, Line 1328:               "Unable to update the pending version of the 
table " + tbl.getFullName(), ex);
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1352
PS1, Line 1352:       if (attempt >= 
Table.NUMBER_OF_ATTEMPTS_TO_UPDATE_VERSION) {
              :         throw new CatalogException("Exhausted max number of 
attempts "
              :             + Table.NUMBER_OF_ATTEMPTS_TO_UPDATE_VERSION
              :             + " to update the pending table version");
              :       }
> Wouldn't we want to try something more graceful in this case. How would the
According to me, it would be highly unlikely for this to happen. Other than 
bumping up the attempts dynamically in the code or putting this whole logic 
under a flag which was can turned off, I cannot think of a better way 
currently. If you have any ideas, I will be happy to look into them.


http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1392
PS1, Line 1392:     if (tblVersion > ctx.toVersion &&
> I've seen some cases in production workloads where the topic update thread
Yeah, I think this makes a lot of sense but this is non-trivial to implement 
and may be done as a separate patch. The reason is that in CatalogOpExecutor we 
do the following (eg. see alterTable())

1. Lock the table.
2. Get a new catalog version (note that this not assigned to the table yet). 
Say the new version is 100 and current version of table is 99.
3. Do the alter operation
4. Reload the table as required
5. assign the new catalog version
6. release table lock.

Now, the topic update thread does the following:
Consider the topic update thread is looking for updates from (99,100]
1. lock the table.
2. Check if the table needs to be in the update. if not, release lock and 
return.
3. add the table to the update.
4. release lock and return.

If we modify the code such that topic-update thread's 2 is done before taking 
the table lock, it is possible that a table which is currently doing operation 
3-4 in CatalogOpExecutor, is skipped since it is not in the topic update 
window. However, the catalog version counter is already updated. Hence the next 
topic update will run from (100, catalog_version] which would miss the update 
100 on table. If the DDL was a sync_ddl the thread will be stuck forever since 
it will expect a update which never comes.

The solution possibly here is to change the catalogOp to do following:
1. Lock the table.
2. Do the alter operation
3. Reload..
4. increment and assign the new catalog version atomically**
5. release table lock.

Going that route will definitely cascade to other stuff.


http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2662
PS1, Line 2662: LOG.info(String
              :         .format(
              :             "%s the pending catalog version of table %s from %s 
to %s%s.",
              :             (success ? "Bumped up" : "Attempt to bump up"), 
getFullName(),
              :             expectedPendingVersion,
              :             newCatalogVersion, success ? "" : " failed"));
> LOG Debug? Isn't this too frequent.
Done


http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2677
PS1, Line 2677: getAndSet
> Why set it to -1? I haven't been able to convince myself that setting it -1
I think resetting it to -1 is not strictly needed. I just thought it would be 
cleaner. But now that you mention it I think it would probably better to not 
set it -1 since it doesn't serve any purpose other than confusing readers :)

For correctness reasoning, the way I think of this is that 
updatePendingVersion() sets the pendingVersionNumber using the value returned 
from incrementAndGetCatalogVersion() which is always monotonically increasing. 
So every invocation of updatePendingVersion() on a table is guaranteed to 
always advance the pendingVersionNumber. For concurrency control, since we 
cannot take a table lock, it makes sure that the value before setting it was 
the same at the time of setting it and retries in case someone else has bumped 
it up as well.

The setCatalogVersion() here sets the higher of pendingVersionNumber or 
newVersion both being monotonically increasing numbers. I don't see why the set 
version will go backwards.



--
To view, visit http://gerrit.cloudera.org:8080/16549
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655
Gerrit-Change-Number: 16549
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Shant Hovsepian <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]>
Gerrit-Comment-Date: Thu, 15 Oct 2020 21:53:29 +0000
Gerrit-HasComments: Yes

Reply via email to