[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. IMPALA-6671: Skip locked tables from topic updates This change adds a mechanism for topic-update thread to skip a table which is locked for more than a configurable interval from the topic updates. This is especially useful in scenarios where long running operations on a locked table (refresh, recover partitions, compute stats) block the topic update thread. This causes unrelated queries which are waiting on metadata via topic updates (catalog-v1 mode) to unnecessarily block. The ideal solution of this problem would be to make HdfsTable immutable so that there is no need for table lock. But that is large change and not easily portable to older releases of Impala. It would be taken up as a separate patch. This change introduces 2 new configurations for catalogd: 1. topic_update_tbl_max_wait_time_ms: This defines the maximum time in msecs the topic update thread waits on a locked table before skipping the table from that iteration of topic updates. The default value is 500. If this configuration is set to 0 the lock with timeout for topic update thread is disabled. 2. catalog_max_lock_skipped_topic_updates: This defines the maximum number of distinct lock operations which are skipped by topic update thread due to lock contention. Once this limit is reached, topic update thread will block until it acquires the table lock and adds it to the updates. Testing: 1. Added a test case which introduces a simulated delay in a few potentially long running statements. This causes the table to be locked for a long time. The topic update thread skips that table from updates and unrelated queries are unblocked since they receive the required metadata from updates. 2. Added a test where multiple threads run blocking statements in a loop to stress the table lock. It makes sure that topic update thread is not starved and eventually blocks on table lock by hitting the limit defined by catalog_max_lock_skipped_topic_updates. 3. Ran exhaustive tests with default configurations. Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Reviewed-on: http://gerrit.cloudera.org:8080/16549 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/analysis/CopyTestCaseStmt.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java A tests/custom_cluster/test_topic_update_frequency.py 13 files changed, 698 insertions(+), 149 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 18 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 17: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 17 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 22 Dec 2020 00:19:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 17: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6796/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 17 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 21 Dec 2020 18:44:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 17: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 17 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 21 Dec 2020 18:44:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 16: Code-Review+2 There were minor updates since the last update. Carrying forward the votes from Tim and Quanlong. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 16 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 21 Dec 2020 18:36:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7886/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 16 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 21 Dec 2020 18:27:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 16: (3 comments) http://gerrit.cloudera.org:8080/#/c/16549/16/tests/custom_cluster/test_topic_update_frequency.py File tests/custom_cluster/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16549/16/tests/custom_cluster/test_topic_update_frequency.py@54 PS16, Line 54: l flake8: E501 line too long (146 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/16/tests/custom_cluster/test_topic_update_frequency.py@120 PS16, Line 120: f flake8: E126 continuation line over-indented for hanging indent http://gerrit.cloudera.org:8080/#/c/16549/16/tests/custom_cluster/test_topic_update_frequency.py@152 PS16, Line 152: l flake8: E501 line too long (146 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 16 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 21 Dec 2020 18:06:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. IMPALA-6671: Skip locked tables from topic updates This change adds a mechanism for topic-update thread to skip a table which is locked for more than a configurable interval from the topic updates. This is especially useful in scenarios where long running operations on a locked table (refresh, recover partitions, compute stats) block the topic update thread. This causes unrelated queries which are waiting on metadata via topic updates (catalog-v1 mode) to unnecessarily block. The ideal solution of this problem would be to make HdfsTable immutable so that there is no need for table lock. But that is large change and not easily portable to older releases of Impala. It would be taken up as a separate patch. This change introduces 2 new configurations for catalogd: 1. topic_update_tbl_max_wait_time_ms: This defines the maximum time in msecs the topic update thread waits on a locked table before skipping the table from that iteration of topic updates. The default value is 500. If this configuration is set to 0 the lock with timeout for topic update thread is disabled. 2. catalog_max_lock_skipped_topic_updates: This defines the maximum number of distinct lock operations which are skipped by topic update thread due to lock contention. Once this limit is reached, topic update thread will block until it acquires the table lock and adds it to the updates. Testing: 1. Added a test case which introduces a simulated delay in a few potentially long running statements. This causes the table to be locked for a long time. The topic update thread skips that table from updates and unrelated queries are unblocked since they receive the required metadata from updates. 2. Added a test where multiple threads run blocking statements in a loop to stress the table lock. It makes sure that topic update thread is not starved and eventually blocks on table lock by hitting the limit defined by catalog_max_lock_skipped_topic_updates. 3. Ran exhaustive tests with default configurations. Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/analysis/CopyTestCaseStmt.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java A tests/custom_cluster/test_topic_update_frequency.py 13 files changed, 698 insertions(+), 149 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16549/16 -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 16 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 15: (7 comments) http://gerrit.cloudera.org:8080/#/c/16549/15/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/15/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1311 PS15, Line 1311: to > nit: to get Done http://gerrit.cloudera.org:8080/#/c/16549/15/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1325 PS15, Line 1325: LOG.warn("Topic update thread blocking until lock is acquired for table {}", > This is too verbose when topicUpdateTblLockMaxWaitTimeMs_ is 0. Almost all makes sense. Done. http://gerrit.cloudera.org:8080/#/c/16549/15/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2235 PS15, Line 2235: the > nit: remove "the" Done http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py File tests/custom_cluster/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py@20 PS15, Line 20: @SkipIfS3.variable_listing_times > flake8: E302 expected 2 blank lines, found 1 Done http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py@66 PS15, Line 66: is > nit: without? Thanks for pointing this out. http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py@70 PS15, Line 70: is > nit: with? Done http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py@106 PS15, Line 106: # there is no other good way other than to wait for some time to make sure that : # the slow query has been submitted and being compiled before we start the : # non_blocking queries : time.sleep(1) > Just for comments, we can use "wait_for_state" to wait for the CREATE state Actually, I had used this before in my earlier attempt but I found that to be flaky. My understanding is the CREATE state will be set when admission controller accepts the query? There is still a time delay between when the query is in create state and the query takes the table lock in catalogd. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 15 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 21 Dec 2020 17:50:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/16549/15/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/15/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1325 PS15, Line 1325: LOG.warn("Topic update thread blocking until lock is acquired for table {}", This is too verbose when topicUpdateTblLockMaxWaitTimeMs_ is 0. Almost all tables will leave one log line here because their version <= ctx.toVersion. Could you change the condition to this? if (topicUpdateTblLockMaxWaitTimeMs_ > 0 && !lockWithTimeout) -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 15 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 21 Dec 2020 08:12:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 15: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 15 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 18 Dec 2020 20:59:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 15: Code-Review+1 (9 comments) Thanks for your patience in updating the patch! LGTM. http://gerrit.cloudera.org:8080/#/c/16549/15/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/16549/15/be/src/catalog/catalog-server.cc@79 PS15, Line 79: the nit: redundant "the" http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1416 PS10, Line 1416: // table lock was successfully acquired. We can now release the versionLock. : versionLock_.writeLock().unlock(); : return > I don't think so. The tblVersion is used later down below to identify if we I see, so the second attemp will always fail to update the pendingVersion but it's ok since its purpose is just to retry acquiring the read lock. http://gerrit.cloudera.org:8080/#/c/16549/15/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/15/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@459 PS15, Line 459: since the timeout nit: should remove? http://gerrit.cloudera.org:8080/#/c/16549/15/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1311 PS15, Line 1311: to nit: to get http://gerrit.cloudera.org:8080/#/c/16549/15/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2235 PS15, Line 2235: the nit: remove "the" http://gerrit.cloudera.org:8080/#/c/16549/15/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2235 PS15, Line 2235: pendingVersion nit: missing period http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py File tests/custom_cluster/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py@66 PS15, Line 66: is nit: without? http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py@70 PS15, Line 70: is nit: with? http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py@106 PS15, Line 106: # there is no other good way other than to wait for some time to make sure that : # the slow query has been submitted and being compiled before we start the : # non_blocking queries : time.sleep(1) Just for comments, we can use "wait_for_state" to wait for the CREATE state. An example: https://github.com/apache/impala/blob/10fe4b6c635601768e70054255b7e50e000e71b5/tests/custom_cluster/test_query_retries.py#L91 But it requires refactoring to expose the query handle. To be simple, I'm ok with sleep 1s here. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 15 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 18 Dec 2020 10:26:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7871/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 15 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 16 Dec 2020 22:21:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 15: (4 comments) http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py File tests/custom_cluster/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py@20 PS15, Line 20: @SkipIfS3.variable_listing_times flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py@53 PS15, Line 53: l flake8: E501 line too long (146 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py@119 PS15, Line 119: f flake8: E126 continuation line over-indented for hanging indent http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py@151 PS15, Line 151: l flake8: E501 line too long (146 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 15 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 16 Dec 2020 22:00:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. IMPALA-6671: Skip locked tables from topic updates This change adds a mechanism for topic-update thread to skip a table which is locked for more than a configurable interval from the topic updates. This is especially useful in scenarios where long running operations on a locked table (refresh, recover partitions, compute stats) block the topic update thread. This causes unrelated queries which are waiting on metadata via topic updates (catalog-v1 mode) to unnecessarily block. The ideal solution of this problem would be to make HdfsTable immutable so that there is no need for table lock. But that is large change and not easily portable to older releases of Impala. It would be taken up as a separate patch. This change introduces 2 new configurations for catalogd: 1. topic_update_tbl_max_wait_time_ms: This defines the maximum time in msecs the topic update thread waits on a locked table before skipping the table from that iteration of topic updates. The default value is 500. If this configuration is set to 0 the lock with timeout for topic update thread is disabled. 2. catalog_max_lock_skipped_topic_updates: This defines the maximum number of distinct lock operations which are skipped by topic update thread due to lock contention. Once this limit is reached, topic update thread will block until it acquires the table lock and adds it to the updates. Testing: 1. Added a test case which introduces a simulated delay in a few potentially long running statements. This causes the table to be locked for a long time. The topic update thread skips that table from updates and unrelated queries are unblocked since they receive the required metadata from updates. 2. Added a test where multiple threads run blocking statements in a loop to stress the table lock. It makes sure that topic update thread is not starved and eventually blocks on table lock by hitting the limit defined by catalog_max_lock_skipped_topic_updates. 3. Ran exhaustive tests with default configurations. Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/analysis/CopyTestCaseStmt.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java A tests/custom_cluster/test_topic_update_frequency.py 13 files changed, 697 insertions(+), 149 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16549/15 -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 15 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7869/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 13 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 16 Dec 2020 19:57:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 13: (3 comments) http://gerrit.cloudera.org:8080/#/c/16549/13/tests/custom_cluster/test_topic_update_frequency.py File tests/custom_cluster/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16549/13/tests/custom_cluster/test_topic_update_frequency.py@52 PS13, Line 52: l flake8: E501 line too long (146 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/13/tests/custom_cluster/test_topic_update_frequency.py@118 PS13, Line 118: f flake8: E126 continuation line over-indented for hanging indent http://gerrit.cloudera.org:8080/#/c/16549/13/tests/custom_cluster/test_topic_update_frequency.py@150 PS13, Line 150: l flake8: E501 line too long (146 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 13 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 16 Dec 2020 19:35:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 13: PS13 is a rebase and some minor comment updates. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 13 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 16 Dec 2020 19:35:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. IMPALA-6671: Skip locked tables from topic updates This change adds a mechanism for topic-update thread to skip a table which is locked for more than a configurable interval from the topic updates. This is especially useful in scenarios where long running operations on a locked table (refresh, recover partitions, compute stats) block the topic update thread. This causes unrelated queries which are waiting on metadata via topic updates (catalog-v1 mode) to unnecessarily block. The ideal solution of this problem would be to make HdfsTable immutable so that there is no need for table lock. But that is large change and not easily portable to older releases of Impala. It would be taken up as a separate patch. This change introduces 2 new configurations for catalogd: 1. topic_update_tbl_max_wait_time_ms: This defines the maximum time in msecs the topic update thread waits on a locked table before skipping the table from that iteration of topic updates. The default value is 500. If this configuration is set to 0 the lock with timeout for topic update thread is disabled. 2. catalog_max_lock_skipped_topic_updates: This defines the maximum number of distinct lock operations which are skipped by topic update thread due to lock contention. Once this limit is reached, topic update thread will block until it acquires the table lock and adds it to the updates. Testing: 1. Added a test case which introduces a simulated delay in a few potentially long running statements. This causes the table to be locked for a long time. The topic update thread skips that table from updates and unrelated queries are unblocked since they receive the required metadata from updates. 2. Added a test where multiple threads run blocking statements in a loop to stress the table lock. It makes sure that topic update thread is not starved and eventually blocks on table lock by hitting the limit defined by catalog_max_lock_skipped_topic_updates. 2. Ran exhaustive tests with default configurations. Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/analysis/CopyTestCaseStmt.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java A tests/custom_cluster/test_topic_update_frequency.py 13 files changed, 696 insertions(+), 149 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16549/13 -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 13 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG@35 PS10, Line 35: Testing: > Good idea. I will loop them overnight and report back. I also made some cha I looped them overnight for a 100 iterations without any failures. I plan to ignore this test for s3 builds in the next update. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 10 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 16 Dec 2020 18:13:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7863/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 12 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 16 Dec 2020 10:01:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 12: (4 comments) http://gerrit.cloudera.org:8080/#/c/16549/12/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/16549/12/fe/src/main/java/org/apache/impala/catalog/Table.java@86 PS12, Line 86: private final ReentrantReadWriteLock tableLock_ = new ReentrantReadWriteLock(true /*fair ordering*/); line too long (103 > 90) http://gerrit.cloudera.org:8080/#/c/16549/12/tests/custom_cluster/test_topic_update_frequency.py File tests/custom_cluster/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16549/12/tests/custom_cluster/test_topic_update_frequency.py@52 PS12, Line 52: l flake8: E501 line too long (146 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/12/tests/custom_cluster/test_topic_update_frequency.py@118 PS12, Line 118: f flake8: E126 continuation line over-indented for hanging indent http://gerrit.cloudera.org:8080/#/c/16549/12/tests/custom_cluster/test_topic_update_frequency.py@150 PS12, Line 150: l flake8: E501 line too long (146 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 12 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 16 Dec 2020 09:39:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 12: (23 comments) http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG@35 PS10, Line 35: > Did you loop any of these tests to make sure they're not flaky? Good idea. I will loop them overnight and report back. I also made some changes to the test to speed them up since they were taking very long. http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG@46 PS10, Line 46: c > nit: 3 Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@212 PS10, Line 212: topicUpdateTblLockMaxWaitTimeM > nit: maxSkippedUpdatesLockContention_ Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@214 PS10, Line 214: l long LOCK_RETRY_TIMEOUT_MS = > nit: topicUpdateTblLockMaxWaitTimeMs_ Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@437 PS10, Line 437:* held when the function returns. Returns false otherwise and no lock is held in > Comment needs a minor update to explain that it gets the write lock. Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@447 PS10, Line 447:*/ > Comment needs update to mention useWriteLock. Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@453 PS10, Line 453: em.c > It may be useful to know the lock type (read/write) in debugging. Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@460 PS10, Line 460: want to unnecessari > Can we directly use the specified timeout here? I think the answer is no an yeah, this 0 timeout is to make sure that we don't also hold the version lock for the timeout duration. I added a comment. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1395 PS10, Line 1395: when topic update thread inspe > I think the timeout should be topicUpdateTblLockMaxWaitTimeMs / maxAttempts yeah, makes sense. Although it adds a bit more to the complexity. Added a comment to explain. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1399 PS10, Line 1399:* @return t > nit: we can return true directly here. Then we don't need the lockAcquired Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1416 PS10, Line 1416: // table lock was successfully acquired. We can now release the versionLock. : versionLock_.writeLock().unlock(); : return > Should we update tblVersion to be hdfsTable.getCatalogVersion() in this cas I don't think so. The tblVersion is used later down below to identify if we have skipped this table below before. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2215 PS10, Line 2215: Db db = getDb(updatedTbl.getDb().getName()); > If the existingTbl has a pendingVersion and both instances are > HdfsTable, we loss the pendingVersion in the updatedTbl. But looks > like it's ok because incrementAndGetCatalogVersion() will always > get a version larger than the pendingVersion. This is just a corner > case that only happens when a table loading is triggered due to > stale writeIdList. It's worth a comment here. Yes, like you mentioned this is okay since the incrementAndGetCatalogVersion will push the table out of topic update window. Also its worth noting that the the addTableToCatalogDelta() call in addDatabaseToCatalogDelta() is on a table object which is received from getAllTables(). getAllTables holds the versionLock's readlock. So the replaceTableIfUnchanged can either occur before the getAllTables() or after. In case of later we are okay since we don't lose the pendingVersion. In case of former, we are okay too since the table will be pushed out of topic-update window. > BTW, what if we use incrementAndGetCatalogVersion() at the end of > table modifications to set its catalog version? Is it a smipler > solution than the pendingVersion solution? We just need to avoid > deadlocks since incrementAndGetCatalogVersion() requires the > catalog versionLock but we are holding the table lock in table
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. IMPALA-6671: Skip locked tables from topic updates This change adds a mechanism for topic-update thread to skip a table which is locked for more than a configurable interval from the topic updates. This is especially useful in scenarios where long running operations on a locked table (refresh, recover partitions, compute stats) block the topic update thread. This causes unrelated queries which are waiting on metadata via topic updates (catalog-v1 mode) to unnecessarily block. The ideal solution of this problem would be to make HdfsTable immutable so that there is no need for table lock. But that is large change and not easily portable to older releases of Impala. It would be taken up as a separate patch. This change introduces 2 new configurations for catalogd: 1. topic_update_tbl_max_wait_time_ms: This defines the maximum time in msecs the topic update thread waits on a locked table before skipping the table from that iteration of topic updates. The default value is 500. If this configuration is set to 0 the lock with timeout for topic update thread is disabled. 2. catalog_max_lock_skipped_topic_updates: This defines the maximum number of distinct lock operations which are skipped by topic update thread due to lock contention. Once this limit is reached, topic update thread will block until it acquires the table lock and adds it to the updates. Testing: 1. Added a test case which introduces a simulated delay in a few potentially long running statements. This causes the table to be locked for a long time. The topic update thread skips that table from updates and unrelated queries are unblocked since they receive the required metadata from updates. 2. Added a test where multiple threads run blocking statements in a loop to stress the table lock. It makes sure that topic update thread is not starved and eventually blocks on table lock by hitting the limit defined by catalog_max_lock_skipped_topic_updates. 2. Ran exhaustive tests with default configurations. Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/analysis/CopyTestCaseStmt.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java A tests/custom_cluster/test_topic_update_frequency.py 13 files changed, 687 insertions(+), 152 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16549/12 -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 12 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 10: (19 comments) http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG@46 PS10, Line 46: 2 nit: 3 http://gerrit.cloudera.org:8080/#/c/16549/10/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/16549/10/be/src/catalog/catalog-server.cc@79 PS10, Line 79: the nit: redundant "the" http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@25 PS10, Line 25: import java.sql.Time; nit: unused import http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@40 PS10, Line 40: import java.util.concurrent.locks.ReentrantLock; nit: unused import http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@212 PS10, Line 212: maxSkippedUpdatesLockContention nit: maxSkippedUpdatesLockContention_ http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@214 PS10, Line 214: topicUpdateTblLockMaxWaitTimeMs nit: topicUpdateTblLockMaxWaitTimeMs_ http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@453 PS10, Line 453: lock It may be useful to know the lock type (read/write) in debugging. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@460 PS10, Line 460: 0, TimeUnit.SECONDS Can we directly use the specified timeout here? I think the answer is no and the reason is we want to release the catalog versionLock immediately. If so, I think it's worth a comment here. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1395 PS10, Line 1395: topicUpdateTblLockMaxWaitTimeMs I think the timeout should be topicUpdateTblLockMaxWaitTimeMs / maxAttempts http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1399 PS10, Line 1399: break; nit: we can return true directly here. Then we don't need the lockAcquired var. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1416 PS10, Line 1416: // if pendingVersionUpdated is false it means that tblVersion has been changed : // and hence we didn't update the pendingVersion. We retry once to acquire a read : // lock. Should we update tblVersion to be hdfsTable.getCatalogVersion() in this case? http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2215 PS10, Line 2215: updatedTbl.setCatalogVersion(incrementAndGetCatalogVersion()); If the existingTbl has a pendingVersion and both instances are HdfsTable, we loss the pendingVersion in the updatedTbl. But looks like it's ok because incrementAndGetCatalogVersion() will always get a version larger than the pendingVersion. This is just a corner case that only happens when a table loading is triggered due to stale writeIdList. It's worth a comment here. BTW, what if we use incrementAndGetCatalogVersion() at the end of table modifications to set its catalog version? Is it a smipler solution than the pendingVersion solution? We just need to avoid deadlocks since incrementAndGetCatalogVersion() requires the catalog versionLock but we are holding the table lock in table modifications. It seems ok if we always use tryLock to acquire the table lock and give up the catalog version lock when it fails. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3383 PS10, Line 3383: // TODO(todd): consider a read-write lock here. We can remove this TODO now. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@299 PS10, Line 299: Hdfs nit: HdfsTable? http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2709 PS10, Line 2709: then nit: the http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/Table.java File
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 10: (7 comments) Looking pretty good... http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG@35 PS10, Line 35: Testing: Did you loop any of these tests to make sure they're not flaky? http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@437 PS10, Line 437:* Tries to acquire versionLock_ and the lock of 'tbl' in that order. Returns true if it Comment needs a minor update to explain that it gets the write lock. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@447 PS10, Line 447:* Tries to acquire the table similar to described in Comment needs update to mention useWriteLock. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/Table.java@84 PS10, Line 84: // Lock protecting this table Could you briefly describe when the read lock can be safely taken and when the write lock is needed. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/Table.java@142 PS10, Line 142: public static final int NUMBER_OF_ATTEMPTS_TO_UPDATE_VERSION = 10; This is no longer needed http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@660 PS10, Line 660: if (tbl instanceof HdfsTable) { Can you remind me why this is necessary on this code path and not other code paths that alter the table? http://gerrit.cloudera.org:8080/#/c/16549/10/tests/metadata/test_topic_update_frequency.py File tests/metadata/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16549/10/tests/metadata/test_topic_update_frequency.py@143 PS10, Line 143: def __run_topic_update_test(self, slow_blocking_query, fast_queries, nit: the order of functions in this class is a bit weird, maybe this should be just below the method that calls it. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 10 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 11 Nov 2020 01:37:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 10: I took a quick look over the changes and I think this makes sense to me, thanks for working on it. I need to make a bit of time to do a detailed review. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 10 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 04 Nov 2020 00:15:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7610/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 10 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 03 Nov 2020 20:57:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 10: (3 comments) http://gerrit.cloudera.org:8080/#/c/16549/10/tests/metadata/test_topic_update_frequency.py File tests/metadata/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16549/10/tests/metadata/test_topic_update_frequency.py@50 PS10, Line 50: l flake8: E501 line too long (146 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/10/tests/metadata/test_topic_update_frequency.py@109 PS10, Line 109: l flake8: E501 line too long (145 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/10/tests/metadata/test_topic_update_frequency.py@177 PS10, Line 177: f flake8: E126 continuation line over-indented for hanging indent -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 10 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 03 Nov 2020 20:37:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. IMPALA-6671: Skip locked tables from topic updates This change adds a mechanism for topic-update thread to skip a table which is locked for more than a configurable interval from the topic updates. This is especially useful in scenarios where long running operations on a locked table (refresh, recover partitions, compute stats) block the topic update thread. This causes unrelated queries which are waiting on metadata via topic updates (catalog-v1 mode) to unnecessarily block. The ideal solution of this problem would be to make HdfsTable immutable so that there is no need for table lock. But that is large change and not easily portable to older releases of Impala. It would be taken up as a separate patch. This change introduces 2 new configurations for catalogd: 1. topic_update_tbl_max_wait_time_ms: This defines the maximum time in msecs the topic update thread waits on a locked table before skipping the table from that iteration of topic updates. The default value is 500. 2. catalog_max_lock_skipped_topic_updates: This defines the maximum number of distinct lock operations which are skipped by topic update thread due to lock contention. Once this limit is reached, topic update thread will block until it acquires the table lock and adds it to the updates. Testing: 1. Added a test case which introduces a simulated delay in a few potentially long running statements. This causes the table to be locked for a long time. The topic update thread skips that table from updates and unrelated queries are unblocked since they receive the required metadata from updates. 2. Added a test where multiple threads run blocking statements in a loop to stress the table lock. It makes sure that topic update thread is not starved and eventually blocks on table lock by hitting the limit defined by catalog_max_lock_skipped_topic_updates. 2. Ran exhaustive tests with default configurations. Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/analysis/CopyTestCaseStmt.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java A tests/metadata/test_topic_update_frequency.py 13 files changed, 647 insertions(+), 149 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16549/10 -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 10 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/16549/7/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/7/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1366 PS7, Line 1366: = nit: remove spaces around '=' to be consistent with 'version={}' http://gerrit.cloudera.org:8080/#/c/16549/7/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1406 PS7, Line 1406: public long updateTblPendingVersion(HdfsTable tbl) throws CatalogException { I think we don't need this now since now only the catalog update thread will update a table's pending version. There are no contentions. http://gerrit.cloudera.org:8080/#/c/16549/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/16549/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2678 PS7, Line 2678: public boolean updatePendingVersion(long expectedPendingVersion, > I have a different idea below, but could we avoid the retry loop around upd I think PS7 is able to avoid the retry loop around this since now only the catalog update thread will call this. There are no contentions. http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@665 PS6, Line 665: if (params.getAlter_type() == TAlterTableType.RENAME_VIEW > Could you explain what error can happen if we don't set the pending version I see PS7 updates this. Maybe we should comment the invariant somewhere. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 7 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 26 Oct 2020 09:33:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/16549/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/16549/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2678 PS7, Line 2678: public boolean updatePendingVersion(long expectedPendingVersion, I have a different idea below, but could we avoid the retry loop around updatePendingVersion() by considering it successful if catalogVersion_ > newPendingVersion. I think that would guarantee that the table gets picked up in the next topic update. http://gerrit.cloudera.org:8080/#/c/16549/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2706 PS7, Line 2706: if (currentPendingVersion > version) { What if, instead of setCatalogVersion() checking pendingVersionNumber(), this was done lazily by the topic update thread. E.g. before calling getCatalogVersion(), the topic update thread does something like: tblVersion = getCatalogVersion(); // If setCatalogVersion() was called after the last time // the topic update thread checked the table. if (tblVersion > lastVersionSeenByTopicUpdate_) { // Bump catalogVersion_ if pending version is greater than previous update catalogVersion_.atomicMax(pendingVersionNumber_); } That would mean that only the topic thread needed to read/write pendingVersionNumber_, so there isn't the same race involving multiple versions. I don't know if there are any negative consequences from not updating https://stackoverflow.com/a/43210465/13270883 had some tips on doing an atomic max - looks like there are some clean solutions. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 7 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 23 Oct 2020 22:54:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7546/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 7 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 23 Oct 2020 22:20:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/16549/7/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/7/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1366 PS7, Line 1366: "Table {} (version={}, pendingVersion = {}) is skipping topic update ({}, {}] " line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/16549/7/tests/metadata/test_topic_update_frequency.py File tests/metadata/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16549/7/tests/metadata/test_topic_update_frequency.py@50 PS7, Line 50: l flake8: E501 line too long (146 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/7/tests/metadata/test_topic_update_frequency.py@109 PS7, Line 109: l flake8: E501 line too long (145 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/7/tests/metadata/test_topic_update_frequency.py@177 PS7, Line 177: f flake8: E126 continuation line over-indented for hanging indent -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 7 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 23 Oct 2020 21:59:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. IMPALA-6671: Skip locked tables from topic updates This change adds a mechanism for topic-update thread to skip a table which is locked for more than a configurable interval from the topic updates. This is especially useful in scenarios where long running operations on a locked table (refresh, recover partitions, compute stats) block the topic update thread. This causes unrelated queries which are waiting on metadata via topic updates (catalog-v1 mode) to unnecessarily block. The ideal solution of this problem would be to make HdfsTable immutable so that there is no need for table lock. But that is large change and not easily portable to older releases of Impala. It would be taken up as a separate patch. This change introduces 2 new configurations for catalogd: 1. topic_update_tbl_max_wait_time_ms: This defines the maximum time in msecs the topic update thread waits on a locked table before skipping the table from that iteration of topic updates. The default value is 500. 2. catalog_max_lock_skipped_topic_updates: This defines the maximum number of distinct lock operations which are skipped by topic update thread due to lock contention. Once this limit is reached, topic update thread will block until it acquires the table lock and adds it to the updates. Testing: 1. Added a test case which introduces a simulated delay in a few potentially long running statements. This causes the table to be locked for a long time. The topic update thread skips that table from updates and unrelated queries are unblocked since they receive the required metadata from updates. 2. Added a test where multiple threads run blocking statements in a loop to stress the table lock. It makes sure that topic update thread is not starved and eventually blocks on table lock by hitting the limit defined by catalog_max_lock_skipped_topic_updates. 2. [WIP] Ran exhaustive tests with default configurations. Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A tests/metadata/test_topic_update_frequency.py 9 files changed, 467 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16549/7 -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 7 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 7: (9 comments) http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1304 PS6, Line 1304: ce we don't have the table lock yet and the table could be changed during the : // execution below. > Maybe use a local variable - canSkipUpdate? The relationship between this c Done. Added changed the wording of the comment which hopefully makes it easier to understand. http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1339 PS6, Line 1339: the tabl > nit: hdfsTable Done http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1349 PS6, Line 1349:* @param ctx GetCatalogDeltaContext where the table is add if the lock is acquired. > Just to make sure I understand correctly, in the case when Thanks for the question, while explaining it you I found a bug with logic and updated the the patch. The main idea is here as follows: the hdfstable.getLastVersionSeenByTopicUpdate() returns the version of the table when topic update thread last inspected the table to evaluate if it needs to go in a topic update. If this is not the same as the table version, it means we have skipped table due to lock contention before and the table was updated before we came here again to add to the topic update. The reason we need to do this is to distinguish the case when a table is locked for long time and topic update thread visited it more than once during this time v/s topic thread thread visited the table and the table happened to be locked again by a different CatalogOperation. Without doing this we will hit the limit maxSkippedUpdatesLockContention too soon and the patch will not really be of much help. For example: Case 1: Time t1: table is locked for a refresh Time t2: topic update tried to add the table but skipped it due to lock contention. It increments the counter by 1 and sets this tblVersion in setLastVersionSeenByTopicUpdate() Time t3: topic update tried again. This time though, the table version has not changed yet since refresh is not complete. It skips the table without incrementing the counter. Time t4: topic update tries again. Same as above. Time t5: topic update tries again. Same as above. Time t6: refresh completes. Time t7: topic update tries now. lock is acquired and it adds the table to the update. Case 2: Time t1: table is locked for a refresh Time t2: topic update tried to add the table but skipped it due to lock contention. It increments the counter by 1 and sets this tblVersion in setLastVersionSeenByTopicUpdate() Time t3: topic update tried again. This time though, the table version has not changed yet since refresh is not complete. It skips the table without incrementing the counter. Time t4: topic update tries again. Same as above. Time t5: topic update tries again. Same as above. Time t6: refresh completes. Time t7: a second refresh is issued which gets the table lock. Time t8: topic update tries now. the table is locked and getLastVersionSeenByTopicUpdate() will not be same as tblVersion now and hence we increment the counter by 1 again. Time t9: refresh completes. Time t10: third refresh is issued which gets the table lock. time t11: topic update tries now. Now we detect the table has been skipped twice before and we block until we acquire the lock. Hope this helps. http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1394 PS6, Line 1394: } else { > I'm not currently quite grokking the need to have all the bits of this sync The main objective here is that if the topic update thread decides to skip the table, it must also make sure that a concurrent setCatalogVersion() on this table should set the version outside the current topic update window. Otherwise, if there is a race between the two, it is possible that a table modification gets leaked from the topic updates. I will try to clean this up more. If you have any suggestions let me know as well. I don't think we really need to update the pending version in CatalogOpExecutor but let me think some more about it. http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1397 PS6, Line 1397: > nit: at most Done http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1398 PS6, Line 1398: > nit: extra e
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 6: (12 comments) This is really what we need in catalog-v1! I still need to look into it deeper. http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@a456 PS6, Line 456: nit: keep the blank line http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@209 PS6, Line 209: maxSkippedUpdatesLockContention nit: maxSkippedUpdatesLockContention_ http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@210 PS6, Line 210: //v nit: needs a space and capitalizes 'v'. http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@211 PS6, Line 211: topicUpdateTblLockMaxWaitTimeMs nit: topicUpdateTblLockMaxWaitTimeMs_ http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@212 PS6, Line 212: //D nit: needs a space http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1309 PS6, Line 1309: we nit: "We". I think we always capitalize the first char of comments. It'd be neat to keep this convention. http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1315 PS6, Line 1315: else { : if nit: we can merge this else-if http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1372 PS6, Line 1372: tryLock is false. nit: The other reason is non hdfs tables are updated quickly so won't suffer the problem http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2682 PS6, Line 2682: LOG.debug(String : .format("%s the pending catalog version of table %s from %s to %s%s.", : (success ? "Bumped up" : "Attempt to bump up"), getFullName(), : expectedPendingVersion, newCatalogVersion, success ? "" : " failed")); nit: Don't need String.format LOG.debug("{} the pending catalog version of table {} from {} to {}{}.", (success ? "Bumped up" : "Attempt to bump up"), getFullName(), expectedPendingVersion, newCatalogVersion, success ? "" : " failed"); http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java File fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java: http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java@67 PS6, Line 67: number ni: Number http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@665 PS6, Line 665: newCatalogVersion = catalog_.updateTblPendingVersion((HdfsTable) tbl); Could you explain what error can happen if we don't set the pending version here? Can we just call catalog_.incrementAndGetCatalogVersion() to get the newCatalogVersion? Let's say the last or currently on-going catalog topic update is in version range (fromVersion, toVersion]. IIUC, the invariant we need is max(newCatalogVersion, pendingVersion) > toVersion When we finish updaing the table, we set its catalog version to max(newCatalogVersion, pendingVersion). It will fit into the version range of the next catalog update since it's larger than toVersion. Here we get a large enough 'newCatalogVersion' to keep this invariant. If the catalog update thread is collecting a version range with toVersion > our newCatalogVersion, it's its duty to maintain this invariant. So only the catalog update thread is required to update the pendingVersion. Then we won't have contentions on updaing the pendingVersion. If I'm wrong, then I think we should do the same (i.e. call catalog_.updateTblPendingVersion()) for the REFRESH code path in CatalogServiceCatalog#reloadTable() as well: https://github.com/apache/impala/blob/227e43f48147c8725100ddc05521bae07ee9becd/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2304
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 6: (6 comments) I was able to do a pass over it, but didn't quite understand the stuff with pending versions yet and my brain is running out of steam tonight. Maybe you can help me understand it and I can take another look in the morning? http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1304 PS6, Line 1304: topicUpdateEntry.getNumSkippedUpdatesLockContention() : == maxSkippedUpdatesLockContention) Maybe use a local variable - canSkipUpdate? The relationship between this condition and topicUpdateEntry.getNumSkippedUpdatesLockContention() < maxSkippedUpdatesLockContention is a little hard to pick up otherwise. http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1339 PS6, Line 1339: hdfstable nit: hdfsTable http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1349 PS6, Line 1349: if (hdfstable.getLastVersionSeenByTopicUpdate() != topicUpdateEntry Just to make sure I understand correctly, in the case when hdfstable.getLastVersionSeenByTopicUpdate() == topicUpdateEntry .getLastSentVersion() then we don't need to update topicUpdateLog() because we don't need to update the counter of skipped updates and the previous entry still has all the correct data? http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1394 PS6, Line 1394: // Currently, there could be only 2 threads which can concurrently update the I'm not currently quite grokking the need to have all the bits of this synchronization scheme and it would be nice to avoid the retry loop so there's no failure mode to think about (I also need to check that the CatalogException is handled cleanly). My understanding is that we're bumping the pending version when it skips an update so that when the long-running operation finishes and setCatalogVersion is called, the version will be high enough to be picked up by a later topic update. I didn't understand why CatalogOpExecutor needs to update the pending version when it's already allocating and updating a catalog version. http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1397 PS6, Line 1397: atmost nit: at most http://gerrit.cloudera.org:8080/#/c/16549/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1398 PS6, Line 1398: e nit: extra e -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 6 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 22 Oct 2020 05:57:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7513/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 5 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 22 Oct 2020 00:44:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/16549/5/tests/metadata/test_topic_update_frequency.py File tests/metadata/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16549/5/tests/metadata/test_topic_update_frequency.py@50 PS5, Line 50: l flake8: E501 line too long (146 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/5/tests/metadata/test_topic_update_frequency.py@109 PS5, Line 109: l flake8: E501 line too long (145 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/5/tests/metadata/test_topic_update_frequency.py@177 PS5, Line 177: f flake8: E126 continuation line over-indented for hanging indent -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 5 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 22 Oct 2020 00:24:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. IMPALA-6671: Skip locked tables from topic updates This change adds a mechanism for topic-update thread to skip a table which is locked for more than a configurable interval from the topic updates. This is especially useful in scenarios where long running operations on a locked table (refresh, recover partitions, compute stats) block the topic update thread. This causes unrelated queries which are waiting on metadata via topic updates (catalog-v1 mode) to unnecessarily block. The ideal solution of this problem would be to make HdfsTable immutable so that there is no need for table lock. But that is large change and not easily portable to older releases of Impala. It would be taken up as a separate patch. This change introduces 2 new configurations for catalogd: 1. topic_update_tbl_max_wait_time_ms: This defines the maximum time in msecs the topic update thread waits on a locked table before skipping the table from that iteration of topic updates. The default value is 500. 2. catalog_max_lock_skipped_topic_updates: This defines the maximum number of distinct lock operations which are skipped by topic update thread due to lock contention. Once this limit is reached, topic update thread will block until it acquires the table lock and adds it to the updates. Testing: 1. Added a test case which introduces a simulated delay in a few potentially long running statements. This causes the table to be locked for a long time. The topic update thread skips that table from updates and unrelated queries are unblocked since they receive the required metadata from updates. 2. Added a test where multiple threads run blocking statements in a loop to stress the table lock. It makes sure that topic update thread is not starved and eventually blocks on table lock by hitting the limit defined by catalog_max_lock_skipped_topic_updates. 2.Ran exhaustive tests with default configurations. Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java A tests/metadata/test_topic_update_frequency.py 10 files changed, 458 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16549/5 -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 5 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 4: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/7501/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 20 Oct 2020 21:55:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 4: (7 comments) http://gerrit.cloudera.org:8080/#/c/16549/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1358 PS4, Line 1358: "Unable to update the pending version of the table " + tbl.getFullName(), ex); line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/16549/4/tests/metadata/test_topic_update_frequency.py File tests/metadata/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16549/4/tests/metadata/test_topic_update_frequency.py@15 PS4, Line 15: import threading flake8: F401 'threading' imported but unused http://gerrit.cloudera.org:8080/#/c/16549/4/tests/metadata/test_topic_update_frequency.py@18 PS4, Line 18: from time import sleep flake8: F401 'time.sleep' imported but unused http://gerrit.cloudera.org:8080/#/c/16549/4/tests/metadata/test_topic_update_frequency.py@20 PS4, Line 20: from tests.beeswax.impala_beeswax import ImpalaBeeswaxException flake8: F401 'tests.beeswax.impala_beeswax.ImpalaBeeswaxException' imported but unused http://gerrit.cloudera.org:8080/#/c/16549/4/tests/metadata/test_topic_update_frequency.py@54 PS4, Line 54: l flake8: E501 line too long (146 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/4/tests/metadata/test_topic_update_frequency.py@113 PS4, Line 113: l flake8: E501 line too long (145 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/4/tests/metadata/test_topic_update_frequency.py@181 PS4, Line 181: f flake8: E126 continuation line over-indented for hanging indent -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 20 Oct 2020 21:41:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. IMPALA-6671: Skip locked tables from topic updates This change adds a mechanism for topic-update thread to skip a table which is locked for more than a configurable interval from the topic updates. This is especially useful in scenarios where long running operations on a locked table (refresh, recover partitions, compute stats) block the topic update thread. This causes unrelated queries which are waiting on metadata via topic updates (catalog-v1 mode) to unnecessarily block. The ideal solution of this problem would be to make HdfsTable immutable so that there is no need for table lock. But that is large change and not easily portable to older releases of Impala. It would be taken up as a separate patch. This change introduces 2 new configurations for catalogd: 1. topic_update_tbl_max_wait_time_ms: This defines the maximum time in msecs the topic update thread waits on a locked table before skipping the table from that iteration of topic updates. The default value is 500. 2. catalog_max_lock_skipped_topic_updates: This defines the maximum number of distinct lock operations which are skipped by topic update thread due to lock contention. Once this limit is reached, topic update thread will block until it acquires the table lock and adds it to the updates. Testing: 1. Added a test case which introduces a simulated delay in a few potentially long running statements. This causes the table to be locked for a long time. The topic update thread skips that table from updates and unrelated queries are unblocked since they receive the required metadata from updates. 2. Added a test where multiple threads run blocking statements in a loop to stress the table lock. It makes sure that topic update thread is not starved and eventually blocks on table lock by hitting the limit defined by catalog_max_lock_skipped_topic_updates. 2.Ran exhaustive tests with default configurations. Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java A tests/metadata/test_topic_update_frequency.py 10 files changed, 461 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16549/4 -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 3: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/7499/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 20 Oct 2020 21:37:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/16549/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1358 PS3, Line 1358: "Unable to update the pending version of the table " + tbl.getFullName(), ex); line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/16549/3/tests/metadata/test_topic_update_frequency.py File tests/metadata/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16549/3/tests/metadata/test_topic_update_frequency.py@15 PS3, Line 15: import threading flake8: F401 'threading' imported but unused http://gerrit.cloudera.org:8080/#/c/16549/3/tests/metadata/test_topic_update_frequency.py@18 PS3, Line 18: from time import sleep flake8: F401 'time.sleep' imported but unused http://gerrit.cloudera.org:8080/#/c/16549/3/tests/metadata/test_topic_update_frequency.py@20 PS3, Line 20: from tests.beeswax.impala_beeswax import ImpalaBeeswaxException flake8: F401 'tests.beeswax.impala_beeswax.ImpalaBeeswaxException' imported but unused http://gerrit.cloudera.org:8080/#/c/16549/3/tests/metadata/test_topic_update_frequency.py@54 PS3, Line 54: l flake8: E501 line too long (146 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/3/tests/metadata/test_topic_update_frequency.py@113 PS3, Line 113: l flake8: E501 line too long (145 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/3/tests/metadata/test_topic_update_frequency.py@181 PS3, Line 181: f flake8: E126 continuation line over-indented for hanging indent -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 20 Oct 2020 21:22:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. IMPALA-6671: Skip locked tables from topic updates This change adds a mechanism for topic-update thread to skip a table which is locked for more than a configurable interval from the topic updates. This is especially useful in scenarios where long running operations on a locked table (refresh, recover partitions, compute stats) block the topic update thread. This causes unrelated queries which are waiting on metadata via topic updates (catalog-v1 mode) to unnecessarily block. The ideal solution of this problem would be to make HdfsTable immutable so that there is no need for table lock. But that is large change and not easily portable to older releases of Impala. It would be taken up as a separate patch. This change introduces 2 new configurations for catalogd: 1. topic_update_tbl_max_wait_time_ms: This defines the maximum time in msecs the topic update thread waits on a locked table before skipping the table from that iteration of topic updates. The default value is 500. 2. catalog_max_lock_skipped_topic_updates: This defines the maximum number of distinct lock operations which are skipped by topic update thread due to lock contention. Once this limit is reached, topic update thread will block until it acquires the table lock and adds it to the updates. Testing: 1. Added a test case which introduces a simulated delay in a few potentially long running statements. This causes the table to be locked for a long time. The topic update thread skips that table from updates and unrelated queries are unblocked since they receive the required metadata from updates. 2. Added a test where multiple threads run blocking statements in a loop to stress the table lock. It makes sure that topic update thread is not starved and eventually blocks on table lock by hitting the limit defined by catalog_max_lock_skipped_topic_updates. 2.Ran exhaustive tests with default configurations. Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java A tests/metadata/test_topic_update_frequency.py 10 files changed, 461 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16549/3 -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 2: > Build Failed > > https://jenkins.impala.io/job/gerrit-code-review-checks/7466/ : > Initial code review checks failed. See linked job for details on > the failure. The build failure is due to : 22:08:13 [INFO] BUILD FAILURE 22:08:13 [ERROR] Plugin net.sourceforge.czt.dev:cup-maven-plugin:1.6-cdh or one of its dependencies could not be resolved: Could not find artifact net.sourceforge.czt.dev:cup-maven-plugin:jar:1.6-cdh in cloudera.thirdparty.repo (https://repository.cloudera.com/content/repositories/third-party) -> [Help 1] 22:08:13 [ERROR] 22:08:13 [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. 22:08:13 [ERROR] Re-run Maven using the -X switch to enable full debug logging. 22:08:13 [ERROR] 22:08:13 [ERROR] For more information about the errors and possible solutions, please read the following articles: 22:08:13 [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginResolutionException which seems unrelated. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 19 Oct 2020 18:36:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 2: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/7466/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 15 Oct 2020 22:08:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/16549/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1354 PS2, Line 1354: "Unable to update the pending version of the table " + tbl.getFullName(), ex); line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/16549/2/tests/metadata/test_topic_update_frequency.py File tests/metadata/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16549/2/tests/metadata/test_topic_update_frequency.py@15 PS2, Line 15: import threading flake8: F401 'threading' imported but unused http://gerrit.cloudera.org:8080/#/c/16549/2/tests/metadata/test_topic_update_frequency.py@18 PS2, Line 18: from time import sleep flake8: F401 'time.sleep' imported but unused http://gerrit.cloudera.org:8080/#/c/16549/2/tests/metadata/test_topic_update_frequency.py@20 PS2, Line 20: from tests.beeswax.impala_beeswax import ImpalaBeeswaxException flake8: F401 'tests.beeswax.impala_beeswax.ImpalaBeeswaxException' imported but unused http://gerrit.cloudera.org:8080/#/c/16549/2/tests/metadata/test_topic_update_frequency.py@53 PS2, Line 53: l flake8: E501 line too long (145 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/2/tests/metadata/test_topic_update_frequency.py@112 PS2, Line 112: l flake8: E501 line too long (145 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/2/tests/metadata/test_topic_update_frequency.py@177 PS2, Line 177: f flake8: E126 continuation line over-indented for hanging indent -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 15 Oct 2020 21:55:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 1: (10 comments) http://gerrit.cloudera.org:8080/#/c/16549/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16549/1//COMMIT_MSG@9 PS1, Line 9: This change adds a mechanism for topic-update thread > question, that maybe will be answered as I read the patch - if an update is if an update on a table is missed, the topic-update thread picks up the table in the next scheduled run. However, it is possible that for a table which is updated very frequently, we might get unlucky and topic update thread keeps skipping it every topic-update run. To handle this scenario we keep track of last version of the table when topic update thread skipped it. In the next iteration is the table version is different that the last version we skipped, it means that by the time topci update thread came back, the table was locked again. We count such instances and tolerate upto 2 times by default. Once this happens 2 times, topic update thread blocks until it adds the table to prevent starvation. Note that this is different that naively counting the number of topic update runs when we skip the table. It is possible that a long running table lock spans multiple topic update runs and hence we should only count such instances as 1 to be effective. http://gerrit.cloudera.org:8080/#/c/16549/1//COMMIT_MSG@23 PS1, Line 23: This solution unblocks topic-updates for a while until a > Maybe this is the best we can do at the moment, but it does seem to have si I believe the solution which I described above is better than IMPALA-5058 since we distinguish between a single lock spanning multiple topic updates v/s multiple lock operations which block the topic update thread. Another improvement over IMPALA-5058 is that this limit is configurable so that we have some options to tune this based on what we see on production instances. http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1294 PS1, Line 1294: true > A freeform boolean is hard to read, best to add an inline comment , true /* Done http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1316 PS1, Line 1316: topicUpdateTblLockMaxWaitTimeMs > I could be misunderstanding the code, but isn't 7200 seconds too long to wa yeah, I kept topicUpdateTblLockMaxWaitTimeMs to 7200 seconds to keep the existing behavior unless it is overridden. This way for users where this is not a real problem, this patch essentially would be a no-op. But if we think there are more users who will run into this problem than the ones who will not, perhaps it is better to set it to lower. I think we can definitely set this value to say to 60 seconds since we have see for really bad cases the lock runs for several minutes. I have set a aggressive default value of 500ms to make sure we hit this code path in the exhaustive runs but I am open to suggestions about the default value. http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1325 PS1, Line 1325: return; > I hope the updateTblPendingVersion() rarely exceeds the number of attempts, I would be really surprised if we this needs more than 10 attempts since there are only 2 threads (one doing alter and other getting catalog delta for topic updates) at any point of time for a given table who will attempt to update the pending version. Let me think of a way to handle this better. http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1328 PS1, Line 1328: "Unable to update the pending version of the table " + tbl.getFullName(), ex); > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1352 PS1, Line 1352: if (attempt >= Table.NUMBER_OF_ATTEMPTS_TO_UPDATE_VERSION) { : throw new CatalogException("Exhausted max number of attempts " : + Table.NUMBER_OF_ATTEMPTS_TO_UPDATE_VERSION : + " to update the pending table version"); : } > Wouldn't we want to try something more graceful in this case. How would the According to me, it would be highly unlikely for this to happen. Other than bumping up the attempts dynamically in the code or putting this whole logic under a flag which was can turned off, I cannot think of a better way currently. If you
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. IMPALA-6671: Skip locked tables from topic updates This change adds a mechanism for topic-update thread to skip a table which is locked for more than a configurable interval from the topic updates. This is especially useful in scenarios where long running operations on a locked table (refresh, recover partitions, compute stats) block the topic update thread. This causes unrelated queries which are waiting on metadata via topic updates (catalog-v1 mode) to unnecessarily block. The ideal solution of this problem would be to make HdfsTable immutable so that there is no need for table lock. But that is large change and not easily portable to older releases of Impala. It would be taken up as a separate patch. This change introduces 2 new configurations for catalogd: 1. topic_update_tbl_max_wait_time_ms: This defines the maximum time in msecs the topic update thread waits on a locked table before skipping the table from that iteration of topic updates. The default value is 500. 2. catalog_max_lock_skipped_topic_updates: This defines the maximum number of distinct lock operations which are skipped by topic update thread due to lock contention. Once this limit is reached, topic update thread will block until it acquires the table lock and adds it to the updates. Testing: 1. Added a test case which introduces a simulated delay in a few potentially long running statements. This causes the table to be locked for a long time. The topic update thread skips that table from updates and unrelated queries are unblocked since they receive the required metadata from updates. 2. Added a test where multiple threads run blocking statements in a loop to stress the table lock. It makes sure that topic update thread is not starved and eventually blocks on table lock by hitting the limit defined by catalog_max_lock_skipped_topic_updates. 2. [WIP] Running exhaustive tests with default configurations. Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java A tests/metadata/test_topic_update_frequency.py 10 files changed, 453 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16549/2 -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar