[Impala-ASF-CR] IMPALA-5058: Improve the concurrency of DDL/DML operations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8752 ) Change subject: IMPALA-5058: Improve the concurrency of DDL/DML operations .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8752 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If12467a83acaeca6a127491d89291dedba91a35a Gerrit-Change-Number: 8752 Gerrit-PatchSet: 4 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Tue, 16 Jan 2018 23:01:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5058: Improve the concurrency of DDL/DML operations
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8752 ) Change subject: IMPALA-5058: Improve the concurrency of DDL/DML operations .. IMPALA-5058: Improve the concurrency of DDL/DML operations Problem: A long running table metadata operation (e.g. refresh) could prevent any other metadata operation from making progress if it coincided with the catalog topic creation operations. The problem was due to the conservative locking scheme used when catalog topics were created. In particular, in order to collect a consistent snapshot of metadata changes, the global catalog lock was held for the entire duration of that operation. Solution: To improve the concurrency of catalog operations the following changes are performed: * A range of catalog versions determines the catalog changes to be included in a catalog update. Any catalog changes that do not fall in the specified range are ignored (to be processed in subsequent catalog topic updates). * The catalog allows metadata operations to make progress while collecting catalog updates. * To prevent starvation of catalog updates (i.e. frequently updated catalog objects skipping catalog updates indefinitely), we keep track of the number of times a catalog object has skipped an update and if that number exceeds a threshold it is included in the next catalog topic update even if its version is not in the specified topic update version range. Hence, the same catalog object may be sent in two consecutive catalog topic updates. This commit also changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. In particular: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core and exhaustive tests. Change-Id: If12467a83acaeca6a127491d89291dedba91a35a Reviewed-on: http://gerrit.cloudera.org:8080/7731 Reviewed-by: Dimitris TsirogiannisTested-by: Impala Public Jenkins Reviewed-on: http://gerrit.cloudera.org:8080/8752 --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalog-util.cc M be/src/catalog/catalog-util.h M be/src/catalog/catalog.cc M be/src/catalog/catalog.h M be/src/exec/catalog-op-executor.cc M be/src/exec/catalog-op-executor.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/client-request-state.cc M be/src/service/frontend.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/CatalogService.thrift M common/thrift/Frontend.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogObject.java M fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java A fe/src/main/java/org/apache/impala/catalog/CatalogObjectImpl.java A fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionQueue.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/DataSource.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Function.java M fe/src/main/java/org/apache/impala/catalog/HdfsCachePool.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/Role.java M fe/src/main/java/org/apache/impala/catalog/RolePrivilege.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M
[Impala-ASF-CR] IMPALA-5058: Improve the concurrency of DDL/DML operations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8752 ) Change subject: IMPALA-5058: Improve the concurrency of DDL/DML operations .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1730/ -- To view, visit http://gerrit.cloudera.org:8080/8752 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If12467a83acaeca6a127491d89291dedba91a35a Gerrit-Change-Number: 8752 Gerrit-PatchSet: 4 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Tue, 16 Jan 2018 18:57:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5058: Improve the concurrency of DDL/DML operations
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8752 ) Change subject: IMPALA-5058: Improve the concurrency of DDL/DML operations .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8752 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If12467a83acaeca6a127491d89291dedba91a35a Gerrit-Change-Number: 8752 Gerrit-PatchSet: 4 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Tue, 16 Jan 2018 18:56:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5058: Improve the concurrency of DDL/DML operations
Hello Bharath Vissapragada, Tianyi Wang, Alex Behm, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8752 to look at the new patch set (#3). Change subject: IMPALA-5058: Improve the concurrency of DDL/DML operations .. IMPALA-5058: Improve the concurrency of DDL/DML operations Problem: A long running table metadata operation (e.g. refresh) could prevent any other metadata operation from making progress if it coincided with the catalog topic creation operations. The problem was due to the conservative locking scheme used when catalog topics were created. In particular, in order to collect a consistent snapshot of metadata changes, the global catalog lock was held for the entire duration of that operation. Solution: To improve the concurrency of catalog operations the following changes are performed: * A range of catalog versions determines the catalog changes to be included in a catalog update. Any catalog changes that do not fall in the specified range are ignored (to be processed in subsequent catalog topic updates). * The catalog allows metadata operations to make progress while collecting catalog updates. * To prevent starvation of catalog updates (i.e. frequently updated catalog objects skipping catalog updates indefinitely), we keep track of the number of times a catalog object has skipped an update and if that number exceeds a threshold it is included in the next catalog topic update even if its version is not in the specified topic update version range. Hence, the same catalog object may be sent in two consecutive catalog topic updates. This commit also changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. In particular: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core and exhaustive tests. Change-Id: If12467a83acaeca6a127491d89291dedba91a35a Reviewed-on: http://gerrit.cloudera.org:8080/7731 Reviewed-by: Dimitris TsirogiannisTested-by: Impala Public Jenkins --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalog-util.cc M be/src/catalog/catalog-util.h M be/src/catalog/catalog.cc M be/src/catalog/catalog.h M be/src/exec/catalog-op-executor.cc M be/src/exec/catalog-op-executor.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/client-request-state.cc M be/src/service/frontend.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/CatalogService.thrift M common/thrift/Frontend.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogObject.java M fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java A fe/src/main/java/org/apache/impala/catalog/CatalogObjectImpl.java A fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionQueue.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/DataSource.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Function.java M fe/src/main/java/org/apache/impala/catalog/HdfsCachePool.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/Role.java M fe/src/main/java/org/apache/impala/catalog/RolePrivilege.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M
[Impala-ASF-CR] IMPALA-5058: Improve the concurrency of DDL/DML operations
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8752 ) Change subject: IMPALA-5058: Improve the concurrency of DDL/DML operations .. Patch Set 2: Code-Review+2 (14 comments) Thanks for the reviews. Waiting on the results from the next concurrency testing from Mostafa and if everything looks good, I'll merge the change. http://gerrit.cloudera.org:8080/#/c/8752/2/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/8752/2/be/src/catalog/catalog-server.cc@331 PS2, Line 331: VLOG(1) << "Publishing " << (topic_deletions ? "deletion " : "update ") > This could probably be verbose if we print for every catalog_object? I agr Yes that was intentional. The rational is the following. We already log the catalog objects sent in the catalog so it makes sense to me to log the received items at the impalads. I believe it can help debug some nasty issues (missing updates, etc). For now I suggest we leave it as is. http://gerrit.cloudera.org:8080/#/c/8752/2/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8752/2/be/src/service/impala-server.h@339 PS2, Line 339: // Wait until the minimum catalog object version in the local cache exceeds > /// style Done http://gerrit.cloudera.org:8080/#/c/8752/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8752/2/be/src/service/impala-server.cc@1474 PS2, Line 1474: VLOG_QUERY << "Done waiting for catalog version: " << catalog_update_version; > nit: We should probably not log this if we detect a catalog_service_id chan Good point. Done http://gerrit.cloudera.org:8080/#/c/8752/2/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/8752/2/common/thrift/CatalogService.thrift@57 PS2, Line 57: / True if this is a result of an INVALIDATE METADATA operation. : 4: required bool is_invalidate > Just curious, wouldn't we know this information from the ClientRequestState Unfortunately, that's not always the case so for these cases, the simplest thing to do was to add that information as part of the catalog response. http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java File fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java: http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java@89 PS2, Line 89:CatalogObjectVersionQueue.INSTANCE.removeVersion( : existingRole.getCatalogVersion()); > Wouldn't removeRole() -> roleCache_.remove(roleName) already do this? You're right. It's not needed here. Removed. http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionQueue.java File fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionQueue.java: http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionQueue.java@30 PS2, Line 30: * versions. Not thread-safe. > nit: I think it helps to define the use case of this class here. (like desc Done http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@93 PS2, Line 93: * Periodically, the CatalogServiceCatalog collects a delta of catalog updates (based on a > Thanks! This is really clear to me now. Done http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@95 PS2, Line 95: * Each catalog topic update is defined by a range of catalog versions [from, to) and the > (from, to] Done http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@96 PS2, Line 96: * CatalogServiceCatalog guarantees that very catalog object that has a version in the > typo: every Done http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@585 PS2, Line 585: = > nit: maybe make >= ? just to be on the safer side. Done http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@628 PS2, Line 628: if (LOG.isDebugEnabled()) { : LOG.debug(String.format("Error calling toThrift() on table %s: %s", : tbl.getFullName(), e.getMessage()), e); : } : return; > I think that is a valid LOG.error(), given we are missing an update of a ve Done
[Impala-ASF-CR] IMPALA-5058: Improve the concurrency of DDL/DML operations
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8752 ) Change subject: IMPALA-5058: Improve the concurrency of DDL/DML operations .. Patch Set 2: Code-Review+1 (10 comments) I've spent some time on this patch. The logic and code structure LGTM. I agree with Alex that we should get this committed and then keep making improvements. Given the number of edge cases around SYNC_DDL / invalidate/ statestore broadcast failures we should probably run a stress test too with some injected failures in those paths. http://gerrit.cloudera.org:8080/#/c/8752/2/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/8752/2/be/src/catalog/catalog-server.cc@331 PS2, Line 331: VLOG(1) << "Publishing " << (topic_deletions ? "deletion " : "update ") This could probably be verbose if we print for every catalog_object? I agree it helps to debug in builds, but just wanted to confirm if that was intentional. (Since we have dynamic log levels, we can move it to VLOG(3) as well if you think that is ok). http://gerrit.cloudera.org:8080/#/c/8752/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8752/2/be/src/service/impala-server.cc@1474 PS2, Line 1474: VLOG_QUERY << "Done waiting for catalog version: " << catalog_update_version; nit: We should probably not log this if we detect a catalog_service_id change and indicate in the logs that it had been restarted or something? (same in other cases too) http://gerrit.cloudera.org:8080/#/c/8752/2/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/8752/2/common/thrift/CatalogService.thrift@57 PS2, Line 57: / True if this is a result of an INVALIDATE METADATA operation. : 4: required bool is_invalidate Just curious, wouldn't we know this information from the ClientRequestState calling ProcessCatalogUpdateResult()? Wondering why the Catalog server has to send this back to the coordinator. http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java File fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java: http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java@89 PS2, Line 89:CatalogObjectVersionQueue.INSTANCE.removeVersion( : existingRole.getCatalogVersion()); Wouldn't removeRole() -> roleCache_.remove(roleName) already do this? Also curious, why this is being done for a very specific case? http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionQueue.java File fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionQueue.java: http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionQueue.java@30 PS2, Line 30: * versions. Not thread-safe. nit: I think it helps to define the use case of this class here. (like described in CatalogDeltaLog) http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@585 PS2, Line 585: = nit: maybe make >= ? just to be on the safer side. http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@628 PS2, Line 628: if (LOG.isDebugEnabled()) { : LOG.debug(String.format("Error calling toThrift() on table %s: %s", : tbl.getFullName(), e.getMessage()), e); : } : return; I think that is a valid LOG.error(), given we are missing an update of a version. http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1304 PS2, Line 1304: versionLock_.writeLock().lock(); qq: I see incrementAndGetCatalogVersion() already takes versionLock_. Whats the reason behind wrapping these calls (at multiple places) with this lock again? Is that maintain some additional consistency across the duration of the calls? http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1868 PS2, Line 1868: LOG.info("Operation using SYNC_DDL is waiting for catalog topic version: " + Wondering if we can make it easy someway to map this log to the corresponding DDL (supportability) http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java:
[Impala-ASF-CR] IMPALA-5058: Improve the concurrency of DDL/DML operations
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8752 ) Change subject: IMPALA-5058: Improve the concurrency of DDL/DML operations .. Patch Set 2: Code-Review+1 (5 comments) I'm pretty happy with this change. I'm sure we can keep making improvements but I think it would be good to get this change committed to let it bake. That should not stop us from considering additional improvements to the structure/comments. I encourage reviewers to keep posting improvement ideas. http://gerrit.cloudera.org:8080/#/c/8752/2/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8752/2/be/src/service/impala-server.h@339 PS2, Line 339: // Wait until the minimum catalog object version in the local cache exceeds /// style in the local cache is greater than or equal to 'min_catalog_update_version' http://gerrit.cloudera.org:8080/#/c/8752/1/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: http://gerrit.cloudera.org:8080/#/c/8752/1/common/thrift/StatestoreService.thrift@88 PS1, Line 88: // non-delta TTopicDelta's since the latest version of every still-present topic item > No, an optimization was added at some point to avoid sending deletions for So then should the second sentence start with "When true..."? http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@93 PS2, Line 93: * Periodically, the CatalogServiceCatalog collects a delta of catalog updates (based on a Thanks! This is really clear to me now. http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@95 PS2, Line 95: * Each catalog topic update is defined by a range of catalog versions [from, to) and the (from, to] http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@96 PS2, Line 96: * CatalogServiceCatalog guarantees that very catalog object that has a version in the typo: every -- To view, visit http://gerrit.cloudera.org:8080/8752 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If12467a83acaeca6a127491d89291dedba91a35a Gerrit-Change-Number: 8752 Gerrit-PatchSet: 2 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Fri, 15 Dec 2017 23:56:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5058: Improve the concurrency of DDL/DML operations
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8752 ) Change subject: IMPALA-5058: Improve the concurrency of DDL/DML operations .. Patch Set 1: (27 comments) http://gerrit.cloudera.org:8080/#/c/8752/1/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8752/1/be/src/service/impala-server.h@328 PS1, Line 328: /// waits all other catalog topic subscribers to process this update by checking the > waits for Done http://gerrit.cloudera.org:8080/#/c/8752/1/be/src/service/impala-server.h@339 PS1, Line 339: void WaitForMinCatalogUpdate(const int64_t min_catalog_update_version, > Add comment Done http://gerrit.cloudera.org:8080/#/c/8752/1/be/src/service/impala-server.h@344 PS1, Line 344: void WaitForCatalogUpdateTopicPropagation(const TUniqueId& catalog_service_id); > Let's pass in the version to keep these Wait() APIs consistent. The version comes from the catalog_update_info_ which should be protected under the catalog_version_lock_. If we pass it as a param we'll have to move the lock outside that function which is not ideal. http://gerrit.cloudera.org:8080/#/c/8752/1/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/8752/1/common/thrift/CatalogInternalService.thrift@39 PS1, Line 39: // List of updated (new and modified) catalog objects for which the catalog verion is > catalog objects whose version is larger than ... Done http://gerrit.cloudera.org:8080/#/c/8752/1/common/thrift/CatalogInternalService.thrift@43 PS1, Line 43: // List of deleted catalog objects for which the catalog version is larger than > catalog objects whose version is larger than ... Done http://gerrit.cloudera.org:8080/#/c/8752/1/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: http://gerrit.cloudera.org:8080/#/c/8752/1/common/thrift/StatestoreService.thrift@83 PS1, Line 83: // subscribers need additional information in order to process the deleted topics that > This is needed when subscribers require additional information not captured Done http://gerrit.cloudera.org:8080/#/c/8752/1/common/thrift/StatestoreService.thrift@88 PS1, Line 88: // non-delta TTopicDelta's (since the latest version of every still-present topic will > * Remove parenthesis since the explanation is important No, an optimization was added at some point to avoid sending deletions for the case non-delta updates. It's "still-present topic item". http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java File fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java: http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java@91 PS1, Line 91: for (RolePrivilege priv: existingRole.getPrivileges()) { > might simplify some places to have bulk versions of the add/remove calls li Done http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@544 PS1, Line 544: throws IllegalStateException { > remove throws since it's a RuntimeException Done http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@577 PS1, Line 577:* TODO: Use global object IDs everywhere instead of tracking catalog objects by > Seems like a pretty big TODO unlikely to get addressed. I suggest removing Done http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@580 PS1, Line 580: public static boolean objectNamesMatch(TCatalogObject first, TCatalogObject second) { > keyEquals() or similar since we are actually checking for key equality and Done http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java: http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java@37 PS1, Line 37: * direct ("fast") update that applies the result of a catalog operation to the cache > remove ("fast") Done http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java@53 PS1, Line 53: * The catalogd uses this log to identify deleted catalog objects. Deleted > ... to identify objects that have been deleted since the last catalog topic Done http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java@55 PS1, Line 55: * them (e.g. dropTable()). While constructing a catalog update topic, we use the log to > Sentence in the
[Impala-ASF-CR] IMPALA-5058: Improve the concurrency of DDL/DML operations
Hello Tianyi Wang, Alex Behm, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8752 to look at the new patch set (#2). Change subject: IMPALA-5058: Improve the concurrency of DDL/DML operations .. IMPALA-5058: Improve the concurrency of DDL/DML operations Problem: A long running table metadata operation (e.g. refresh) could prevent any other metadata operation from making progress if it coincided with the catalog topic creation operations. The problem was due to the conservative locking scheme used when catalog topics were created. In particular, in order to collect a consistent snapshot of metadata changes, the global catalog lock was held for the entire duration of that operation. Solution: To improve the concurrency of catalog operations the following changes are performed: * A range of catalog versions determines the catalog changes to be included in a catalog update. Any catalog changes that do not fall in the specified range are ignored (to be processed in subsequent catalog topic updates). * The catalog allows metadata operations to make progress while collecting catalog updates. * To prevent starvation of catalog updates (i.e. frequently updated catalog objects skipping catalog updates indefinitely), we keep track of the number of times a catalog object has skipped an update and if that number exceeds a threshold it is included in the next catalog topic update even if its version is not in the specified topic update version range. Hence, the same catalog object may be sent in two consecutive catalog topic updates. This commit also changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. In particular: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core and exhaustive tests. Change-Id: If12467a83acaeca6a127491d89291dedba91a35a Reviewed-on: http://gerrit.cloudera.org:8080/7731 Reviewed-by: Dimitris TsirogiannisTested-by: Impala Public Jenkins --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalog-util.cc M be/src/catalog/catalog-util.h M be/src/catalog/catalog.cc M be/src/catalog/catalog.h M be/src/exec/catalog-op-executor.cc M be/src/exec/catalog-op-executor.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/client-request-state.cc M be/src/service/frontend.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/CatalogService.thrift M common/thrift/Frontend.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogObject.java M fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java A fe/src/main/java/org/apache/impala/catalog/CatalogObjectImpl.java A fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionQueue.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/DataSource.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Function.java M fe/src/main/java/org/apache/impala/catalog/HdfsCachePool.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/Role.java M fe/src/main/java/org/apache/impala/catalog/RolePrivilege.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M
[Impala-ASF-CR] IMPALA-5058: Improve the concurrency of DDL/DML operations
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8752 ) Change subject: IMPALA-5058: Improve the concurrency of DDL/DML operations .. Patch Set 1: (2 comments) metadata/test_metadata_query_statements.py::TestMetadataQueryStatements::test_show_data_sources doesn't pass on my machine. See the inline comments: http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java File fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java: http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java@75 PS1, Line 75: CatalogObjectVersionQueue.INSTANCE.addCatalogObjectVersion( catalogObject.getCatalogVersion() is used here http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1467 PS1, Line 1467: dataSource.setCatalogVersion(incrementAndGetCatalogVersion()); Here add() is called before setCatalogVersion(), so the version could be uninitialized in add(). Is this correct behavior? -- To view, visit http://gerrit.cloudera.org:8080/8752 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If12467a83acaeca6a127491d89291dedba91a35a Gerrit-Change-Number: 8752 Gerrit-PatchSet: 1 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Fri, 08 Dec 2017 01:48:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5058: Improve the concurrency of DDL/DML operations
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8752 ) Change subject: IMPALA-5058: Improve the concurrency of DDL/DML operations .. Patch Set 1: (25 comments) Here's a first wave of comments. Still thinking about some places in more detail. http://gerrit.cloudera.org:8080/#/c/8752/1/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8752/1/be/src/service/impala-server.h@328 PS1, Line 328: /// waits all other catalog topic subscribers to process this update by checking the waits for http://gerrit.cloudera.org:8080/#/c/8752/1/be/src/service/impala-server.h@339 PS1, Line 339: void WaitForMinCatalogUpdate(const int64_t min_catalog_update_version, Add comment http://gerrit.cloudera.org:8080/#/c/8752/1/be/src/service/impala-server.h@344 PS1, Line 344: void WaitForCatalogUpdateTopicPropagation(const TUniqueId& catalog_service_id); Let's pass in the version to keep these Wait() APIs consistent. http://gerrit.cloudera.org:8080/#/c/8752/1/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/8752/1/common/thrift/CatalogInternalService.thrift@39 PS1, Line 39: // List of updated (new and modified) catalog objects for which the catalog verion is catalog objects whose version is larger than ... http://gerrit.cloudera.org:8080/#/c/8752/1/common/thrift/CatalogInternalService.thrift@43 PS1, Line 43: // List of deleted catalog objects for which the catalog version is larger than catalog objects whose version is larger than ... http://gerrit.cloudera.org:8080/#/c/8752/1/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: http://gerrit.cloudera.org:8080/#/c/8752/1/common/thrift/StatestoreService.thrift@83 PS1, Line 83: // subscribers need additional information in order to process the deleted topics that This is needed when subscribers require additional information not captured in the item key to process the deleted item (e.g., catalog version of deleted catalog object). http://gerrit.cloudera.org:8080/#/c/8752/1/common/thrift/StatestoreService.thrift@88 PS1, Line 88: // non-delta TTopicDelta's (since the latest version of every still-present topic will * Remove parenthesis since the explanation is important * Do you mean "still-present item" instead of "still-present topic"? The sentence is somewhat confusing to me, maybe I'm misunderstanding. A non-delta update must include all items so it should also include this one, right? http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java File fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java: http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java@91 PS1, Line 91: for (RolePrivilege priv: existingRole.getPrivileges()) { might simplify some places to have bulk versions of the add/remove calls like addAll and removeAll that take a collection of CatalogObjects http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@544 PS1, Line 544: throws IllegalStateException { remove throws since it's a RuntimeException http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@577 PS1, Line 577:* TODO: Use global object IDs everywhere instead of tracking catalog objects by Seems like a pretty big TODO unlikely to get addressed. I suggest removing it. http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@580 PS1, Line 580: public static boolean objectNamesMatch(TCatalogObject first, TCatalogObject second) { keyEquals() or similar since we are actually checking for key equality and not "object name" equality http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java: http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java@37 PS1, Line 37: * direct ("fast") update that applies the result of a catalog operation to the cache remove ("fast") http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java@53 PS1, Line 53: * The catalogd uses this log to identify deleted catalog objects. Deleted ... to identify objects that have been deleted since the last catalog topic update. http://gerrit.cloudera.org:8080/#/c/8752/1/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java@55 PS1, Line 55: * them (e.g. dropTable()).
[Impala-ASF-CR] IMPALA-5058: Improve the concurrency of DDL/DML operations
Hello Impala Public Jenkins, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/8752 to review the following change. Change subject: IMPALA-5058: Improve the concurrency of DDL/DML operations .. IMPALA-5058: Improve the concurrency of DDL/DML operations Problem: A long running table metadata operation (e.g. refresh) could prevent any other metadata operation from making progress if it coincided with the catalog topic creation operations. The problem was due to the conservative locking scheme used when catalog topics were created. In particular, in order to collect a consistent snapshot of metadata changes, the global catalog lock was held for the entire duration of that operation. Solution: To improve the concurrency of catalog operations the following changes are performed: * A range of catalog versions determines the catalog changes to be included in a catalog update. Any catalog changes that do not fall in the specified range are ignored (to be processed in subsequent catalog topic updates). * The catalog allows metadata operations to make progress while collecting catalog updates. * To prevent starvation of catalog updates (i.e. frequently updated catalog objects skipping catalog updates indefinitely), we keep track of the number of times a catalog object has skipped an update and if that number exceeds a threshold it is included in the next catalog topic update even if its version is not in the specified topic update version range. Hence, the same catalog object may be sent in two consecutive catalog topic updates. This commit also changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. In particular: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core and exhaustive tests. Change-Id: If12467a83acaeca6a127491d89291dedba91a35a Reviewed-on: http://gerrit.cloudera.org:8080/7731 Reviewed-by: Dimitris TsirogiannisTested-by: Impala Public Jenkins --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalog-util.cc M be/src/catalog/catalog-util.h M be/src/catalog/catalog.cc M be/src/catalog/catalog.h M be/src/exec/catalog-op-executor.cc M be/src/exec/catalog-op-executor.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/client-request-state.cc M be/src/service/frontend.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/CatalogService.thrift M common/thrift/Frontend.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogObject.java M fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java A fe/src/main/java/org/apache/impala/catalog/CatalogObjectImpl.java A fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionQueue.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/DataSource.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Function.java M fe/src/main/java/org/apache/impala/catalog/HdfsCachePool.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/Role.java M fe/src/main/java/org/apache/impala/catalog/RolePrivilege.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M