[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. IMPALA-7972 Detect self-events to avoid unnecessary invalidates This patch adds support to detect self-generated events from catalog. This is used to avoid unnecessary invalidates to the tables from such self-events. Currently, alter_table, alter_partition, add_partition and drop_partition event types can invalidate the table metadata. Originally, we planned to have a global version number support from metastore (see HIVE-21115). But since that is still not complete, we rely on a combination of other identifiers to determine if a event is self-generated or not. These self-event identifiers consists of values from the table/partition parameters. A catalog service uuid and the catalog version number. The uuid is generated for each catalogservice when it comes up and it adds it to the table/partition parameters with the key "impala.CatalogServiceId". The catalog version number is added with the key "impala.CatalogVersion". When catalog executes a DDL operation it appends the current catalog version to the list of version numbers for the in-flight events for the table. Events processor clears this version when the corresponding version number identified by serviceId is received in the event. This is needed since it is possible that a external non-Impala system which generates the event presents the same serviceId and version number later on. The algorithm to detect a self-event is as below. 1. Add the service id and expected catalog version to table/partition parameters when executing the DDL operation. When the HMS operation is successful, add the version number to the list of version for in-flight events at table level. 2. When the event is received, the first time you see the combination of serviceId and version number, event processor clears the version number from table's list and determines the event as self-generated (and hence ignored) 3. If the event data presents a unknown serviceId or if the version number is not present in the list of in-flight versions, event is not a self-event and needs to be processed. In order to limit the total memory footprint, only 10 version numbers are stored at the table. Since the event processor is expected to poll every few seconds this should be a reasonable bound which satisfies most use-cases. Otherwise, event processor may wrongly process a self-event to invalidate the table. In such a case, its a performance penalty not a correctness issue. In case of drop_partition event, the partition object is not available in the event. Hence we cannot determine if its a self-event. In such cases currently we always issue a invalidate command. This is a known limitation and will be improved in IMPALA-7973 Patch adds new tests to trigger alter table/partition DDLs from impala and makes sure that the table is not invalidated. Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Reviewed-on: http://gerrit.cloudera.org:8080/12591 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 6 files changed, 1,103 insertions(+), 148 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 19 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 18: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 18 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 12 Mar 2019 02:11:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 17: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2408/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 17 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 11 Mar 2019 22:34:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2407/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 16 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 11 Mar 2019 22:24:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 18: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 18 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 11 Mar 2019 21:53:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 18: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3902/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 18 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 11 Mar 2019 21:53:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 17: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 17 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 11 Mar 2019 21:53:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 17: (1 comment) http://gerrit.cloudera.org:8080/#/c/12591/17/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/12591/17/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@420 PS17, Line 420: DISABLE_EVENT_HMS_SYNC_KEY, tblProperty.toString(), getFullyQualifiedTblName()); line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 17 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 11 Mar 2019 21:50:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Vihang Karajgaonkar has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. IMPALA-7972 Detect self-events to avoid unnecessary invalidates This patch adds support to detect self-generated events from catalog. This is used to avoid unnecessary invalidates to the tables from such self-events. Currently, alter_table, alter_partition, add_partition and drop_partition event types can invalidate the table metadata. Originally, we planned to have a global version number support from metastore (see HIVE-21115). But since that is still not complete, we rely on a combination of other identifiers to determine if a event is self-generated or not. These self-event identifiers consists of values from the table/partition parameters. A catalog service uuid and the catalog version number. The uuid is generated for each catalogservice when it comes up and it adds it to the table/partition parameters with the key "impala.CatalogServiceId". The catalog version number is added with the key "impala.CatalogVersion". When catalog executes a DDL operation it appends the current catalog version to the list of version numbers for the in-flight events for the table. Events processor clears this version when the corresponding version number identified by serviceId is received in the event. This is needed since it is possible that a external non-Impala system which generates the event presents the same serviceId and version number later on. The algorithm to detect a self-event is as below. 1. Add the service id and expected catalog version to table/partition parameters when executing the DDL operation. When the HMS operation is successful, add the version number to the list of version for in-flight events at table level. 2. When the event is received, the first time you see the combination of serviceId and version number, event processor clears the version number from table's list and determines the event as self-generated (and hence ignored) 3. If the event data presents a unknown serviceId or if the version number is not present in the list of in-flight versions, event is not a self-event and needs to be processed. In order to limit the total memory footprint, only 10 version numbers are stored at the table. Since the event processor is expected to poll every few seconds this should be a reasonable bound which satisfies most use-cases. Otherwise, event processor may wrongly process a self-event to invalidate the table. In such a case, its a performance penalty not a correctness issue. In case of drop_partition event, the partition object is not available in the event. Hence we cannot determine if its a self-event. In such cases currently we always issue a invalidate command. This is a known limitation and will be improved in IMPALA-7973 Patch adds new tests to trigger alter table/partition DDLs from impala and makes sure that the table is not invalidated. Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 6 files changed, 1,103 insertions(+), 148 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12591/17 -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 17 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 16: (1 comment) http://gerrit.cloudera.org:8080/#/c/12591/16/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/12591/16/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@420 PS16, Line 420: DISABLE_EVENT_HMS_SYNC_KEY, tblProperty.toString(), getFullyQualifiedTblName()); line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 16 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 11 Mar 2019 21:42:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Vihang Karajgaonkar has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. IMPALA-7972 Detect self-events to avoid unnecessary invalidates This patch adds support to detect self-generated events from catalog. This is used to avoid unnecessary invalidates to the tables from such self-events. Currently, alter_table, alter_partition, add_partition and drop_partition event types can invalidate the table metadata. Originally, we planned to have a global version number support from metastore (see HIVE-21115). But since that is still not complete, we rely on a combination of other identifiers to determine if a event is self-generated or not. These self-event identifiers consists of values from the table/partition parameters. A catalog service uuid and the catalog version number. The uuid is generated for each catalogservice when it comes up and it adds it to the table/partition parameters with the key "impala.CatalogServiceId". The catalog version number is added with the key "impala.CatalogVersion". When catalog executes a DDL operation it appends the current catalog version to the list of version numbers for the in-flight events for the table. Events processor clears this version when the corresponding version number identified by serviceId is received in the event. This is needed since it is possible that a external non-Impala system which generates the event presents the same serviceId and version number later on. The algorithm to detect a self-event is as below. 1. Add the service id and expected catalog version to table/partition parameters when executing the DDL operation. When the HMS operation is successful, add the version number to the list of version for in-flight events at table level. 2. When the event is received, the first time you see the combination of serviceId and version number, event processor clears the version number from table's list and determines the event as self-generated (and hence ignored) 3. If the event data presents a unknown serviceId or if the version number is not present in the list of in-flight versions, event is not a self-event and needs to be processed. In order to limit the total memory footprint, only 10 version numbers are stored at the table. Since the event processor is expected to poll every few seconds this should be a reasonable bound which satisfies most use-cases. Otherwise, event processor may wrongly process a self-event to invalidate the table. In such a case, its a performance penalty not a correctness issue. In case of drop_partition event, the partition object is not available in the event. Hence we cannot determine if its a self-event. In such cases currently we always issue a invalidate command. This is a known limitation and will be improved in IMPALA-7973 Patch adds new tests to trigger alter table/partition DDLs from impala and makes sure that the table is not invalidated. Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 6 files changed, 1,103 insertions(+), 148 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12591/16 -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 16 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2406/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 15 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 11 Mar 2019 20:23:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Vihang Karajgaonkar has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. IMPALA-7972 Detect self-events to avoid unnecessary invalidates This patch adds support to detect self-generated events from catalog. This is used to avoid unnecessary invalidates to the tables from such self-events. Currently, alter_table, alter_partition, add_partition and drop_partition event types can invalidate the table metadata. Originally, we planned to have a global version number support from metastore (see HIVE-21115). But since that is still not complete, we rely on a combination of other identifiers to determine if a event is self-generated or not. These self-event identifiers consists of values from the table/partition parameters. A catalog service uuid and the catalog version number. The uuid is generated for each catalogservice when it comes up and it adds it to the table/partition parameters with the key "impala.CatalogServiceId". The catalog version number is added with the key "impala.CatalogVersion". When catalog executes a DDL operation it appends the current catalog version to the list of version numbers for the in-flight events for the table. Events processor clears this version when the corresponding version number identified by serviceId is received in the event. This is needed since it is possible that a external non-Impala system which generates the event presents the same serviceId and version number later on. The algorithm to detect a self-event is as below. 1. Add the service id and expected catalog version to table/partition parameters when executing the DDL operation. When the HMS operation is successful, add the version number to the list of version for in-flight events at table level. 2. When the event is received, the first time you see the combination of serviceId and version number, event processor clears the version number from table's list and determines the event as self-generated (and hence ignored) 3. If the event data presents a unknown serviceId or if the version number is not present in the list of in-flight versions, event is not a self-event and needs to be processed. In order to limit the total memory footprint, only 10 version numbers are stored at the table. Since the event processor is expected to poll every few seconds this should be a reasonable bound which satisfies most use-cases. Otherwise, event processor may wrongly process a self-event to invalidate the table. In such a case, its a performance penalty not a correctness issue. In case of drop_partition event, the partition object is not available in the event. Hence we cannot determine if its a self-event. In such cases currently we always issue a invalidate command. This is a known limitation and will be improved in IMPALA-7973 Patch adds new tests to trigger alter table/partition DDLs from impala and makes sure that the table is not invalidated. Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 6 files changed, 1,099 insertions(+), 144 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12591/15 -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 15 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 12: (9 comments) I think the new refactor makes sense. Just a few more nits and I think we are good to go. http://gerrit.cloudera.org:8080/#/c/12591/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/12591/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@743 PS12, Line 743: return result; throw DbNotFound...? http://gerrit.cloudera.org:8080/#/c/12591/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@750 PS12, Line 750: result = tbl.getVersionsForInflightEvents(); : return result; merge? http://gerrit.cloudera.org:8080/#/c/12591/12/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/12591/12/fe/src/main/java/org/apache/impala/catalog/Table.java@655 PS12, Line 655: No-op if event processing is disabled doesn't look like it? http://gerrit.cloudera.org:8080/#/c/12591/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/12591/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@613 PS12, Line 613: pendingVersionNumbersFromCatalog_ = catalog_ One pending question. Since we are storing these version numbers in the table state, I think we are inherently relying on in-place table modifications (which is kinda true for most cases). However if someone issues an Invalidate, I think we replace the whole table object which essentially wipes of this state. I think the worst that could happen is probably some extra (and unnecessary) invalidates. Is my understanding right? Following the sequence of actions that * i think * could create this. t1. alter table. generates event e t2. invalidate table t3. load table. t4. event 'e' from HMS After t4, pendingVersionNumbersFromCatalog_ is an EMPTY_LIST and hence it kicks off another round of invalidation. Please correct me if I'm wrong. http://gerrit.cloudera.org:8080/#/c/12591/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/12591/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@521 PS12, Line 521: catalog_.getLock().writeLock().unlock(); this looks buggy, what is the corresponding lock for this? http://gerrit.cloudera.org:8080/#/c/12591/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@527 PS12, Line 527:catalog_.getLock().writeLock().unlock() Move this too? http://gerrit.cloudera.org:8080/#/c/12591/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@829 PS12, Line 829: catalog_.isExternalEventProcessingEnabled() Move this inside the addCatalogServiceidentifies()...? http://gerrit.cloudera.org:8080/#/c/12591/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3253 PS12, Line 3253: setLastDdlTime update http://gerrit.cloudera.org:8080/#/c/12591/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3259 PS12, Line 3259: updateLastDdlTimeLocally nit: overrideDdlTime or something like that? -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 12 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 07 Mar 2019 21:58:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2379/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 12 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 07 Mar 2019 02:17:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/12591/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/12591/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3179 PS12, Line 3179: "Table parameters must contain catalog version before adding it to partition parameters"); line too long (102 > 90) -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 12 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 07 Mar 2019 01:34:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Vihang Karajgaonkar has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. IMPALA-7972 Detect self-events to avoid unnecessary invalidates This patch adds support to detect self-generated events from catalog. This is used to avoid unnecessary invalidates to the tables from such self-events. Currently, alter_table, alter_partition, add_partition and drop_partition event types can invalidate the table metadata. Originally, we planned to have a global version number support from metastore (see HIVE-21115). But since that is still not complete, we rely on a combination of other identifiers to determine if a event is self-generated or not. These self-event identifiers consists of values from the table/partition parameters. A catalog service uuid and the catalog version number. The uuid is generated for each catalogservice when it comes up and it adds it to the table/partition parameters with the key "impala.CatalogServiceId". The catalog version number is added with the key "impala.CatalogVersion". When catalog executes a DDL operation it appends the current catalog version to the list of version numbers for the in-flight events for the table. Events processor clears this version when the corresponding version number identified by serviceId is received in the event. This is needed since it is possible that a external non-Impala system which generates the event presents the same serviceId and version number later on. The algorithm to detect a self-event is as below. 1. Add the service id and expected catalog version to table/partition parameters when executing the DDL operation. When the HMS operation is successful, add the version number to the list of version for in-flight events at table level. 2. When the event is received, the first time you see the combination of serviceId and version number, event processor clears the version number from table's list and determines the event as self-generated (and hence ignored) 3. If the event data presents a unknown serviceId or if the version number is not present in the list of in-flight versions, event is not a self-event and needs to be processed. In order to limit the total memory footprint, only 10 version numbers are stored at the table. Since the event processor is expected to poll every few seconds this should be a reasonable bound which satisfies most use-cases. Otherwise, event processor may wrongly process a self-event to invalidate the table. In such a case, its a performance penalty not a correctness issue. In case of drop_partition event, the partition object is not available in the event. Hence we cannot determine if its a self-event. In such cases currently we always issue a invalidate command. This is a known limitation and will be improved in IMPALA-7973 Patch adds new tests to trigger alter table/partition DDLs from impala and makes sure that the table is not invalidated. Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 6 files changed, 1,102 insertions(+), 145 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12591/12 -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 12 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Tim Armstrong has removed a vote on this change. Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Removed Code-Review-2 by Tim Armstrong -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 10 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 10: (3 comments) http://gerrit.cloudera.org:8080/#/c/12591/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/12591/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@500 PS10, Line 500: // Get a new catalog version to assign to the table being altered. : long newCatalogVersion = catalog_.incrementAndGetCatalogVersion(); Can you move this below the tryLock(tbl)? It takes the versionLock_ for you. Also move L523 (to unlock) closer to this, so that you don't hold the version lock for longer. (See the pattern elsewhere in this file). http://gerrit.cloudera.org:8080/#/c/12591/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1974 PS10, Line 1974: catalog_.getCatalogVersion()) This looks incorrect? The new table create below has a different version. http://gerrit.cloudera.org:8080/#/c/12591/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3312 PS10, Line 3312: newCatalogVersion Now that I think about this, I think newCatalogVersion variable doesn't make much sense here. Can we refactor the code better without passing around these newCatalogVersions? I think it is super confusing to those who read the code. -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 10 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 01 Mar 2019 21:57:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 9: Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3855/ -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 9 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 01 Mar 2019 21:38:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2321/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 10 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 01 Mar 2019 20:18:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 10: Code-Review-2 I'm pretty sure the failures in https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/4718/ *are* related to the patch, many look metadata related, e.g. https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/4718/testReport/junit/authorization.test_authorization/TestAuthorization/test_invalidate_metadata_sentry_unavailable/ Hypothetically if tests were so flaky that we did have 38 test failures on a typical run then we should stop merging large changes and start fixing the flaky tests. In either case we should not merge this until we understand what happened with the tests. -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 10 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 01 Mar 2019 20:11:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Vihang Karajgaonkar has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. IMPALA-7972 Detect self-events to avoid unnecessary invalidates This patch adds support to detect self-generated events from catalog. This is used to avoid unnecessary invalidates to the tables from such self-events. Currently, alter_table, alter_partition, add_partition and drop_partition event types can invalidate the table metadata. Originally, we planned to have a global version number support from metastore (see HIVE-21115). But since that is still not complete, we rely on a combination of other identifiers to determine if a event is self-generated or not. These self-event identifiers consists of values from the table/partition parameters. A catalog service uuid and the catalog version number. The uuid is generated for each catalogservice when it comes up and it adds it to the table/partition parameters with the key "impala.CatalogServiceId". The catalog version number is added with the key "impala.CatalogVersion". When catalog executes a DDL operation it appends the current catalog version to the list of version numbers for the in-flight events for the table. Events processor clears this version when the corresponding version number identified by serviceId is received in the event. This is needed since it is possible that a external non-Impala system which generates the event presents the same serviceId and version number later on. The algorithm to detect a self-event is as below. 1. Add the service id and expected catalog version to table/partition parameters when executing the DDL operation. When the HMS operation is successful, add the version number to the list of version for in-flight events at table level. 2. When the event is received, the first time you see the combination of serviceId and version number, event processor clears the version number from table's list and determines the event as self-generated (and hence ignored) 3. If the event data presents a unknown serviceId or if the version number is not present in the list of in-flight versions, event is not a self-event and needs to be processed. In order to limit the total memory footprint, only 10 version numbers are stored at the table. Since the event processor is expected to poll every few seconds this should be a reasonable bound which satisfies most use-cases. Otherwise, event processor may wrongly process a self-event to invalidate the table. In such a case, its a performance penalty not a correctness issue. In case of drop_partition event, the partition object is not available in the event. Hence we cannot determine if its a self-event. In such cases currently we always issue a invalidate command. This is a known limitation and will be improved in IMPALA-7973 Patch adds new tests to trigger alter table/partition DDLs from impala and makes sure that the table is not invalidated. Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 6 files changed, 1,191 insertions(+), 211 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12591/10 -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 10 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3855/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 9 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 01 Mar 2019 17:30:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 9: > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3854/ I looked at the failures and it seems like these test failures are unrelated to this patch. Hi Bharath, can you please help me triage this or resubmit the job if this is known flakiness? -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 9 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 01 Mar 2019 07:44:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 9: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3854/ -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 9 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 01 Mar 2019 04:59:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2310/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 8 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 01 Mar 2019 00:53:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3854/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 9 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 01 Mar 2019 00:46:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 8: Code-Review+2 Thanks for addressing the comments. -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 8 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 01 Mar 2019 00:46:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 9 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 01 Mar 2019 00:46:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Vihang Karajgaonkar has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. IMPALA-7972 Detect self-events to avoid unnecessary invalidates This patch adds support to detect self-generated events from catalog. This is used to avoid unnecessary invalidates to the tables from such self-events. Currently, alter_table, alter_partition, add_partition and drop_partition event types can invalidate the table metadata. Originally, we planned to have a global version number support from metastore (see HIVE-21115). But since that is still not complete, we rely on a combination of other identifiers to determine if a event is self-generated or not. These self-event identifiers consists of values from the table/partition parameters. A catalog service uuid and the catalog version number. The uuid is generated for each catalogservice when it comes up and it adds it to the table/partition parameters with the key "impala.CatalogServiceId". The catalog version number is added with the key "impala.CatalogVersion". When catalog executes a DDL operation it appends the current catalog version to the list of version numbers for the in-flight events for the table. Events processor clears this version when the corresponding version number identified by serviceId is received in the event. This is needed since it is possible that a external non-Impala system which generates the event presents the same serviceId and version number later on. The algorithm to detect a self-event is as below. 1. Add the service id and expected catalog version to table/partition parameters when executing the DDL operation. When the HMS operation is successful, add the version number to the list of version for in-flight events at table level. 2. When the event is received, the first time you see the combination of serviceId and version number, event processor clears the version number from table's list and determines the event as self-generated (and hence ignored) 3. If the event data presents a unknown serviceId or if the version number is not present in the list of in-flight versions, event is not a self-event and needs to be processed. In order to limit the total memory footprint, only 10 version numbers are stored at the table. Since the event processor is expected to poll every few seconds this should be a reasonable bound which satisfies most use-cases. Otherwise, event processor may wrongly process a self-event to invalidate the table. In such a case, its a performance penalty not a correctness issue. In case of drop_partition event, the partition object is not available in the event. Hence we cannot determine if its a self-event. In such cases currently we always issue a invalidate command. This is a known limitation and will be improved in IMPALA-7973 Patch adds new tests to trigger alter table/partition DDLs from impala and makes sure that the table is not invalidated. Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 6 files changed, 1,193 insertions(+), 211 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12591/8 -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 8 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 8: (4 comments) latest patch addresses Bharath's comments http://gerrit.cloudera.org:8080/#/c/12591/7/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/12591/7/fe/src/main/java/org/apache/impala/catalog/Table.java@629 PS7, Line 629: getFullName( > getFullName() Done http://gerrit.cloudera.org:8080/#/c/12591/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/12591/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@659 PS7, Line 659: debugLog("Table {} does not need to be " > Can we move this to invalidateCatalogTable() instead of incrementing at al makes sense. done. Also, did it for the self-event-counter http://gerrit.cloudera.org:8080/#/c/12591/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/12591/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2289 PS7, Line 2289: operation generated event can be skipped > as a self-event? Done http://gerrit.cloudera.org:8080/#/c/12591/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3279 PS7, Line 3279: to skip the event generated by :* this operation > same as above. Done -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 8 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 01 Mar 2019 00:31:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2306/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 6 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 28 Feb 2019 23:41:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2307/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 7 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 28 Feb 2019 23:41:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 7: Code-Review+2 (4 comments) Thanks for addressing the comments. A few minor comments and then I can submit it for GVO. http://gerrit.cloudera.org:8080/#/c/12591/7/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/12591/7/fe/src/main/java/org/apache/impala/catalog/Table.java@629 PS7, Line 629: getTableName getFullName() http://gerrit.cloudera.org:8080/#/c/12591/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/12591/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@659 PS7, Line 659: metrics_.getCounter(MetastoreEventsProcessor.NUMBER_OF_TABLE_INVALIDATES).inc(); Can we move this to invalidateCatalogTable() instead of incrementing at all the callers? http://gerrit.cloudera.org:8080/#/c/12591/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/12591/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2289 PS7, Line 2289: operation generated event can be skipped as a self-event? http://gerrit.cloudera.org:8080/#/c/12591/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3279 PS7, Line 3279: to skip the event generated by :* this operation same as above. -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 7 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 28 Feb 2019 23:34:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Vihang Karajgaonkar has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. IMPALA-7972 Detect self-events to avoid unnecessary invalidates This patch adds support to detect self-generated events from catalog. This is used to avoid unnecessary invalidates to the tables from such self-events. Currently, alter_table, alter_partition, add_partition and drop_partition event types can invalidate the table metadata. Originally, we planned to have a global version number support from metastore (see HIVE-21115). But since that is still not complete, we rely on a combination of other identifiers to determine if a event is self-generated or not. These self-event identifiers consists of values from the table/partition parameters. A catalog service uuid and the catalog version number. The uuid is generated for each catalogservice when it comes up and it adds it to the table/partition parameters with the key "impala.CatalogServiceId". The catalog version number is added with the key "impala.CatalogVersion". When catalog executes a DDL operation it appends the current catalog version to the list of version numbers for the in-flight events for the table. Events processor clears this version when the corresponding version number identified by serviceId is received in the event. This is needed since it is possible that a external non-Impala system which generates the event presents the same serviceId and version number later on. The algorithm to detect a self-event is as below. 1. Add the service id and expected catalog version to table/partition parameters when executing the DDL operation. When the HMS operation is successful, add the version number to the list of version for in-flight events at table level. 2. When the event is received, the first time you see the combination of serviceId and version number, event processor clears the version number from table's list and determines the event as self-generated (and hence ignored) 3. If the event data presents a unknown serviceId or if the version number is not present in the list of in-flight versions, event is not a self-event and needs to be processed. In order to limit the total memory footprint, only 10 version numbers are stored at the table. Since the event processor is expected to poll every few seconds this should be a reasonable bound which satisfies most use-cases. Otherwise, event processor may wrongly process a self-event to invalidate the table. In such a case, its a performance penalty not a correctness issue. In case of drop_partition event, the partition object is not available in the event. Hence we cannot determine if its a self-event. In such cases currently we always issue a invalidate command. This is a known limitation and will be improved in IMPALA-7973 Patch adds new tests to trigger alter table/partition DDLs from impala and makes sure that the table is not invalidated. Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 6 files changed, 1,190 insertions(+), 210 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12591/7 -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 7 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Vihang Karajgaonkar has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. IMPALA-7972 Detect self-events to avoid unnecessary invalidates This patch adds support to detect self-generated events from catalog. This is used to avoid unnecessary invalidates to the tables from such self-events. Currently, alter_table, alter_partition, add_partition and drop_partition event types can invalidate the table metadata. Originally, we planned to have a global version number support from metastore (see HIVE-21115). But since that is still not complete, we rely on a combination of other identifiers to determine if a event is self-generated or not. These self-event identifiers consists of values from the table/partition parameters. A catalog service uuid and the catalog version number. The uuid is generated for each catalogservice when it comes up and it adds it to the table/partition parameters with the key "impala.CatalogServiceId". The catalog version number is added with the key "impala.CatalogVersion". When catalog executes a DDL operation it appends the current catalog version to the list of version numbers for the in-flight events for the table. Events processor clears this version when the corresponding version number identified by serviceId is received in the event. This is needed since it is possible that a external non-Impala system which generates the event presents the same serviceId and version number later on. The algorithm to detect a self-event is as below. 1. Add the service id and expected catalog version to table/partition parameters when executing the DDL operation. When the HMS operation is successful, add the version number to the list of version for in-flight events at table level. 2. When the event is received, the first time you see the combination of serviceId and version number, event processor clears the version number from table's list and determines the event as self-generated (and hence ignored) 3. If the event data presents a unknown serviceId or if the version number is not present in the list of in-flight versions, event is not a self-event and needs to be processed. In order to limit the total memory footprint, only 10 version numbers are stored at the table. Since the event processor is expected to poll every few seconds this should be a reasonable bound which satisfies most use-cases. Otherwise, event processor may wrongly process a self-event to invalidate the table. In such a case, its a performance penalty not a correctness issue. In case of drop_partition event, the partition object is not available in the event. Hence we cannot determine if its a self-event. In such cases currently we always issue a invalidate command. This is a known limitation and will be improved in IMPALA-7973 Patch adds new tests to trigger alter table/partition DDLs from impala and makes sure that the table is not invalidated. Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 6 files changed, 1,194 insertions(+), 210 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12591/6 -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 6 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 6: (15 comments) Added suggested changes by Bharath. Also, added metrics for number of self-events and number of times event processor successfully invalidated a table. http://gerrit.cloudera.org:8080/#/c/12591/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12591/5//COMMIT_MSG@19 PS5, Line 19: and the catalog version number. The uuid is generated for each > nit: line overflow Done http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@189 PS5, Line 189: : // unique identifier of this catalog service > Why not use the catalogServiceId_ below? Use TUniqueIdUtil#printId() to con I didn't realize that we already have a service id. Removed this and using it instead. http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@741 PS5, Line 741: Preconditions.checkState(isExternalEventProcessingEnabl > Isn't this a preconditions check? Done http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@785 PS5, Line 785: > version Done http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@805 PS5, Line 805: String.format("Table %s not found", new Tabl > Looks like this can silently fail. How about logging something in that case Added a log.warn in addToVersionsForInflightEvents method. http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/Table.java@20 PS5, Line 20: import java.util.ArrayList; > unused? Done http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/Table.java@24 PS5, Line 24: import java.util.List; > unused? Done http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/Table.java@117 PS5, Line 117: // FIFO list of versions for all the in-flight metastore events in this table > doc Done http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/Table.java@120 PS5, Line 120: new LinkedList<>(); > ? Had changed the implementation and forgot to update the document later on. Fixed it now. http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@889 PS5, Line 889: protected long versionNumberFromEvent_ = -1; > Add a doc? I think this crucial to the logic here. Done http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@900 PS5, Line 900: */ > May be add a pointer to the MetastoreEvents class where you defined what a Done. Also added more detail regarding the case when subsequent event with the same serviceid and version can show up http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1147 PS5, Line 1147:*/ : private IgnoredEvent( : CatalogServiceCatalog catalog, Metrics metrics, NotificationEvent event) { : super(catalog, metrics, event); : } : : @Override : public void process() { : debugLog("Ignored"); : } : : @Override : protected boolean isEventProcessingDisabled() { : return false; : } : } : } : : > Should these be no-ops if the event processing is not enabled? These methods were moved to CatalogOpExecutor since we wanted to get the serviceId from CatalogServiceCatalog's serviceId now. Added a check to make them a no-op when event processing is disabled. http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@397 PS5,
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 5: (16 comments) Mostly general comments. I think the code in CatalogOpExecutor is in dire need of some refactoring (outside the scope of this patch). We should probably factor out all the common stuff into CatalogOpCtx and then pass it around instead of adding new parameters for every method. http://gerrit.cloudera.org:8080/#/c/12591/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12591/5//COMMIT_MSG@19 PS5, Line 19: and the catalog version number. The uuid is generated for each catalogservice when nit: line overflow http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@189 PS5, Line 189: // unique identifier of this catalog service : private static final String SERVICE_UUID = UUID.randomUUID().toString(); Why not use the catalogServiceId_ below? Use TUniqueIdUtil#printId() to convert it to a string. http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@741 PS5, Line 741: if (!isExternalEventProcessingEnabled()) return result; Isn't this a preconditions check? http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@785 PS5, Line 785: ersion version http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@805 PS5, Line 805: tbl.addToVersionsForInflightEvents(versionNumber); Looks like this can silently fail. How about logging something in that case? http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/Table.java@20 PS5, Line 20: import java.util.ArrayDeque; unused? http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/Table.java@24 PS5, Line 24: import java.util.Deque; unused? http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/Table.java@117 PS5, Line 117: private static final int MAX_NUMBER_OF_INFLIGHT_EVENTS = 10; doc http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/Table.java@120 PS5, Line 120: in seconds to initial ? http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@889 PS5, Line 889: protected List pendingVersionNumbersFromCatalog_ = Collections.EMPTY_LIST; Add a doc? I think this crucial to the logic here. http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@900 PS5, Line 900: * This method detects if this event is self-generated or not. In order to May be add a pointer to the MetastoreEvents class where you defined what a self-event is? http://gerrit.cloudera.org:8080/#/c/12591/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1147 PS5, Line 1147: /** :* Adds self-event identifiers in the table parameters :*/ : public static void addCatalogServiceIdentifiers( : org.apache.hadoop.hive.metastore.api.Table msTbl, long catalogVersionId) { : msTbl.putToParameters( : CATALOG_SERVICE_ID_PROP_KEY, CatalogServiceCatalog.getServiceUUID()); : msTbl.putToParameters(CATALOG_VERSION_PROP_KEY, String.valueOf(catalogVersionId)); : } : : /** :* Adds self-event identifiers in the partition parameters :*/ : public static void addCatalogServiceIdentifiers( : org.apache.hadoop.hive.metastore.api.Partition partition, long catalogVersionId) { : partition.putToParameters( : CATALOG_SERVICE_ID_PROP_KEY, CatalogServiceCatalog.getServiceUUID()); : partition.putToParameters(CATALOG_VERSION_PROP_KEY, String.valueOf(catalogVersionId)); : } Should these be no-ops if the event processing is not enabled?
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2290/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 5 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 28 Feb 2019 01:52:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Vihang Karajgaonkar has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. IMPALA-7972 Detect self-events to avoid unnecessary invalidates This patch adds support to detect self-generated events from catalog. This is used to avoid unnecessary invalidates to the tables from such self-events. Currently, alter_table, alter_partition, add_partition and drop_partition event types can invalidate the table metadata. Originally, we planned to have a global version number support from metastore (see HIVE-21115). But since that is still not complete, we rely on a combination of other identifiers to determine if a event is self-generated or not. These self-event identifiers consists of values from the table/partition parameters. A catalog service uuid and the catalog version number. The uuid is generated for each catalogservice when it comes up and it adds it to the table/partition parameters with the key "impala.CatalogServiceId". The catalog version number is added with the key "impala.CatalogVersion". When catalog executes a DDL operation it appends the current catalog version to the list of version numbers for the in-flight events for the table. Events processor clears this version when the corresponding version number identified by serviceId is received in the event. This is needed since it is possible that a external non-Impala system which generates the event presents the same serviceId and version number later on. The algorithm to detect a self-event is as below. 1. Add the service id and expected catalog version to table/partition parameters when executing the DDL operation. When the HMS operation is successful, add the version number to the list of version for in-flight events at table level. 2. When the event is received, the first time you see the combination of serviceId and version number, event processor clears the version number from table's list and determines the event as self-generated (and hence ignored) 3. If the event data presents a unknown serviceId or if the version number is not present in the list of in-flight versions, event is not a self-event and needs to be processed. In order to limit the total memory footprint, only 10 version numbers are stored at the table. Since the event processor is expected to poll every few seconds this should be a reasonable bound which satisfies most use-cases. Otherwise, event processor may wrongly process a self-event to invalidate the table. In such a case, its a performance penalty not a correctness issue. In case of drop_partition event, the partition object is not available in the event. Hence we cannot determine if its a self-event. In such cases currently we always issue a invalidate command. This is a known limitation and will be improved in IMPALA-7973 Patch adds new tests to trigger alter table/partition DDLs from impala and makes sure that the table is not invalidated. Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 6 files changed, 1,127 insertions(+), 202 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12591/5 -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 5 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/12591/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/12591/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@738 PS4, Line 738: public Collection getInFlightVersionsForEvents(String dbName, String tblName) > Should be a list: evens are ordered and must be ignored in the order that t Done http://gerrit.cloudera.org:8080/#/c/12591/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@765 PS4, Line 765: public void removeFromInFlightVersionsForEvents(String dbName, String tblName, > This form introduces a race condition. Can you do something like: Discussed this with Paul and agreed that this race condition may not be a problem as of now, since any other thread which can possible modify this list will either add a new version or invalidate the table. In both the cases, the code works as expected. I have added a TODO in MetastoreEvents class to improve this as a followup item later. http://gerrit.cloudera.org:8080/#/c/12591/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/12591/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@889 PS4, Line 889: protected Collection pendingVersionNumbersFromCatalog_ = Collections.EMPTY_LIST; > Again, this introduces a race condition if some other thread changes the ta Will be done as a followup since it is not critical as of now. This is related to invalidateOrIgnore(versionNumber) approach which was suggested. http://gerrit.cloudera.org:8080/#/c/12591/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@919 PS4, Line 919: protected boolean isSelfEvent() throws CatalogException { > I see. You want to identify outside of a lock if this is a self event. The Thanks for suggestion, As discussed this can be done as a followup in a separate patch. -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 28 Feb 2019 00:49:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 4: (7 comments) Identified one more potential race condition and suggested a solution. http://gerrit.cloudera.org:8080/#/c/12591/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/12591/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@738 PS4, Line 738: public Collection getInFlightVersionsForEvents(String dbName, String tblName) Should be a list: evens are ordered and must be ignored in the order that they were added to the list. http://gerrit.cloudera.org:8080/#/c/12591/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@765 PS4, Line 765: public void removeFromInFlightVersionsForEvents(String dbName, String tblName, This form introduces a race condition. Can you do something like: void invaliateOrIgnore(int versionNo) - lock - if list is non-empty, and first entry is versionNo, remove that first entry - Else, invalidate the table - unlock http://gerrit.cloudera.org:8080/#/c/12591/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/12591/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@889 PS4, Line 889: protected Collection pendingVersionNumbersFromCatalog_ = Collections.EMPTY_LIST; Again, this introduces a race condition if some other thread changes the table since we obtained the list. http://gerrit.cloudera.org:8080/#/c/12591/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@919 PS4, Line 919: protected boolean isSelfEvent() throws CatalogException { I see. You want to identify outside of a lock if this is a self event. The proposed non-racy solution makes self-event detection an integral part of the invalidate, avoiding race conditions. The logic would be: if server id not set, or does not match - invalidate else - invalidOrIgnore(event version) Can even be simplified to: eventVersion = -1 if serverId is set and matches then evenVersion = version from event invalidateOrIgnore(eventVersion) I think this completely eliminates all race condition paths http://gerrit.cloudera.org:8080/#/c/12591/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@957 PS4, Line 957: if (isSelfEvent()) { To enable this logging, have invalidOrIgnore return true if ignored, false if invalidated (or visa-versa, you choose) http://gerrit.cloudera.org:8080/#/c/12591/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1100 PS4, Line 1100: protected boolean isSelfEvent() { With the new system, we don't know if it is a self event until we lock and look. All we can tell is that, when we checked, it WAS a self event. http://gerrit.cloudera.org:8080/#/c/12591/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/12591/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@148 PS4, Line 148: * its a performance penalty not a correctness issue. Thanks for the explanation! -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 27 Feb 2019 23:13:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2284/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 27 Feb 2019 23:12:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Vihang Karajgaonkar has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. IMPALA-7972 Detect self-events to avoid unnecessary invalidates This patch adds support to detect self-generated events from catalog. This is used to avoid unnecessary invalidates to the tables from such self-events. Currently, alter_table, alter_partition, add_partition and drop_partition event types can invalidate the table metadata. Originally, we planned to have a global version number support from metastore (see HIVE-21115). But since that is still not complete, we rely on a combination of other identifiers to determine if a event is self-generated or not. These self-event identifiers consists of values from the table/partition parameters. A catalog service uuid and the catalog version number. The uuid is generated for each catalogservice when it comes up and it adds it to the table/partition parameters with the key "impala.CatalogServiceId". The catalog version number is added with the key "impala.CatalogVersion". When catalog executes a DDL operation it appends the current catalog version to the list of version numbers for the in-flight events for the table. Events processor clears this version when the corresponding version number identified by serviceId is received in the event. This is needed since it is possible that a external non-Impala system which generates the event presents the same serviceId and version number later on. The algorithm to detect a self-event is as below. 1. Add the service id and expected catalog version to table/partition parameters when executing the DDL operation. When the HMS operation is successful, add the version number to the list of version for in-flight events at table level. 2. When the event is received, the first time you see the combination of serviceId and version number, event processor clears the version number from table's list and determines the event as self-generated (and hence ignored) 3. If the event data presents a unknown serviceId or if the version number is not present in the list of in-flight versions, event is not a self-event and needs to be processed. In order to limit the total memory footprint, only 10 version numbers are stored at the table. Since the event processor is expected to poll every few seconds this should be a reasonable bound which satisfies most use-cases. Otherwise, event processor may wrongly process a self-event to invalidate the table. In such a case, its a performance penalty not a correctness issue. In case of drop_partition event, the partition object is not available in the event. Hence we cannot determine if its a self-event. In such cases currently we always issue a invalidate command. This is a known limitation and will be improved in IMPALA-7973 Patch adds new tests to trigger alter table/partition DDLs from impala and makes sure that the table is not invalidated. Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 6 files changed, 1,142 insertions(+), 203 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12591/4 -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 2: (17 comments) Quick initial review. More substantive comments will follow our in-person discussion. http://gerrit.cloudera.org:8080/#/c/12591/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/12591/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@891 PS1, Line 891: propertyValues.add(tbl.getMetaStoreTable().getParameters().get(propertyKey)); This will add nulls to the list if the property is not set. Is this the intention? Actually, looks like the semantics here is that you have to correlated lists: keys in one list, values in another. Is a map a better choice here? http://gerrit.cloudera.org:8080/#/c/12591/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@920 PS1, Line 920: propVals.add(hdfsPartition.getParameters().get(key)); Same issue as above. http://gerrit.cloudera.org:8080/#/c/12591/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/12591/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@61 PS1, Line 61: public static final String CATALOG_SERVICE_ID_PROP_KEY = "impala.CatalogServiceId"; Nit: the above props use TitleCase. The one below uses camelCase. Should we be consistent? http://gerrit.cloudera.org:8080/#/c/12591/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@359 PS1, Line 359:* not. Used by TableInvalidating Events to avoid unnecessary invalidatesß Future readers will greatly benefit form an explanation of the logic used to detect self-events. I'm having to reverse engineer it from the implementation: a few words of explanation would be greatly appreciated! http://gerrit.cloudera.org:8080/#/c/12591/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@360 PS1, Line 360:*/ The #1 item that needs explanation is concurrency. Change 1: sends to HMS Change 2: sends to HMS Change 2: returns from HMS, changes catalog Change 1: returns from HMS, changes catalog Notice that the above is deliberately done out of order due to, say, network delays. What are the semantics in each case? Suppose that change 1 is a rename, change 2 is a request to add a column to the old table. If change 2 is processed first, things work. If not, things don't work. How is this then reflected in the events and how are these cases synchronized? http://gerrit.cloudera.org:8080/#/c/12591/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@363 PS1, Line 363: private final Long lastDDLTime_; Does this need the overhead of an object? Or, can we use -1 as the "not set" indicator and use a primitive long? http://gerrit.cloudera.org:8080/#/c/12591/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@368 PS1, Line 368: private final Long catalogVersion_; As above. http://gerrit.cloudera.org:8080/#/c/12591/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@408 PS1, Line 408: * instance, two SIDs which have different serviceIds cannot be compared. This They can: just compare based on the service IDs. Also, put all null service IDs before all events with a service ID. May not mean much, but does impose an ordering. http://gerrit.cloudera.org:8080/#/c/12591/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@415 PS1, Line 415: * 1 means we want to process the event represented by this SID Rather than explaining all this, and doing a mapping from partial ordering to semantics, why not simply have a method that answers the direct question: canIgnore(something)? http://gerrit.cloudera.org:8080/#/c/12591/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@420 PS1, Line 420: if (this.equals(0)) return 0; Why would this ever equal the number 0? Or did you mean lower case "o"? The usual practice is to implement equals in terms of compareTo, not the other way around... http://gerrit.cloudera.org:8080/#/c/12591/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@424 PS1, Line 424: if (this.lastDDLTime_ < o.lastDDLTime_) return -1; The above assumes both values are non-null, else will crash with NPE. This is a good reason to use a primitive long instead of an object Long for the field. Consider: int result = Long.compare(lastDDLTime, o.lastDDLTime_); if (result != 0) return result;
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2259/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 26 Feb 2019 21:51:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Vihang Karajgaonkar has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. IMPALA-7972 Detect self-events to avoid unnecessary invalidates This patch adds support to detect self-generated events from catalog. This is used to avoid unnecessary invalidates to the tables from such self-events. Currently, alter_table, alter_partition, add_partition and drop_partition event types can invalidate the table metadata. Originally, we planned to have a global version number support from metastore (see HIVE-21115). But since that is still not complete, we rely on a combination of other identifiers to determine if a event is self-generated or not. These self-event identifiers consists of values from the table/partition parameters for transient_lastDDLTime, a uuid and version number. The uuid is generated for each catalogservice when it comes up and it adds it to the table/partition parameters with the key "impala.CatalogServiceId". The catalog version number is added with the key "impala.CatalogVersion". Since we want the metastore to update the transient_lastDDLTime we remove this parameter before catalog issues a alterTable or alterPartition DDL operation to metastore. When a event is generated we fetch the values of these parameters from event and catalog and compare them as folows: 1. If the transient_lastDDLTime of the table (partition in case of partition events) from the event is strictly less than or greater than value of transient_lastDDLTime in the parameters of corresponding catalog object we can ignore the event or process the event respectively. 2. In case of transient_lastDDLTime is equal to value in catalog, we rely on the serviceId and catalog version to resolve the conflict. if the serviceId matches with the serviceId of catalog, the version number is used to compare. If it doesn't match, the event is generated from another catalog and event should be processed. In case of drop_partition event, the partition object is not available in the event. Hence we cannot determine if its a self-event. In such cases currently we always issue a invalidate command. This is a known limitation and will be improved in IMPALA-7973 Patch adds new tests to trigger alter table/partition DDLs from impala and makes sure that the table is not invalidated. Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 6 files changed, 1,028 insertions(+), 200 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12591/2 -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12591 ) Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/2252/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/12591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353 Gerrit-Change-Number: 12591 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 26 Feb 2019 16:36:11 + Gerrit-HasComments: No