[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/7731 ) Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. IMPALA-5538: Use explicit catalog versions for deleted objects This commit changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. Previously, deletions of catalog objects were not explicitly annotated with the catalog version in which the deletion occured and the impalads were using the max catalog version in a catalog update in order to decide whether a deletion should be applied to the local catalog cache or not. This works correctly under the assumption that all the changes that occurred in the catalog between an update's min and max catalog version are included in that update, i.e. no gaps or missing updates. With the upcoming fix for IMPALA-5058, that constraint will be relaxed, thus allowing for gaps in the catalog updates. To avoid breaking the existing behavior, this patch introduced the following changes: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core tests. Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Reviewed-on: http://gerrit.cloudera.org:8080/7731 Reviewed-by: Dimitris TsirogiannisTested-by: Impala Public Jenkins --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalog-util.cc M be/src/catalog/catalog-util.h M be/src/catalog/catalog.cc M be/src/catalog/catalog.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M tests/statestore/test_statestore.py 21 files changed, 480 insertions(+), 441 deletions(-) Approvals: Dimitris Tsirogiannis: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-Change-Number: 7731 Gerrit-PatchSet: 9 Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7731 ) Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-Change-Number: 7731 Gerrit-PatchSet: 8 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 26 Sep 2017 20:20:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7731 ) Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1267/ -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-Change-Number: 7731 Gerrit-PatchSet: 8 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 26 Sep 2017 16:20:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/7731 ) Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 8: Code-Review+2 Rebase and keep Dan's +2 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-Change-Number: 7731 Gerrit-PatchSet: 8 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 26 Sep 2017 16:20:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/7731 ) Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 7: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc@177 PS5, Line 177: Statestore::Subscriber::Subscriber(const SubscriberId& subscriber_id, : const TUniqueId& registration_id, const TNe > Hm, it's still not clear to me why a transient topic cannot get values in d okay. it wasn't clear to me if we can just take the value that was produced for an add/update and use it for delete, and expect that value to be the one that the subscriber expects for delete. But i guess assuming that is okay. -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-Change-Number: 7731 Gerrit-PatchSet: 7 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 25 Sep 2017 18:50:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Hello Bharath Vissapragada, Alex Behm, Vuk Ercegovac, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7731 to look at the new patch set (#7). Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. IMPALA-5538: Use explicit catalog versions for deleted objects This commit changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. Previously, deletions of catalog objects were not explicitly annotated with the catalog version in which the deletion occured and the impalads were using the max catalog version in a catalog update in order to decide whether a deletion should be applied to the local catalog cache or not. This works correctly under the assumption that all the changes that occurred in the catalog between an update's min and max catalog version are included in that update, i.e. no gaps or missing updates. With the upcoming fix for IMPALA-5058, that constraint will be relaxed, thus allowing for gaps in the catalog updates. To avoid breaking the existing behavior, this patch introduced the following changes: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core tests. Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalog-util.cc M be/src/catalog/catalog-util.h M be/src/catalog/catalog.cc M be/src/catalog/catalog.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M tests/statestore/test_statestore.py 21 files changed, 480 insertions(+), 441 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/7 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-Change-Number: 7731 Gerrit-PatchSet: 7 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/7731 ) Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.h File be/src/catalog/catalog-server.h: http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.h@131 PS6, Line 131: Also determines any : /// deletions of catalog objects by looking at the : /// difference of the last set of topic entry keys that were sent and the current set : /// of topic entry keys > Remove? Done http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.cc@308 PS6, Line 308: bool topi > should that be DCHECK_GT? The old conditional to continue on was <= ... Correct. Done http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog.h File be/src/catalog/catalog.h: http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog.h@59 PS6, Line 59: Gets all Catalog objects and the metadata that is applicable for the given request. : /// Always returns all object names that exist in the Catalog, but allows for extended : /// metadata for objects that were modified after the specified version. > is that accurate? from our discussion, my understanding was that only obje Yeah, wording is not clear. Changed the comment. http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc@177 PS5, Line 177: entry_it->second.SetDeleted(true); : entry_it->second.SetVersion(last_version_); > What I mean is that topics that are transient do not get values in deleted Hm, it's still not clear to me why a transient topic cannot get values in deleted topics, in the sense that I see the topic being transient and how to process deletions two orthogonal issues and I don't understand why one should influence the other. http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/statestore/statestore.cc@170 PS6, Line 170: topic_update_log_.erase(version); > also, does that still make sense (now that we no longer do the SetValue(NUL Done http://gerrit.cloudera.org:8080/#/c/7731/6/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/7731/6/common/thrift/CatalogInternalService.thrift@32 PS6, Line 32: > does it include objects with changes in version > or >= the from_version? Updated the comment (>). Done http://gerrit.cloudera.org:8080/#/c/7731/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/7731/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@310 PS6, Line 310: Views, Databases, and :* Functions) > datasources, cachepools, sentry stuff ... I think the list is too big, may Done -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-Change-Number: 7731 Gerrit-PatchSet: 5 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 25 Sep 2017 17:55:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/7731 ) Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.h File be/src/catalog/catalog-server.h: http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.h@131 PS6, Line 131: Also determines any : /// deletions of catalog objects by looking at the : /// difference of the last set of topic entry keys that were sent and the current set : /// of topic entry keys Remove? http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@369 PS5, Line 369: catalogTbl.setTable(tbl.toThrift()); > It's racy if you do that. A concurrent table modification may have gotten a Yea, got it, thanks. http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@970 PS5, Line 970: return removedTable; > I thought about it but there are two reasons not to do it: a) we need the T I see. http://gerrit.cloudera.org:8080/#/c/7731/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/7731/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@310 PS6, Line 310: Tables, Views, Databases, and :* Functions) datasources, cachepools, sentry stuff ... I think the list is too big, may be remove this, rather than keeping it stale? -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-Change-Number: 7731 Gerrit-PatchSet: 6 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 22 Sep 2017 16:20:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/7731 ) Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/statestore/statestore.cc@170 PS6, Line 170: total_value_size_bytes_ -= entry_it->second.value().size() also, does that still make sense (now that we no longer do the SetValue(NULL) at line 176 in the old code? -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-Change-Number: 7731 Gerrit-PatchSet: 6 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 21 Sep 2017 23:17:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/7731 ) Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.cc@308 PS6, Line 308: DCHECK_GE should that be DCHECK_GT? The old conditional to continue on was <= ... http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog.h File be/src/catalog/catalog.h: http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog.h@59 PS6, Line 59: Gets all Catalog objects and the metadata that is applicable for the given request. : /// Always returns all object names that exist in the Catalog, but allows for extended : /// metadata for objects that were modified after the specified version. is that accurate? from our discussion, my understanding was that only objects with version > from_version would be included. http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc@177 PS5, Line 177: } : } > The behavior is topic specific and currently only subscribers of catalog-up What I mean is that topics that are transient do not get values in deleted topics. So, if a subscriber relies on value to process deletion, it better not register as transient, right? Shouldn't that be made explicit? http://gerrit.cloudera.org:8080/#/c/7731/6/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/7731/6/common/thrift/CatalogInternalService.thrift@32 PS6, Line 32: relative to does it include objects with changes in version > or >= the from_version? -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-Change-Number: 7731 Gerrit-PatchSet: 6 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 21 Sep 2017 23:11:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7731 to look at the new patch set (#6). Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. IMPALA-5538: Use explicit catalog versions for deleted objects This commit changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. Previously, deletions of catalog objects were not explicitly annotated with the catalog version in which the deletion occured and the impalads were using the max catalog version in a catalog update in order to decide whether a deletion should be applied to the local catalog cache or not. This works correctly under the assumption that all the changes that occurred in the catalog between an update's min and max catalog version are included in that update, i.e. no gaps or missing updates. With the upcoming fix for IMPALA-5058, that constraint will be relaxed, thus allowing for gaps in the catalog updates. To avoid breaking the existing behavior, this patch introduced the following changes: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core tests. Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalog-util.cc M be/src/catalog/catalog-util.h M be/src/catalog/catalog.cc M be/src/catalog/catalog.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M tests/statestore/test_statestore.py 21 files changed, 473 insertions(+), 425 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/6 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 5: (27 comments) http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: PS5, Line 231: // If this is not a delta update, request an update from version 0 from the local : // catalog. There is an optimization that checks if pending_topic_updates_ was just : // reloaded from version 0. If so, then skip this step and use that data. > I found it impossible to understand this comment without reading through mo Done PS5, Line 234: delta.from_version == 0 && delta.to_version == 0 > from the comment above, presumably this condition means "this is not a delt Yeah, you're right. This condition is really confusing. from_version = 0 implies delta = false and can be used here instead. The part that is not clear to me is the to_version check but I couldn't find any case where it would make sense to check the to_version in order to decide whether to send the full catalog update or not. I will remove it and see if it breaks something. PS5, Line 310: if (catalog_object.catalog_version <= last_sent_catalog_version_) continue; > given that last_sent_catalog_version_ was passed to GetAllCatalogObjects(), Right, it's not needed. Replaced it with a DCHECK. http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/catalog/catalog-server.h File be/src/catalog/catalog-server.h: PS5, Line 150: added > nit: May be 'added/updated/removed' to be consistent with other places? Done http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS5, Line 1392: > can we delete this function now? Yes. Done PS5, Line 1366: ; > Could you add 'len' too, given this is trace logging anyway. (might be help Done PS5, Line 1399: // Nothing to do here. > Remove? Kinda seems obvious Done http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: PS5, Line 177: entry_it->second.SetDeleted(true); : entry_it->second.SetVersion(last_version_); > now that some subscribes require deleted entries to have a value, how does The behavior is topic specific and currently only subscribers of catalog-update topic rely on the value for deleted entries. Not sure how to make it less fragile in the sense that "value_" is opaque to the statestore and is interpreted only by subscribers. PS5, Line 490: > would be easier to read if you line break it there to keep the last arg all Done PS5, Line 538: if (topic_entry.value() != Statestore::TopicEntry::NULL_VALUE) > why do we need that guard? why can't we set topic_item.value even in that c Yeap, removed it. PS5, Line 546: int64_t topic_size = topic.total_key_size_bytes() - deleted_key_size_bytes : + topic.total_value_size_bytes(); > that math doesn't look correct anymore (deleted entries can have non-zero v Done http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.h File be/src/statestore/statestore.h: PS5, Line 180: may > may or will? "will". Changed it. http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: PS5, Line 25: GetAllCatalogObjects > see below. I think we should rename this. Done PS5, Line 25: Contains all known Catalog objects : // (databases, tables/views, and functions) from the CatalogService's cache. > I think that should be updated to explain that this is relative to a partic Done PS5, Line 38: empty list if no objects detected in the Catalog > Thanks. As we talked about in person, I think we should rename GetAllCatalo Done http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: PS5, Line 87: If true, this item was deleted > How about: Done PS5, Line 87: If true, this item was deleted > Actually, the parenthesis I suggested doesn't clarify. Maybe that part sho Done PS5, Line 97: List of changes to topic entries, including deletions. > That's only true when is_delta==true. So, how about we say: Done PS5, Line 100: applied to in-memory state, > that's kinda dependent on the subscriber's implementation, right? would it Done http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java: PS5, Line 53: identiy > nit: typo Done PS5, Line 123: if (first.getType() != second.getType()) return false; > This seems to already be handled by the next line (by generating a differen Good point. Removed
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dan Hecht has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: PS5, Line 87: If true, this item was deleted > How about: Actually, the parenthesis I suggested doesn't clarify. Maybe that part should say: (since the latest version of every still-present topic will be included) or something like that. -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dan Hecht has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 5: (16 comments) http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: PS5, Line 231: // If this is not a delta update, request an update from version 0 from the local : // catalog. There is an optimization that checks if pending_topic_updates_ was just : // reloaded from version 0. If so, then skip this step and use that data. I found it impossible to understand this comment without reading through most of the code in this file, which kinda defeats the point of having a comment :) i think it would be clearer if it said something like: If not generating a delta update and 'pending_topic_updates_' doesn't already contain the full catalog (beginning with version 0), then force GatherCatalogUpdatesThread() to reload the full catalog. PS5, Line 234: delta.from_version == 0 && delta.to_version == 0 from the comment above, presumably this condition means "this is not a delta update"? Why is that the condition and not 'delta.is_delta'? PS5, Line 310: if (catalog_object.catalog_version <= last_sent_catalog_version_) continue; given that last_sent_catalog_version_ was passed to GetAllCatalogObjects(), why is this needed? http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS5, Line 1392: can we delete this function now? http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: PS5, Line 177: entry_it->second.SetDeleted(true); : entry_it->second.SetVersion(last_version_); now that some subscribes require deleted entries to have a value, how does this all work? are we implicitly requiring that only non-transient subscribers depend on values for deletions? If that's the case, that seems fragile (and at the least should be documented). PS5, Line 490: would be easier to read if you line break it there to keep the last arg all on one line. PS5, Line 538: if (topic_entry.value() != Statestore::TopicEntry::NULL_VALUE) why do we need that guard? why can't we set topic_item.value even in that case? assuming we don't need it, it seems like we can remove the NULL_VALUE constant altogether. PS5, Line 546: int64_t topic_size = topic.total_key_size_bytes() - deleted_key_size_bytes : + topic.total_value_size_bytes(); that math doesn't look correct anymore (deleted entries can have non-zero value size). it might be more straight forward to just add up the topic size as we go rather than try to subtract out the deleted portion. http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.h File be/src/statestore/statestore.h: PS5, Line 180: may may or will? http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: PS5, Line 25: GetAllCatalogObjects see below. I think we should rename this. PS5, Line 25: Contains all known Catalog objects : // (databases, tables/views, and functions) from the CatalogService's cache. I think that should be updated to explain that this is relative to a particular catalog version and that it now expresses it as an update and delete. PS5, Line 38: empty list if no objects detected in the Catalog > Yes, there is some context that is missing here, hence it's not easy to und Thanks. As we talked about in person, I think we should rename GetAllCatalogObjects to GetCatalogDelta() or something similar to make it clear that this gets the delta. There's no hint at that until you read the header file comment. http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: Line 84: // is not included in the topic key (e.g. catalog version of deleted catalog objects). > Let me try to explain. Statestore only uses that flag to avoid sending dele Please see my comments elsewhere in this file for suggestions on clarifying the interface. It still seems confusing and weird that 'value' can matter for delete when delete, but I guess the comments I'm suggested elsewhere help clarify at least what 'delete' actually means. PS5, Line 87: If true, this item was deleted How about: If true, this item was deleted. When false, this TTopicItem need not be included in non-delta TTopicDelta's (since the complete set of TTopicItems will be included in that case). or something like that. PS5, Line 97: List of changes to topic entries, including deletions. That's only true when is_delta==true. So, how about we say: When is_delta=true, a list of changes to topic entries, including deletions, within [from_version, to_version]. When
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 5: (2 comments) Addressing some questions by Dan. http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: PS5, Line 38: empty list if no objects detected in the Catalog > s/through/trouble/g Yes, there is some context that is missing here, hence it's not easy to understand what these comments mean. Let me try to explain first here and I will expand the comments to clarify some things. 1. What I think that comment should say is "no deleted objects were detected since the last GetAllCatalogObjects request". 2. Similarly to the previous comment. Deleted and updated/new objects are with respect to the previous GetAllCatalogObjects call. The base from which these deltas are computed is the "fromVersion" param of the TGetAllCatalogObjects struct. I'll update the comments. http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: Line 84: // is not included in the topic key (e.g. catalog version of deleted catalog objects). > Sorry, opaque was the wrong word. I meant to say agnostic. Let me try to explain. Statestore only uses that flag to avoid sending deleted topic entries when the topic update is not a delta one (this is an optimization). The statestore subscribers (e.g. impalad) are the ones that really need what's in the 'value' field in order to determine how to process a deleted topic entry. For the case of catalog update, the value contains among other things the catalog version when the deletion occurred and the impalad uses that information to determine if the deletion should be applied at the local catalog or not. We can definitely add the deleted flag inside whatever object we're putting in 'value' (e.g. TCatalogObject) and I am happy to try this out if you think it's better. -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dan Hecht has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: PS5, Line 38: empty list if no objects detected in the Catalog > I wasn't very clear. I was making two separate points: s/through/trouble/g -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dan Hecht has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: PS5, Line 38: empty list if no objects detected in the Catalog > Sorry not following. This says detected not deleted. Maybe I misunderstood I wasn't very clear. I was making two separate points: 1) The parenthesis statement makes it sound like the (only) condition where this list is empty is when no objects are detected in the catalog, but it seems that this list can also be empty if nothing is deleted. 2) Separately (and not related to the highlighted code so sorry for the confusion), I'm having through understanding what 'update_objects' and 'deleted_objects' really mean since those are ways to express things as a delta from some list, but I don't know what that list is. In the old structure, 'objects' was just an absolute thing (not relative to a previous list). http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: Line 84: // is not included in the topic key (e.g. catalog version of deleted catalog objects). > Hm, I don't see why the value should be opaque to the subscriber. On the co Sorry, opaque was the wrong word. I meant to say agnostic. I agree value should be opaque to statestore. What I don't understand is how 'delete' can have meaning at the statestore level if a subscriber needs to look inside 'value' to know how to apply it. So, is 'deleted' opaque to the statestore, and if so, why not just put that in the 'value'? If statestore does use 'deleted' for something, how can that be given that 'value' contains information needed in order to apply the delete? -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: PS5, Line 38: empty list if no objects detected in the Catalog > that seems kinda misleading. won't the list also be empty if nothing was de Sorry not following. This says detected not deleted. Maybe I misunderstood you comment. http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: Line 84: // is not included in the topic key (e.g. catalog version of deleted catalog objects). > hmm, without reading through how the catalog topic works, this is still pre Hm, I don't see why the value should be opaque to the subscriber. On the contrary, it should be opaque to the statestore but each topic subscriber could have different logic implemented around deletions. If we want to be able to handle deletions with only the information from the key then we need to expand the key and for the case of catalog updates, it should also include the catalog version. Unfortunately, that won't work well with the current implementation of the statestore. -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dan Hecht has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: PS5, Line 38: empty list if no objects detected in the Catalog that seems kinda misleading. won't the list also be empty if nothing was deleted? And what do we mean by deleted (and for that matter updated)? Deleted (and updated) since when? Doesn't this need a "base version" to make sense? http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: Line 84: // is not included in the topic key (e.g. catalog version of deleted catalog objects). hmm, without reading through how the catalog topic works, this is still pretty confusing. it seems weird that a value is needed to perform a deletion of a key (and it means that semantics of delete are now not opaque to the particular subscriber, right?) Maybe easiest if we talk in person. -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/7731/3/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: PS3, Line 86: deleted > What I meant is that we should explain in the comment here what 'value' mea Fair enough. Done PS3, Line 105: The from_version will always be 0 for non-delta updates > Yeah. what I meant was we should document it in this comment, right? Done -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7731 to look at the new patch set (#5). Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. IMPALA-5538: Use explicit catalog versions for deleted objects This commit changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. Previously, deletions of catalog objects were not explicitly annotated with the catalog version in which the deletion occured and the impalads were using the max catalog version in a catalog update in order to decide whether a deletion should be applied to the local catalog cache or not. This works correctly under the assumption that all the changes that occurred in the catalog between an update's min and max catalog version are included in that update, i.e. no gaps or missing updates. With the upcoming fix for IMPALA-5058, that constraint will be relaxed, thus allowing for gaps in the catalog updates. To avoid breaking the existing behavior, this patch introduced the following changes: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core tests. Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/statestore/test_statestore.py 16 files changed, 427 insertions(+), 338 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/5 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dan Hecht has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/7731/3/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: PS3, Line 86: deleted > What I tried to explain in the commit msg (let me know if it still unclear) What I meant is that we should explain in the comment here what 'value' means when 'deleted'=true. it doesn't seem intuitive what 'value' would be in that case (in most systems, "delete" operations deal only with the 'key' not the 'value'). PS3, Line 105: The from_version will always be 0 for non-delta updates > The to_version is 0 only when there are no changes in the topic. It's value Yeah. what I meant was we should document it in this comment, right? -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7731 to look at the new patch set (#4). Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. IMPALA-5538: Use explicit catalog versions for deleted objects This commit changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. Previously, deletions of catalog objects were not explicitly annotated with the catalog version in which the deletion occured and the impalads were using the max catalog version in a catalog update in order to decide whether a deletion should be applied to the local catalog cache or not. This works correctly under the assumption that all the changes that occurred in the catalog between an update's min and max catalog version are included in that update, i.e. no gaps or missing updates. With the upcoming fix for IMPALA-5058, that constraint will be relaxed, thus allowing for gaps in the catalog updates. To avoid breaking the existing behavior, this patch introduced the following changes: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core tests. Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/statestore/test_statestore.py 16 files changed, 422 insertions(+), 336 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/4 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/7731/3/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: PS3, Line 34: updated > does "updated" also include new objects? or is it really just objects that It includes both added and changed objects. I updated the comment but let me know if you have a better name option. http://gerrit.cloudera.org:8080/#/c/7731/3/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: PS3, Line 86: deleted > in this case, what is in 'value'? Is it defined to be set or is only the ke What I tried to explain in the commit msg (let me know if it still unclear) was that we should not use the contents of "value" to indicate if a topic item is deleted or not. It doesn't mean that a topic item with deleted set to true should have a specific value, i.e. each topic can have a separate behavior. PS3, Line 103: all changes > your commit message seems to contradict that. Oh, maybe i'm confusing cata Hm, sorry for the confusion. What the commit is describing is related to catalog versions and the follow up changes for fixing IMPALA-5058. PS3, Line 105: The from_version will always be 0 for non-delta updates > what about to_version in this case? The to_version is 0 only when there are no changes in the topic. It's value doesn't depend on whether the update is delta or not. -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dan Hecht has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/7731/3/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: PS3, Line 34: updated does "updated" also include new objects? or is it really just objects that have been updated, i.e. changed? http://gerrit.cloudera.org:8080/#/c/7731/3/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: PS3, Line 86: deleted in this case, what is in 'value'? Is it defined to be set or is only the key relevant? (Your commit message talks about it but it's not clear from the interface). PS3, Line 103: all changes your commit message seems to contradict that. Oh, maybe i'm confusing catalog versions and statestore topic versions? PS3, Line 105: The from_version will always be 0 for non-delta updates what about to_version in this case? -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Alex Behm has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has uploaded a new patch set (#3). Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. IMPALA-5538: Use explicit catalog versions for deleted objects This commit changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. Previously, deletions of catalog objects were not explicitly annotated with the catalog version in which the deletion occured and the impalads were using the max catalog version in a catalog update in order to decide whether a deletion should be applied to the local catalog cache or not. This works correctly under the assumption that all the changes that occurred in the catalog between an update's min and max catalog version are included in that update, i.e. no gaps or missing updates. With the upcoming fix for IMPALA-5058, that constraint will be relaxed, thus allowing for gaps in the catalog updates. To avoid breaking the existing behavior, this patch introduced the following changes: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core tests. Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/statestore/test_statestore.py 16 files changed, 421 insertions(+), 336 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/3 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 2: (16 comments) http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: PS2, Line 233: 0, if they have then skip this step and use that data > nit: perhaps easier to read? "...version 0. If so, then skip ..." Done Line 305: BuildTopicUpdatesHelper(catalog_objects.updated_objects, false); > I like the previous version more where this is inlined into L292. A quick c Done Line 317: if (catalog_object.catalog_version <= last_sent_catalog_version_) continue; > Move this above L312? Done PS2, Line 336: if (topic_deletions) { : VLOG(1) << "Publishing deletion: " << entry_key << "@" : << catalog_object.catalog_version; : } else { : VLOG(1) << "Publishing update: " << entry_key << "@" : << catalog_object.catalog_version; : } > perhaps remove some redundancy this way: Done http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 1340: LOG(INFO) << "Received large catalog update(>100mb): " > Received a large catalog item? Done PS2, Line 1366: if (item.deleted) { : VLOG(3) << "Deleted item: " << item.key << " version: " : << catalog_object.catalog_version; : } else { : VLOG(3) << "Added item: " << item.key << " version: " : << catalog_object.catalog_version; : } > same suggestion as in catalog_server to reduce redundancy. Done http://gerrit.cloudera.org:8080/#/c/7731/2/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: Line 86: 3: required bool deleted = false; > Isn't it better to not have a default value to make sure this is always set Done http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java: Line 56: * determine which catalog objects were deleted since the last catalog update topic. > catalog topic update Done Line 57: * Once the catalog update topic is constructed, the old deleted catalog objects are > catalog topic update Done Line 74: // Retrieve all the removed catalog objects with version > 'fromVersion'. > /** style comment Done http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: PS2, Line 152: . > the word "delta" implies that we're dealing with inserts, deletes, and poss Done Line 347: // Identify the catalog objects that were updated (added/dropped) in the catalog with > /** style comment Done Line 351: Set addedCatalogObjectKeys = Sets.newHashSet(); > updatedCatalogObjects? Done Line 363: TCatalogObject catalogTbl = new TCatalogObject(TCatalogObjectType.TABLE, > Why not just iterate over db.getTables()? Done Line 441: addedCatalogObjectKeys.add(CatalogDeltaLog.toCatalogObjectKey(privilege)); > Instead of having a duplicate add() in all places, how about creating the a Done Line 448: for (TCatalogObject removedObject: deltaLog_.retrieveObjects(fromVersion)) { > Nice! Much clearer. Done -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Vuk Ercegovac has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: PS2, Line 233: 0, if they have then skip this step and use that data nit: perhaps easier to read? "...version 0. If so, then skip ..." PS2, Line 336: if (topic_deletions) { : VLOG(1) << "Publishing deletion: " << entry_key << "@" : << catalog_object.catalog_version; : } else { : VLOG(1) << "Publishing update: " << entry_key << "@" : << catalog_object.catalog_version; : } perhaps remove some redundancy this way: VLOG(1) << "Publishing " << (topic_deletions ? "deletion" : "update") << ": " << entry_key << "@" << catalog_object.catalog_version; http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS2, Line 1366: if (item.deleted) { : VLOG(3) << "Deleted item: " << item.key << " version: " : << catalog_object.catalog_version; : } else { : VLOG(3) << "Added item: " << item.key << " version: " : << catalog_object.catalog_version; : } same suggestion as in catalog_server to reduce redundancy. http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: PS2, Line 152: . the word "delta" implies that we're dealing with inserts, deletes, and possibly updates. any reason why this member is named more generally? why not call it deleteLog_? -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Alex Behm has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 2: (12 comments) Change is looking pretty good to me, only minor comments left. http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: Line 305: BuildTopicUpdatesHelper(catalog_objects.updated_objects, false); I like the previous version more where this is inlined into L292. A quick comment about the requirements/structure of a topic update would be nice, e.g. stating that the set of deletions and updates are disjoint and therefore the order in which updates/deletions are added is irrelevant. Line 317: if (catalog_object.catalog_version <= last_sent_catalog_version_) continue; Move this above L312? http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 1340: LOG(INFO) << "Received large catalog update(>100mb): " Received a large catalog item? http://gerrit.cloudera.org:8080/#/c/7731/2/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: Line 86: 3: required bool deleted = false; Isn't it better to not have a default value to make sure this is always set? http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java: Line 56: * determine which catalog objects were deleted since the last catalog update topic. catalog topic update Line 57: * Once the catalog update topic is constructed, the old deleted catalog objects are catalog topic update Line 74: // Retrieve all the removed catalog objects with version > 'fromVersion'. /** style comment http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: Line 347: // Identify the catalog objects that were updated (added/dropped) in the catalog with /** style comment Line 351: Set addedCatalogObjectKeys = Sets.newHashSet(); updatedCatalogObjects? Line 363: TCatalogObject catalogTbl = new TCatalogObject(TCatalogObjectType.TABLE, Why not just iterate over db.getTables()? Line 441: addedCatalogObjectKeys.add(CatalogDeltaLog.toCatalogObjectKey(privilege)); Instead of having a duplicate add() in all places, how about creating the addedCatalogObjets set right before L448 by looping over the addedCatalogObjects? Seems only marginally less efficient, but less prone to forgetting an add() somewhere. Line 448: for (TCatalogObject removedObject: deltaLog_.retrieveObjects(fromVersion)) { Nice! Much clearer. -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has uploaded a new patch set (#2). Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. IMPALA-5538: Use explicit catalog versions for deleted objects This commit changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. Previously, deletions of catalog objects were not explicitly annotated with the catalog version in which the deletion occured and the impalads were using the max catalog version in a catalog update in order to decide whether a deletion should be applied to the local catalog cache or not. This works correctly under the assumption that all the changes that occurred in the catalog between an update's min and max catalog version are included in that update, i.e. no gaps or missing updates. With the upcoming fix for IMPALA-5058, that constraint will be relaxed, thus allowing for gaps in the catalog updates. To avoid breaking the existing behavior, this patch introduced the following changes: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core tests. Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/statestore/test_statestore.py 16 files changed, 438 insertions(+), 331 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/2 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis