[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 16: This change did not cherrypick successfully into branch 2.x. To resolve this, please do the cherry-pick manually and submit it to Gerrit at refs/for/2.x or add an exception to the branch 2.x copy of bin/ignored_commits.json. Thanks, your friendly bot at https://jenkins.impala.io/job/cherrypick-2.x-and-test/524/ . -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 16 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 21 May 2018 23:52:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 15: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 15 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 21 May 2018 18:25:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Reviewed-on: http://gerrit.cloudera.org:8080/10116 Reviewed-by: Lars VolkerTested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M tests/metadata/test_last_ddl_time_update.py 8 files changed, 228 insertions(+), 173 deletions(-) Approvals: Lars Volker: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 16 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 15: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2515/ -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 15 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 21 May 2018 14:58:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 15: Code-Review+2 Carrying Alex's +2 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 15 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 21 May 2018 14:58:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 14: >From the failing build: 21:55:34 ] Creating an egg for /home/ubuntu/Impala/shell/ext-py/sqlparse-0.1.14 ... 21:55:34 ] Creating an egg for /home/ubuntu/Impala/shell/ext-py/sqlparse-0.1.19 21:55:34 ] python: can't open file 'setup.py': [Errno 2] No such file or directory 21:55:34 ] Error in shell/make_shell_tarball.sh at line 96: python setup.py -q bdist_egg clean There is a known issue with these log lines which should be resolved by doing a clean build. -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 14 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sun, 20 May 2018 12:46:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 13: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2507/ -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 13 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 18 May 2018 21:55:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 13: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2507/ -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 13 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 18 May 2018 18:03:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 13 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 18 May 2018 18:02:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/10116/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10116/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2868 PS12, Line 2868: // this would make it necessary to reload the table after alter_table in order to > I agree that today we will always reload the table metadata from the HMS an Thanks for the explanation! I have rewritten the comment, I hope that it is better now. -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 13 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 18 May 2018 17:04:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#13). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M tests/metadata/test_last_ddl_time_update.py 8 files changed, 228 insertions(+), 173 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/13 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 13 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/10116/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10116/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2868 PS12, Line 2868: // catalogd would read it back after alter_table, but as this would not work for > Do you have an example in mind when the table should not be reloaded? I loo I agree that today we will always reload the table metadata from the HMS and set a new msTbl. That behavior is not by design though - I think it's unintentional and probably a bug (see for example IMPALA-6994). Personally, I think this comment is difficult to understand because it's explanation relies on (1) understanding what callers will do after calling this function, and (2) understanding what is done for Kudu tables, so there's some loss of abstraction. Imo, it would be clearer to declare a general policy of setting the lastDdlTime in Impala to not rely on the HMS setting it and to avoid having to fetch it from HMS if possible. -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 12 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 18 May 2018 16:39:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/10116/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10116/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2868 PS12, Line 2868: // catalogd would read it back after alter_table, but as this would not work for > I don't think it's true that we'd always read it back after calling this fu Do you have an example in mind when the table should not be reloaded? I looked at the callers of applyAlterTable(): loadTableMetadata() is called after alter*() + dropTableStats(), and createTable() only creates an "incomplete table", so the table will be reloaded before it is actually used. Because of this, my impression was that tables are always reloaded by design. -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 12 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 18 May 2018 15:43:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 12: Code-Review+2 (1 comment) lgtm after addressing my last comment http://gerrit.cloudera.org:8080/#/c/10116/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10116/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2868 PS12, Line 2868: // catalogd would read it back after alter_table, but as this would not work for I don't think it's true that we'd always read it back after calling this function. I think the bigger issue is that if we let the HMS set lastDdlTime, then we'd have to fetch the HMS table metadata again only to get that updated lastDdlTime. So with it's more efficient for us to set it directly. -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 12 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 18 May 2018 00:05:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 11: (4 comments) http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@244 PS10, Line 244: msTable_.putToParameters(StatsSetupConst.DO_NOT_UPDATE_STATS, > Makes sense, please add a comment somewhere stating why we prefer to set th Done http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@785 PS10, Line 785: Table.updateTimestampProperty(msTbl, HdfsTable.TBL_PROP_LAST_COMPUTE_STATS_TIME); > The Kudu properties are generaly in KuduTable. Which property in HdfsTable Done - I thought that some of the properties defined in HdfsTable are used by Kudu tables too, but I looked through them and they are only used in HDFS tables. http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py File tests/metadata/test_last_ddl_time_update.py: http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@74 PS10, Line 74: def run_test(self, queries, expect_changed_ddl_time, expect_changed_stats_time): > The vast majority of cases only runs one query at once, and there is no fun Done http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@93 PS10, Line 93: time.sleep (1.1) > I agree it's an HMS convention. Let's change the comment to state that then I have already rewritten Hive to HMS in patch set 11 - does the comment still miss something? -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 15 May 2018 13:57:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#12). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M tests/metadata/test_last_ddl_time_update.py 8 files changed, 228 insertions(+), 173 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/12 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 12 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 10: (5 comments) http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@244 PS10, Line 244: Long.toString(System.currentTimeMillis() / 1000)); > It sets it if the property does not exist, but this would not work well for Makes sense, please add a comment somewhere stating why we prefer to set the lastDdlTime in Impala. http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@785 PS10, Line 785: msTbl.putToParameters("impala.lastComputeStatsTime", > I have put the constant to HdfsTable, because all the other property keys r The Kudu properties are generaly in KuduTable. Which property in HdfsTable also applies to Kudu tables? This new last compute stats time property should be in Table. http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py File tests/metadata/test_last_ddl_time_update.py: http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@74 PS10, Line 74: def run_test(self, query, expect_changed_ddl_time, expect_changed_stats_time): > The "invalidate metadata %(TBL)s; describe %(TBL)s" combo was used the chec The vast majority of cases only runs one query at once, and there is no fundamental reason to provide a multi-query test interface - all testing can be done just as well without it. The multi-query behavior is subtly different than running the single-query interface multiple times, so I think overall it's simpler to think about a single-query interface. http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@93 PS10, Line 93: # Hive uses a seconds granularity on the last ddl time. > Isn't it an HMS convention in this case? Or is there a reason behind not us I agree it's an HMS convention. Let's change the comment to state that then. http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@155 PS10, Line 155: h.expect_no_time_change("drop incremental stats %(TBL)s partition (j=1, s='2012')") > Are you sure about this? I use DROP INCREMENTAL STATS on purpose to check + ok wfm -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 14 May 2018 20:58:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 10: (8 comments) http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1235 PS10, Line 1235:* TODO: the following paragraph seems at least partially obsolate > Why not clean it up then? Point (1) is obsolete but point (2) is still vali Done http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@244 PS10, Line 244: Long.toString(System.currentTimeMillis() / 1000)); > Doesn't the HMS set this in alter_table()? It sets it if the property does not exist, but this would not work well for Kudu tables: after calling alter_table here, the table will not be loaded from HMS again (unlike in HDFS tables, where this is done to ensure HMS-Impala consistency). This means that if I would remove "transient_lastDdlTime" here, then HMS would calculate a valid value, but Impala catalogd would not know about this value. http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@785 PS10, Line 785: msTbl.putToParameters("impala.lastComputeStatsTime", > Create a constant for the table property in Table similar to what we have i I have put the constant to HdfsTable, because all the other property keys reside their, even those that are used by Kudu tables too. http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2868 PS10, Line 2868: // HMS updates transient_lastDdlTime if it is removed. > I understand what this comment says, but I don't understand it's relevance This comment remained here by mistake. http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2870 PS10, Line 2870: Long.toString(System.currentTimeMillis() / 1000)); > We're doing "System.currentTimeMillis() / 1000" in quite a few places, I su I have created a bit different helper function. http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py File tests/metadata/test_last_ddl_time_update.py: http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@74 PS10, Line 74: def run_test(self, query, expect_changed_ddl_time, expect_changed_stats_time): > Passing a ";" delimited string is really weird. Why not pass a list of quer The "invalidate metadata %(TBL)s; describe %(TBL)s" combo was used the check that in case of Kudu tables, lastDdlTime is not increased by reloading the table (it used to increase it), so an extra query is needed after INVALIDATE METADATA to load the table. The two can be separated (like for other init queries) but I felt that they belong together. I replaced the string splitting with variable argument list - I can replace this with two separate calls if needed. http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@93 PS10, Line 93: # Hive uses a seconds granularity on the last ddl time. > Not just Hive - we do too. Just say "All times are stored in seconds" or so Isn't it an HMS convention in this case? Or is there a reason behind not using higher precision timestamp? http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@155 PS10, Line 155: h.expect_no_time_change("drop incremental stats %(TBL)s partition (j=1, s='2012')") > use "drop stats" instead to wipe everything (including incremental stats) Are you sure about this? I use DROP INCREMENTAL STATS on purpose to check +1 statement's effect in the test suite. -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri,
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#11). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M tests/metadata/test_last_ddl_time_update.py 8 files changed, 220 insertions(+), 173 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/11 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 10: Code-Review+1 Sorry, some of the authorization test changes were missing from patch 9. Carry +1 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 10 May 2018 16:10:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#10). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M tests/metadata/test_last_ddl_time_update.py 7 files changed, 203 insertions(+), 161 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/10 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 9: The change broke AuthorizationTest#TestDescribeTableResults. I fixed this in patch set 9. -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 09 May 2018 19:11:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#9). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M tests/metadata/test_last_ddl_time_update.py 7 files changed, 202 insertions(+), 161 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/9 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 8: Code-Review+1 Alex, do you have time to look at this? -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 07 May 2018 15:42:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 8: (4 comments) http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py File tests/metadata/test_last_ddl_time_update.py: http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@52 PS7, Line 52: > Mention the supported substitutions in the methods below (and add comments Done http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@56 PS7, Line 56: substituted > Is there a reason we don't test changes to both? Do they never change at th There is no statement that should change both. http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@76 PS7, Line 76: ns the given queries > Do you need to specify the class to access TestHelper? Yes, adding the outer classes is necessary in this case, h = TestHelper(...) would lead to "NameError: global name 'TestHelper' is not defined". It is also valid to use self.TestHelper(...) but I think that it is weird, so I prefer adding the class name. http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@179 PS7, Line 179: # dynamic partition insert > Can you move this method into the Helper class now? Done - this change resulted in a bit less code but more indentation, so I am not sure that it is better this way. -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 04 May 2018 17:50:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#8). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_last_ddl_time_update.py 6 files changed, 201 insertions(+), 161 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/8 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py File tests/metadata/test_last_ddl_time_update.py: http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@52 PS7, Line 52: self.substitutions = {'TBL': "%s.%s" % (db_name, tbl_name), 'WAREHOUSE': WAREHOUSE} Mention the supported substitutions in the methods below (and add comments there). http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@56 PS7, Line 56: False, False Is there a reason we don't test changes to both? Do they never change at the same time? http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@76 PS7, Line 76: TestLastDdlTimeUpdate Do you need to specify the class to access TestHelper? http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@179 PS7, Line 179: def run_test(self, query, db_name, table_name, Can you move this method into the Helper class now? -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 02 May 2018 18:14:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#7). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_last_ddl_time_update.py 6 files changed, 182 insertions(+), 152 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/7 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#6). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_last_ddl_time_update.py 6 files changed, 182 insertions(+), 152 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/6 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#5). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_last_ddl_time_update.py 6 files changed, 182 insertions(+), 152 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/5 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 4: (6 comments) > - At this point every kind of COMPUTE STATS update > impala.lastComputeStatsTime, including INCREMENTAL, TABLESAMPLE, > and calls that only COMPUTE STATS for specific columns. I can > exclude some of these, but I think that the current behavior should > cause the "least surprise". That seems reasonable to me, but please check with Alex. > - I think that the tests should be reorganized, but I would like to > finalize the behavior before doing that. Yes. :) http://gerrit.cloudera.org:8080/#/c/10116/4/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: http://gerrit.cloudera.org:8080/#/c/10116/4/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@247 PS4, Line 247: msClient.alter_table(msTable_.getDbName(), nit: single line? http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2857 PS3, Line 2857:* command if the metadata is not completely in-sync. This affects both Hive and > Is the missing field explanation really necessary? I think that the current lgtm http://gerrit.cloudera.org:8080/#/c/10116/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10116/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@782 PS4, Line 782: if (params.isSetTable_stats()) updateTableStats(params, msTbl); : : if (params.isSetTable_stats()) { This could be one check now http://gerrit.cloudera.org:8080/#/c/10116/4/tests/metadata/test_last_ddl_time_update.py File tests/metadata/test_last_ddl_time_update.py: http://gerrit.cloudera.org:8080/#/c/10116/4/tests/metadata/test_last_ddl_time_update.py@24 PS4, Line 24: # Checks different statements effect the last DDL time and last compute stats time. nit: Checks that... http://gerrit.cloudera.org:8080/#/c/10116/4/tests/metadata/test_last_ddl_time_update.py@50 PS4, Line 50: # compute statistics the fill table property impala.lastComputeStatsTime nit: "to fill the.."? http://gerrit.cloudera.org:8080/#/c/10116/4/tests/metadata/test_last_ddl_time_update.py@167 PS4, Line 167: False, False Can you think of ways to make these calls more readable? We could have to constants, e.g. DDL=1 and STATS=2 and then pass a list of changed values. These would then read self.run_test("bla", unique_database, TBL_NAME, [DDL, STATS]) There might be better ways I cannot think of right now. -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 27 Apr 2018 19:46:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 4: (11 comments) I do not consider this change to be complete, as there are some open questions: - At this point every kind of COMPUTE STATS update impala.lastComputeStatsTime, including INCREMENTAL, TABLESAMPLE, and calls that only COMPUTE STATS for specific columns. I can exclude some of these, but I think that the current behavior should cause the "least surprise". - I think that the tests should be reorganized, but I would like to finalize the behavior before doing that. http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@240 PS3, Line 240: > Can you explain in a comment why we need to clone the table instead of remo I have changed the logic to upload data to HMS only if it was actually changed. The cloning was a misunderstanding on my part: CatalogOpExecutor clones HMS tables before calling alter_table, but the reason is that it does not want to update its own representation till the update is actually successful, so it alters the clone and calls alter_table with it. After alter_table, it downloads the table from HMS and overwrites its representation with this new version, so Catalog always contains a version which was loaded from HMS. Kudu tables are different, as Impala tries to be in sync with Kudu instead of HMS, and this final alter_table's goal is to ensure that HMS is in sync with Kudu. http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@728 PS3, Line 728:* but not for ALTER TABLE SET COLUMN STATS. > This comment should point out that it updates impala.lastComputeStatsTime, Done http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@782 PS3, Line 782: u > Why do we only update this if both are set? Why not if we're just updating I tried to be as strict as possible to ensure that only COMPUTE STATS update impala.lastComputeStatsTime, but it turned out to be unnecessary. http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2857 PS3, Line 2857:* command if the metadata is not completely in-sync. This affects both Hive and > Explain the bool parameter in the comment including what value the lastDdlT Is the missing field explanation really necessary? I think that the current comment states quite clearly that the alter_table overwrites the whole table. http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2862 PS3, Line 2862:*/ > unused? Done http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3221 PS3, Line 3221:* parent. > Assuming this line is still correct, can you add a comment where this happe It is not correct (since a long time). http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py File tests/metadata/test_last_ddl_time_update.py: http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@50 PS3, Line 50: # compute statistics the fill table property impala.lastComputeStatsTime > This should be more verbose, explain why it is necessary to initialize it. I would like to add more comments after the tests are reorganized to their final shape. http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@75 PS3, Line 75: self.execute_query("drop stats %s" % FQ_TBL_NAME) > We should also have a test for computing incremental stats, for sampled sta Done http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@145 PS3, Line 145: # change table property > I think we should have defaults for both expect_* parameters here, or for n Done http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@161 PS3, Line 161: # drop statistics > nit: beforeStatsTime Done http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@180 PS3, Line 180: LAST_COMPUTE_STATS_TIME_KEY = "impala.lastComputeStatsTime" > stats_time, given the key is lastComputeStatsTime, too. Done -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#4). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_last_ddl_time_update.py 6 files changed, 152 insertions(+), 131 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/4 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 3: (11 comments) http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@240 PS3, Line 240: // HMS updates transient_lastDdlTime if it is removed. Can you explain in a comment why we need to clone the table instead of removing the key from msTable_? http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@728 PS3, Line 728:* and 'numUpdatedColumns', respectively. This comment should point out that it updates impala.lastComputeStatsTime, and that it only does so when running compute stats. http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@782 PS3, Line 782: && Why do we only update this if both are set? Why not if we're just updating one? http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2857 PS3, Line 2857:* longer period of time. Explain the bool parameter in the comment including what value the lastDdlTime is set to if it's set to true. Since we already describe the HMS API we could also mention what happens with missing fields (they get removed). http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2862 PS3, Line 2862: long lastDdlTime = -1; unused? http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3221 PS3, Line 3221:* Updates the lastDdlTime of the table if new partitions were created. Assuming this line is still correct, can you add a comment where this happens? e.g. "This call also updates lastDdlTime" http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py File tests/metadata/test_last_ddl_time_update.py: http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@50 PS3, Line 50: # initialize last compute stats time This should be more verbose, explain why it is necessary to initialize it. http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@75 PS3, Line 75: self.run_test("compute stats %s" % FQ_TBL_NAME, We should also have a test for computing incremental stats, for sampled stats, and for manually updating both table and column stats. http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@145 PS3, Line 145:expect_changed_ddl_time, expect_changed_stat_time = False): I think we should have defaults for both expect_* parameters here, or for none. http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@161 PS3, Line 161: beforeStatTime = table.parameters[LAST_COMPUTE_STATS_TIME_KEY] nit: beforeStatsTime http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@180 PS3, Line 180: if expect_changed_stat_time: stats_time, given the key is lastComputeStatsTime, too. -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 19 Apr 2018 21:08:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/10116/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10116/2//COMMIT_MSG@28 PS2, Line 28: ablo > typo, and missing 'to' Done http://gerrit.cloudera.org:8080/#/c/10116/2//COMMIT_MSG@30 PS2, Line 30: > + 'be'? Done http://gerrit.cloudera.org:8080/#/c/10116/2/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: http://gerrit.cloudera.org:8080/#/c/10116/2/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@239 PS2, Line 239: msTable_.deepCopy(); > Why do we deep copy it now? Do be honest, I am not sure about this part - without deep copy, DESCRIBE TABLE did not work as intended for Kudu tables, transient_lastDdlTime was missing. I consider the Kudu part to be "work in progress". Note that CatalogOpExecutor always deep copies the tables before calling applyAlterTable(), which calls HmsClient's alter_table(), so this seams to be the normal way to do it in Impala. -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 19 Apr 2018 15:59:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#3). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - handling of "transient_lastDdlTime" is simplified to use HMS's default mechanism by removing this property before calling alter_table() - computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime" - for Kudu behavior, see notes Notes: - Kudu tables work a bit differently, because they overwrite their HMS representation whenever they are reloaded from Kudu. This means that "transient_lastDdlTime" is increased during several operations, for example any operation that loads the table after INVALIDATE METADATA. I did not change this behavior. Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_last_ddl_time_update.py 5 files changed, 105 insertions(+), 116 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/3 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 2: (3 comments) Looks good, only had some nit comments and questions. http://gerrit.cloudera.org:8080/#/c/10116/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10116/2//COMMIT_MSG@28 PS2, Line 28: ablo typo, and missing 'to' http://gerrit.cloudera.org:8080/#/c/10116/2//COMMIT_MSG@30 PS2, Line 30: + 'be'? http://gerrit.cloudera.org:8080/#/c/10116/2/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: http://gerrit.cloudera.org:8080/#/c/10116/2/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@239 PS2, Line 239: msTable_.deepCopy(); Why do we deep copy it now? Don't we need to set msTable_ if there were no exceptions? -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 19 Apr 2018 15:16:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - handling of "transient_lastDdlTime" is simplified to use HMS's default mechanism by removing this property before calling alter_table() - computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime" - for Kudu behavior, see notes Notes: - Kudu tables work a bit differently, because they overwrite their HMS representation whenever they are reloaded from Kudu. This means that "transient_lastDdlTime" is increased during several operations, for example any operation that loads the table after INVALIDATE METADATA. I did not change this behavior. Kudu will be ablo sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_last_ddl_time_update.py 5 files changed, 105 insertions(+), 116 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/2 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10116 Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - handling of "transient_lastDdlTime" is simplified to use HMS's default mechanism by removing this property before calling alter_table() - computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime" - for Kudu behavior, see notes Notes: - Kudu tables work a bit differently, because they overwrite their HMS representation whenever they are reloaded from Kudu. This means that "transient_lastDdlTime" is increased during several operations, for example any operation that loads the table after INVALIDATE METADATA. I did not change this behavior. Kudu will be ablo sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_last_ddl_time_update.py 5 files changed, 108 insertions(+), 120 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/1 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer