Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 )
Change subject: IMPALA-6131: Track time of last statistics update in metadata ...................................................................... Patch Set 4: (11 comments) I do not consider this change to be complete, as there are some open questions: - At this point every kind of COMPUTE STATS update impala.lastComputeStatsTime, including INCREMENTAL, TABLESAMPLE, and calls that only COMPUTE STATS for specific columns. I can exclude some of these, but I think that the current behavior should cause the "least surprise". - I think that the tests should be reorganized, but I would like to finalize the behavior before doing that. http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@240 PS3, Line 240: > Can you explain in a comment why we need to clone the table instead of remo I have changed the logic to upload data to HMS only if it was actually changed. The cloning was a misunderstanding on my part: CatalogOpExecutor clones HMS tables before calling alter_table, but the reason is that it does not want to update its own representation till the update is actually successful, so it alters the clone and calls alter_table with it. After alter_table, it downloads the table from HMS and overwrites its representation with this new version, so Catalog always contains a version which was loaded from HMS. Kudu tables are different, as Impala tries to be in sync with Kudu instead of HMS, and this final alter_table's goal is to ensure that HMS is in sync with Kudu. http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@728 PS3, Line 728: * but not for ALTER TABLE SET COLUMN STATS. > This comment should point out that it updates impala.lastComputeStatsTime, Done http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@782 PS3, Line 782: u > Why do we only update this if both are set? Why not if we're just updating I tried to be as strict as possible to ensure that only COMPUTE STATS update impala.lastComputeStatsTime, but it turned out to be unnecessary. http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2857 PS3, Line 2857: * command if the metadata is not completely in-sync. This affects both Hive and > Explain the bool parameter in the comment including what value the lastDdlT Is the missing field explanation really necessary? I think that the current comment states quite clearly that the alter_table overwrites the whole table. http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2862 PS3, Line 2862: */ > unused? Done http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3221 PS3, Line 3221: * parent. > Assuming this line is still correct, can you add a comment where this happe It is not correct (since a long time). http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py File tests/metadata/test_last_ddl_time_update.py: http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@50 PS3, Line 50: # compute statistics the fill table property impala.lastComputeStatsTime > This should be more verbose, explain why it is necessary to initialize it. I would like to add more comments after the tests are reorganized to their final shape. http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@75 PS3, Line 75: self.execute_query("drop stats %s" % FQ_TBL_NAME) > We should also have a test for computing incremental stats, for sampled sta Done http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@145 PS3, Line 145: # change table property > I think we should have defaults for both expect_* parameters here, or for n Done http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@161 PS3, Line 161: # drop statistics > nit: beforeStatsTime Done http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@180 PS3, Line 180: LAST_COMPUTE_STATS_TIME_KEY = "impala.lastComputeStatsTime" > stats_time, given the key is lastComputeStatsTime, too. Done -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Thu, 26 Apr 2018 18:30:13 +0000 Gerrit-HasComments: Yes