Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10116 )

Change subject: IMPALA-6131: Track time of last statistics update in metadata
......................................................................


Patch Set 4:

(11 comments)

I do not consider this change to be complete, as there are some open questions:
- At this point every kind of COMPUTE STATS update impala.lastComputeStatsTime, 
including INCREMENTAL, TABLESAMPLE, and calls that only COMPUTE STATS for 
specific columns. I can exclude some of these, but I think that the current 
behavior should cause the "least surprise".
- I think that the tests should be reorganized, but I would like to finalize 
the behavior before doing that.

http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@240
PS3, Line 240:
> Can you explain in a comment why we need to clone the table instead of remo
I have changed the logic to upload data to HMS only if it was actually changed.

The cloning was a misunderstanding on my part: CatalogOpExecutor clones HMS 
tables before calling alter_table, but the reason is that it does not want to 
update its own representation till the update is actually successful, so it 
alters the clone and calls alter_table with it. After alter_table, it downloads 
the table from HMS and overwrites its representation with this new version, so 
Catalog always contains a version which was loaded from HMS. Kudu tables are 
different, as Impala tries to be in sync with Kudu instead of HMS, and this 
final alter_table's goal is to ensure that HMS is in sync with Kudu.


http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@728
PS3, Line 728:    * but not for ALTER TABLE SET COLUMN STATS.
> This comment should point out that it updates impala.lastComputeStatsTime,
Done


http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@782
PS3, Line 782:  u
> Why do we only update this if both are set? Why not if we're just updating
I tried to be as strict as possible to ensure that only COMPUTE STATS update 
impala.lastComputeStatsTime, but it turned out to be unnecessary.


http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2857
PS3, Line 2857:    * command if the metadata is not completely in-sync. This 
affects both Hive and
> Explain the bool parameter in the comment including what value the lastDdlT
Is the missing field explanation really necessary? I think that the current 
comment states quite clearly that the alter_table overwrites the whole table.


http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2862
PS3, Line 2862:    */
> unused?
Done


http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3221
PS3, Line 3221:    * parent.
> Assuming this line is still correct, can you add a comment where this happe
It is not correct (since a long time).


http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py
File tests/metadata/test_last_ddl_time_update.py:

http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@50
PS3, Line 50:     # compute statistics the fill table property 
impala.lastComputeStatsTime
> This should be more verbose, explain why it is necessary to initialize it.
I would like to add more comments after the tests are reorganized to their 
final shape.


http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@75
PS3, Line 75:     self.execute_query("drop stats %s" % FQ_TBL_NAME)
> We should also have a test for computing incremental stats, for sampled sta
Done


http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@145
PS3, Line 145:     # change table property
> I think we should have defaults for both expect_* parameters here, or for n
Done


http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@161
PS3, Line 161:     # drop statistics
> nit: beforeStatsTime
Done


http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@180
PS3, Line 180:     LAST_COMPUTE_STATS_TIME_KEY = "impala.lastComputeStatsTime"
> stats_time, given the key is lastComputeStatsTime, too.
Done



--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Thu, 26 Apr 2018 18:30:13 +0000
Gerrit-HasComments: Yes

Reply via email to