Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21665 )

Change subject: IMPALA-12865: enable_reload_events breaks 
enable_skipping_older_events by pushing lastRefreshEventId too high
......................................................................


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21665/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/21665/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7176
PS3, Line 7176: l.getFullName());
> Yeah, you are right. We need to prioritize correctness over performance.
Instead of setting 'lastRefreshEventId_' back to -1 in this case, I think we 
can keep the original value if it's not -1. In 
HdfsTable#setLastRefreshEventId(), we only update 'lastRefreshEventId_' if the 
given 'eventId' is larger:
https://github.com/apache/impala/blob/7369ebb8ba02edfedcef071029b7bcd62157f452/fe/src/main/java/org/apache/impala/catalog/Table.java#L1115-L1117

However, in HdfsPartition$Builder#setLastRefreshEventId(), we are missing the 
same check and we can add it:
https://github.com/apache/impala/blob/7369ebb8ba02edfedcef071029b7bcd62157f452/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L1382


http://gerrit.cloudera.org:8080/#/c/21665/6/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/21665/6/tests/custom_cluster/test_events_custom_configs.py@1201
PS6, Line 1201: 2000
Can we set this to 4000? In my local env, 2000 is not enough to fail the test 
before the fix. I think run_stmt_in_hive() is slow.


http://gerrit.cloudera.org:8080/#/c/21665/6/tests/custom_cluster/test_events_custom_configs.py@1212
PS6, Line 1212:     self.client.execute_async("refresh {} 
partition(year=2024)".format(tbl))
Can we get the handle and make sure the statement actually finishes? E.g.

 handle = self.client.execute_async(...)

and after run_stmt_in_hive(), add

    self.wait_for_state(handle, self.client.QUERY_STATES['FINISHED'], timeout=4)
    assert self.client.get_state(handle) == self.client.QUERY_STATES['FINISHED']



--
To view, visit http://gerrit.cloudera.org:8080/21665
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I90039da77ec561c5aede44456f88c6650582815b
Gerrit-Change-Number: 21665
Gerrit-PatchSet: 6
Gerrit-Owner: Sai Hemanth Gantasala <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>
Gerrit-Comment-Date: Fri, 08 Nov 2024 08:53:45 +0000
Gerrit-HasComments: Yes

Reply via email to