Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20681 )
Change subject: [WIP] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time ...................................................................... Patch Set 6: (3 comments) Thanks for working on the DST issues! I generally think that doing this for compile time predicate push down is critical while runtime filters are secondary, as they only matter if the timestamp column is a join key, which seems atypical for me. For testing: I created a test table for Parquet, int64_timestamps_at_dst_changes. There are only a few values/tests, but it could be a good starting point. http://gerrit.cloudera.org:8080/#/c/20681/5/be/src/exec/kudu/kudu-scanner.cc File be/src/exec/kudu/kudu-scanner.cc: http://gerrit.cloudera.org:8080/#/c/20681/5/be/src/exec/kudu/kudu-scanner.cc@288 PS5, Line 288: TimestampValue ts_min; : TimestampValue ts_max; : if (state_->query_options().convert_kudu_utc_timestamps && : col_type.type == TYPE_TIMESTAMP) { : ts_min = *reinterpret_cast<const TimestampValue*>(min); : ts_max = *reinterpret_cast<const TimestampValue*>(max); : ConvertLocalTimeMinStatToUTC(&ts_min); : ConvertLocalTimeMaxStatToUTC(&ts_max); : min = &ts_min; : max = &ts_max; : } : > > This can filter out values incorrectly during dst changes. The Looks great! http://gerrit.cloudera.org:8080/#/c/20681/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: http://gerrit.cloudera.org:8080/#/c/20681/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@573 PS5, Line 573: kuduPredicate = KuduPredicate.newComparisonPredicate(column, op, unixMicros); > > This may not work correctly during DST change, see my other I agree with the solution. EQ predicates could be turned to an in-list if it is ambiguous. ExprUtil.localTimestampToUnixTimeMicros could get an extra argument to define what kind of value is expected. There is an explanation of pre and post values at https://github.com/google/cctz/blob/2acd9f2b463eee05c354901a9fd5235fe06c017b/include/cctz/time_zone.h#L98 http://gerrit.cloudera.org:8080/#/c/20681/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/20681/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@485 PS5, Line 485: // The filter is targeted for Kudu scan node with source timestamp truncation. : String functionName = analyzer.getQueryOptions().isConvert_kudu_utc_timestamps() ? : "to_unix_micros" : "utc_to_unix_micros"; : Expr toUnixTimeExpr = > > Bloom filters with timezone conversion are tricky - the problem is I think that it is ok to disable bloom filters in case isConvert_kudu_utc_timestamps is true and the timezone is not UTC. I don't know how frequently people use timestamp columns as join keys on Kudu tables - this may be niche use case not worth optimizing. If someone needs this to be fast, they can still rewrite their query to do the join on UTC timestamps, which should be the fastest and most precise way if all tables involved store the timestamp as UTC. I am also ok with adding a query option. An alternative way would be to not push down the bloom filter to Kudu but evaluate in the scanner post timezone conversion. I think that this would work correctly, but would be less optimal than pushing down to Kudu and would still need some work. -- To view, visit http://gerrit.cloudera.org:8080/20681 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3 Gerrit-Change-Number: 20681 Gerrit-PatchSet: 6 Gerrit-Owner: Zihao Ye <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Zihao Ye <[email protected]> Gerrit-Comment-Date: Wed, 15 Nov 2023 14:30:02 +0000 Gerrit-HasComments: Yes
