Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16094 )
Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter ...................................................................... Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/16094/3/be/src/exprs/timestamp-truncation-expr.h File be/src/exprs/timestamp-truncation-expr.h: http://gerrit.cloudera.org:8080/#/c/16094/3/be/src/exprs/timestamp-truncation-expr.h@18 PS3, Line 18: #ifndef IMPALA_EXPRS_TIMESTAMP_TRUNCATION_EXPR_H_ We prefer to use '#pragma once' for new code http://gerrit.cloudera.org:8080/#/c/16094/3/be/src/exprs/timestamp-truncation-expr.h@30 PS3, Line 30: The children of this Expr Probably worth specifying that this should have exactly one child. http://gerrit.cloudera.org:8080/#/c/16094/3/common/thrift/Exprs.thrift File common/thrift/Exprs.thrift: http://gerrit.cloudera.org:8080/#/c/16094/3/common/thrift/Exprs.thrift@151 PS3, Line 151: struct TTimestampTruncationExpr { Since this doesn't contain anything, its fine to just not include it. It's already for case for example for NULL_LITERAL that there isn't a corresponding TNullLiteral. http://gerrit.cloudera.org:8080/#/c/16094/3/fe/src/main/java/org/apache/impala/analysis/TimestampTruncationExpr.java File fe/src/main/java/org/apache/impala/analysis/TimestampTruncationExpr.java: http://gerrit.cloudera.org:8080/#/c/16094/3/fe/src/main/java/org/apache/impala/analysis/TimestampTruncationExpr.java@30 PS3, Line 30: The children Probably worth specifying that this will have exactly one child. http://gerrit.cloudera.org:8080/#/c/16094/3/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/16094/3/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@347 PS3, Line 347: // Don't create bloom filter targeting for HDFS scan node if the source : // scan slot is Kudu timestamp slot since the timestamp values stored : // in Kudu table are truncated and maybe different from the values stored : // in the HDFS tables. I think this needs more thought. There's two things we need to worry about when deciding to generate/assign a runtime filter: that it isn't going to change the final results of the query, and that its likely to produce a perf improvement. The perf improvement part is already handled by the logic around cardinality and limits on the number of bloom filters and stuff, so the only reason to do this is if generating these filters would change the output of queries. My understanding would be that prior to this patch we would generate filters like this (since we were already generating bloom filters targeting HDFS timestamp columns, and we don't currently consider the source those timestamps are coming from). So is the implication of this that there's a bug currently, i.e. are the Kudu timestamp -> HDFS timestamp bloom filters that would get generated right now resulting in queries returning incorrect values? If so, then of course we should fix it (and probably also file a JIRA for it so that users that run into it can find info about it). If not, then we should leave it as is. As you say, its might not be a good idea to do a join of a Kudu table and an HDFS table on a timestamp column because the timestamps are probably not going to match, but that's not our problem its the users problem, and its not necessarily always going to be the case eg. it might be that the HDFS table was generated by doing a "select *" from a Kudu table, in which case the timestamps will match up (this is a common pattern since Kudu is better for collecting a lot of small inserts, then the data gets moved in bulk to an HDFS table for archiving) http://gerrit.cloudera.org:8080/#/c/16094/3/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@658 PS3, Line 658: true We sometimes specify the names of parameters in a comment when the value is a constant like this, eg. "..., /* isTimestampTruncation */ true);" -- To view, visit http://gerrit.cloudera.org:8080/16094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48 Gerrit-Change-Number: 16094 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Comment-Date: Tue, 23 Jun 2020 15:48:03 +0000 Gerrit-HasComments: Yes
