Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16094 )

Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/16094/3/be/src/exprs/timestamp-truncation-expr.h
File be/src/exprs/timestamp-truncation-expr.h:

http://gerrit.cloudera.org:8080/#/c/16094/3/be/src/exprs/timestamp-truncation-expr.h@18
PS3, Line 18: #ifndef IMPALA_EXPRS_TIMESTAMP_TRUNCATION_EXPR_H_
We prefer to use '#pragma once' for new code


http://gerrit.cloudera.org:8080/#/c/16094/3/be/src/exprs/timestamp-truncation-expr.h@30
PS3, Line 30: The children of this Expr
Probably worth specifying that this should have exactly one child.


http://gerrit.cloudera.org:8080/#/c/16094/3/common/thrift/Exprs.thrift
File common/thrift/Exprs.thrift:

http://gerrit.cloudera.org:8080/#/c/16094/3/common/thrift/Exprs.thrift@151
PS3, Line 151: struct TTimestampTruncationExpr {
Since this doesn't contain anything, its fine to just not include it.

It's already for case for example for NULL_LITERAL that there isn't a 
corresponding TNullLiteral.


http://gerrit.cloudera.org:8080/#/c/16094/3/fe/src/main/java/org/apache/impala/analysis/TimestampTruncationExpr.java
File fe/src/main/java/org/apache/impala/analysis/TimestampTruncationExpr.java:

http://gerrit.cloudera.org:8080/#/c/16094/3/fe/src/main/java/org/apache/impala/analysis/TimestampTruncationExpr.java@30
PS3, Line 30: The children
Probably worth specifying that this will have exactly one child.


http://gerrit.cloudera.org:8080/#/c/16094/3/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/16094/3/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@347
PS3, Line 347: // Don't create bloom filter targeting for HDFS scan node if the 
source
             :         // scan slot is Kudu timestamp slot since the timestamp 
values stored
             :         // in Kudu table are truncated and maybe different from 
the values stored
             :         // in the HDFS tables.
I think this needs more thought. There's two things we need to worry about when 
deciding to generate/assign a runtime filter: that it isn't going to change the 
final results of the query, and that its likely to produce a perf improvement.

The perf improvement part is already handled by the logic around cardinality 
and limits on the number of bloom filters and stuff, so the only reason to do 
this is if generating these filters would change the output of queries.

My understanding would be that prior to this patch we would generate filters 
like this (since we were already generating bloom filters targeting HDFS 
timestamp columns, and we don't currently consider the source those timestamps 
are coming from). So is the implication of this that there's a bug currently, 
i.e. are the Kudu timestamp -> HDFS timestamp bloom filters that would get 
generated right now resulting in queries returning incorrect values?

If so, then of course we should fix it (and probably also file a JIRA for it so 
that users that run into it can find info about it). If not, then we should 
leave it as is.

As you say, its might not be a good idea to do a join of a Kudu table and an 
HDFS table on a timestamp column because the timestamps are probably not going 
to match, but that's not our problem its the users problem, and its not 
necessarily always going to be the case eg. it might be that the HDFS table was 
generated by doing a "select *" from a Kudu table, in which case the timestamps 
will match up (this is a common pattern since Kudu is better for collecting a 
lot of small inserts, then the data gets moved in bulk to an HDFS table for 
archiving)


http://gerrit.cloudera.org:8080/#/c/16094/3/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@658
PS3, Line 658: true
We sometimes specify the names of parameters in a comment when the value is a 
constant like this, eg. "..., /* isTimestampTruncation */ true);"



--
To view, visit http://gerrit.cloudera.org:8080/16094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
Gerrit-Change-Number: 16094
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Comment-Date: Tue, 23 Jun 2020 15:48:03 +0000
Gerrit-HasComments: Yes

Reply via email to