[Impala-ASF-CR] [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16910 ) Change subject: [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8036/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7fa20b62b5cb06169048b0785b70e85a9f21bf07 Gerrit-Change-Number: 16910 Gerrit-PatchSet: 7 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 21 Jan 2021 09:08:13 + Gerrit-HasComments: No
[Impala-ASF-CR] [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16910 ) Change subject: [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior .. Patch Set 7: Added more tests and more fixes for the revealed problems. The implementation become more complex now. Still need more test coverage and there may be other issues. I'm concerning whether it's worth the complexity to support the query hints... Maybe we can provide the query option first, and implement the hints when users are asking for it. I'll update the first two patches (utf8-func, utf8-char-varchar) to not consider this patch. So their implementation can be simpler hopefully. -- To view, visit http://gerrit.cloudera.org:8080/16910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7fa20b62b5cb06169048b0785b70e85a9f21bf07 Gerrit-Change-Number: 16910 Gerrit-PatchSet: 7 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 21 Jan 2021 08:52:33 + Gerrit-HasComments: No
[Impala-ASF-CR] [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16910 ) Change subject: [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/16910/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/16910/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1275 PS7, Line 1275: if (result.getType().isStringType() && !((ScalarType) result.getType()).isSetUtf8Mode()) { line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/16910/7/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java File fe/src/main/java/org/apache/impala/catalog/ColumnStats.java: http://gerrit.cloudera.org:8080/#/c/16910/7/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java@498 PS7, Line 498: if (colType != null && colType.isFixedLengthType() && !((ScalarType) colType).isUtf8Mode()) { line too long (97 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7fa20b62b5cb06169048b0785b70e85a9f21bf07 Gerrit-Change-Number: 16910 Gerrit-PatchSet: 7 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 21 Jan 2021 08:47:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16910 to look at the new patch set (#7). Change subject: [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior .. [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior This patch introduces a pair of query hints, UTF8_MODE_ON and UTF8_MODE_OFF, to turn on/off the UTF-8 aware behavior. The query hint should be put in front of the select list. It will affect the query block and all subquery blocks until there is another hint in the subquery. See more examples in the test file. Implementation: Each Analyzer instance is corresponding to a query block. We introduces a flag, isUtf8Mode_, in Analyzer. When analyzing a SelectStmt, we change the order to analyze the hints of the select list first. Then each Analyzer will get the correct utf-8 hint. When detecting whether the current query block is in UTF-8 mode, check the flag first. If it's not set, inherits the ancessor Analyzer's state. There are some gotchas in the current code base: - Type instances are shared across FE, e.g. metadata, slot descriptors, exprs, etc. Changing the utf8 marker of a Type instance should make sure it's not shared in any other places. Otherwise, we could accidentially change the utf8 mode of other parts. - In planning phase, some exprs are substituted and re-analyzed. The utf8 markers can lost due to using an analyzer of other scope. - The arg type of a ScalarFnCall expr is assumed to be the identical with the return type of the child expr. This is not true after this patch, since they could be in different utf8 mode. The first problem can be resolved by cloning Type instances whenever we do an assignment. To limit the scope of the code changes, we just do this for creating slot descriptors. For exprs, we add the utf8 marker to it as well and mark it there (instead of marking utf8 mode for each child types and return type). This simplify the work since we Exprs instances won't be shared with the metadata codes. For the second problem, we change the utf8 marker from boolean to Boolean, and initialize it as null. Precondition checks are added if the utf8 marker is flipped. In the planning phase, analyzers will be marked as in planning (by a new state field). Expr#analyze will ignore the utf8 marker of the analyzer in this state, which helps to keep its original utf8 marker (if has). For the third problem, we just need to take care of FunctionContextImpl creation and the related codegen code paths. When generating arg types, use the utf8 marker of the ScalarFnCall expr instead. Tests: - Add tests for using query hints with string functions. - TODO: Add more tests for reading from tables. Change-Id: I7fa20b62b5cb06169048b0785b70e85a9f21bf07 --- M be/src/exprs/anyval-util.h M be/src/exprs/expr.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M common/thrift/Types.thrift M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/CastExpr.java M fe/src/main/java/org/apache/impala/analysis/CollectionStructType.java M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/SelectList.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TypeDef.java M fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java M fe/src/main/java/org/apache/impala/catalog/ArrayType.java M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Function.java M fe/src/main/java/org/apache/impala/catalog/IcebergStructField.java M fe/src/main/java/org/apache/impala/catalog/MapType.java M fe/src/main/java/org/apache/impala/catalog/ScalarFunction.java M fe/src/main/java/org/apache/impala/catalog/ScalarType.java M fe/src/main/java/org/apache/impala/catalog/StructField.java M fe/src/main/java/org/apache/impala/catalog/StructType.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java A testdata/workloads/functional-query/queries/QueryTest/utf8-hints.test M tests/query_test/test_utf8_strings.py 34 files changed, 361 insertions(+), 97 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/16910/7 -- To view, visit http://gerrit.cl
[Impala-ASF-CR] [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16910 ) Change subject: [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/16910/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/16910/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2815 PS5, Line 2815: public void setIsUtf8Mode(boolean isUtf8Mode) throws AnalysisException{ > I mentioned this to you out-of-band but thought I should mention that I thi Yeah, I found some issues when adding more complex tests. I plan to make two refactors: 1) Change the isUtf8 marker from boolean to Boolean object. So it can have null as the initial state, which helps us detect whether the marker is reset. 2) In FE, move the isUtf8 marker from ScalarType to Expr. There are lots of Type instances shared in the metadata. We need explicitly cloning the Type instances in many places, which is error prone. Actually, CastExpr and FunctionCallExpr are the only places where the UTF8 semantic take place. Save the isUtf8 marker in them makes more sense. Still need to examine the substition codes to see whether we'll lose these markers. -- To view, visit http://gerrit.cloudera.org:8080/16910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7fa20b62b5cb06169048b0785b70e85a9f21bf07 Gerrit-Change-Number: 16910 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 13 Jan 2021 09:17:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16910 ) Change subject: [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/16910/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/16910/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2815 PS5, Line 2815: public void setIsUtf8Mode(boolean isUtf8Mode) throws AnalysisException{ I mentioned this to you out-of-band but thought I should mention that I think this will get lost on some expressions when they are substituted and re-analyzed in a different query block or with the root analyzer during planning. -- To view, visit http://gerrit.cloudera.org:8080/16910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7fa20b62b5cb06169048b0785b70e85a9f21bf07 Gerrit-Change-Number: 16910 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 08 Jan 2021 17:45:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16910 ) Change subject: [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior .. Patch Set 2: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/7936/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7fa20b62b5cb06169048b0785b70e85a9f21bf07 Gerrit-Change-Number: 16910 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 04 Jan 2021 09:21:32 + Gerrit-HasComments: No
[Impala-ASF-CR] [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior
Quanlong Huang has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/16910 ) Change subject: [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior .. [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior This patch introduces a pair of query hints, UTF8_MODE_ON and UTF8_MODE_OFF, to turn on/off the UTF-8 aware behavior. The query hint should be put in front of the select list. It will affect the query block and all subquery blocks until there is another hint in the subquery. See more examples in the test file. Implementation: Each Analyzer instance is corresponding to a query block. We introduces a flag, isUtf8Mode_, in Analyzer. When analyzing a SelectStmt, we change the order to analyze the hints of the select list first. Then each Analyzer will get the correct utf-8 hint. When detecting whether the current query block is in UTF-8 mode, check the flag first. If it's not set, inherits the ancessor Analyzer's state. Tests: - Add tests for using query hints with string functions. - TODO: Add more tests for reading from tables. - TODO: Add tests for that consequent tuple descriptors both have char(N) slots but one is utf8 and the other is not. Change-Id: I7fa20b62b5cb06169048b0785b70e85a9f21bf07 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SelectList.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A testdata/workloads/functional-query/queries/QueryTest/utf8-hints.test M tests/query_test/test_utf8_strings.py 5 files changed, 83 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/16910/2 -- To view, visit http://gerrit.cloudera.org:8080/16910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7fa20b62b5cb06169048b0785b70e85a9f21bf07 Gerrit-Change-Number: 16910 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang