Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16910

to look at the new patch set (#7).

Change subject: [WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior
......................................................................

[WIP] IMPALA-10410: Query hints for turn on/off UTF-8 behavior

This patch introduces a pair of query hints, UTF8_MODE_ON and
UTF8_MODE_OFF, to turn on/off the UTF-8 aware behavior. The query hint
should be put in front of the select list. It will affect the query
block and all subquery blocks until there is another hint in the
subquery. See more examples in the test file.

Implementation:
Each Analyzer instance is corresponding to a query block. We introduces
a flag, isUtf8Mode_, in Analyzer. When analyzing a SelectStmt, we change
the order to analyze the hints of the select list first. Then each
Analyzer will get the correct utf-8 hint. When detecting whether the
current query block is in UTF-8 mode, check the flag first. If it's not
set, inherits the ancessor Analyzer's state.

There are some gotchas in the current code base:
 - Type instances are shared across FE, e.g. metadata, slot descriptors,
   exprs, etc. Changing the utf8 marker of a Type instance should make
   sure it's not shared in any other places. Otherwise, we could
   accidentially change the utf8 mode of other parts.
 - In planning phase, some exprs are substituted and re-analyzed. The
   utf8 markers can lost due to using an analyzer of other scope.
 - The arg type of a ScalarFnCall expr is assumed to be the identical
   with the return type of the child expr. This is not true after this
   patch, since they could be in different utf8 mode.
The first problem can be resolved by cloning Type instances whenever we
do an assignment. To limit the scope of the code changes, we just do
this for creating slot descriptors. For exprs, we add the utf8 marker to
it as well and mark it there (instead of marking utf8 mode for each
child types and return type). This simplify the work since we Exprs
instances won't be shared with the metadata codes.

For the second problem, we change the utf8 marker from boolean to
Boolean, and initialize it as null. Precondition checks are added if the
utf8 marker is flipped. In the planning phase, analyzers will be marked
as in planning (by a new state field). Expr#analyze will ignore the utf8
marker of the analyzer in this state, which helps to keep its original
utf8 marker (if has).

For the third problem, we just need to take care of FunctionContextImpl
creation and the related codegen code paths. When generating arg types,
use the utf8 marker of the ScalarFnCall expr instead.

Tests:
 - Add tests for using query hints with string functions.
 - TODO: Add more tests for reading from tables.

Change-Id: I7fa20b62b5cb06169048b0785b70e85a9f21bf07
---
M be/src/exprs/anyval-util.h
M be/src/exprs/expr.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M common/thrift/Types.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/CollectionStructType.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TypeDef.java
M fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java
M fe/src/main/java/org/apache/impala/catalog/ArrayType.java
M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/Function.java
M fe/src/main/java/org/apache/impala/catalog/IcebergStructField.java
M fe/src/main/java/org/apache/impala/catalog/MapType.java
M fe/src/main/java/org/apache/impala/catalog/ScalarFunction.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/catalog/StructField.java
M fe/src/main/java/org/apache/impala/catalog/StructType.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
A testdata/workloads/functional-query/queries/QueryTest/utf8-hints.test
M tests/query_test/test_utf8_strings.py
34 files changed, 361 insertions(+), 97 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/16910/7
--
To view, visit http://gerrit.cloudera.org:8080/16910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7fa20b62b5cb06169048b0785b70e85a9f21bf07
Gerrit-Change-Number: 16910
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>

Reply via email to