Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12267 )

Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST()
......................................................................


Patch Set 3:

(13 comments)

Hey Paul,
Thanks for taking a look!
The format used here is exactly the same as used with from_timestamp() and 
to_timestamp(). In fact I re-used the formatting BE logic here so I would say 
that a CAST(.. FORMAT .. ) is kind of an alias to from_timestamp() and 
to_timestamp() functions with a bit more flexibility.
I haven't found a to_char() function in Impala either but in my opinion 
from_timestamp(timestamp, format) kind of serves that purpose. If you feel that 
this is a gap we should cover then let's open a Jira for this.

http://gerrit.cloudera.org:8080/#/c/12267/2/be/src/exprs/scalar-expr.cc
File be/src/exprs/scalar-expr.cc:

http://gerrit.cloudera.org:8080/#/c/12267/2/be/src/exprs/scalar-expr.cc@181
PS2, Line 181:       } else if (texpr_node.__isset.cast_expr &&
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/12267/2/common/thrift/Exprs.thrift
File common/thrift/Exprs.thrift:

http://gerrit.cloudera.org:8080/#/c/12267/2/common/thrift/Exprs.thrift@143
PS2, Line 143: }
> Again, should format be an argument to the cast function for DateTime/strin
See my comment on CastExpr.java


http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java
File fe/src/main/java/org/apache/impala/analysis/CastExpr.java:

http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@50
PS2, Line 50:   private final String castFormat_;
> final
Done


http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@220
PS2, Line 220:     }
> If cast is implemented as a function call (which it seems to be), should th
It would also be doable. I know because I went for that approach initially but 
I ran into a few problems with adding another argument:
When Impala starts, FE loads cast function information from the BE and in case 
it doesn't find the cast functions it expects then Impala fails to start. So in 
order to avoid this I had to extend CastExpr.initBuiltins() function with extra 
if's like "if this is a string vs timestamp conversion then please search for 
cast functions with an additional parameter". This made Impala start but still 
not the desired parameter was passed to the BE cast functions as 'format'. I 
figured out that another magic had to be applied for string vs timestamp 
conversions (another set of "if this is that kind of cast") to add another 
child as the format parameter to the thrift tree that is sent to the BE so that 
it can gather this parameter well.
Long story short I didn't find this approach elegant enough (frankly found it 
rather ugly) to stick to it.

Just a sidenote, that not all TExprNodes are extended with the new cast_expr 
because it is optional so it is set only in case a CAST(..FORMAT..) was 
executed.


http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@256
PS2, Line 256:       if (!(type_.isDateType() && 
getChild(0).getType().isStringType()) &&
> Here we are saying that CAST(col AS DATETIME FORMAT '') is the same as CAST
Good point, those shouldn't be treated the same here. Empty format currently 
results an error when parsing on the BE side, and for the users' perspective a 
NULL as a result. By catching it here I can return additional hints to the user.

Note, checking emptiness is easy to handle here. However, the full 
validation/parsing of the format happens on the BE side and I don't think that 
the same parsing algorithm should be implemented here on FE side to catch 
malformed patterns provided by the user.


http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@272
PS2, Line 272:       // for every type since it is redundant with STRING. Casts 
to go through 2 casts:
> Pass into constructor so that castFormat_ can be final?
Good point! Done.


http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java:

http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@2957
PS2, Line 2957:     AnalysisError("select cast("+to_timestamp_cast+" AS STRING 
FORMAT '')",
> Checks for an empty or invalid format string?
Added emptiness checks. The format checks happen on the backend so I test 
invalid format there.


http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@2964
PS2, Line 2964:   @Test
> Order in JUint assertions is expected, actual. It is reversed in the line a
Thx, for spotting. Done


http://gerrit.cloudera.org:8080/#/c/12267/2/testdata/workloads/functional-query/queries/QueryTest/cast_format.test
File testdata/workloads/functional-query/queries/QueryTest/cast_format.test:

http://gerrit.cloudera.org:8080/#/c/12267/2/testdata/workloads/functional-query/queries/QueryTest/cast_format.test@20
PS2, Line 20: ---- QUERY
> Add error cases? Bad format, etc? Formats seem to be checked at runtime so
There is one invalid format scenario covered in test_cast_format_without_row 
(test_cast_with_format.py). That is basically the same test as this one but it 
runs quicker because it doesn't require a table to do the cast on.

Another note on bad format is that internally the very same logic is re-used as 
for from_timestamp() and to_timestamp() functions and the format is tested 
there.


http://gerrit.cloudera.org:8080/#/c/12267/2/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/12267/2/tests/query_test/test_cast_with_format.py@18
PS2, Line 18: from tests.common.impala_test_suite import ImpalaTestSuite
> flake8: F401 'pytest' imported but unused
Done


http://gerrit.cloudera.org:8080/#/c/12267/2/tests/query_test/test_cast_with_format.py@23
PS2, Line 23:   @classmethod
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/12267/2/tests/query_test/test_cast_with_format.py@33
PS2, Line 33: =
> flake8: E502 the backslash is redundant between brackets
Done


http://gerrit.cloudera.org:8080/#/c/12267/2/tests/query_test/test_cast_with_format.py@81
PS2, Line 81:
> flake8: W391 blank line at end of file
Done



--
To view, visit http://gerrit.cloudera.org:8080/12267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697
Gerrit-Change-Number: 12267
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Paul Rogers <[email protected]>
Gerrit-Comment-Date: Thu, 31 Jan 2019 12:21:35 +0000
Gerrit-HasComments: Yes

Reply via email to