Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/12267 )
Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST() ...................................................................... Patch Set 3: (13 comments) Hey Paul, Thanks for taking a look! The format used here is exactly the same as used with from_timestamp() and to_timestamp(). In fact I re-used the formatting BE logic here so I would say that a CAST(.. FORMAT .. ) is kind of an alias to from_timestamp() and to_timestamp() functions with a bit more flexibility. I haven't found a to_char() function in Impala either but in my opinion from_timestamp(timestamp, format) kind of serves that purpose. If you feel that this is a gap we should cover then let's open a Jira for this. http://gerrit.cloudera.org:8080/#/c/12267/2/be/src/exprs/scalar-expr.cc File be/src/exprs/scalar-expr.cc: http://gerrit.cloudera.org:8080/#/c/12267/2/be/src/exprs/scalar-expr.cc@181 PS2, Line 181: } else if (texpr_node.__isset.cast_expr && > line too long (93 > 90) Done http://gerrit.cloudera.org:8080/#/c/12267/2/common/thrift/Exprs.thrift File common/thrift/Exprs.thrift: http://gerrit.cloudera.org:8080/#/c/12267/2/common/thrift/Exprs.thrift@143 PS2, Line 143: } > Again, should format be an argument to the cast function for DateTime/strin See my comment on CastExpr.java http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java File fe/src/main/java/org/apache/impala/analysis/CastExpr.java: http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@50 PS2, Line 50: private final String castFormat_; > final Done http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@220 PS2, Line 220: } > If cast is implemented as a function call (which it seems to be), should th It would also be doable. I know because I went for that approach initially but I ran into a few problems with adding another argument: When Impala starts, FE loads cast function information from the BE and in case it doesn't find the cast functions it expects then Impala fails to start. So in order to avoid this I had to extend CastExpr.initBuiltins() function with extra if's like "if this is a string vs timestamp conversion then please search for cast functions with an additional parameter". This made Impala start but still not the desired parameter was passed to the BE cast functions as 'format'. I figured out that another magic had to be applied for string vs timestamp conversions (another set of "if this is that kind of cast") to add another child as the format parameter to the thrift tree that is sent to the BE so that it can gather this parameter well. Long story short I didn't find this approach elegant enough (frankly found it rather ugly) to stick to it. Just a sidenote, that not all TExprNodes are extended with the new cast_expr because it is optional so it is set only in case a CAST(..FORMAT..) was executed. http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@256 PS2, Line 256: if (!(type_.isDateType() && getChild(0).getType().isStringType()) && > Here we are saying that CAST(col AS DATETIME FORMAT '') is the same as CAST Good point, those shouldn't be treated the same here. Empty format currently results an error when parsing on the BE side, and for the users' perspective a NULL as a result. By catching it here I can return additional hints to the user. Note, checking emptiness is easy to handle here. However, the full validation/parsing of the format happens on the BE side and I don't think that the same parsing algorithm should be implemented here on FE side to catch malformed patterns provided by the user. http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@272 PS2, Line 272: // for every type since it is redundant with STRING. Casts to go through 2 casts: > Pass into constructor so that castFormat_ can be final? Good point! Done. http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java: http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@2957 PS2, Line 2957: AnalysisError("select cast("+to_timestamp_cast+" AS STRING FORMAT '')", > Checks for an empty or invalid format string? Added emptiness checks. The format checks happen on the backend so I test invalid format there. http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@2964 PS2, Line 2964: @Test > Order in JUint assertions is expected, actual. It is reversed in the line a Thx, for spotting. Done http://gerrit.cloudera.org:8080/#/c/12267/2/testdata/workloads/functional-query/queries/QueryTest/cast_format.test File testdata/workloads/functional-query/queries/QueryTest/cast_format.test: http://gerrit.cloudera.org:8080/#/c/12267/2/testdata/workloads/functional-query/queries/QueryTest/cast_format.test@20 PS2, Line 20: ---- QUERY > Add error cases? Bad format, etc? Formats seem to be checked at runtime so There is one invalid format scenario covered in test_cast_format_without_row (test_cast_with_format.py). That is basically the same test as this one but it runs quicker because it doesn't require a table to do the cast on. Another note on bad format is that internally the very same logic is re-used as for from_timestamp() and to_timestamp() functions and the format is tested there. http://gerrit.cloudera.org:8080/#/c/12267/2/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/12267/2/tests/query_test/test_cast_with_format.py@18 PS2, Line 18: from tests.common.impala_test_suite import ImpalaTestSuite > flake8: F401 'pytest' imported but unused Done http://gerrit.cloudera.org:8080/#/c/12267/2/tests/query_test/test_cast_with_format.py@23 PS2, Line 23: @classmethod > flake8: E302 expected 2 blank lines, found 1 Done http://gerrit.cloudera.org:8080/#/c/12267/2/tests/query_test/test_cast_with_format.py@33 PS2, Line 33: = > flake8: E502 the backslash is redundant between brackets Done http://gerrit.cloudera.org:8080/#/c/12267/2/tests/query_test/test_cast_with_format.py@81 PS2, Line 81: > flake8: W391 blank line at end of file Done -- To view, visit http://gerrit.cloudera.org:8080/12267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697 Gerrit-Change-Number: 12267 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Paul Rogers <[email protected]> Gerrit-Comment-Date: Thu, 31 Jan 2019 12:21:35 +0000 Gerrit-HasComments: Yes
