Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12267 )
Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST() ...................................................................... Patch Set 3: (1 comment) Key question here is the syntax of the format string. There are two questions. First, does our syntax here have to be identical to that of Hive? As I understand it, Impala can write view definitions to HMS that Hive should be able to read and visa/versa. If we used a different format string than Hive, would we know which format to use for a given view? The other consideration is that we'd like to be ISO SQL compliant. The SQL formats are similar to, but distinct from, the Java8 date/time formats. If we go the ISO SQL route, we'd have to parse the SQL format into the Impala (that is, Java 8) format. Finally, if we use ISO SQL formats in CAST, we should also create new functions to do the same tasks using the SQL format strings. (These would be in addition to the existing Hive-compatible functions.) Unfortunately, unless Hive adds the same functions, these will break view compatibility if used in views. http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java File fe/src/main/java/org/apache/impala/analysis/CastExpr.java: http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@220 PS2, Line 220: } > It would also be doable. I know because I went for that approach initially Probably doable. As you noted, we load functions on startup. Cast functions are declared in CastExpr.initBuiltins(). You would have to specially handle the String --> Timestamp and Timestamp --> String functions. The code already has a number of type-specific rules. Then, when analyzing the cast, the FORMAT string would be moved into the third argument position in the AST. From here on, the normal function mechanism should take you the rest of the way. You would need corresponding changes on the BE. Basically, if the format is null, you'd treat the call as a two-arg cast. In fact, you may even be able to register both two- and three- arg overloads of the function and choose between them depending on whether a FORMAT string exists. -- To view, visit http://gerrit.cloudera.org:8080/12267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697 Gerrit-Change-Number: 12267 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Greg Rahn <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Paul Rogers <[email protected]> Gerrit-Comment-Date: Fri, 01 Feb 2019 06:10:44 +0000 Gerrit-HasComments: Yes
