Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12267 )
Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST() ...................................................................... Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/cast-functions-ir.cc@172 PS3, Line 172: DateTimeFormatContext format_ctx(cast_format.c_str(), cast_format.length()); : if (!ParseFormatTokens(&format_ctx)) return StringVal::null(); I am concerned about the performance impact of doing this for every row - TimestampFunctions::StringValFromTimestamp() and ToTimestamp() in timestamp-functions-ir.cc does the format parsing only once if the format argument is constant. The parsing uses the heap which is also something we try to avoid in expressions. In this case the format is always constant, and the parsing TimestampValue.Format() does not modify the DateTimeFormatContext, so it does not seem too complex to do the initialization only once for a FunctionContext. http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/scalar-expr-evaluator.cc File be/src/exprs/scalar-expr-evaluator.cc: http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/scalar-expr-evaluator.cc@129 PS3, Line 129: string cast_format = root_.GetCastFormat(); I am not too familiar with ScalarExprEvaluator, but this functions seems strange: as I understand, it uses the cast format of the root expression to set the cast format in the function context of every expression. What happens if the root is not a CAST function or if there are multiple CASTs in the tree with different formats? Please add tests for these cases. Some examples (sorry if I made mistakes in the queries/results, I didn't verify them): select (cast(ch as timestamp format 'yyyy/dd/MM HH:mm:ss') > cast(ch as timestamp format 'yyyy/MM/dd HH:mm:ss')) from $DATABASE.tbl_cast_format -- should return True (as dec 11 > nov 12) select cast(cast(ch as timestamp format 'yyyy/dd/MM HH:mm:ss'), as string format 'HH:mm:ss MM-dd-yyyy')); from $DATABASE.tbl_cast_format -- should return '13:14:15 12-11-2019' -- To view, visit http://gerrit.cloudera.org:8080/12267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697 Gerrit-Change-Number: 12267 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Greg Rahn <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Paul Rogers <[email protected]> Gerrit-Comment-Date: Thu, 31 Jan 2019 18:09:38 +0000 Gerrit-HasComments: Yes
