Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12267 )

Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST()
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/cast-functions-ir.cc@172
PS3, Line 172:     DateTimeFormatContext format_ctx(cast_format.c_str(), 
cast_format.length());
             :     if (!ParseFormatTokens(&format_ctx)) return 
StringVal::null();
I am concerned about the performance impact of doing this for every row - 
TimestampFunctions::StringValFromTimestamp() and ToTimestamp() in 
timestamp-functions-ir.cc does the format parsing only once if the format 
argument is constant. The parsing uses the heap which is also something we try 
to avoid in expressions.

In this case the format is always constant, and the parsing 
TimestampValue.Format() does not modify the DateTimeFormatContext, so it does 
not seem too complex to do the initialization only once for a FunctionContext.


http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/scalar-expr-evaluator.cc
File be/src/exprs/scalar-expr-evaluator.cc:

http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/scalar-expr-evaluator.cc@129
PS3, Line 129:   string cast_format = root_.GetCastFormat();
I am not too familiar with ScalarExprEvaluator, but this functions seems 
strange: as I understand, it uses the cast format of the root expression to set 
the cast format in the function context of every expression. What happens if 
the root is not a CAST function or if there are multiple CASTs in the tree with 
different formats?

Please add tests for these cases. Some examples (sorry if I made mistakes in 
the queries/results, I didn't verify them):

select
 (cast(ch as timestamp format 'yyyy/dd/MM HH:mm:ss') >
 cast(ch as timestamp format 'yyyy/MM/dd HH:mm:ss'))
from $DATABASE.tbl_cast_format
-- should return True (as dec 11 > nov 12)

select
 cast(cast(ch as timestamp format 'yyyy/dd/MM HH:mm:ss'),
  as string format 'HH:mm:ss MM-dd-yyyy'));
from $DATABASE.tbl_cast_format
-- should return '13:14:15 12-11-2019'



--
To view, visit http://gerrit.cloudera.org:8080/12267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697
Gerrit-Change-Number: 12267
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Greg Rahn <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Paul Rogers <[email protected]>
Gerrit-Comment-Date: Thu, 31 Jan 2019 18:09:38 +0000
Gerrit-HasComments: Yes

Reply via email to