[Impala-ASF-CR] IMPALA-8705: ISO:SQL:2016 datetime patterns - Milestone 3

Attila Jeges (Code Review) Fri, 15 Nov 2019 06:40:17 -0800

Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14714 )


Change subject: IMPALA-8705: ISO:SQL:2016 datetime patterns - Milestone 3
......................................................................


Patch Set 1:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/14714/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14714/1//COMMIT_MSG@12
PS1, Line 12: The tokens introduced:
            : - Full month name: In a string to datetime conversion this token 
can
            :   parse textual month name into a datetime type. In a datetime to
            :   string conversion this token gives the textual representation 
of a
            :   month.
            : - Short month name: Similar to the full month name token but this
            :   works for 3-character month names like 'JAN'.
            : - Full day name: In a datetime to string conversion this token 
gives
            :   the textual representation of a day like 'Tuesday.' Not 
suppported
            :   in a string to datetime conversion.
            : - Short day name: Similar to full day name token but this works 
for
            :   3-character day names like 'TUE'.
            : - Day of week: In a datetime to string conversion this gives a 
number
            :   in [1-7] where 1 represents Sunday. Not supported in a string to
            :   datetime conversion.
            : - Quarter of year: In a datetime to string conversion this gives a
            :   number in [1-4] representing a quarter of the year. Not 
supported
            :   in a string to datetime conversion.
            : - Week of year: In a datetime to string conversion this gives a
            :   number in [1-53] to represent the week of year where the first 
week
            :   starts from 1st of January. Not supported in a string to 
datetime
            :   conversion.
            : - Week of month: In a datetime to string conversion this gives a
            :   number in [1-5] to represent the week of month where the first 
week
            :   starts from the first day of the month. Not supported in a 
string
            :   to datetime conversion.
Please add the actual format tokens to the commit message as well.


http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/exprs/timestamp-functions.h
File be/src/exprs/timestamp-functions.h:

http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/exprs/timestamp-functions.h@71
PS1, Line 71: CAMELCASE
CAMELCASE is a bit misleading as strings like "eBay", "iPhone" are also called 
camel case (https://en.wikipedia.org/wiki/Camel_case).

CAPITALIZED or something similar would be a better name.


http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/exprs/timestamp-functions.cc@59
PS1, Line 59: const string TimestampFunctions::DAY_ARRAY[7] = {"Sun", "Mon", 
"Tue", "Wed", "Thu",
            :     "Fri", "Sat"};
Is this still necessary? We can just use SHORT_DAY_NAMES[CAMELCASE] instead.


http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/date-parse-util.cc
File be/src/runtime/date-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/date-parse-util.cc@212
PS1, Line 212: int DateParser::GetDayOfWeek(const DateValue& date) {
DCHECK(date.IsValid());


http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@77
PS1, Line 77: {"FF3", 3}
Does "FF3" have to be listed here? The length of the token is already 3.


http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@77
PS1, Line 77: {"HH", 2},
Does "HH" have to be listed here? The length of the token is already 2.


http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.h
File be/src/runtime/datetime-parser-common.h:

http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.h@69
PS1, Line 69: MAX_SHORT_MONTH_NAME_LENGTH
Probably this should be just called 'SHORT_MONTH_NAME_LENGTH' as all short 
month names are of the same length.


http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.h@74
PS1, Line 74: // Length of short day names like 'JAN', 'FEB, etc.
Comment should be about short day names and not about short month names.


http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.h@75
PS1, Line 75: MAX_SHORT_DAY_NAME_LENGTH
Same as L74: should be called 'SHORT_DAY_NAME_LENGTH'.


http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.h@296
PS1, Line 296: ased on
             : /// how the month token is given in 'tok'
nit: Based on the casing of the month format token in 'tok'.


http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.h@302
PS1, Line 302: Based on how the day token is given in 'tok'
nit: Same as L297.


http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.cc
File be/src/runtime/datetime-parser-common.cc:

http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.cc@197
PS1, Line 197:   const auto* search_map = &MONTH_NAME_TO_INDEX;
             :   if (type == MONTH_NAME_SHORT) search_map = 
&SHORT_MONTH_NAME_TO_INDEX;
nit: Using the ternary operator and a constant reference instead of a pointer 
would be more straightforward:

const auto& search_map = (type == MONTH_NAME_SHORT) ?
MONTH_NAME_TO_INDEX : SHORT_MONTH_NAME_TO_INDEX;


http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.cc@276
PS1, Line 276: tok.len == MAX_SHORT_MONTH_NAME_LENGTH
Using (tok.type == MONTH_NAME_SHORT) condition would be more straightforward.


http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.cc@282
PS1, Line 282: TimestampFunctions::MONTH_NAMES_PADDED[text_case][num_of_month - 
1];
Is this behavior documented in the ANSI standard?


http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.cc@295
PS1, Line 295: tok.len == MAX_SHORT_DAY_NAME_LENGTH
Using (tok.type == DAY_NAME_SHORT) condition would be more straightforward.


http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.cc@299
PS1, Line 299:  TimestampFunctions::DAY_NAMES_PADDED[text_case][day - 1];
Is this behavior defined in the ANSI standard?


http://gerrit.cloudera.org:8080/#/c/14714/1/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14714/1/tests/query_test/test_cast_with_format.py@301
PS1, Line 301: # Test different lowercase vs uppercase scenarios with the 
string to datetime path.
In all the string -> datetime conversions, month-name is surrounded by 
separators. Is that a deliberate limitation? I assume technically it would be 
possible to parse a month name even if it is not followed by a separator. We 
only have a limited set of month names and they are not each others prefixes.

Anyway, if this is a limitation please add a comment about it to the commit 
message.



--
To view, visit http://gerrit.cloudera.org:8080/14714
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic797f19a1311b54e5d00d01d0a7afe1f0f21fb8f
Gerrit-Change-Number: 14714
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Fri, 15 Nov 2019 14:39:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8705: ISO:SQL:2016 datetime patterns - Milestone 3

Reply via email to