Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14714 )
Change subject: IMPALA-8705: ISO:SQL:2016 datetime patterns - Milestone 3 ...................................................................... Patch Set 1: (17 comments) http://gerrit.cloudera.org:8080/#/c/14714/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14714/1//COMMIT_MSG@12 PS1, Line 12: The tokens introduced: : - Full month name: In a string to datetime conversion this token can : parse textual month name into a datetime type. In a datetime to : string conversion this token gives the textual representation of a : month. : - Short month name: Similar to the full month name token but this : works for 3-character month names like 'JAN'. : - Full day name: In a datetime to string conversion this token gives : the textual representation of a day like 'Tuesday.' Not suppported : in a string to datetime conversion. : - Short day name: Similar to full day name token but this works for : 3-character day names like 'TUE'. : - Day of week: In a datetime to string conversion this gives a number : in [1-7] where 1 represents Sunday. Not supported in a string to : datetime conversion. : - Quarter of year: In a datetime to string conversion this gives a : number in [1-4] representing a quarter of the year. Not supported : in a string to datetime conversion. : - Week of year: In a datetime to string conversion this gives a : number in [1-53] to represent the week of year where the first week : starts from 1st of January. Not supported in a string to datetime : conversion. : - Week of month: In a datetime to string conversion this gives a : number in [1-5] to represent the week of month where the first week : starts from the first day of the month. Not supported in a string : to datetime conversion. Please add the actual format tokens to the commit message as well. http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/exprs/timestamp-functions.h File be/src/exprs/timestamp-functions.h: http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/exprs/timestamp-functions.h@71 PS1, Line 71: CAMELCASE CAMELCASE is a bit misleading as strings like "eBay", "iPhone" are also called camel case (https://en.wikipedia.org/wiki/Camel_case). CAPITALIZED or something similar would be a better name. http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/exprs/timestamp-functions.cc@59 PS1, Line 59: const string TimestampFunctions::DAY_ARRAY[7] = {"Sun", "Mon", "Tue", "Wed", "Thu", : "Fri", "Sat"}; Is this still necessary? We can just use SHORT_DAY_NAMES[CAMELCASE] instead. http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/date-parse-util.cc File be/src/runtime/date-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/date-parse-util.cc@212 PS1, Line 212: int DateParser::GetDayOfWeek(const DateValue& date) { DCHECK(date.IsValid()); http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-iso-sql-format-tokenizer.cc File be/src/runtime/datetime-iso-sql-format-tokenizer.cc: http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@77 PS1, Line 77: {"FF3", 3} Does "FF3" have to be listed here? The length of the token is already 3. http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@77 PS1, Line 77: {"HH", 2}, Does "HH" have to be listed here? The length of the token is already 2. http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.h File be/src/runtime/datetime-parser-common.h: http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.h@69 PS1, Line 69: MAX_SHORT_MONTH_NAME_LENGTH Probably this should be just called 'SHORT_MONTH_NAME_LENGTH' as all short month names are of the same length. http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.h@74 PS1, Line 74: // Length of short day names like 'JAN', 'FEB, etc. Comment should be about short day names and not about short month names. http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.h@75 PS1, Line 75: MAX_SHORT_DAY_NAME_LENGTH Same as L74: should be called 'SHORT_DAY_NAME_LENGTH'. http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.h@296 PS1, Line 296: ased on : /// how the month token is given in 'tok' nit: Based on the casing of the month format token in 'tok'. http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.h@302 PS1, Line 302: Based on how the day token is given in 'tok' nit: Same as L297. http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.cc File be/src/runtime/datetime-parser-common.cc: http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.cc@197 PS1, Line 197: const auto* search_map = &MONTH_NAME_TO_INDEX; : if (type == MONTH_NAME_SHORT) search_map = &SHORT_MONTH_NAME_TO_INDEX; nit: Using the ternary operator and a constant reference instead of a pointer would be more straightforward: const auto& search_map = (type == MONTH_NAME_SHORT) ? MONTH_NAME_TO_INDEX : SHORT_MONTH_NAME_TO_INDEX; http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.cc@276 PS1, Line 276: tok.len == MAX_SHORT_MONTH_NAME_LENGTH Using (tok.type == MONTH_NAME_SHORT) condition would be more straightforward. http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.cc@282 PS1, Line 282: TimestampFunctions::MONTH_NAMES_PADDED[text_case][num_of_month - 1]; Is this behavior documented in the ANSI standard? http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.cc@295 PS1, Line 295: tok.len == MAX_SHORT_DAY_NAME_LENGTH Using (tok.type == DAY_NAME_SHORT) condition would be more straightforward. http://gerrit.cloudera.org:8080/#/c/14714/1/be/src/runtime/datetime-parser-common.cc@299 PS1, Line 299: TimestampFunctions::DAY_NAMES_PADDED[text_case][day - 1]; Is this behavior defined in the ANSI standard? http://gerrit.cloudera.org:8080/#/c/14714/1/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14714/1/tests/query_test/test_cast_with_format.py@301 PS1, Line 301: # Test different lowercase vs uppercase scenarios with the string to datetime path. In all the string -> datetime conversions, month-name is surrounded by separators. Is that a deliberate limitation? I assume technically it would be possible to parse a month name even if it is not followed by a separator. We only have a limited set of month names and they are not each others prefixes. Anyway, if this is a limitation please add a comment about it to the commit message. -- To view, visit http://gerrit.cloudera.org:8080/14714 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic797f19a1311b54e5d00d01d0a7afe1f0f21fb8f Gerrit-Change-Number: 14714 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Fri, 15 Nov 2019 14:39:30 +0000 Gerrit-HasComments: Yes
