[Impala-ASF-CR] IMPALA-3894: Changed the behavior parsing 2-digit year values
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3894: Changed the behavior parsing 2-digit year values .. Patch Set 4: (8 comments) Looks pretty good. No major concerns, we just need some tests and a bit of cleanup to make it easier to follow. http://gerrit.cloudera.org:8080/#/c/7530/4//COMMIT_MSG Commit Message: Line 16: > select from_unixtime(unix_timestamp('31-AUG-94', 'dd-MMM-yy'),'MMdd'); Can you add some tests in expr-test for some different cases to cover all of the code paths? I think you can make the value of "now()" deterministic if you set up a RuntimeState(), and pass in a TQueryCtx with now_string set. http://gerrit.cloudera.org:8080/#/c/7530/4/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: Line 159: dt_ctx->now = context->impl()->state()->now(); We should be able to do this during the Prepare() phase of this function. It looks like that's implemented in UnixAndFromUnixPrepare() according to the function registry: https://github.com/apache/incubator-impala/blob/master/common/function-registry/impala_functions.py#L246 http://gerrit.cloudera.org:8080/#/c/7530/4/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: Line 449: if (tok_len <= 2) { You can do this on one line (see https://google.github.io/styleguide/cppguide.html#Conditionals) if (tok_len <= 2) realign_year = true; PS4, Line 549: dt_ctx.now->date() Let's factor this common subexpression out into a separate variable for readability. Line 551: dt_result->year += (century_start_year / 100) * 100; Could you add a comment giving an example of what the value might be at this point. It's subtle.. Line 555: if (TimestampValue(parsed_date, parsed_time) < Could you add a comment giving an example of when we need to increment by 100? http://gerrit.cloudera.org:8080/#/c/7530/4/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: Line 139: /// Current time to determine the actual year when parsing 2-digit year token nit: add a "." for consistency with other comments. Line 140: const TimestampValue* now; Let's just store a copy of the TimestampValue inline instead of the pointer. Makes it easier to understand who owns the memory. -- To view, visit http://gerrit.cloudera.org:8080/7530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5da761255915dc741f1dcc488fd4ef6ecc385896 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3894: Changed the behavior parsing 2-digit year values
Greg Rahn has posted comments on this change. Change subject: IMPALA-3894: Changed the behavior parsing 2-digit year values .. Patch Set 4: IIRC that's the calculation SimpleDateFormat uses for two digit years and likely where Hive gets it from. If tests prove that true, LGTM. -- To view, visit http://gerrit.cloudera.org:8080/7530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5da761255915dc741f1dcc488fd4ef6ecc385896 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3894: Changed the behavior parsing 2-digit year values
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3894: Changed the behavior parsing 2-digit year values .. Patch Set 4: Greg, you've got a lot of experience looking at date/time functions. What do you think about emulating Hive's behaviour here? It looks like it factors in the current year when determining how to interpret a 2 digit yet. -- To view, visit http://gerrit.cloudera.org:8080/7530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5da761255915dc741f1dcc488fd4ef6ecc385896 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3894: Changed the behavior parsing 2-digit year values
Tianyi Wang has uploaded a new change for review. http://gerrit.cloudera.org:8080/7530 Change subject: IMPALA-3894: Changed the behavior parsing 2-digit year values .. IMPALA-3894: Changed the behavior parsing 2-digit year values This patch changed the behavor when running unix_timestamp(string, string) function. Before the change Impala directly adds 2000 to the year parsed. Behavior after change is the same as Hive's, shifting the parsed date into the interval [current time - 80 years, current time + 20 years). Given query > select from_unixtime(unix_timestamp('31-AUG-94', 'dd-MMM-yy'),'MMdd'); Impala would output 20940831 before the change and 19940831 with this patch applied. unix_timestamp function with other forms of parameters is not affected. Change-Id: I5da761255915dc741f1dcc488fd4ef6ecc385896 --- M be/src/exprs/timestamp-functions-ir.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 27 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/7530/4 -- To view, visit http://gerrit.cloudera.org:8080/7530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I5da761255915dc741f1dcc488fd4ef6ecc385896 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang