Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3894: Change the behavior parsing date "YY" ......................................................................
Patch Set 9: (9 comments) A few code comments that I think you can address now, plus a few behavior questions for Greg - just to double check he's OK with this new behavior. Lets get his input on them before proceeding. http://gerrit.cloudera.org:8080/#/c/7530/9//COMMIT_MSG Commit Message: PS9, Line 11: same as Hive's How did you determine exactly what Hive's behavior is - did you find a reference in documentation, or code? Can you add a link? PS9, Line 12: into the interval [current time - 80 years, current time + 20 years). looks like postgresql's behavior is different. I think they just break at the unix epoch. Lets see if Greg is OK with us following Hive's behavior if that differs from Postgres. We may wanna look at a few more databases to see if there is consensus. mj=# select to_date('05 Dec 69', 'DD Mon YY'); to_date ------------ 2069-12-05 (1 row) mj=# select to_date('05 Dec 70', 'DD Mon YY'); to_date ------------ 1970-12-05 (1 row) http://gerrit.cloudera.org:8080/#/c/7530/9/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: PS9, Line 26: # not needed here if it's also included in the header PS9, Line 448: // Taking tok_len > 3 literally is Hive behavior. this comment isn't clear to me PS9, Line 547: unsigned short century_start_year = now_date.year() - 80; : date century_start_date(century_start_year, now_date.month(), now_date.day()); you can use boost operator-(time_duration) this can also be precomputed as part of the DateTimeContext to avoid some cycles on every row PS9, Line 557: TimestampValue just use ptime directly. we're trying to avoid instantiating TimestampValue except for real data (i.e. as part of a tuple) PS9, Line 558: TimestampValue same PS9, Line 558: TimestampValue(century_start_date, dt_ctx.now.time()) it might be worth pre-computing this as part of the DateTimeFormatContext to cut down on the overhead when parsing a lot of rows http://gerrit.cloudera.org:8080/#/c/7530/9/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: PS9, Line 481: // Test 1-digit year format with time to show the exact boundary : (TimestampTC("y-MM-dd HH:mm:ss", "37-07-28 16:14:23", false, true, false, : 2037, 7, 28, 16, 14, 23)) : (TimestampTC("y-MM-dd HH:mm:ss", "37-07-28 16:14:24", false, true, false, : 1937, 7, 28, 16, 14, 24)) hm, postgres does something different with 'y', let's see what Greg wants us to do. It always adds 2000: mj=# select to_date('05 Dec 0', 'DD Mon y'); to_date ------------ 2000-12-05 (1 row) mj=# select to_date('05 Dec 9', 'DD Mon y'); to_date ------------ 2009-12-05 (1 row) mj=# select to_date('05 Dec 99', 'DD Mon y'); to_date ------------ 2099-12-05 (1 row) mj=# select to_date('05 Dec 999', 'DD Mon y'); to_date ------------ 2999-12-05 (1 row) mj=# select to_date('05 Dec 9999', 'DD Mon y'); to_date ------------- 11999-12-05 (1 row) -- To view, visit http://gerrit.cloudera.org:8080/7530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5da761255915dc741f1dcc488fd4ef6ecc385896 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang <tw...@cloudera.com> Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes