[Impala-ASF-CR] IMPALA-3894: Changed the behavior parsing 2-digit year values

2017-07-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3894: Changed the behavior parsing 2-digit year values
..


Patch Set 4:

(8 comments)

Looks pretty good. No major concerns, we just need some tests and a bit of 
cleanup to make it easier to follow.

http://gerrit.cloudera.org:8080/#/c/7530/4//COMMIT_MSG
Commit Message:

Line 16: > select from_unixtime(unix_timestamp('31-AUG-94', 
'dd-MMM-yy'),'MMdd');
Can you add some tests in expr-test for some different cases to cover all of 
the code paths?

I think you can make the value of "now()" deterministic if you set up a 
RuntimeState(), and pass in a TQueryCtx with now_string set.


http://gerrit.cloudera.org:8080/#/c/7530/4/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

Line 159:   dt_ctx->now = context->impl()->state()->now();
We should be able to do this during the Prepare() phase of this function. It 
looks like that's implemented in UnixAndFromUnixPrepare() according to the 
function registry: 
https://github.com/apache/incubator-impala/blob/master/common/function-registry/impala_functions.py#L246


http://gerrit.cloudera.org:8080/#/c/7530/4/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

Line 449: if (tok_len <= 2) {
You can do this on one line (see 
https://google.github.io/styleguide/cppguide.html#Conditionals)

  if (tok_len <= 2) realign_year = true;


PS4, Line 549: dt_ctx.now->date()
Let's factor this common subexpression out into a separate variable for 
readability.


Line 551: dt_result->year += (century_start_year / 100) * 100;
Could you add a comment giving an example of what the value might be at this 
point. It's subtle..


Line 555: if (TimestampValue(parsed_date, parsed_time) <
Could you add a comment giving an example of when we need to increment by 100?


http://gerrit.cloudera.org:8080/#/c/7530/4/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

Line 139:   /// Current time to determine the actual year when parsing 2-digit 
year token
nit: add a "." for consistency with other comments.


Line 140:   const TimestampValue* now;
Let's just store a copy of the TimestampValue inline instead of the pointer. 
Makes it easier to understand who owns the memory.


-- 
To view, visit http://gerrit.cloudera.org:8080/7530
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5da761255915dc741f1dcc488fd4ef6ecc385896
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3894: Changed the behavior parsing 2-digit year values

2017-07-27 Thread Greg Rahn (Code Review)
Greg Rahn has posted comments on this change.

Change subject: IMPALA-3894: Changed the behavior parsing 2-digit year values
..


Patch Set 4:

IIRC that's the calculation SimpleDateFormat uses for two digit years and 
likely where Hive gets it from.  If tests prove that true, LGTM.

-- 
To view, visit http://gerrit.cloudera.org:8080/7530
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5da761255915dc741f1dcc488fd4ef6ecc385896
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3894: Changed the behavior parsing 2-digit year values

2017-07-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3894: Changed the behavior parsing 2-digit year values
..


Patch Set 4:

Greg, you've got a lot of experience looking at date/time functions.

What do you think about emulating Hive's behaviour here? It looks like it 
factors in the current year when determining how to interpret a 2 digit yet.

-- 
To view, visit http://gerrit.cloudera.org:8080/7530
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5da761255915dc741f1dcc488fd4ef6ecc385896
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3894: Changed the behavior parsing 2-digit year values

2017-07-27 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7530

Change subject: IMPALA-3894: Changed the behavior parsing 2-digit year values
..

IMPALA-3894: Changed the behavior parsing 2-digit year values

This patch changed the behavor when running
unix_timestamp(string, string) function.
Before the change Impala directly adds 2000 to the year parsed.
Behavior after change is the same as Hive's,
shifting the parsed date into the interval
[current time - 80 years, current time + 20 years).
Given query
> select from_unixtime(unix_timestamp('31-AUG-94', 'dd-MMM-yy'),'MMdd');
Impala would output 20940831 before the change
and 19940831 with this patch applied.
unix_timestamp function with other forms of parameters is not affected.

Change-Id: I5da761255915dc741f1dcc488fd4ef6ecc385896
---
M be/src/exprs/timestamp-functions-ir.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
3 files changed, 27 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/7530/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7530
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5da761255915dc741f1dcc488fd4ef6ecc385896
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang