Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
......................................................................


Patch Set 12:

(10 comments)

Sorry for the delay here. This is surprisingly tricky given the way that the 
existing code worked.

http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/exprs/expr-test.cc@3010
PS12, Line 3010:               TimestampValue::Parse("2001-01-21 
01:05:1.11211"));
There's some attempt in the code to handle trailing timezone offsets. Can we 
add a positive or negative test for that case?


http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.h@179
PS12, Line 179:   /// Parse a default date/time string. The default timestamp 
format is:
This comment doesn't seem to match the function.

I think this also needs a high-level explanation of what the function does. My 
current understanding is that it returns a set of format tokens that *may* 
allow us to parse the string and leaves the final validation to the parsing 
step.


http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.h@191
PS12, Line 191:   static bool ParseFormatTokensByStr(DateTimeFormatContext* 
dt_ctx);
Nit: add a space before the next function for consistency.


http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@218
PS12, Line 218: bool 
TimestampParser::ParseFormatTokensByStr(DateTimeFormatContext* dt_ctx){
I'm really struggling to understand this function, particularly in what cases 
it accepts or rejects the tokens.

The function this was based off of was a lot more complex because it needed to 
handle tokens in varying order.

Given that the order of tokens we expect to see is fixed, I think this function 
could be made vastly easier to understand by getting rid of the loop and 
checking the tokens in order. I.e. it should be something like:

  // Parse a 4-digit year.
  tok_end = ParseDigitToken(str, str_end)
  if (tok_end - str != 4) return ...;
  dt_ctx->toks.push_back(DateTimeFormatToken(YEAR, ...));
  str = tok_end;
  ParseSeparator(str, str_end, "-", ...);

  // Parse a 1 or 2 digit month.
  tok_end = ParseDigitToken(str, str_end)
  ...


http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@229
PS12, Line 229:     if ((*str == 'T') || (*str == 'Z') || (!isalnum(*str))) {
Do we really need to handle arbitrary separators? It seems like this will allow 
a lot of weird input. It seems bad to be overly permissive.

I asked a question in the JIRA about this.


http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@232
PS12, Line 232:         dt_ctx->toks.push_back(DateTimeFormatToken(TZ_OFFSET, 
str - str_begin,
Do we need to support timezone offsets here?


http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@282
PS12, Line 282:     if (tok_len == 1) ++dt_ctx->fmt_out_len;
Do we need to compute fmt_out_len for this case?


http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@292
PS12, Line 292:   // lazy_ctx is made local so that we do not run into races 
where
We can remove this comment (it's interesting but won't have much future value).


http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@395
PS12, Line 395:
nit: trailing whitespace


http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@461
PS12, Line 461: date TimestampParser::RealignYear(const DateTimeParseResult& 
dt_result,
nit: can you move this back to the previous location. It's annoying that the 
order doesn't match the header but moving code around makes maintaining the 
code a bit harder because you end up with false changes in the history.



--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 12
Gerrit-Owner: Vincent Tran <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Vincent Tran <[email protected]>
Gerrit-Comment-Date: Fri, 26 Jan 2018 00:03:26 +0000
Gerrit-HasComments: Yes

Reply via email to