Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 )
Change subject: IMPALA-5315: Cast to timestamp fails for YYYY-M-D format ...................................................................... Patch Set 12: (10 comments) Sorry for the delay here. This is surprisingly tricky given the way that the existing code worked. http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/exprs/expr-test.cc@3010 PS12, Line 3010: TimestampValue::Parse("2001-01-21 01:05:1.11211")); There's some attempt in the code to handle trailing timezone offsets. Can we add a positive or negative test for that case? http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.h@179 PS12, Line 179: /// Parse a default date/time string. The default timestamp format is: This comment doesn't seem to match the function. I think this also needs a high-level explanation of what the function does. My current understanding is that it returns a set of format tokens that *may* allow us to parse the string and leaves the final validation to the parsing step. http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.h@191 PS12, Line 191: static bool ParseFormatTokensByStr(DateTimeFormatContext* dt_ctx); Nit: add a space before the next function for consistency. http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@218 PS12, Line 218: bool TimestampParser::ParseFormatTokensByStr(DateTimeFormatContext* dt_ctx){ I'm really struggling to understand this function, particularly in what cases it accepts or rejects the tokens. The function this was based off of was a lot more complex because it needed to handle tokens in varying order. Given that the order of tokens we expect to see is fixed, I think this function could be made vastly easier to understand by getting rid of the loop and checking the tokens in order. I.e. it should be something like: // Parse a 4-digit year. tok_end = ParseDigitToken(str, str_end) if (tok_end - str != 4) return ...; dt_ctx->toks.push_back(DateTimeFormatToken(YEAR, ...)); str = tok_end; ParseSeparator(str, str_end, "-", ...); // Parse a 1 or 2 digit month. tok_end = ParseDigitToken(str, str_end) ... http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@229 PS12, Line 229: if ((*str == 'T') || (*str == 'Z') || (!isalnum(*str))) { Do we really need to handle arbitrary separators? It seems like this will allow a lot of weird input. It seems bad to be overly permissive. I asked a question in the JIRA about this. http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@232 PS12, Line 232: dt_ctx->toks.push_back(DateTimeFormatToken(TZ_OFFSET, str - str_begin, Do we need to support timezone offsets here? http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@282 PS12, Line 282: if (tok_len == 1) ++dt_ctx->fmt_out_len; Do we need to compute fmt_out_len for this case? http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@292 PS12, Line 292: // lazy_ctx is made local so that we do not run into races where We can remove this comment (it's interesting but won't have much future value). http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@395 PS12, Line 395: nit: trailing whitespace http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@461 PS12, Line 461: date TimestampParser::RealignYear(const DateTimeParseResult& dt_result, nit: can you move this back to the previous location. It's annoying that the order doesn't match the header but moving code around makes maintaining the code a bit harder because you end up with false changes in the history. -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 12 Gerrit-Owner: Vincent Tran <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Vincent Tran <[email protected]> Gerrit-Comment-Date: Fri, 26 Jan 2018 00:03:26 +0000 Gerrit-HasComments: Yes
