Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 )
Change subject: IMPALA-5315: Cast to timestamp fails for YYYY-M-D format ...................................................................... Patch Set 14: (17 comments) It looks like we're getting close. http://gerrit.cloudera.org:8080/#/c/7009/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7009/14//COMMIT_MSG@25 PS14, Line 25: in expr-test.cc to be inline with existing tests for CAST() function. I think we still need some end-to-end tests that exercise the new function in the context of a query, and probably also one that scans a text table with this format. Let me know if I can help with setting those up. http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.h@182 PS14, Line 182: parse nit: "to be parsed" here and below. http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@217 PS14, Line 217: const char* TimestampParser::ParseDigitToken(const char* str, const char* str_end){ nit: needs space http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@229 PS14, Line 229: while(tok_end < str_end) { nit: needs space http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@236 PS14, Line 236: bool TimestampParser::ParseFormatTokensByStr(DateTimeFormatContext* dt_ctx){ nit: needs space http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@239 PS14, Line 239: DCHECK(dt_ctx->fmt_len > 0); DCHECK_GT(dt_ctx->fmt_len, 0); http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@240 PS14, Line 240: size DCHECK_EQ(dt_ctx->toks.size(), 0); http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@249 PS14, Line 249: dt_ctx->toks.push_back(DateTimeFormatToken(YEAR,str - str_begin, tok_end - str, nit: needs space http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@250 PS14, Line 250: str)); nit: replace tab with spaces http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@263 PS14, Line 263: dt_ctx->toks.push_back(DateTimeFormatToken(MONTH_IN_YEAR, str - str_begin, tok_end - str, nit: > 90 chars http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@283 PS14, Line 283: dt_ctx->has_date_toks We can just return true here, right? http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@340 PS14, Line 340: factional typo: fractional http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@346 PS14, Line 346: if (tok_end - str > 9) { This is really equivalent to the following, right? int num_digits = min(9, tok_end - str); dt_ctx->toks.push_back(DateTimeFormatToken(FRACTION, str - str_begin, num_digits, str)); http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@357 PS14, Line 357: dt_ctx->has_time_toks this can just be 'true', right? http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@457 PS14, Line 457: NULL nit: nullptr http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@458 PS14, Line 458: lazy_ctx.Reset(str, lazy_len); It's annoying that we have to pay the cost of initializing lazy_ctx when we don't use it. It's probably not that bad, but I'd rather that the cost of this extra functionality was as close to zero as possible. Can we avoid that by restructuring this as: if (dt_ctx != nullptr) { return Parse(str, len, *dt_ctx, d, t); } else { DateTimeFormatContext lazy_ctx; if (ParseFormatTokensByStr(&lazy_ctx)) { // Use lazy_ctx } else { *d = boost::gregorian::date(); *t = boost::posix_time::time_duration(boost::posix_time::not_a_date_time); return false; } } http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@460 PS14, Line 460: dt_ctx = &lazy_ctx; nit: indentation -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 14 Gerrit-Owner: Vincent Tran <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Vincent Tran <[email protected]> Gerrit-Comment-Date: Thu, 08 Mar 2018 01:42:29 +0000 Gerrit-HasComments: Yes
