[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 18: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 18 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Tue, 13 Mar 2018 22:10:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. IMPALA-5315: Cast to timestamp fails for -M-D format This change allows casting of a string in 'lazy' date/time format to timestamp. The supported lazy date formats are: -[M]M-[d]d -[M]M-[d]d [H]H:[m]m:[s]s[.S] [H]H:[m]m:[s]s[.S] We will incur a SCAN performance penalty (approximately 1/2 TotalReadThroughput) when the string is in one of these lazy date/time format. Testing: Benchmarked the performance consequence by executing this SQL on a private build over 3.8 billion rows: select min(cast (time_string as timestamp)) from private.impala_5315 Added tests for valid and invalid date/time format strings in expr-test.cc to be inline with existing tests for CAST() function. Added end-to-end tests into exprs.test and select-lazy-timestamp.test to exercise the new function within the context of a query. Added tests to exercise the leading and trailing white space trimming behaviour in default and lazy date/time string format (IMPALA-6630). Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Reviewed-on: http://gerrit.cloudera.org:8080/7009 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h A testdata/data/lazy_timestamp.csv M testdata/workloads/functional-query/queries/QueryTest/exprs.test A testdata/workloads/functional-query/queries/QueryTest/select-lazy-timestamp.test M tests/query_test/test_scanners.py 7 files changed, 398 insertions(+), 7 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 19 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 18: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2092/ -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 18 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Tue, 13 Mar 2018 18:22:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 18: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 18 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Tue, 13 Mar 2018 18:22:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 17: Code-Review+1 Did anyone else want to take a look? I'm happy with the code and I think the test coverage is good. -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 17 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Tue, 13 Mar 2018 00:48:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Hello Lars Volker, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7009 to look at the new patch set (#17). Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. IMPALA-5315: Cast to timestamp fails for -M-D format This change allows casting of a string in 'lazy' date/time format to timestamp. The supported lazy date formats are: -[M]M-[d]d -[M]M-[d]d [H]H:[m]m:[s]s[.S] [H]H:[m]m:[s]s[.S] We will incur a SCAN performance penalty (approximately 1/2 TotalReadThroughput) when the string is in one of these lazy date/time format. Testing: Benchmarked the performance consequence by executing this SQL on a private build over 3.8 billion rows: select min(cast (time_string as timestamp)) from private.impala_5315 Added tests for valid and invalid date/time format strings in expr-test.cc to be inline with existing tests for CAST() function. Added end-to-end tests into exprs.test and select-lazy-timestamp.test to exercise the new function within the context of a query. Added tests to exercise the leading and trailing white space trimming behaviour in default and lazy date/time string format (IMPALA-6630). Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h A testdata/data/lazy_timestamp.csv M testdata/workloads/functional-query/queries/QueryTest/exprs.test A testdata/workloads/functional-query/queries/QueryTest/select-lazy-timestamp.test M tests/query_test/test_scanners.py 7 files changed, 398 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/17 -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 17 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Vincent Tran has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 16: (4 comments) Changed the name of the e2e test to be more appropriate for its purpose. http://gerrit.cloudera.org:8080/#/c/7009/16/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/7009/16/be/src/runtime/timestamp-parse-util.cc@28 PS16, Line 28: #include > Standard C++ headers go before the 3rd-party library headers. I.e. line 19 Done http://gerrit.cloudera.org:8080/#/c/7009/16/testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test File testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test: http://gerrit.cloudera.org:8080/#/c/7009/16/testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test@3 PS16, Line 3: select cast(dt_str as timestamp) from lazy_dt_str > should not need a cast here, we want to test the text scanner parsing the d Done http://gerrit.cloudera.org:8080/#/c/7009/16/testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test@4 PS16, Line 4: RESULTS > Let's make this RESULTS: VERIFY_IS_EQUAL_SORTED Done http://gerrit.cloudera.org:8080/#/c/7009/16/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/7009/16/tests/query_test/test_scanners.py@874 PS16, Line 874: STRING > This should be a timestamp column, so we can make sure that the text scanne Done -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 16 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Tue, 13 Mar 2018 00:23:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 16: (1 comment) http://gerrit.cloudera.org:8080/#/c/7009/16/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/7009/16/be/src/exprs/expr-test.cc@6176 PS16, Line 6176: TestStringValue("monthname(cast('2011-02-18 09:10:11.00' as timestamp))", "February"); > nit: long lines. It might be easier to run the patch through clang-format t NM, these were pulled in with a rebase -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 16 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Mon, 12 Mar 2018 23:57:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 16: I started a dry run of the merge just in case there were any test failures or clang-tidy warnings: https://jenkins.impala.io/job/gerrit-verify-dryrun/2086/ -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 16 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Mon, 12 Mar 2018 21:16:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 16: (5 comments) I think this should be the last round of comments. http://gerrit.cloudera.org:8080/#/c/7009/16/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/7009/16/be/src/exprs/expr-test.cc@6176 PS16, Line 6176: TestStringValue("monthname(cast('2011-02-18 09:10:11.00' as timestamp))", "February"); nit: long lines. It might be easier to run the patch through clang-format to fix some of these whitespace and include ordering issues: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536 http://gerrit.cloudera.org:8080/#/c/7009/16/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/7009/16/be/src/runtime/timestamp-parse-util.cc@28 PS16, Line 28: #include Standard C++ headers go before the 3rd-party library headers. I.e. line 19 above. http://gerrit.cloudera.org:8080/#/c/7009/16/testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test File testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test: http://gerrit.cloudera.org:8080/#/c/7009/16/testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test@3 PS16, Line 3: select cast(dt_str as timestamp) from lazy_dt_str should not need a cast here, we want to test the text scanner parsing the datetime column directly http://gerrit.cloudera.org:8080/#/c/7009/16/testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test@4 PS16, Line 4: RESULTS Let's make this RESULTS: VERIFY_IS_EQUAL_SORTED We don't guarantee the order of returned rows without an order by clause (although in practice it will be in the file order for this query plan). http://gerrit.cloudera.org:8080/#/c/7009/16/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/7009/16/tests/query_test/test_scanners.py@874 PS16, Line 874: STRING This should be a timestamp column, so we can make sure that the text scanner does the conversion correctly. -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 16 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Mon, 12 Mar 2018 21:15:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Hello Lars Volker, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7009 to look at the new patch set (#16). Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. IMPALA-5315: Cast to timestamp fails for -M-D format This change allows casting of a string in 'lazy' date/time format to timestamp. The supported lazy date formats are: -[M]M-[d]d -[M]M-[d]d [H]H:[m]m:[s]s[.S] [H]H:[m]m:[s]s[.S] We will incur a SCAN performance penalty (approximately 1/2 TotalReadThroughput) when the string is in one of these lazy date/time format. Testing: Benchmarked the performance consequence by executing this SQL on a private build over 3.8 billion rows: select min(cast (time_string as timestamp)) from private.impala_5315 Added tests for valid and invalid date/time format strings in expr-test.cc to be inline with existing tests for CAST() function. Added end-to-end tests into exprs.test and cast-lazy-datetime-string.test to exercise the new function within the context of a query. Added tests to exercise the leading and trailing white space trimming behaviour in default and lazy date/time string format (IMPALA-6630). Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h A testdata/data/lazy_timestamp.csv A testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test M testdata/workloads/functional-query/queries/QueryTest/exprs.test M tests/query_test/test_scanners.py 7 files changed, 400 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/16 -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 16 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Hello Lars Volker, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7009 to look at the new patch set (#15). Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. IMPALA-5315: Cast to timestamp fails for -M-D format This change allows casting of a string in 'lazy' date/time format to timestamp. The supported lazy date formats are: -[M]M-[d]d -[M]M-[d]d [H]H:[m]m:[s]s[.S] [H]H:[m]m:[s]s[.S] We will incur a SCAN performance penalty (approximately 1/2 TotalReadThroughput) when the string is in one of these lazy date/time format. Testing: Benchmarked the performance consequence by executing this SQL on a private build over 3.8 billion rows: select min(cast (time_string as timestamp)) from private.impala_5315 Added tests for valid and invalid date/time format strings in expr-test.cc to be inline with existing tests for CAST() function. Added end-to-end tests into exprs.test and cast-lazy-datetime-string.test to exercise the new function within the context of a query. Added tests to exercise the leading and trailing white space trimming behaviour in default and lazy date/time string format (IMPALA-6630). Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h A testdata/data/lazy_timestamp.csv A testdata/workloads/functional-query/queries/QueryTest/cast-lazy-datetime-string.test M testdata/workloads/functional-query/queries/QueryTest/exprs.test M tests/query_test/test_scanners.py 7 files changed, 399 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/15 -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 15 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Vincent Tran has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 14: (18 comments) http://gerrit.cloudera.org:8080/#/c/7009/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7009/14//COMMIT_MSG@25 PS14, Line 25: in expr-test.cc to be inline with existing tests for CAST() function. > I think we still need some end-to-end tests that exercise the new function Done http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/exprs/expr-test.cc@3086 PS14, Line 3086: TestIsNull("cast('1909-10-2 12:32:1.111.111.2' as timestamp)", TYPE_TIMESTAMP); > I spent some time trying to break the code. No luck yet. I realised we shou Done http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.h@182 PS14, Line 182: parse > nit: "to be parsed" here and below. Done http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@217 PS14, Line 217: const char* TimestampParser::ParseDigitToken(const char* str, const char* str_end){ > nit: needs space Done http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@229 PS14, Line 229: while(tok_end < str_end) { > nit: needs space Done http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@236 PS14, Line 236: bool TimestampParser::ParseFormatTokensByStr(DateTimeFormatContext* dt_ctx){ > nit: needs space Done http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@239 PS14, Line 239: DCHECK(dt_ctx->fmt_len > 0); > DCHECK_GT(dt_ctx->fmt_len, 0); Done http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@240 PS14, Line 240: size > DCHECK_EQ(dt_ctx->toks.size(), 0); Done http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@249 PS14, Line 249: dt_ctx->toks.push_back(DateTimeFormatToken(YEAR,str - str_begin, tok_end - str, > nit: needs space Done http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@250 PS14, Line 250: str)); > nit: replace tab with spaces Done http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@263 PS14, Line 263: dt_ctx->toks.push_back(DateTimeFormatToken(MONTH_IN_YEAR, str - str_begin, tok_end - str, > nit: > 90 chars Done http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@283 PS14, Line 283: dt_ctx->has_date_toks > We can just return true here, right? Indeed. Fixed. http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@340 PS14, Line 340: factional > typo: fractional Done http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@346 PS14, Line 346: if (tok_end - str > 9) { > This is really equivalent to the following, right? That is definitely more succinct. Fixed. http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@357 PS14, Line 357: dt_ctx->has_time_toks > this can just be 'true', right? Done http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@457 PS14, Line 457: NULL > nit: nullptr Done http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@458 PS14, Line 458: lazy_ctx.Reset(str, lazy_len); > It's annoying that we have to pay the cost of initializing lazy_ctx when we Done http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@460 PS14, Line 460: dt_ctx = &lazy_ctx; > nit: indentation Done -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 14 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Sat, 10 Mar 2018 23:18:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 14: (1 comment) http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/exprs/expr-test.cc@3086 PS14, Line 3086: TestIsNull("cast('1909-10-2 12:32:1.111.111.2' as timestamp)", TYPE_TIMESTAMP); I spent some time trying to break the code. No luck yet. I realised we should systematically test invalid formats, to try and exercise all of the exit paths out of your function. Anyway, I stamped out some tests to convince myself of the correctness. I think we should add them to this patchset: https://github.com/timarmstrong/impala/tree/timestamp-parse-tests I did find some interesting behaviour though with leading and trailing whitespace: IMPALA-6630. Can you add tests for that behaviour with the lazy and non-lazy formats to make sure we cover those code paths and don't accidentally break the current behaviour. -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 14 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Fri, 09 Mar 2018 01:51:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 14: (17 comments) It looks like we're getting close. http://gerrit.cloudera.org:8080/#/c/7009/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7009/14//COMMIT_MSG@25 PS14, Line 25: in expr-test.cc to be inline with existing tests for CAST() function. I think we still need some end-to-end tests that exercise the new function in the context of a query, and probably also one that scans a text table with this format. Let me know if I can help with setting those up. http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.h@182 PS14, Line 182: parse nit: "to be parsed" here and below. http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@217 PS14, Line 217: const char* TimestampParser::ParseDigitToken(const char* str, const char* str_end){ nit: needs space http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@229 PS14, Line 229: while(tok_end < str_end) { nit: needs space http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@236 PS14, Line 236: bool TimestampParser::ParseFormatTokensByStr(DateTimeFormatContext* dt_ctx){ nit: needs space http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@239 PS14, Line 239: DCHECK(dt_ctx->fmt_len > 0); DCHECK_GT(dt_ctx->fmt_len, 0); http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@240 PS14, Line 240: size DCHECK_EQ(dt_ctx->toks.size(), 0); http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@249 PS14, Line 249: dt_ctx->toks.push_back(DateTimeFormatToken(YEAR,str - str_begin, tok_end - str, nit: needs space http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@250 PS14, Line 250: str)); nit: replace tab with spaces http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@263 PS14, Line 263: dt_ctx->toks.push_back(DateTimeFormatToken(MONTH_IN_YEAR, str - str_begin, tok_end - str, nit: > 90 chars http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@283 PS14, Line 283: dt_ctx->has_date_toks We can just return true here, right? http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@340 PS14, Line 340: factional typo: fractional http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@346 PS14, Line 346: if (tok_end - str > 9) { This is really equivalent to the following, right? int num_digits = min(9, tok_end - str); dt_ctx->toks.push_back(DateTimeFormatToken(FRACTION, str - str_begin, num_digits, str)); http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@357 PS14, Line 357: dt_ctx->has_time_toks this can just be 'true', right? http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@457 PS14, Line 457: NULL nit: nullptr http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@458 PS14, Line 458: lazy_ctx.Reset(str, lazy_len); It's annoying that we have to pay the cost of initializing lazy_ctx when we don't use it. It's probably not that bad, but I'd rather that the cost of this extra functionality was as close to zero as possible. Can we avoid that by restructuring this as: if (dt_ctx != nullptr) { return Parse(str, len, *dt_ctx, d, t); } else { DateTimeFormatContext lazy_ctx; if (ParseFormatTokensByStr(&lazy_ctx)) { // Use lazy_ctx } else { *d = boost::gregorian::date(); *t = boost::posix_time::time_duration(boost::posix_time::not_a_date_time); return false; } } http://gerrit.cloudera.org:8080/#/c/7009/14/be/src/runtime/timestamp-parse-util.cc@460 PS14, Line 460: dt_ctx = &lazy_ctx; nit: indentation -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 14 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Thu, 08 Mar 2018 01:42:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 14: I've been neglecting reviews - need to get back to this one. -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 14 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Sat, 03 Mar 2018 01:20:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Hello Lars Volker, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7009 to look at the new patch set (#14). Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. IMPALA-5315: Cast to timestamp fails for -M-D format This change allows casting of a string in 'lazy' date/time format to timestamp. The supported lazy date formats are: -[M]M-[d]d -[M]M-[d]d [H]H:[m]m:[s]s[.S] [H]H:[m]m:[s]s[.S] We will incur a SCAN performance penalty (approximately 1/2 TotalReadThroughput) when the string is in one of these lazy date/time format. Testing: Benchmarked the performance consequence by executing this SQL on a private build over 3.8 billion rows: select min(cast (time_string as timestamp)) from private.impala_5315 Added tests for valid and invalid date/time format strings in expr-test.cc to be inline with existing tests for CAST() function. Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 224 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/14 -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 14 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Vincent Tran has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 12: (10 comments) http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/exprs/expr-test.cc@3010 PS12, Line 3010: TimestampValue::Parse("2001-01-21 01:05:1.11211")); > There's some attempt in the code to handle trailing timezone offsets. Can w Removed timezone offset support as it was not part of the request from the Jira. http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.h@179 PS12, Line 179: /// Parse a default date/time string. The default timestamp format is: > This comment doesn't seem to match the function. Looks like I inserted my function header between a comment and its function. Fixed http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.h@191 PS12, Line 191: static bool ParseFormatTokensByStr(DateTimeFormatContext* dt_ctx); > Nit: add a space before the next function for consistency. Done http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@218 PS12, Line 218: bool TimestampParser::ParseFormatTokensByStr(DateTimeFormatContext* dt_ctx){ > I'm really struggling to understand this function, particularly in what cas Done. Changed to serial parsing. http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@229 PS12, Line 229: if ((*str == 'T') || (*str == 'Z') || (!isalnum(*str))) { > Do we really need to handle arbitrary separators? It seems like this will a No. Changed to only accept '-' for date separator, ':' for time separator and '.' for the fractional second separator. http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@232 PS12, Line 232: dt_ctx->toks.push_back(DateTimeFormatToken(TZ_OFFSET, str - str_begin, > Do we need to support timezone offsets here? I guess we don't have to - based on the Jira's description. Removed. http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@282 PS12, Line 282: if (tok_len == 1) ++dt_ctx->fmt_out_len; > Do we need to compute fmt_out_len for this case? No. Removed http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@292 PS12, Line 292: // lazy_ctx is made local so that we do not run into races where > We can remove this comment (it's interesting but won't have much future val Done http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@395 PS12, Line 395: > nit: trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@461 PS12, Line 461: date TimestampParser::RealignYear(const DateTimeParseResult& dt_result, > nit: can you move this back to the previous location. It's annoying that th Done -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 12 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Sat, 10 Feb 2018 23:59:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Hello Lars Volker, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7009 to look at the new patch set (#13). Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. IMPALA-5315: Cast to timestamp fails for -M-D format This change allows casting of a string in 'lazy' date/time format to timestamp. The supported lazy date formats are: -[M]M-[d]d -[M]M-[d]d [H]H:[m]m:[s]s[.S] [H]H:[m]m:[s]s[.S] We will incur a SCAN performance penalty (approximately 1/2 TotalReadThroughput) when the string is in one of these lazy date/time format. Testing: Benchmarked the performance consequence by executing this SQL on a private build over 3.8 billion rows: select min(cast (time_string as timestamp)) from private.impala_5315 Added tests for valid and invalid date/time format strings in expr-test.cc to be inline with existing tests for CAST() function. Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 223 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/13 -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 13 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 12: (10 comments) Sorry for the delay here. This is surprisingly tricky given the way that the existing code worked. http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/exprs/expr-test.cc@3010 PS12, Line 3010: TimestampValue::Parse("2001-01-21 01:05:1.11211")); There's some attempt in the code to handle trailing timezone offsets. Can we add a positive or negative test for that case? http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.h@179 PS12, Line 179: /// Parse a default date/time string. The default timestamp format is: This comment doesn't seem to match the function. I think this also needs a high-level explanation of what the function does. My current understanding is that it returns a set of format tokens that *may* allow us to parse the string and leaves the final validation to the parsing step. http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.h@191 PS12, Line 191: static bool ParseFormatTokensByStr(DateTimeFormatContext* dt_ctx); Nit: add a space before the next function for consistency. http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@218 PS12, Line 218: bool TimestampParser::ParseFormatTokensByStr(DateTimeFormatContext* dt_ctx){ I'm really struggling to understand this function, particularly in what cases it accepts or rejects the tokens. The function this was based off of was a lot more complex because it needed to handle tokens in varying order. Given that the order of tokens we expect to see is fixed, I think this function could be made vastly easier to understand by getting rid of the loop and checking the tokens in order. I.e. it should be something like: // Parse a 4-digit year. tok_end = ParseDigitToken(str, str_end) if (tok_end - str != 4) return ...; dt_ctx->toks.push_back(DateTimeFormatToken(YEAR, ...)); str = tok_end; ParseSeparator(str, str_end, "-", ...); // Parse a 1 or 2 digit month. tok_end = ParseDigitToken(str, str_end) ... http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@229 PS12, Line 229: if ((*str == 'T') || (*str == 'Z') || (!isalnum(*str))) { Do we really need to handle arbitrary separators? It seems like this will allow a lot of weird input. It seems bad to be overly permissive. I asked a question in the JIRA about this. http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@232 PS12, Line 232: dt_ctx->toks.push_back(DateTimeFormatToken(TZ_OFFSET, str - str_begin, Do we need to support timezone offsets here? http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@282 PS12, Line 282: if (tok_len == 1) ++dt_ctx->fmt_out_len; Do we need to compute fmt_out_len for this case? http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@292 PS12, Line 292: // lazy_ctx is made local so that we do not run into races where We can remove this comment (it's interesting but won't have much future value). http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@395 PS12, Line 395: nit: trailing whitespace http://gerrit.cloudera.org:8080/#/c/7009/12/be/src/runtime/timestamp-parse-util.cc@461 PS12, Line 461: date TimestampParser::RealignYear(const DateTimeParseResult& dt_result, nit: can you move this back to the previous location. It's annoying that the order doesn't match the header but moving code around makes maintaining the code a bit harder because you end up with false changes in the history. -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 12 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Fri, 26 Jan 2018 00:03:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Hello Lars Volker, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7009 to look at the new patch set (#12). Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. IMPALA-5315: Cast to timestamp fails for -M-D format This change allows casting of a string in 'lazy' date/time format to timestamp. The supported lazy date formats are: -[M]M-[d]d -[M]M-[d]d [H]H:[m]m:[s]s[.S] -[M]M-[d]dT[H]H:[m]m:[s]s[.S] We will also take a variety of separators (including "-", "/", "T", "Z", ":") We will incur a SCAN performance penalty (approximately 1/2 TotalReadThroughput) when the string is in one of these lazy date/time format. Testing: Benchmarked the performance consequence by executing this SQL on a private build over 3.8 billion rows: select min(cast (time_string as timestamp)) from private.impala_5315 Added tests for valid and invalid date/time format strings in expr-test.cc to be inline with existing tests for CAST() function. Finally, this change also moves the following two functions to be adjacent to each other for readability: Parse(const char* str, int len, boost::gregorian::date* d, boost::posix_time::time_duration* t); Parse(const char* str, int len, const DateTimeFormatContext& dt_ctx, boost::gregorian::date* d, boost::posix_time::time_duration* t); Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 156 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/12 -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 12 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Vincent Tran has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 10: (5 comments) http://gerrit.cloudera.org:8080/#/c/7009/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7009/10//COMMIT_MSG@39 PS10, Line 39: Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f > I think we need some additional tests for bad data and some query tests tha Sorry about that. Fixed obvious crashes due to perpetual retry and wrong result bugs (in the fractional component) due to a persistent CACHED_CTX after previous failed parse(). Added bad data tests. When you say "query tests that scan tables with mixed-format" do you mean a private test on my console, or actually creating a mixed-format table in one of the test databases as part of the patch? http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.h@267 PS10, Line 267: static DateTimeFormatContext LAZY_CTX; > Having this as a static variable doesn't seem right, since it will be share This is an interesting challenge that I did not take into account. I have made lazy_ctx local and CACHED_CTX global. We accept that CACHED_CTX can be modified by multiple threads. In which case thread-local lazy_ctx will need to be re-parsed. 1) In most scenarios, all values in a data/time string column should share the same format, CACHED_CTX will unlikely to be modified. 2) In scenarios where values in date/time string columns are of mixed format, CAHCED_CTX will be a best-effort optimization. In the worst case, CACHED_CTX will always fail to match the current format so we will take the same perf. hit as the path without this optimization. http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc@378 PS10, Line 378: LIKELY > LIKELY doesn't seem right here - I think we can just remove it. Done http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc@381 PS10, Line 381: if (LAZY_CTX.fmt != NULL) dt_ctx = &LAZY_CTX; > nit: we use braces on both branches if there is an else. Done http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc@383 PS10, Line 383: LAZY_CTX.Reset(str, len); > nit: missing indentation. Done -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 10 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Mon, 18 Dec 2017 01:51:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Hello Lars Volker, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7009 to look at the new patch set (#11). Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. IMPALA-5315: Cast to timestamp fails for -M-D format This change allows casting of a string in 'lazy' date format to timestamp. The supported lazy date formats are: -[M]M-[d]d -[M]M-[d]d [H]H:[m]m:[s]s[.S] -[M]M-[d]dT[H]H:[m]m:[s]s[.S] We will also take a variety of separators (including "-", "/", "T", "Z", ":") The parse logic will: 1) Try to use the default template timestamp formats first since it is more performant. 2) Try to reuse the previously generated lazy timestamp format (if exists) to maintain comparable SCAN performance if all values in column have the same timestamp format. 3) Discover the new template format if both prior paths fail. In the worst case, where we have to generate a new timestamp format for each value in a column (columns with variable timestamp format), we will incur a SCAN performance penalty (approximately 1/2 TotalReadThroughput). Finally, this change also moves the following two functions to be adjacent to each other for readability: Parse(const char* str, int len, boost::gregorian::date* d, boost::posix_time::time_duration* t); Parse(const char* str, int len, const DateTimeFormatContext& dt_ctx, boost::gregorian::date* d, boost::posix_time::time_duration* t); Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 205 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/11 -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 11 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 10: (5 comments) http://gerrit.cloudera.org:8080/#/c/7009/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7009/10//COMMIT_MSG@39 PS10, Line 39: Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f I think we need some additional tests for bad data and some query tests that scan tables with mixed-format timestamps to test that it correctly switches to a different context. I was able to crash things with the current patch fairly easily. Repro: drop table if exists timestamp_strs; create table timestamp_strs as select '2001-1-2' as str; insert into timestamp_strs select '2001-10-2'; insert into timestamp_strs select * from timestamp_strs; insert into timestamp_strs select * from timestamp_strs; insert into timestamp_strs select * from timestamp_strs; insert into timestamp_strs select * from timestamp_strs; insert into timestamp_strs select '2001-foo-bar'; select str, cast(str as timestamp) from timestamp_strs; http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.h@267 PS10, Line 267: static DateTimeFormatContext LAZY_CTX; Having this as a static variable doesn't seem right, since it will be shared between multiple threads, which will updated it. I don't see a particularly easy place to put the last context so that it's local to the thread/column that's executing Parse(). I'm wondering if it's just easiest to have lazy_ctx be a local variable that we create on the fly - we'd take a 50% perf hit for the lazy formats according to your benchmark but it's not clear to me that the optimisation is worth the extra complexity of having to manage LAZY_CTX. http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc@378 PS10, Line 378: LIKELY LIKELY doesn't seem right here - I think we can just remove it. http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc@381 PS10, Line 381: if (LAZY_CTX.fmt != NULL) dt_ctx = &LAZY_CTX; nit: we use braces on both branches if there is an else. http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc@383 PS10, Line 383: LAZY_CTX.Reset(str, len); nit: missing indentation. -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 10 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Thu, 14 Dec 2017 23:21:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Hello Lars Volker, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7009 to look at the new patch set (#10). Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. IMPALA-5315: Cast to timestamp fails for -M-D format This change allows casting of a string in 'lazy' date format to timestamp. The supported lazy date formats are: -[M]M-[d]d -[M]M-[d]d [H]H:[m]m:[s]s[.S] -[M]M-[d]dT[H]H:[m]m:[s]s[.S] We will also take a variety of separators (including "-", "/", "T", "Z", ":") The parse logic will: 1) Try to use the default template timestamp formats first since it is more performant. 2) Try to reuse the previously generated lazy timestamp format (if exists) to maintain comparable SCAN performance if all values in column have the same timestamp format. 3) Discover the new template format if both prior paths fail. In the worst case, where we have to generate a new timestamp format for each value in a column (columns with variable timestamp format), we will incur a SCAN performance penalty (approximately 1/2 TotalReadThroughput). Finally, this change also moves the following two functions to be adjacent to each other for readability: Parse(const char* str, int len, boost::gregorian::date* d, boost::posix_time::time_duration* t); Parse(const char* str, int len, const DateTimeFormatContext& dt_ctx, boost::gregorian::date* d, boost::posix_time::time_duration* t); Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 152 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/10 -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 10 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Hello Lars Volker, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7009 to look at the new patch set (#9). Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. IMPALA-5315: Cast to timestamp fails for -M-D format This change allows casting of a string in 'lazy' date format to timestamp. The supported lazy date formats are: -[M]M-[d]d -[M]M-[d]d [H]H:[m]m:[s]s[.S] -[M]M-[d]dT[H]H:[m]m:[s]s[.S] We will also take a variety of separators (including "-", "/", "T", "Z", ":") The parse logic will: 1) Try to use the default template timestamp formats first since it is more performant. 2) Try to reuse the previously generated lazy timestamp format (if exists) to maintain comparable SCAN performance if all values in column have the same timestamp format. 3) Discover the new template format if both prior paths fail. In the worst case, where we have to generate a new timestamp format for each value in a column (columns with variable timestamp format), we will incur a SCAN performance penalty (approximately 1/2 TotalReadThroughput). Finally, this change also moves the following two functions to be adjacent to each other for readability: Parse(const char* str, int len, boost::gregorian::date* d, boost::posix_time::time_duration* t); Parse(const char* str, int len, const DateTimeFormatContext& dt_ctx, boost::gregorian::date* d, boost::posix_time::time_duration* t); Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 199 insertions(+), 71 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/9 -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 9 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Vincent Tran has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/7009/5/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/7009/5/be/src/runtime/timestamp-parse-util.cc@294 PS5, Line 294: *t = boost::posix_time::time_duration(boost::posix_time::not_a_date_time); > This should use a stringstream, otherwise each append may result in a copy Refactored this. -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 8 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Thu, 14 Dec 2017 00:00:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Hello Lars Volker, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7009 to look at the new patch set (#8). Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. IMPALA-5315: Cast to timestamp fails for -M-D format This change allows casting of a string in 'lazy' date format to timestamp. The supported lazy date formats are: -[M]M-[d]d -[M]M-[d]d [H]H:[m]m:[s]s[.S] -[M]M-[d]dT[H]H:[m]m:[s]s[.S] We will also take a variety of separators (including "-", "/", "T", "Z", ":") The parse logic will: 1) Try to use the default template timestamp formats first since it is more performant. 2) Try to reuse the previously generated lazy timestamp format (if exists) to maintain comparable SCAN performance if all values in column have the same timestamp format. 3) Discover the new template format if both prior paths fail. In the worst case, where we have to generate a new timestamp format for each value in a column (collumns withvariable timestamp format), we will incur a SCAN performance decrease (approximately 1/2 TotalReadThroughput). Finally, this change also moves the following two functions to be adjacent to each other for readability: Parse(const char* str, int len, boost::gregorian::date* d, boost::posix_time::time_duration* t); Parse(const char* str, int len, const DateTimeFormatContext& dt_ctx, boost::gregorian::date* d, boost::posix_time::time_duration* t); Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 199 insertions(+), 71 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/8 -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 8 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Hello Lars Volker, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7009 to look at the new patch set (#7). Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. IMPALA-5315: Cast to timestamp fails for -M-D format This change allows casting of a string in 'lazy' date format to timestamp. The supported lazy date formats are: -[M]M-[d]d -[M]M-[d]d [H]H:[m]m:[s]s[.S] -[M]M-[d]dT[H]H:[m]m:[s]s[.S] We will also take a variety of separators (including "-", "/", "T", "Z", ":") The parse logic will: 1) Try to use the default template timestamp formats first since it is more performant. 2) Try to reuse the previously generated lazy timestamp format (if exists) to maintain comparable SCAN performance if all values in column have the same timestamp format. 3) Discover the new template format if both prior paths fail. In the worst case, where we have to generate a new timestamp format for each value in a column (collumns with variable timestamp format), we will incur a SCAN performance decrease (apprimately 1/2 TotalReadThroughput). Finally, this change also moves the following two functions to be adjacent to each other: Parse(const char* str, int len, boost::gregorian::date* d, boost::posix_time::time_duration* t); Parse(const char* str, int len, const DateTimeFormatContext& dt_ctx, boost::gregorian::date* d, boost::posix_time::time_duration* t); Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 199 insertions(+), 71 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/7 -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 7 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Vincent Tran has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 6: The alternative here is that we only take this perf hit when non-default format is used and still maintain the previous default templates. -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 6 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Wed, 13 Dec 2017 16:50:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Vincent Tran has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 6: I incorporated the same heuristic into ParseFormatTokens() to eliminate the usage of default format strings. We observe some marginal gain of 16% in TotalReadThroughput: - TotalReadThroughput: 11.69 MB/sec @Tim, since you wrote the original ParseFormatTokens(), do you see any additional places where we can achieve more marginal gain? Also, do you prefer the rewrite of the existing function or creating a new one such as in the current patch set? -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 6 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Wed, 13 Dec 2017 16:40:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Hello Lars Volker, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7009 to look at the new patch set (#6). Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. IMPALA-5315: Cast to timestamp fails for -M-D format This change allows casting of a string in 'lazy' date format to timestamp. The supported lazy date formats are: -[M]M-[d]d -[M]M-[d]d [H]H:[m]m:[s]s[.S] -[M]M-[d]dT[H]H:[m]m:[s]s[.S] Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 137 insertions(+), 79 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/6 -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 6 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 5: Thanks for the update Vincent - agree that that's a bad perf hit. -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 5 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Tue, 12 Dec 2017 23:35:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Vincent Tran has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 5: Looks like the slow down is due to the 2x slower TotalReadThroughput from the SCAN node due to LAZY_CTX.Reset() being called for each value vs. called once for each DateTimeFormatContext in TimestampParser::Init() when the format string is stamped out. New path - TotalReadThroughput: 10.03 MB/sec Old path - TotalReadThroughput: 20.75 MB/sec Will look for alternatives. -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 5 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Tue, 12 Dec 2017 02:20:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Vincent Tran has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 5: I ran 3 trials on both the old path and the new path (on-the-flight template generation for all cases) select min(cast (time_string as timestamp)) from private.impala_5315 Where impala_5315 contains 3.87B rows. The result is pretty consistent between the three trials: The AGGR node takes 2x the time to run when the template needs to be generated on the flight. The time difference is in AGGR NODE BuildTime. I haven't figured out why this is yet. Looking for input and ideas. Old path: http://github.mtv.cloudera.com/gist/vttran/32f18cecd5b277dade0da72fa561084d New path: http://github.mtv.cloudera.com/gist/vttran/91de8682521e405f230f565a00bcd22a -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 5 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Fri, 08 Dec 2017 21:14:00 + Gerrit-HasComments: No