[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Gabor Kaszab has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 This patch adds additional datetime format tokens on top of what was introduced with Milestone 1 (IMPALA-8703). The tokens introduced: - Free text token: Surrounded by double quotes, a free text section can be given in the format where the same text is expected in the input without the surrounding double quotes. - FX modifier: This modifier has to be given at the beginning of the format and is valid for the whole format. In a string to datetime conversion this forces strict separator matching and expects all the tokens in the input to have the same length as their maximum length. E.g. A month has to be of length 2 prefixed by zero if needed. This is the default in a datetime to string conversion. - FM modifier: This modifier affects only the following token and overrides the FX modifier. In a string to datetime conversion when using this modifier the length of the token in the input can be shorter than the max length of that token type if followed by a separator. E.g. 1-digit month, less than 4-digit year, etc. This is the default behaviour in a string to datetime conversion. In a datetime to string conversion tokens with this modifier aren't padded to the maximum length. Example output: "2010-1-9". Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Reviewed-on: http://gerrit.cloudera.org:8080/14291 Tested-by: Impala Public Jenkins Reviewed-by: Attila Jeges --- M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-parser.cc M be/src/runtime/datetime-iso-sql-format-parser.h M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/timestamp-parse-util.cc M tests/query_test/test_cast_with_format.py 9 files changed, 699 insertions(+), 48 deletions(-) Approvals: Impala Public Jenkins: Verified Attila Jeges: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 16 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 15: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 15 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 07 Nov 2019 16:02:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 15: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 15 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 06 Nov 2019 16:11:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc File be/src/runtime/datetime-iso-sql-format-tokenizer.cc: http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@237 PS14, Line 237: DCHECK(*current_pos == dt_ctx_->fmt); > I meant it like this: Hmm. This might make the function itself a bit more readable, however, on the callsite it would become a bit strange. const char* current_pos = dt_ctx_->fmt; current_pos = ProcessFXModifier(_pos); Here L1 would loose its purpose so we might want to do something like this: const char* current_pos = ProcessFXModifier(_pos); But I'm not that confident about initialising current_pos from the return value of ProcessFXModifier(). At first glance it's not really straightforward what value to expect in current_pos after. >From these options still the current one feels the most comfortable for me. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 15 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 06 Nov 2019 13:24:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 14: (1 comment) http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc File be/src/runtime/datetime-iso-sql-format-tokenizer.cc: http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@237 PS14, Line 237: DCHECK(*current_pos == dt_ctx_->fmt); > I have to pass **current_pos into this function as a param because the func I meant it like this: char* IsoSqlFormatTokenizer::ProcessFXModifier() const { DCHECK(dt_ctx_->fmt != nullptr); const char* current_pos = dt_ctx_->fmt; if (strncmp(current_pos, "FX", 2) == 0) { dt_ctx_->fx_modifier = true; current_pos += 2; } return current_pos; } and then in L86 you can call it like this: current_pos = ProcessFXModifier() -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 14 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 06 Nov 2019 12:45:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4961/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 15 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 06 Nov 2019 12:32:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 15: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5179/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 15 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 06 Nov 2019 11:49:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 15: (4 comments) http://gerrit.cloudera.org:8080/#/c/14291/15/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/15/tests/query_test/test_cast_with_format.py@751 PS15, Line 751: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/15/tests/query_test/test_cast_with_format.py@755 PS15, Line 755: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/15/tests/query_test/test_cast_with_format.py@759 PS15, Line 759: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/15/tests/query_test/test_cast_with_format.py@763 PS15, Line 763: assert result.data == [r'''1985 text tab used for whitespace -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 15 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 06 Nov 2019 11:49:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 15: (3 comments) http://gerrit.cloudera.org:8080/#/c/14291/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14291/14//COMMIT_MSG@28 PS14, Line 28: This is the default behaviour > What does "default behavior" mean in this context? Is it when no FX or FM m Default is when neither FX or FM are given. http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc File be/src/runtime/datetime-iso-sql-format-tokenizer.cc: http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@115 PS14, Line 115: fm_modifier_active_ = fa > nit: not necessary, you can just set the flag to false no matter what. Done http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@237 PS14, Line 237: DCHECK(*current_pos == dt_ctx_->fmt); > It would be more straightforward if ProcessFXModifier() did not take any pa I have to pass **current_pos into this function as a param because the function is expected to advance the pointer after the FX modifier. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 15 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 06 Nov 2019 11:48:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Hello Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14291 to look at the new patch set (#15). Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 This patch adds additional datetime format tokens on top of what was introduced with Milestone 1 (IMPALA-8703). The tokens introduced: - Free text token: Surrounded by double quotes, a free text section can be given in the format where the same text is expected in the input without the surrounding double quotes. - FX modifier: This modifier has to be given at the beginning of the format and is valid for the whole format. In a string to datetime conversion this forces strict separator matching and expects all the tokens in the input to have the same length as their maximum length. E.g. A month has to be of length 2 prefixed by zero if needed. This is the default in a datetime to string conversion. - FM modifier: This modifier affects only the following token and overrides the FX modifier. In a string to datetime conversion when using this modifier the length of the token in the input can be shorter than the max length of that token type if followed by a separator. E.g. 1-digit month, less than 4-digit year, etc. This is the default behaviour in a string to datetime conversion. In a datetime to string conversion tokens with this modifier aren't padded to the maximum length. Example output: "2010-1-9". Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 --- M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-parser.cc M be/src/runtime/datetime-iso-sql-format-parser.h M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/timestamp-parse-util.cc M tests/query_test/test_cast_with_format.py 9 files changed, 699 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/15 -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 15 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 14: Code-Review+1 (3 comments) One more round. I'll +2 it afterwards. http://gerrit.cloudera.org:8080/#/c/14291/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14291/14//COMMIT_MSG@28 PS14, Line 28: This is the default behaviour What does "default behavior" mean in this context? Is it when no FX or FM modifier is used? Or FX is used but FM isn't? http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc File be/src/runtime/datetime-iso-sql-format-tokenizer.cc: http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@115 PS14, Line 115: if (fm_modifier_active_) nit: not necessary, you can just set the flag to false no matter what. http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@237 PS14, Line 237: DCHECK(*current_pos == dt_ctx_->fmt); It would be more straightforward if ProcessFXModifier() did not take any parameters and returned a char* instead of void. No need to pass current_pos as a parameter if *current_pos is expected to be set to dt_ctx_->fmt. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 14 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 04 Nov 2019 17:12:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4918/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 14 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 31 Oct 2019 14:12:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 14: (4 comments) http://gerrit.cloudera.org:8080/#/c/14291/14/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/14/tests/query_test/test_cast_with_format.py@751 PS14, Line 751: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/14/tests/query_test/test_cast_with_format.py@755 PS14, Line 755: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/14/tests/query_test/test_cast_with_format.py@759 PS14, Line 759: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/14/tests/query_test/test_cast_with_format.py@763 PS14, Line 763: assert result.data == [r'''1985 text tab used for whitespace -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 14 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 31 Oct 2019 13:28:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Hello Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14291 to look at the new patch set (#14). Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 This patch adds additional datetime format tokens on top of what was introduced with Milestone 1 (IMPALA-8703). The tokens introduced: - Free text token: Surrounded by double quotes, a free text section can be given in the format where the same text is expected in the input without the surrounding double quotes. - FX modifier: This modifier has to be given at the beginning of the format and is valid for the whole format. In a string to datetime conversion this forces strict separator matching and expects all the tokens in the input to have the same length as their maximum length. E.g. A month has to be of length 2 prefixed by zero if needed. This is the default in a datetime to string conversion. - FM modifier: This modifier affects only the following token and overrides the FX modifier. In a string to datetime conversion when using this modifier the length of the token in the input can be shorter than the max length of that token type if followed by a separator. E.g. 1-digit month, less than 4-digit year, etc. This is the default behaviour in a string to datetime conversion. In a datetime to string conversion tokens with this modifier aren't padded to the maximum length. Example output: "2010-1-9". Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 --- M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-parser.cc M be/src/runtime/datetime-iso-sql-format-parser.h M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/timestamp-parse-util.cc M tests/query_test/test_cast_with_format.py 9 files changed, 699 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/14 -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 14 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/14291/12/be/src/runtime/datetime-iso-sql-format-tokenizer.cc File be/src/runtime/datetime-iso-sql-format-tokenizer.cc: http://gerrit.cloudera.org:8080/#/c/14291/12/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@124 PS12, Line 124: if (token->second.type == FX_MODIFIER) { : if (used_tokens_.size() > 0 || dt_ctx_->fx_modifier) { : return MISPLACED_FX_MODIFIER_ERROR; : } : dt_ctx_->fx_modifier = true; : *current_pos += curr_token_size; : return SUCCESS; : } 'FMFX-MM-DD' is another format string that should be rejected but currently it is not. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 12 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 30 Oct 2019 14:22:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 12: (5 comments) http://gerrit.cloudera.org:8080/#/c/14291/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14291/12//COMMIT_MSG@24 PS12, Line 24: Using this the value of a token can be : shorter than the max length if followed by a separator. Please clarify that this is about datetime to string conversion. http://gerrit.cloudera.org:8080/#/c/14291/12/be/src/runtime/datetime-iso-sql-format-parser.h File be/src/runtime/datetime-iso-sql-format-parser.h: http://gerrit.cloudera.org:8080/#/c/14291/12/be/src/runtime/datetime-iso-sql-format-parser.h@81 PS12, Line 81: a nit: the http://gerrit.cloudera.org:8080/#/c/14291/12/be/src/runtime/datetime-iso-sql-format-parser.cc File be/src/runtime/datetime-iso-sql-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/14291/12/be/src/runtime/datetime-iso-sql-format-parser.cc@238 PS12, Line 238: DCHECK(current_tok_idx != nullptr && *current_tok_idx < dt_ctx.toks.size()); Add && dt_ctx.toks[*current_tok_idx].type == SEPARATOR http://gerrit.cloudera.org:8080/#/c/14291/12/be/src/runtime/datetime-iso-sql-format-tokenizer.cc File be/src/runtime/datetime-iso-sql-format-tokenizer.cc: http://gerrit.cloudera.org:8080/#/c/14291/12/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@113 PS12, Line 113: if (IsStartOfTextToken(*current_pos)) { : return ProcessTextToken(current_pos, str_begin, str_end); : } What if text token is preceded by an FM modifier? I did some testing and it looks like in a format string like 'FX-MM-FM"text"DD' the FM-modifier applies to DD token, instead of the "text" token. I think we need to do here something like L141-144. ProcessSeaprators() has a similar problem. In 'FX-MM-FM-DD' FM applies to DD token instead of the '-' separator. http://gerrit.cloudera.org:8080/#/c/14291/12/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@124 PS12, Line 124: if (token->second.type == FX_MODIFIER) { : if (used_tokens_.size() > 0 || dt_ctx_->fx_modifier) { : return MISPLACED_FX_MODIFIER_ERROR; : } : dt_ctx_->fx_modifier = true; : *current_pos += curr_token_size; : return SUCCESS; : } This still allows format strings like: '--FX-MM-DD' or '"text"FX-MM-DD' Maybe it would be easier to parse the optional FX modifier before calling ProcessNextToken() in a loop. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 12 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 30 Oct 2019 14:06:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4898/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 12 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 29 Oct 2019 18:22:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 12: (7 comments) http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-parser.cc File be/src/runtime/datetime-iso-sql-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-parser.cc@214 PS11, Line 214: DCHECK(tok->val <= *format && *format < tok->val + tok->len); > DCHECK(tok->val <= *format < tok->val + tok->len); Done http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-parser.cc@217 PS11, Line 217: ped quo > Is it possible that strncmp (here or in the next line) overruns [tok->val, Indeed, the text token part is not null-terminated. This wouldn't cause any issues in my opinion es the format as a whole is null-terminated. Adding an extra check to be on the safe side. http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-parser.cc@255 PS11, Line 255: if (*current_pos >= end_pos && *current_tok_idx < dt_ctx.toks.size()) { > Add *current_tok_idx < dt_ctx.toks.size() condition to the while condition Thx, done. http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-tokenizer.cc File be/src/runtime/datetime-iso-sql-format-tokenizer.cc: http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@84 PS11, Line 84: const char* str_end = dt_ctx_->fmt + dt_ctx_->fmt_len; > As explained in L271, 'str_end' is not needed. See answer in L271. http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@111 PS11, Line 111: const char* str_end = dt_ctx_->fmt + dt_ctx_->fmt_len; > As explained in L271, 'str_end' is not needed. See answer in L271 http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@271 PS11, Line 271: while (current_pos < str_end) { > If we assume that 'str_start' points to a null-terminated string, we don't Yeah, technically I could drop 'str_end' and compare 'current_pos' to '\0'. However, I feel that having 'str_end' improves code readability a bit. http://gerrit.cloudera.org:8080/#/c/14291/11/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/11/tests/query_test/test_cast_with_format.py@1101 PS11, Line 1101: FXFMFX-MM-DD > 'FXFX-MM-DD' should fail too right? Done -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 12 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 29 Oct 2019 17:39:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Hello Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14291 to look at the new patch set (#12). Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 This patch adds additional datetime format tokens on top of what was introduced with Milestone 1 (IMPALA-8703). The tokens introduced: - Free text token: Surrounded by double quotes, a free text section can be given in the format where the same text is expected in the input without the surrounding double quotes. - FX modifier: This modifier has to be given at the beginning of the format and is valid for the whole format. In a string to datetime conversion this forces strict separator matching and expects all the tokens in the input to have the same length as their maximum length. E.g. A month has to be of length 2 prefixed by zero if needed. This is the default in a datetime to string conversion. - FM modifier: This modifier affects only the following token and overrides the FX modifier. Using this the value of a token can be shorter than the max length if followed by a separator. E.g. 1-digit month, less than 4-digit year, etc. This is the default behaviour in a string to datetime conversion. Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 --- M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-parser.cc M be/src/runtime/datetime-iso-sql-format-parser.h M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/timestamp-parse-util.cc M tests/query_test/test_cast_with_format.py 9 files changed, 659 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/12 -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 12 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 12: (4 comments) http://gerrit.cloudera.org:8080/#/c/14291/12/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/12/tests/query_test/test_cast_with_format.py@751 PS12, Line 751: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/12/tests/query_test/test_cast_with_format.py@755 PS12, Line 755: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/12/tests/query_test/test_cast_with_format.py@759 PS12, Line 759: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/12/tests/query_test/test_cast_with_format.py@763 PS12, Line 763: assert result.data == [r'''1985 text tab used for whitespace -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 12 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 29 Oct 2019 17:38:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 11: (7 comments) http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-parser.cc File be/src/runtime/datetime-iso-sql-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-parser.cc@214 PS11, Line 214: DCHECK(tok != nullptr); DCHECK(tok->val <= *format < tok->val + tok->len); http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-parser.cc@217 PS11, Line 217: strncmp Is it possible that strncmp (here or in the next line) overruns [tok->val, tok->val + tok->len) char-range? http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-parser.cc@255 PS11, Line 255: while (dt_ctx.toks[*current_tok_idx].type == TEXT && Add *current_tok_idx < dt_ctx.toks.size() condition to the while condition otherwise we might overrun the dt_ctx.toks buffer. http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-tokenizer.cc File be/src/runtime/datetime-iso-sql-format-tokenizer.cc: http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@84 PS11, Line 84: const char* str_end = dt_ctx_->fmt + dt_ctx_->fmt_len; As explained in L271, 'str_end' is not needed. You can also add DCHECK(dt_ctx_->fmt[dt_ctx_->fmt_len] == '\0') here. http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@111 PS11, Line 111: const char* str_end = dt_ctx_->fmt + dt_ctx_->fmt_len; As explained in L271, 'str_end' is not needed. http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@271 PS11, Line 271: while (current_pos < str_end) { If we assume that 'str_start' points to a null-terminated string, we don't really need to use 'str_end' here and elsewhere. You can just do: while (*current_pos != '\0') {..} http://gerrit.cloudera.org:8080/#/c/14291/11/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/11/tests/query_test/test_cast_with_format.py@1101 PS11, Line 1101: FXFMFX-MM-DD 'FXFX-MM-DD' should fail too right? If yes, please add a test for that format string too. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 11 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 29 Oct 2019 13:58:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4894/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 11 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 28 Oct 2019 14:00:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 11: (10 comments) http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.h File be/src/runtime/datetime-iso-sql-format-parser.h: http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.h@81 PS10, Line 81: If '*format' points at a beginning of an escape sequence, : // '*forma > It doesn't move *format to the next character, it moves it to the last char Indeed this reflect better what happens for 'format'. Done http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.cc File be/src/runtime/datetime-iso-sql-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.cc@57 PS10, Line 57: >= > '>=' might be safer to use here Done http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.cc@251 PS10, Line 251: // If we reached the end of input or the end of token sequence, we can return. : if (*current_pos >= end_pos || *current_tok_idx >= dt_ctx.toks.size()) { : // Skip trailing empty text tokens in format. : > What if we reached the end of input but dt_ctx.toks still contains some emp Thanks for spotting this. Done http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h File be/src/runtime/datetime-iso-sql-format-tokenizer.h: http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@91 PS10, Line 91: string > string functions Done http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@128 PS10, Line 128: static bool IsStartOfTextToken(const char* current_pos) > This should probably be a static function instead of const. Done http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@137 PS10, Line 137: str_start > str_start, here and elsewhere in the comment. Done http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@141 PS10, Line 141: static const char* FindEndOfTextToken(const char* str_start, const char* str_end, : bool is_escaped); > This should be a static function too. Done http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc File be/src/runtime/datetime-iso-sql-format-tokenizer.cc: http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@124 PS10, Line 124: if (token->second.type == FX_MODIFIER) { : if (used_tokens_.size() > 0 || dt_ctx_->fx_modifier) { : return MISPLACED_FX_MODIFIER_ERROR; : } : dt_ctx_->fx_modifier = true; : *current_pos += curr_token_size; : return SUCCESS; : } : if (token->second.type == FM_MODIFIER) { : fm_modifier_active_ = true; : > This allows weird format strings too, e.g.: 'FXFMFMFX-MM-DD' Initially I thought that giving this freedom to the user wouldn't hurt but giving it a second look I feel that this would cause a bit ambiguity. Let me restrict that FX can only be given once and at the very beginning. I could restrict FM to be given only once for a particular token but that wouldn't seem that important. "FXFMFM-FMFMDD-MM" is not ambiguous at all as it seems that which token is M modified and which isn't. http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@251 PS10, Line 251: DCHECK(str_begin != nullptr) > nit: DCHECK(str_begin <= *current_pos && *current_pos < str_end); Done http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@254 PS10, Line 254: = (**current > nit: no need to put is_ecaped inside parentheses. Done -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 11 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 28 Oct 2019 13:43:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 11: (4 comments) http://gerrit.cloudera.org:8080/#/c/14291/11/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/11/tests/query_test/test_cast_with_format.py@751 PS11, Line 751: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/11/tests/query_test/test_cast_with_format.py@755 PS11, Line 755: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/11/tests/query_test/test_cast_with_format.py@759 PS11, Line 759: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/11/tests/query_test/test_cast_with_format.py@763 PS11, Line 763: assert result.data == [r'''1985 text tab used for whitespace -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 11 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 28 Oct 2019 13:17:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Hello Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14291 to look at the new patch set (#11). Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 This patch adds additional datetime format tokens on top of what was introduced with Milestone 1 (IMPALA-8703). The tokens introduced: - Free text token: Surrounded by double quotes, a free text section can be given in the format where the same text is expected in the input without the surrounding double quotes. - FX modifier: This modifier has to be given at the beginning of the format and is valid for the whole format. In a string to datetime conversion this forces strict separator matching and expects all the tokens in the input to have the same length as their maximum length. E.g. A month has to be of length 2 prefixed by zero if needed. This is the default in a datetime to string conversion. - FM modifier: This modifier affects only the following token and overrides the FX modifier. Using this the value of a token can be shorter than the max length if followed by a separator. E.g. 1-digit month, less than 4-digit year, etc. This is the default behaviour in a string to datetime conversion. Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 --- M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-parser.cc M be/src/runtime/datetime-iso-sql-format-parser.h M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/timestamp-parse-util.cc M tests/query_test/test_cast_with_format.py 9 files changed, 653 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/11 -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 11 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 10: (10 comments) Thanks for making the changes. Few more comments: http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.h File be/src/runtime/datetime-iso-sql-format-parser.h: http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.h@81 PS10, Line 81: As a side effect moves '*format' to the next character in the : // format. It doesn't move *format to the next character, it moves it to the last character of the escape sequence. If *format doesn't point at an escape sequence, *format is not changed. Maybe something like this: " If '*format' points at a beginning of an escape sequence, '*format' is moved to the last character of the escape sequence. Otherwise, '*format' is not changed. " http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.cc File be/src/runtime/datetime-iso-sql-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.cc@57 PS10, Line 57: == '>=' might be safer to use here http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.cc@251 PS10, Line 251: // If we reached the end of input or the end of token sequence, we can return. : if (*current_pos >= end_pos || *current_tok_idx >= dt_ctx.toks.size()) { : return (*current_pos >= end_pos && *current_tok_idx >= dt_ctx.toks.size()); : } What if we reached the end of input but dt_ctx.toks still contains some empty TEXT tokens? select cast('1985-12-09-' as date format '-MM-DD-""'); I think this corner-case should be handled here, instead of just returning false. http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h File be/src/runtime/datetime-iso-sql-format-tokenizer.h: http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@91 PS10, Line 91: function string functions http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@128 PS10, Line 128: bool IsStartOfTextToken(const char* current_pos) const; This should probably be a static function instead of const. http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@137 PS10, Line 137: start_str str_start, here and elsewhere in the comment. http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@141 PS10, Line 141: const char* FindEndOfTextToken(const char* str_start, const char* str_end, : bool is_escaped); This should be a static function too. http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc File be/src/runtime/datetime-iso-sql-format-tokenizer.cc: http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@124 PS10, Line 124: if (token->second.type == FX_MODIFIER) { : if (used_tokens_.size() > 0) return MISPLACED_FX_MODIFIER_ERROR; : dt_ctx_->fx_modifier = true; : *current_pos += curr_token_size; : return SUCCESS; : } : if (token->second.type == FM_MODIFIER) { : fm_modifier_active_ = true; : *current_pos += curr_token_size; : return SUCCESS; : } This allows weird format strings too, e.g.: 'FXFMFMFX-MM-DD' Probably these should return an error. http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@251 PS10, Line 251: DCHECK(str_begin < str_end); nit: DCHECK(str_begin <= *current_pos && *current_pos < str_end); http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@254 PS10, Line 254: (is_escaped) nit: no need to put is_ecaped inside parentheses. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 10 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 18 Oct 2019 12:05:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@782 PS7, Line 782: # Strict separator matching. : result = self.client.execute("select cast('2001-03-02 03:10:15' as timestamp format" : "'FX MM-DD HH12:MI:SS')") : assert result.data == ["NULL"] : : result = self.client.execute("select cast('2001-03-03 03:10:15' as timestamp format" : "'FX-MM-DD HH12::MI:SS')") : assert result.data == ["NULL"] : : result = self.client.execute("select cast('2001-03-04' as timestamp format" : "'FX-MM-DD ')") : assert result.data == ["NULL"] : : # Strict token length matching. : result = self.client.execute("select cast('2001-3-05' as timestamp format " : "'FX-MM-DD')") : assert result.data == ["NULL"] > L778 is a positive test for FX modifier. Does that cover what you ask for? Ok, thanks. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 10 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 18 Oct 2019 12:10:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4819/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 10 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 18 Oct 2019 07:06:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Hello Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14291 to look at the new patch set (#10). Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 This patch adds additional datetime format tokens on top of what was introduced with Milestone 1 (IMPALA-8703). The tokens introduced: - Free text token: Surrounded by double quotes, a free text section can be given in the format where the same text is expected in the input without the surrounding double quotes. - FX modifier: This modifier has to be given at the beginning of the format and is valid for the whole format. In a string to datetime conversion this forces strict separator matching and expects all the tokens in the input to have the same length as their maximum length. E.g. A month has to be of length 2 prefixed by zero if needed. This is the default in a datetime to string conversion. - FM modifier: This modifier affects only the following token and overrides the FX modifier. Using this the value of a token can be shorter than the max length if followed by a separator. E.g. 1-digit month, less than 4-digit year, etc. This is the default behaviour in a string to datetime conversion. Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 --- M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-parser.cc M be/src/runtime/datetime-iso-sql-format-parser.h M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/timestamp-parse-util.cc M tests/query_test/test_cast_with_format.py 9 files changed, 631 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/10 -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 10 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 10: (4 comments) http://gerrit.cloudera.org:8080/#/c/14291/10/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/10/tests/query_test/test_cast_with_format.py@743 PS10, Line 743: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/10/tests/query_test/test_cast_with_format.py@747 PS10, Line 747: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/10/tests/query_test/test_cast_with_format.py@751 PS10, Line 751: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/10/tests/query_test/test_cast_with_format.py@755 PS10, Line 755: assert result.data == [r'''1985 text tab used for whitespace -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 10 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 18 Oct 2019 06:24:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4815/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 9 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 17 Oct 2019 18:48:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Hello Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14291 to look at the new patch set (#9). Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 This patch adds additional datetime format tokens on top of what was introduced with Milestone 1 (IMPALA-8703). The tokens introduced: - Free text token: Surrounded by double quotes, a free text section can be given in the format where the same text is expected in the input without the surrounding double quotes. - FX modifier: This modifier has to be given at the beginning of the format and is valid for the whole format. In a string to datetime conversion this forces strict separator matching and expects all the tokens in the input to have the same length as their maximum length. E.g. A month has to be of length 2 prefixed by zero if needed. This is the default in a datetime to string conversion. - FM modifier: This modifier affects only the following token and overrides the FX modifier. Using this the value of a token can be shorter than the max length if followed by a separator. E.g. 1-digit month, less than 4-digit year, etc. This is the default behaviour in a string to datetime conversion. Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 --- M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-parser.cc M be/src/runtime/datetime-iso-sql-format-parser.h M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/timestamp-parse-util.cc M tests/query_test/test_cast_with_format.py 9 files changed, 637 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/9 -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 9 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 6: (14 comments) Thanks for taking another look! http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h File be/src/runtime/datetime-iso-sql-format-parser.h: http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h@78 PS7, Line 78: format > text token inside the format Done http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h@81 PS7, Line 81: format > text token Done http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h@81 PS7, Line 81: and moves : // '*format' to 'a'. > I don't think part this is true. Ideed, thx. Done http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc File be/src/runtime/datetime-iso-sql-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc@48 PS7, Line 48: if (tok->type == SEPARATOR && !dt_ctx.fx_modifier) { : bool res = ProcessSeparatorSequence(_pos, end_pos, dt_ctx, ); : if (!res || current_pos == end_pos) return res; : DCHECK(i < dt_ctx.toks.size()); : // Next token, following the separator sequence. : tok = _ctx.toks[i]; : } > I'd prefer to handle strict separator matching (tok->type == SEPARATOR && d Done http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc@56 PS7, Line 56: const char* token_end_pos = FindEndOfToken(current_pos, end_pos - current_pos, *tok); : if (token_end_pos == nullptr) return false; : int token_len = token_end_pos - current_pos; : : if (dt_ctx.fx_modifier && !tok->fm_modifier && tok->type != TEXT && : token_len != tok->len) { : return false; : } > for TEXT tokens, 'token_end_pos' and 'token_len' aren't set here. 'token_en Done http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc@185 PS7, Line 185: token_len == 1 > dt_ctx.fx_modifier && token_len == 1 Merged this with the loose separator handling. This case is no longer needed here. http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc File be/src/runtime/datetime-iso-sql-format-tokenizer.cc: http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@240 PS6, Line 240: strncmp(current_pos, "\\\"", 2) == 0) > You're correct, it is null-terminated for iso-sql format strings, because t I see. I wasn't aware that the format is not null-terminated for the SingleDateFormat path. Thanks for letting me know! I'll add comments about this assumption on the IsoSqlFormat branch for both IsoSqlFormatTokenizer::dt_ctx_ and IsoSqlFormatParser::ParseDateTime(). Thanks again for spotting this! http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-parser-common.h File be/src/runtime/datetime-parser-common.h: http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-parser-common.h@134 PS7, Line 134: then > that Done http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@738 PS6, Line 738: select cast("1985-\a\b\f\n\r\t\v\'12-09" as ''' : r'''date format '-"\a\b\f\n\r\t\v\'"MM-DD') > This test is about special escape sequences but non-special chars in the te Isn't intentional. Remained from a previous approach I had implemented to handle special char. I'm removing the non-special chars from this test. Done http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@782 PS7, Line 782: # Strict separator matching. : result = self.client.execute("select cast('2001-03-02 03:10:15' as timestamp format" : "'FX MM-DD HH12:MI:SS')") : assert result.data == ["NULL"] : : result = self.client.execute("select cast('2001-03-03 03:10:15' as timestamp format" : "'FX-MM-DD HH12::MI:SS')") : assert result.data == ["NULL"] : : result = self.client.execute("select cast('2001-03-04' as timestamp format" : "'FX-MM-DD ')") : assert result.data == ["NULL"]
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 9: (4 comments) http://gerrit.cloudera.org:8080/#/c/14291/9/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/9/tests/query_test/test_cast_with_format.py@743 PS9, Line 743: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/9/tests/query_test/test_cast_with_format.py@747 PS9, Line 747: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/9/tests/query_test/test_cast_with_format.py@751 PS9, Line 751: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/9/tests/query_test/test_cast_with_format.py@755 PS9, Line 755: assert result.data == [r'''1985 text tab used for whitespace -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 9 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 17 Oct 2019 18:05:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc File be/src/runtime/datetime-iso-sql-format-tokenizer.cc: http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@240 PS6, Line 240: strncmp(current_pos, "\\\"", 2) == 0) > It is zero-terminated You're correct, it is null-terminated for iso-sql format strings, because the DateTimeFormatContext instance is always created using a null-terminated string (in CastFormatExpr::OpenEvaluator()). On the other hand, DateTimeFormatContext class does not guarantee that DateTimeFormatContext::fmt is null-terminated. E.g. TimestampFunctions::UnixAndFromUnixPrepare() creates a DateTimeFormatContext instance where fmt is not null-terminated. For this reason, we are very careful in SimpleDateFormatTokenizer::Tokenize() not to assume that format strings will end with a null-byte. Maybe add a comment at the beginning of this file to explain that iso-sql format strings are always null-terminated? http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@738 PS6, Line 738: select cast("1985-\a\b\f\n\r\t\v\'12-09" as ''' : r'''date format '-"\a\b\f\n\r\t\v\'"MM-DD') This test is about special escape sequences but non-special chars in the text token are escaped too. Is that intentional? -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 16 Oct 2019 18:02:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@768 PS6, Line 768: covered > Don't see the difference. This seems nit of the nits :D "covered" and "surrounded" mean very different things: https://www.thefreedictionary.com/covered https://www.thefreedictionary.com/surrounded :) -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 16 Oct 2019 16:08:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4807/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 16 Oct 2019 14:47:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 6: (11 comments) http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-parser.cc File be/src/runtime/datetime-iso-sql-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-parser.cc@225 PS6, Line 225: switch (**format) { : case 'b': return '\b'; : case 'n': return '\n'; : case 'r': return '\r'; : case 't': return '\t'; : } > Please check what escape sequences are supported by ANSI SQL standard. Well, I'm not convinced that ANSI SQL is relevant here as according to this page it doesn't allow escaping quotes and double quotes. https://www.ibm.com/support/knowledgecenter/en/SSGU8G_12.1.0/com.ibm.esqlc.doc/ids_esqlc_0015.htm This doc about MySql is kind of in line with my implementation: https://www.oreilly.com/library/view/mysql-cookbook/0596001452/ch04s02.html It also says that 2 sequential double quotes works as an escaped double quote same as with a backslash. However, I don't feel the need for this. Additionally MySql has \0 as a special char (ANSII NULL) but again, don't find it important. http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc File be/src/runtime/datetime-iso-sql-format-tokenizer.cc: http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@109 PS6, Line 109: *current_pos != nullptr > nit: current_pos != nullptr && *current_pos != nullptr Done http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@112 PS6, Line 112: *current_pos < str_end > nit: str_begin <= *current_pos && *current_pos < str_end Done http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@240 PS6, Line 240: strncmp(current_pos, "\\\"", 2) == 0) > This is safe only if 'current_pos' is zero-terminated, but I don't think it It is zero-terminated http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@273 PS6, Line 273: strncmp(*current_pos, "\\\"", 2) > same as L240. see above http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@277 PS6, Line 277: strncmp(*current_pos, "\\\"", 4) > same as L240. see above http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@282 PS6, Line 282: strncmp(*current_pos, "", 2) > same as L240. see above http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-parser-common.cc File be/src/runtime/datetime-parser-common.cc: http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-parser-common.cc@201 PS6, Line 201: continue > nit: can be removed, here and below. Done http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-parser-common.cc@202 PS6, Line 202:} else if (!tok.is_double_escaped && strncmp(text_it, "\\\"", 2) == 0) { : result.append("\""); : ++text_it; : continue; : } else if (strncmp(text_it, "", 2) == 0) { : result.append("\\"); : ++text_it; : continue; : } else if (strncmp(text_it, "\\b", 2) == 0) { : result.append("\b"); : ++text_it; : continue; : } else if (strncmp(text_it, "\\n", 2) == 0) { : result.append("\n"); : ++text_it; : continue; : } else if (strncmp(text_it, "\\r", 2) == 0) { : result.append("\r"); : ++text_it; : continue; : } else if (strncmp(text_it, "\\t", 2) == 0) { : result.append("\t"); : ++text_it; : continue; : } > Are these the only escape sequences supported by ANSI SQL? See the answer in datetime-iso-sql-format-parser.cc http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@663 PS6, Line 663:# Format part is surrounded by double quotes so the quotes indicating the start and : # end of the text token has to be escaped. > I did some testing around when the format string is surrounded by ' and the If the format is surrounded by single quotes but the text double quotes surrounding the text token is escaped is a valid scenario. From the code it's not possible to decide whether a
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 8: (4 comments) http://gerrit.cloudera.org:8080/#/c/14291/8/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/8/tests/query_test/test_cast_with_format.py@743 PS8, Line 743: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/8/tests/query_test/test_cast_with_format.py@747 PS8, Line 747: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/8/tests/query_test/test_cast_with_format.py@751 PS8, Line 751: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/8/tests/query_test/test_cast_with_format.py@755 PS8, Line 755: assert result.data == [r'''1985 text tab used for whitespace -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 16 Oct 2019 14:02:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Hello Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14291 to look at the new patch set (#8). Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 This patch adds additional datetime format tokens on top of what was introduced with Milestone 1 (IMPALA-8703). The tokens introduced: - Free text token: Surrounded by double quotes, a free text section can be given in the format where the same text is expected in the input without the surrounding double quotes. - FX modifier: This modifier has to be given at the beginning of the format and is valid for the whole format. In a string to datetime conversion this forces strict separator matching and expects all the tokens in the input to have the same length as their maximum length. E.g. A month has to be of length 2 prefixed by zero if needed. This is the default in a datetime to string conversion. - FM modifier: This modifier affects only the following token and overrides the FX modifier. Using this the value of a token can be shorter than the max length if followed by a separator. E.g. 1-digit month, less than 4-digit year, etc. This is the default behaviour in a string to datetime conversion. Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 --- M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-parser.cc M be/src/runtime/datetime-iso-sql-format-parser.h M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/timestamp-parse-util.cc M tests/query_test/test_cast_with_format.py 9 files changed, 632 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/8 -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 7: (7 comments) http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h File be/src/runtime/datetime-iso-sql-format-parser.h: http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h@78 PS7, Line 78: format text token inside the format http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h@81 PS7, Line 81: format text token http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h@81 PS7, Line 81: and moves : // '*format' to 'a'. I don't think part this is true. In GetNextCharFromTextToken() *format is moved to the last character of the escape sequence (to '"' in the example given above). http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc File be/src/runtime/datetime-iso-sql-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc@48 PS7, Line 48: if (tok->type == SEPARATOR && !dt_ctx.fx_modifier) { : bool res = ProcessSeparatorSequence(_pos, end_pos, dt_ctx, ); : if (!res || current_pos == end_pos) return res; : DCHECK(i < dt_ctx.toks.size()); : // Next token, following the separator sequence. : tok = _ctx.toks[i]; : } I'd prefer to handle strict separator matching (tok->type == SEPARATOR && dt_ctx.fx_modifier) right after this, instead of L183. http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc@56 PS7, Line 56: const char* token_end_pos = FindEndOfToken(current_pos, end_pos - current_pos, *tok); : if (token_end_pos == nullptr) return false; : int token_len = token_end_pos - current_pos; : : if (dt_ctx.fx_modifier && !tok->fm_modifier && tok->type != TEXT && : token_len != tok->len) { : return false; : } for TEXT tokens, 'token_end_pos' and 'token_len' aren't set here. 'token_end_pos' is set in L170. For clarity I'd move the code in L170-182 right before L56, and execute L56-193 only if tok->type != TEXT. http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc@185 PS7, Line 185: token_len == 1 dt_ctx.fx_modifier && token_len == 1 http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-parser-common.h File be/src/runtime/datetime-parser-common.h: http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-parser-common.h@134 PS7, Line 134: then that -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 16 Oct 2019 12:11:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 7: (5 comments) http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@782 PS7, Line 782: # Strict separator matching. : result = self.client.execute("select cast('2001-03-02 03:10:15' as timestamp format" : "'FX MM-DD HH12:MI:SS')") : assert result.data == ["NULL"] : : result = self.client.execute("select cast('2001-03-03 03:10:15' as timestamp format" : "'FX-MM-DD HH12::MI:SS')") : assert result.data == ["NULL"] : : result = self.client.execute("select cast('2001-03-04' as timestamp format" : "'FX-MM-DD ')") : assert result.data == ["NULL"] : : # Strict token length matching. : result = self.client.execute("select cast('2001-3-05' as timestamp format " : "'FX-MM-DD')") : assert result.data == ["NULL"] Maybe add positive tests too for these. e.g after L783: select cast('2001 03-02 03:10:15' as timestamp format 'FX MM-DD HH12:MI:SS') http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@866 PS7, Line 866: doesn't don't http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@872 PS7, Line 872: result = self.client.execute("select cast(cast('2001-03-09' as date) " : "as string format '-FMMM-FMDD')") : assert result.data == ["2001-3-9"] Maybe add a test were year token is FM-modified e.g.: select cast(date'0001-03-09' as string format 'FM-FMMM-FMDD') select cast(date'0001-03-09' as string format 'FMYY-FMMM-FMDD') http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@882 PS7, Line 882:result = self.client.execute("select cast(cast('2001-04-09' as date) " : "as string format 'FX-FMMM-FMDD')") Test FM-modified year token here too. http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@1045 PS7, Line 1045: "select cast('1985-11-21text' as timestamp format '-MM-DD\"text')" Again, please use raw python strings, here and below. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 16 Oct 2019 09:47:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 6: (11 comments) http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-parser.cc File be/src/runtime/datetime-iso-sql-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-parser.cc@225 PS6, Line 225: switch (**format) { : case 'b': return '\b'; : case 'n': return '\n'; : case 'r': return '\r'; : case 't': return '\t'; : } Please check what escape sequences are supported by ANSI SQL standard. (same as datetime-parser-common.cc:226) http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc File be/src/runtime/datetime-iso-sql-format-tokenizer.cc: http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@109 PS6, Line 109: *current_pos != nullptr nit: current_pos != nullptr && *current_pos != nullptr http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@112 PS6, Line 112: *current_pos < str_end nit: str_begin <= *current_pos && *current_pos < str_end http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@240 PS6, Line 240: strncmp(current_pos, "\\\"", 2) == 0) This is safe only if 'current_pos' is zero-terminated, but I don't think it is. Probably you should check first that there are at least 2 characters between current_pos and end_pos. http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@273 PS6, Line 273: strncmp(*current_pos, "\\\"", 2) same as L240. http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@277 PS6, Line 277: strncmp(*current_pos, "\\\"", 4) same as L240. http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@282 PS6, Line 282: strncmp(*current_pos, "", 2) same as L240. http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-parser-common.cc File be/src/runtime/datetime-parser-common.cc: http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-parser-common.cc@201 PS6, Line 201: continue nit: can be removed, here and below. http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-parser-common.cc@202 PS6, Line 202:} else if (!tok.is_double_escaped && strncmp(text_it, "\\\"", 2) == 0) { : result.append("\""); : ++text_it; : continue; : } else if (strncmp(text_it, "", 2) == 0) { : result.append("\\"); : ++text_it; : continue; : } else if (strncmp(text_it, "\\b", 2) == 0) { : result.append("\b"); : ++text_it; : continue; : } else if (strncmp(text_it, "\\n", 2) == 0) { : result.append("\n"); : ++text_it; : continue; : } else if (strncmp(text_it, "\\r", 2) == 0) { : result.append("\r"); : ++text_it; : continue; : } else if (strncmp(text_it, "\\t", 2) == 0) { : result.append("\t"); : ++text_it; : continue; : } Are these the only escape sequences supported by ANSI SQL? C, for instance supports a bunch of others too: https://en.wikipedia.org/wiki/Escape_sequences_in_C http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@663 PS6, Line 663:# Format part is surrounded by double quotes so the quotes indicating the start and : # end of the text token has to be escaped. I did some testing around when the format string is surrounded by ' and the text token is surrounded by \" Date to string conversion: >> select cast(date"1985-12-08" as string format '-MM-DD \"X\"'); returns: 1985-12-08 X I'm not sure if this should work or return an error instead. also if only the opening " to the text token is escaped: >> select cast(date"1985-12-08" as string format '-MM-DD \"X"'); returns 1985-12-08 This should probably return an error. On the other hand when doing string to date conversion: >> select cast("1985-12-08 X" as date format '-MM-DD \"X\"'); returns: 1985-12-08 >> select cast("1985-12-08 X" as date format '-MM-DD \"X"'); returns an error. >> select cast("1985-12-08" as date format '-MM-DD \"X"');
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4763/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 09 Oct 2019 11:19:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4762/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 09 Oct 2019 11:10:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Hello Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14291 to look at the new patch set (#6). Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 This patch adds additional datetime format tokens on top of what was introduced with Milestone 1 (IMPALA-8703). The tokens introduced: - Free text token: Surrounded by double quotes, a free text section can be given in the format where the same text is expected in the input without the surrounding double quotes. - FX modifier: This modifier has to be given at the beginning of the format and is valid for the whole format. In a string to datetime conversion this forces strict separator matching and expects all the tokens in the input to have the same length as their maximum length. E.g. A month has to be of length 2 prefixed by zero if needed. This is the default in a datetime to string conversion. - FM modifier: This modifier affects only the following token and overrides the FX modifier. Using this the value of a token can be shorter than the max length if followed by a separator. E.g. 1-digit month, less than 4-digit year, etc. This is the default behaviour in a string to datetime conversion. Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 --- M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-parser.cc M be/src/runtime/datetime-iso-sql-format-parser.h M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/timestamp-parse-util.cc M tests/query_test/test_cast_with_format.py 9 files changed, 616 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/6 -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@743 PS6, Line 743: assert result.data == [r'''1985 text > tab used for whitespace This is intentional to check the output contains a tab. http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@747 PS6, Line 747: assert result.data == [r'''1985 text > tab used for whitespace This is intentional to check the output contains a tab. http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@751 PS6, Line 751: assert result.data == [r'''1985 text > tab used for whitespace This is intentional to check the output contains a tab. http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@755 PS6, Line 755: assert result.data == [r'''1985 text > tab used for whitespace This is intentional to check the output contains a tab. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 09 Oct 2019 10:40:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@743 PS6, Line 743: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@747 PS6, Line 747: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@751 PS6, Line 751: assert result.data == [r'''1985 text tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@755 PS6, Line 755: assert result.data == [r'''1985 text tab used for whitespace -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 09 Oct 2019 10:38:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 5: (12 comments) http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@743 PS5, Line 743: r flake8: W291 trailing whitespace http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@743 PS5, Line 743: assert result.data == [r'''1985 line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@743 PS5, Line 743: assert result.data == [r'''1985 tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@747 PS5, Line 747: r flake8: W291 trailing whitespace http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@747 PS5, Line 747: assert result.data == [r'''1985 line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@747 PS5, Line 747: assert result.data == [r'''1985 tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@751 PS5, Line 751: r flake8: W291 trailing whitespace http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@751 PS5, Line 751: assert result.data == [r'''1985 line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@751 PS5, Line 751: assert result.data == [r'''1985 tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@755 PS5, Line 755: r flake8: W291 trailing whitespace http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@755 PS5, Line 755: assert result.data == [r'''1985 line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@755 PS5, Line 755: assert result.data == [r'''1985 tab used for whitespace -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 09 Oct 2019 10:29:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Hello Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14291 to look at the new patch set (#5). Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 This patch adds additional datetime format tokens on top of what was introduced with Milestone 1 (IMPALA-8703). The tokens introduced: - Free text token: Surrounded by double quotes, a free text section can be given in the format where the same text is expected in the input without the surrounding double quotes. - FX modifier: This modifier has to be given at the beginning of the format and is valid for the whole format. In a string to datetime conversion this forces strict separator matching and expects all the tokens in the input to have the same length as their maximum length. E.g. A month has to be of length 2 prefixed by zero if needed. This is the default in a datetime to string conversion. - FM modifier: This modifier affects only the following token and overrides the FX modifier. Using this the value of a token can be shorter than the max length if followed by a separator. E.g. 1-digit month, less than 4-digit year, etc. This is the default behaviour in a string to datetime conversion. Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 --- M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-parser.cc M be/src/runtime/datetime-iso-sql-format-parser.h M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/timestamp-parse-util.cc M tests/query_test/test_cast_with_format.py 9 files changed, 616 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/5 -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4761/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 09 Oct 2019 10:25:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 4: (16 comments) http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@744 PS4, Line 744: 12-10'''] flake8: E101 indentation contains mixed spaces and tabs http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@744 PS4, Line 744: 12-10'''] flake8: W191 indentation contains tabs http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@744 PS4, Line 744: 12-10'''] tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@745 PS4, Line 745: result = self.client.execute(r'''select cast(cast("1985-12-11" as date) as string ''' flake8: E101 indentation contains mixed spaces and tabs http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@748 PS4, Line 748: 12-11'''] flake8: E101 indentation contains mixed spaces and tabs http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@748 PS4, Line 748: 12-11'''] flake8: W191 indentation contains tabs http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@748 PS4, Line 748: 12-11'''] tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@749 PS4, Line 749: result = self.client.execute(r'''select cast(cast("1985-12-12" as timestamp) as ''' flake8: E101 indentation contains mixed spaces and tabs http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@752 PS4, Line 752: 12-12'''] flake8: E101 indentation contains mixed spaces and tabs http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@752 PS4, Line 752: 12-12'''] flake8: W191 indentation contains tabs http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@752 PS4, Line 752: 12-12'''] tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@753 PS4, Line 753: result = self.client.execute(r'''select cast(cast("1985-12-13" as timestamp) as ''' flake8: E101 indentation contains mixed spaces and tabs http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@756 PS4, Line 756: 12-13'''] flake8: E101 indentation contains mixed spaces and tabs http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@756 PS4, Line 756: 12-13'''] flake8: W191 indentation contains tabs http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@756 PS4, Line 756: 12-13'''] tab used for whitespace http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@758 PS4, Line 758: # Escaped backslash in text token. flake8: E101 indentation contains mixed spaces and tabs -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 09 Oct 2019 09:45:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py@724 PS3, Line 724: 'format '-"some \ text"MM-DD' > I'm confused about the usage of backslash in this format string and the for Well, being honest I got also confused by all this escaping and such :) Anyway the first bullet point seems correct for me and in the second the output shouldn't contain a backslash. Will do the change to work accordingly. http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py@730 PS3, Line 730: s > Probably the apostrophe can be removed for clarity. I assume the - separato Done http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py@731 PS3, Line 731: '-"\some text"MM-DD' > Again, the backslash is used here to escape the s character right? See above http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py@735 PS3, Line 735: '1985-some text12-08' > Please use a raw string here too, to make it explicit if the backslash mean Done -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 09 Oct 2019 09:45:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Hello Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14291 to look at the new patch set (#4). Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 This patch adds additional datetime format tokens on top of what was introduced with Milestone 1 (IMPALA-8703). The tokens introduced: - Free text token: Surrounded by double quotes, a free text section can be given in the format where the same text is expected in the input without the surrounding double quotes. - FX modifier: This modifier has to be given at the beginning of the format and is valid for the whole format. In a string to datetime conversion this forces strict separator matching and expects all the tokens in the input to have the same length as their maximum length. E.g. A month has to be of length 2 prefixed by zero if needed. This is the default in a datetime to string conversion. - FM modifier: This modifier affects only the following token and overrides the FX modifier. Using this the value of a token can be shorter than the max length if followed by a separator. E.g. 1-digit month, less than 4-digit year, etc. This is the default behaviour in a string to datetime conversion. Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 --- M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-parser.cc M be/src/runtime/datetime-iso-sql-format-parser.h M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/timestamp-parse-util.cc M tests/query_test/test_cast_with_format.py 9 files changed, 616 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/4 -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py@735 PS3, Line 735: '1985-\some text12-08 Please use a raw string here too, to make it explicit if the backslash means a literal backslash or an escape character. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 03 Oct 2019 15:32:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py@724 PS3, Line 724: format '-"some \ text"MM-DD') I'm confused about the usage of backslash in this format string and the format string in L727. Is it used for a literal backslash character or for escaping the space character that follows it? 1. For instance a slightly modified select: select cast("1985- some text12-05" as date format '-"some \ text"MM-DD'); Returns : 1985-12-05 Which makes me think that the backslash in the format string escapes the space. 2. On the other hand the select in L726 select cast(cast("1985-12-06" as date) as string format '-"some \ text"MM-DD') Returns: 1985-some \ text12-06 Which makes me think that the backslash in the format string is supposed to be taken literally. http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py@730 PS3, Line 730: ' Probably the apostrophe can be removed for clarity. I assume the - separator in the format string matches -' in the input string. http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py@731 PS3, Line 731: -"\some text"MM-DD') Again, the backslash is used here to escape the s character right? However the backslash in the format string in L714 is interpreted literally. It could be interesting to try running these with Hive and check how it handles backslashes. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 03 Oct 2019 15:15:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4686/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 01 Oct 2019 13:32:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/14291/2/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/2/tests/query_test/test_cast_with_format.py@578 PS2, Line 578: test_text_token > Thanks for spotting this. The code is not really prepared to handle backsla Update: I did some adjustments to include backslash in the format and input. You can now do so and even include some special chars like '\n' or '\t'. Note, that it's still not feasible to go super wild with the backslashes and include multiple ones after each other. I'm not sure if this is worth putting more efforts into because I don't see any real life scenarios where that would be needed. What do you think? -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 01 Oct 2019 12:58:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Hello Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14291 to look at the new patch set (#3). Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 This patch adds additional datetime format tokens on top of what was introduced with Milestone 1 (IMPALA-8703). The tokens introduced: - Free text token: Surrounded by double quotes, a free text section can be given in the format where the same text is expected in the input without the surrounding double quotes. - FX modifier: This modifier has to be given at the beginning of the format and is valid for the whole format. In a string to datetime conversion this forces strict separator matching and expects all the tokens in the input to have the same length as their maximum length. E.g. A month has to be of length 2 prefixed by zero if needed. This is the default in a datetime to string conversion. - FM modifier: This modifier affects only the following token and overrides the FX modifier. Using this the value of a token can be shorter than the max length if followed by a separator. E.g. 1-digit month, less than 4-digit year, etc. This is the default behaviour in a string to datetime conversion. Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 --- M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-parser.cc M be/src/runtime/datetime-iso-sql-format-parser.h M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/timestamp-parse-util.cc M tests/query_test/test_cast_with_format.py 9 files changed, 534 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/3 -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/14291/2/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/2/tests/query_test/test_cast_with_format.py@578 PS2, Line 578: test_text_token > I'm confused about how a backslash character should be represented inside t Thanks for spotting this. The code is not really prepared to handle backslashes inside the text token. What makes this complicated is that when parsing we don't know if the content of the text token is escaped or double escaped (latter can happen when the surrounding double quotes of the text token are escaped themselves.) so we won't know how many backslashes to skip. I'll give this a second though, or if I can't figure out anything we might want to emphasize that backslashes are not supported only for escaping double quotes. http://gerrit.cloudera.org:8080/#/c/14291/2/tests/query_test/test_cast_with_format.py@580 PS2, Line 580: ''' : r''' > You can probably remove these from the end of the L580 and the beginning of It might work, but I wanted to not include a new line char in the query string. I don't see much benefit of removing these. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 27 Sep 2019 11:36:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/14291/2/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/2/tests/query_test/test_cast_with_format.py@578 PS2, Line 578: test_text_token I'm confused about how a backslash character should be represented inside the text format token: This works: select cast('1985 \ 11-28' as date format '" \ "MM-DD'); and this works too: select cast('1985 \ 11-28' as date format '" \\ "MM-DD'); Is this the expected behavior or a bug? http://gerrit.cloudera.org:8080/#/c/14291/2/tests/query_test/test_cast_with_format.py@580 PS2, Line 580: ''' : r''' You can probably remove these from the end of the L580 and the beginning of L581, here and elsewhere in the method. These kind of strings in python may span multiple lines. r'''select cast('1985-11-19T01:02:03Z' as timestamp format '-MM-DD"T"HH24:MI:SS"Z"')''' The \n character will become part of the string, but in this case it probably won't be a problem. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 26 Sep 2019 17:41:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4648/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 26 Sep 2019 14:29:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Hello Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14291 to look at the new patch set (#2). Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 This patch adds additional datetime format tokens on top of what was introduced with Milestone 1 (IMPALA-8703). The tokens introduced: - Free text token: Surrounded by double quotes, a free text section can be given in the format where the same text is expected in the input without the surrounding double quotes. - FX modifier: This modifier has to be given at the beginning of the format and is valid for the whole format. In a string to datetime conversion this forces strict separator matching and expects all the tokens in the input to have the same length as their maximum length. E.g. A month has to be of length 2 prefixed by zero if needed. This is the default in a datetime to string conversion. - FM modifier: This modifier affects only the following token and overrides the FX modifier. Using this the value of a token can be shorter than the max length if followed by a separator. E.g. 1-digit month, less than 4-digit year, etc. This is the default behaviour in a string to datetime conversion. Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 --- M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-parser.cc M be/src/runtime/datetime-iso-sql-format-parser.h M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/timestamp-parse-util.cc M tests/query_test/test_cast_with_format.py 9 files changed, 482 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/2 -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/14291/1/be/src/runtime/date-parse-util.cc File be/src/runtime/date-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/14291/1/be/src/runtime/date-parse-util.cc@158 PS1, Line 158: > Last parameter 3 can be removed, MONTH_ARRAY[] elements are already 3 char Done http://gerrit.cloudera.org:8080/#/c/14291/1/be/src/runtime/date-parse-util.cc@172 PS1, Line 172: // Remove the escaping from the double quotes ins > I think, this works only if the format string is between double quotes. Wh The first replace is used when the format is between double quotes, the text token is surrounded by escaped double quotes and there is a double escaped double quote inside. E.g.: ... FORMAT "\"text1\\\"text2\"") The replace below is used when the there is an escaped double quote inside the text token (not double-escaped). Note that these replace calls are running only the content of the text token and not for the surrounding double or single quotes. If the format is covered with single quotes than the forst replace doesn't have to run because the quote inside the text token won't be double-escaped. http://gerrit.cloudera.org:8080/#/c/14291/1/be/src/runtime/date-parse-util.cc@172 PS1, Line 172:// Remove the escaping from the double quotes inside the text token. : // If the format is surrounded by double quot > Please add some comments about these calls. Done http://gerrit.cloudera.org:8080/#/c/14291/1/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/1/tests/query_test/test_cast_with_format.py@595 PS1, Line 595: r'''select cast('1985-11-21' as timestamp ''' : r'''format '""""-""MM""-""DD""')''') > This and other strings would be easier to read if you used python's raw-str Thanks for the comment, this is in fact way more readable with raw-strings. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 26 Sep 2019 13:49:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 1: (4 comments) I took a quick look. I'll dive into it more tomorrow. http://gerrit.cloudera.org:8080/#/c/14291/1/be/src/runtime/date-parse-util.cc File be/src/runtime/date-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/14291/1/be/src/runtime/date-parse-util.cc@158 PS1, Line 158: , 3) Last parameter 3 can be removed, MONTH_ARRAY[] elements are already 3 char length. http://gerrit.cloudera.org:8080/#/c/14291/1/be/src/runtime/date-parse-util.cc@172 PS1, Line 172: boost::replace_all(text_token, "\\\"", "\""); I think, this works only if the format string is between double quotes. What happens if format string is between apostrophe characters? http://gerrit.cloudera.org:8080/#/c/14291/1/be/src/runtime/date-parse-util.cc@172 PS1, Line 172:boost::replace_all(text_token, "\\\"", "\""); : boost::replace_all(text_token, "\\\"", "\""); Please add some comments about these calls. http://gerrit.cloudera.org:8080/#/c/14291/1/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/1/tests/query_test/test_cast_with_format.py@595 PS1, Line 595: "select cast('1985-11-21' as timestamp " : "format '\"\"\"\"-\"\"MM\"\"-\"\"DD\"\"')" This and other strings would be easier to read if you used python's raw-string notation. For instance, instead of: "select cast('1985-11-21' as timestamp format '\"\"\"\"-\"\"MM\"\"-\"\"DD\"\"')" you could write: r'''select cast('1985-11-21' as timestamp format '""""-""MM""-""DD""')''' no need to escape " ' \ characters inside raw strings. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 24 Sep 2019 16:08:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 ) Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4625/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 24 Sep 2019 13:43:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14291 Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 .. IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 This patch adds additional datetime format tokens on top of what was introduced with Milestone 1 (IMPALA-8703). The tokens introduced: - Free text token: Surrounded by double quotes, a free text section can be given in the format where the same text is expected in the input without the surrounding double quotes. - FX modifier: This modifier has to be given at the beginning of the format and is valid for the whole format. In a string to datetime conversion this forces strict separator matching and expects all the tokens in the input to have the same length as their maximum length. E.g. A month has to be of length 2 prefixed by zero if needed. This is the default in a datetime to string conversion. - FM modifier: This modifier affects only the following token and overrides the FX modifier. Using this the value of a token can be shorter than the max length if followed by a separator. E.g. 1-digit month, less than 4-digit year, etc. This is the default behaviour in a string to datetime conversion. Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 --- M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-parser.cc M be/src/runtime/datetime-iso-sql-format-parser.h M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/timestamp-parse-util.cc M tests/query_test/test_cast_with_format.py 9 files changed, 473 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/1 -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab