[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-11-07 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..

IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

This patch adds additional datetime format tokens on top of what was
introduced with Milestone 1 (IMPALA-8703).

The tokens introduced:
- Free text token: Surrounded by double quotes, a free text section
  can be given in the format where the same text is expected in the
  input without the surrounding double quotes.
- FX modifier: This modifier has to be given at the beginning of the
  format and is valid for the whole format.
  In a string to datetime conversion this forces strict separator
  matching and expects all the tokens in the input to have the same
  length as their maximum length. E.g. A month has to be of length 2
  prefixed by zero if needed.
  This is the default in a datetime to string conversion.
- FM modifier: This modifier affects only the following token and
  overrides the FX modifier.
  In a string to datetime conversion when using this modifier the
  length of the token in the input can be shorter than the max length
  of that token type if followed by a separator. E.g. 1-digit month,
  less than 4-digit year, etc. This is the default behaviour in a
  string to datetime conversion.
  In a datetime to string conversion tokens with this modifier aren't
  padded to the maximum length. Example output: "2010-1-9".

Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Reviewed-on: http://gerrit.cloudera.org:8080/14291
Tested-by: Impala Public Jenkins 
Reviewed-by: Attila Jeges 
---
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/datetime-iso-sql-format-parser.h
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/timestamp-parse-util.cc
M tests/query_test/test_cast_with_format.py
9 files changed, 699 insertions(+), 48 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Attila Jeges: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 16
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-11-07 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 15: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 07 Nov 2019 16:02:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-11-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 15: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 06 Nov 2019 16:11:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-11-06 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@237
PS14, Line 237: DCHECK(*current_pos == dt_ctx_->fmt);
> I meant it like this:
Hmm. This might make the function itself a bit more readable, however, on the 
callsite it would become a bit strange.

const char* current_pos = dt_ctx_->fmt;
current_pos = ProcessFXModifier(_pos);
Here L1 would loose its purpose so we might want to do something like this:
const char* current_pos = ProcessFXModifier(_pos);

But I'm not that confident about initialising current_pos from the return value 
of ProcessFXModifier(). At first glance it's not really straightforward what 
value to expect in current_pos after.

>From these options still the current one feels the most comfortable for me.



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 06 Nov 2019 13:24:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-11-06 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@237
PS14, Line 237: DCHECK(*current_pos == dt_ctx_->fmt);
> I have to pass **current_pos into this function as a param because the func
I meant it like this:

char* IsoSqlFormatTokenizer::ProcessFXModifier() const {
  DCHECK(dt_ctx_->fmt != nullptr);
  const char* current_pos = dt_ctx_->fmt;
  if (strncmp(current_pos, "FX", 2) == 0) {
dt_ctx_->fx_modifier = true;
current_pos += 2;
  }
  return current_pos;
}

and then in L86 you can call it like this:
current_pos = ProcessFXModifier()



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 14
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 06 Nov 2019 12:45:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-11-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4961/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 06 Nov 2019 12:32:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-11-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 15:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5179/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 06 Nov 2019 11:49:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-11-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 15:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14291/15/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/15/tests/query_test/test_cast_with_format.py@751
PS15, Line 751: assert result.data == [r'''1985 text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/15/tests/query_test/test_cast_with_format.py@755
PS15, Line 755: assert result.data == [r'''1985 text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/15/tests/query_test/test_cast_with_format.py@759
PS15, Line 759: assert result.data == [r'''1985 text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/15/tests/query_test/test_cast_with_format.py@763
PS15, Line 763: assert result.data == [r'''1985 text
tab used for whitespace



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 06 Nov 2019 11:49:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-11-06 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 15:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14291/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14291/14//COMMIT_MSG@28
PS14, Line 28: This is the default behaviour
> What does "default behavior" mean in this context? Is it when no FX or FM m
Default is when neither FX or FM are given.


http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@115
PS14, Line 115: fm_modifier_active_ = fa
> nit: not necessary, you can just set the flag to false no matter what.
Done


http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@237
PS14, Line 237: DCHECK(*current_pos == dt_ctx_->fmt);
> It would be more straightforward if ProcessFXModifier() did not take any pa
I have to pass **current_pos into this function as a param because the function 
is expected to advance the pointer after the FX modifier.



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 06 Nov 2019 11:48:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-11-06 Thread Gabor Kaszab (Code Review)
Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14291

to look at the new patch set (#15).

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..

IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

This patch adds additional datetime format tokens on top of what was
introduced with Milestone 1 (IMPALA-8703).

The tokens introduced:
- Free text token: Surrounded by double quotes, a free text section
  can be given in the format where the same text is expected in the
  input without the surrounding double quotes.
- FX modifier: This modifier has to be given at the beginning of the
  format and is valid for the whole format.
  In a string to datetime conversion this forces strict separator
  matching and expects all the tokens in the input to have the same
  length as their maximum length. E.g. A month has to be of length 2
  prefixed by zero if needed.
  This is the default in a datetime to string conversion.
- FM modifier: This modifier affects only the following token and
  overrides the FX modifier.
  In a string to datetime conversion when using this modifier the
  length of the token in the input can be shorter than the max length
  of that token type if followed by a separator. E.g. 1-digit month,
  less than 4-digit year, etc. This is the default behaviour in a
  string to datetime conversion.
  In a datetime to string conversion tokens with this modifier aren't
  padded to the maximum length. Example output: "2010-1-9".

Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
---
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/datetime-iso-sql-format-parser.h
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/timestamp-parse-util.cc
M tests/query_test/test_cast_with_format.py
9 files changed, 699 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/15
--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-11-04 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 14: Code-Review+1

(3 comments)

One more round. I'll +2 it afterwards.

http://gerrit.cloudera.org:8080/#/c/14291/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14291/14//COMMIT_MSG@28
PS14, Line 28: This is the default behaviour
What does "default behavior" mean in this context? Is it when no FX or FM 
modifier is used? Or FX is used but FM isn't?


http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@115
PS14, Line 115: if (fm_modifier_active_)
nit: not necessary, you can just set the flag to false no matter what.


http://gerrit.cloudera.org:8080/#/c/14291/14/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@237
PS14, Line 237: DCHECK(*current_pos == dt_ctx_->fmt);
It would be more straightforward if ProcessFXModifier() did not take any 
parameters and returned a char* instead of void.

No need to pass current_pos as a parameter if *current_pos is expected to be 
set to dt_ctx_->fmt.



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 14
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 04 Nov 2019 17:12:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4918/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 14
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 31 Oct 2019 14:12:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 14:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14291/14/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/14/tests/query_test/test_cast_with_format.py@751
PS14, Line 751: assert result.data == [r'''1985 text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/14/tests/query_test/test_cast_with_format.py@755
PS14, Line 755: assert result.data == [r'''1985 text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/14/tests/query_test/test_cast_with_format.py@759
PS14, Line 759: assert result.data == [r'''1985 text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/14/tests/query_test/test_cast_with_format.py@763
PS14, Line 763: assert result.data == [r'''1985 text
tab used for whitespace



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 14
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 31 Oct 2019 13:28:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-31 Thread Gabor Kaszab (Code Review)
Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14291

to look at the new patch set (#14).

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..

IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

This patch adds additional datetime format tokens on top of what was
introduced with Milestone 1 (IMPALA-8703).

The tokens introduced:
- Free text token: Surrounded by double quotes, a free text section
  can be given in the format where the same text is expected in the
  input without the surrounding double quotes.
- FX modifier: This modifier has to be given at the beginning of the
  format and is valid for the whole format.
  In a string to datetime conversion this forces strict separator
  matching and expects all the tokens in the input to have the same
  length as their maximum length. E.g. A month has to be of length 2
  prefixed by zero if needed.
  This is the default in a datetime to string conversion.
- FM modifier: This modifier affects only the following token and
  overrides the FX modifier.
  In a string to datetime conversion when using this modifier the
  length of the token in the input can be shorter than the max length
  of that token type if followed by a separator. E.g. 1-digit month,
  less than 4-digit year, etc. This is the default behaviour in a
  string to datetime conversion.
  In a datetime to string conversion tokens with this modifier aren't
  padded to the maximum length. Example output: "2010-1-9".

Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
---
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/datetime-iso-sql-format-parser.h
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/timestamp-parse-util.cc
M tests/query_test/test_cast_with_format.py
9 files changed, 699 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/14
--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 14
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-30 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14291/12/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/14291/12/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@124
PS12, Line 124:  if (token->second.type == FX_MODIFIER) {
  : if (used_tokens_.size() > 0 || dt_ctx_->fx_modifier) {
  :   return MISPLACED_FX_MODIFIER_ERROR;
  : }
  : dt_ctx_->fx_modifier = true;
  : *current_pos += curr_token_size;
  : return SUCCESS;
  :   }
'FMFX-MM-DD' is another format string that should be rejected but currently 
it is not.



-- 
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 12
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 30 Oct 2019 14:22:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-30 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 12:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/14291/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14291/12//COMMIT_MSG@24
PS12, Line 24: Using this the value of a token can be
 :   shorter than the max length if followed by a separator.
Please clarify that this is about datetime to string conversion.


http://gerrit.cloudera.org:8080/#/c/14291/12/be/src/runtime/datetime-iso-sql-format-parser.h
File be/src/runtime/datetime-iso-sql-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/14291/12/be/src/runtime/datetime-iso-sql-format-parser.h@81
PS12, Line 81: a
nit: the


http://gerrit.cloudera.org:8080/#/c/14291/12/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/14291/12/be/src/runtime/datetime-iso-sql-format-parser.cc@238
PS12, Line 238:   DCHECK(current_tok_idx != nullptr && *current_tok_idx < 
dt_ctx.toks.size());
Add
&& dt_ctx.toks[*current_tok_idx].type == SEPARATOR


http://gerrit.cloudera.org:8080/#/c/14291/12/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/14291/12/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@113
PS12, Line 113:   if (IsStartOfTextToken(*current_pos)) {
  : return ProcessTextToken(current_pos, str_begin, str_end);
  :   }
What if text token is preceded by an FM modifier?

I did some testing and it looks like in a format string like 
'FX-MM-FM"text"DD' the FM-modifier applies to DD token, instead of the 
"text" token.

I think we need to do here something like L141-144.

ProcessSeaprators() has a similar problem. In 'FX-MM-FM-DD' FM applies to 
DD token instead of the '-' separator.


http://gerrit.cloudera.org:8080/#/c/14291/12/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@124
PS12, Line 124:  if (token->second.type == FX_MODIFIER) {
  : if (used_tokens_.size() > 0 || dt_ctx_->fx_modifier) {
  :   return MISPLACED_FX_MODIFIER_ERROR;
  : }
  : dt_ctx_->fx_modifier = true;
  : *current_pos += curr_token_size;
  : return SUCCESS;
  :   }
This still allows format strings like:
'--FX-MM-DD'
or
'"text"FX-MM-DD'

Maybe it would be easier to parse the optional FX modifier before calling 
ProcessNextToken() in a loop.



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 12
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 30 Oct 2019 14:06:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4898/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 12
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 29 Oct 2019 18:22:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-29 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 12:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-parser.cc@214
PS11, Line 214:   DCHECK(tok->val <= *format && *format < tok->val + tok->len);
> DCHECK(tok->val <= *format < tok->val + tok->len);
Done


http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-parser.cc@217
PS11, Line 217: ped quo
> Is it possible that strncmp (here or in the next line) overruns [tok->val,
Indeed, the text token part is not null-terminated. This wouldn't cause any 
issues in my opinion es the format as a whole is null-terminated. Adding an 
extra check to be on the safe side.


http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-parser.cc@255
PS11, Line 255: if (*current_pos >= end_pos && *current_tok_idx < 
dt_ctx.toks.size()) {
> Add *current_tok_idx < dt_ctx.toks.size() condition to the while condition
Thx, done.


http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@84
PS11, Line 84: const char* str_end = dt_ctx_->fmt + dt_ctx_->fmt_len;
> As explained in L271, 'str_end' is not needed.
See answer in L271.


http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@111
PS11, Line 111: const char* str_end = dt_ctx_->fmt + dt_ctx_->fmt_len;
> As explained in L271, 'str_end' is not needed.
See answer in L271


http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@271
PS11, Line 271: while (current_pos < str_end) {
> If we assume that 'str_start' points to a null-terminated string, we don't
Yeah, technically I could drop 'str_end' and compare 'current_pos' to '\0'. 
However, I feel that having 'str_end' improves code readability a bit.


http://gerrit.cloudera.org:8080/#/c/14291/11/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/11/tests/query_test/test_cast_with_format.py@1101
PS11, Line 1101: FXFMFX-MM-DD
> 'FXFX-MM-DD' should fail too right?
Done



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 12
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 29 Oct 2019 17:39:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-29 Thread Gabor Kaszab (Code Review)
Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14291

to look at the new patch set (#12).

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..

IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

This patch adds additional datetime format tokens on top of what was
introduced with Milestone 1 (IMPALA-8703).

The tokens introduced:
- Free text token: Surrounded by double quotes, a free text section
  can be given in the format where the same text is expected in the
  input without the surrounding double quotes.
- FX modifier: This modifier has to be given at the beginning of the
  format and is valid for the whole format.
  In a string to datetime conversion this forces strict separator
  matching and expects all the tokens in the input to have the same
  length as their maximum length. E.g. A month has to be of length 2
  prefixed by zero if needed.
  This is the default in a datetime to string conversion.
- FM modifier: This modifier affects only the following token and
  overrides the FX modifier. Using this the value of a token can be
  shorter than the max length if followed by a separator.
  E.g. 1-digit month, less than 4-digit year, etc. This is the
  default behaviour in a string to datetime conversion.

Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
---
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/datetime-iso-sql-format-parser.h
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/timestamp-parse-util.cc
M tests/query_test/test_cast_with_format.py
9 files changed, 659 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/12
--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 12
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 12:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14291/12/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/12/tests/query_test/test_cast_with_format.py@751
PS12, Line 751: assert result.data == [r'''1985 text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/12/tests/query_test/test_cast_with_format.py@755
PS12, Line 755: assert result.data == [r'''1985 text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/12/tests/query_test/test_cast_with_format.py@759
PS12, Line 759: assert result.data == [r'''1985 text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/12/tests/query_test/test_cast_with_format.py@763
PS12, Line 763: assert result.data == [r'''1985 text
tab used for whitespace



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 12
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 29 Oct 2019 17:38:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-29 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 11:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-parser.cc@214
PS11, Line 214:   DCHECK(tok != nullptr);
DCHECK(tok->val <= *format < tok->val + tok->len);


http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-parser.cc@217
PS11, Line 217: strncmp
Is it possible that strncmp (here or in the next line) overruns [tok->val, 
tok->val + tok->len) char-range?


http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-parser.cc@255
PS11, Line 255:   while (dt_ctx.toks[*current_tok_idx].type == TEXT &&
Add *current_tok_idx < dt_ctx.toks.size() condition to the while condition 
otherwise we might overrun the dt_ctx.toks buffer.


http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@84
PS11, Line 84: const char* str_end = dt_ctx_->fmt + dt_ctx_->fmt_len;
As explained in L271, 'str_end' is not needed.

You can also add DCHECK(dt_ctx_->fmt[dt_ctx_->fmt_len] == '\0') here.


http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@111
PS11, Line 111: const char* str_end = dt_ctx_->fmt + dt_ctx_->fmt_len;
As explained in L271, 'str_end' is not needed.


http://gerrit.cloudera.org:8080/#/c/14291/11/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@271
PS11, Line 271: while (current_pos < str_end) {
If we assume that 'str_start' points to a null-terminated string, we don't 
really need to use 'str_end' here and elsewhere. You can just do:

while (*current_pos != '\0') {..}


http://gerrit.cloudera.org:8080/#/c/14291/11/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/11/tests/query_test/test_cast_with_format.py@1101
PS11, Line 1101: FXFMFX-MM-DD
'FXFX-MM-DD' should fail too right?

If yes, please add a test for that format string too.



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 29 Oct 2019 13:58:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4894/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 28 Oct 2019 14:00:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-28 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 11:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.h
File be/src/runtime/datetime-iso-sql-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.h@81
PS10, Line 81:  If '*format' points at a beginning of an escape sequence,
 :   // '*forma
> It doesn't move *format to the next character, it moves it to the last char
Indeed this reflect better what happens for 'format'. Done


http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.cc@57
PS10, Line 57: >=
> '>=' might be safer to use here
Done


http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.cc@251
PS10, Line 251:  // If we reached the end of input or the end of token 
sequence, we can return.
  :   if (*current_pos >= end_pos || *current_tok_idx >= 
dt_ctx.toks.size()) {
  : // Skip trailing empty text tokens in format.
  :
> What if we reached the end of input but dt_ctx.toks still contains some emp
Thanks for spotting this. Done


http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h
File be/src/runtime/datetime-iso-sql-format-tokenizer.h:

http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@91
PS10, Line 91: string
> string functions
Done


http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@128
PS10, Line 128:  static bool IsStartOfTextToken(const char* current_pos)
> This should probably be a static function instead of const.
Done


http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@137
PS10, Line 137: str_start
> str_start, here and elsewhere in the comment.
Done


http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@141
PS10, Line 141:   static const char* FindEndOfTextToken(const char* str_start, 
const char* str_end,
  :   bool is_escaped);
> This should be a static function too.
Done


http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@124
PS10, Line 124:   if (token->second.type == FX_MODIFIER) {
  : if (used_tokens_.size() > 0 || dt_ctx_->fx_modifier) {
  :   return MISPLACED_FX_MODIFIER_ERROR;
  : }
  : dt_ctx_->fx_modifier = true;
  : *current_pos += curr_token_size;
  : return SUCCESS;
  :   }
  :   if (token->second.type == FM_MODIFIER) {
  : fm_modifier_active_ = true;
  :
> This allows weird format strings too, e.g.: 'FXFMFMFX-MM-DD'
Initially I thought that giving this freedom to the user wouldn't hurt but 
giving it a second look I feel that this would cause a bit ambiguity. Let me 
restrict that FX can only be given once and at the very beginning.

I could restrict FM to be given only once for a particular token but that 
wouldn't seem that important. "FXFMFM-FMFMDD-MM" is not ambiguous at all as 
it seems that which token is M modified and which isn't.


http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@251
PS10, Line 251: DCHECK(str_begin != nullptr)
> nit: DCHECK(str_begin <= *current_pos && *current_pos < str_end);
Done


http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@254
PS10, Line 254: = (**current
> nit: no need to put is_ecaped inside parentheses.
Done



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 28 Oct 2019 13:43:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 11:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14291/11/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/11/tests/query_test/test_cast_with_format.py@751
PS11, Line 751: assert result.data == [r'''1985 text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/11/tests/query_test/test_cast_with_format.py@755
PS11, Line 755: assert result.data == [r'''1985 text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/11/tests/query_test/test_cast_with_format.py@759
PS11, Line 759: assert result.data == [r'''1985 text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/11/tests/query_test/test_cast_with_format.py@763
PS11, Line 763: assert result.data == [r'''1985 text
tab used for whitespace



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 28 Oct 2019 13:17:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-28 Thread Gabor Kaszab (Code Review)
Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14291

to look at the new patch set (#11).

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..

IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

This patch adds additional datetime format tokens on top of what was
introduced with Milestone 1 (IMPALA-8703).

The tokens introduced:
- Free text token: Surrounded by double quotes, a free text section
  can be given in the format where the same text is expected in the
  input without the surrounding double quotes.
- FX modifier: This modifier has to be given at the beginning of the
  format and is valid for the whole format.
  In a string to datetime conversion this forces strict separator
  matching and expects all the tokens in the input to have the same
  length as their maximum length. E.g. A month has to be of length 2
  prefixed by zero if needed.
  This is the default in a datetime to string conversion.
- FM modifier: This modifier affects only the following token and
  overrides the FX modifier. Using this the value of a token can be
  shorter than the max length if followed by a separator.
  E.g. 1-digit month, less than 4-digit year, etc. This is the
  default behaviour in a string to datetime conversion.

Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
---
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/datetime-iso-sql-format-parser.h
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/timestamp-parse-util.cc
M tests/query_test/test_cast_with_format.py
9 files changed, 653 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/11
--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-18 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 10:

(10 comments)

Thanks for making the changes. Few more comments:

http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.h
File be/src/runtime/datetime-iso-sql-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.h@81
PS10, Line 81:  As a side effect moves '*format' to the next character in the
 :   // format.
It doesn't move *format to the next character, it moves it to the last 
character of the escape sequence. If *format doesn't point at an escape 
sequence, *format is not changed.

Maybe something like this:
"
If '*format' points at a beginning of an escape sequence, '*format' is moved to 
the last character of the escape sequence. Otherwise, '*format' is not changed.
"


http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.cc@57
PS10, Line 57: ==
'>=' might be safer to use here


http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-parser.cc@251
PS10, Line 251:  // If we reached the end of input or the end of token 
sequence, we can return.
  :   if (*current_pos >= end_pos || *current_tok_idx >= 
dt_ctx.toks.size()) {
  : return (*current_pos >= end_pos && *current_tok_idx >= 
dt_ctx.toks.size());
  :   }
What if we reached the end of input but dt_ctx.toks still contains some empty 
TEXT tokens?

select cast('1985-12-09-' as date format '-MM-DD-""');

I think this corner-case should be handled here, instead of just returning 
false.


http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h
File be/src/runtime/datetime-iso-sql-format-tokenizer.h:

http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@91
PS10, Line 91: function
string functions


http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@128
PS10, Line 128:  bool IsStartOfTextToken(const char* current_pos) const;
This should probably be a static function instead of const.


http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@137
PS10, Line 137: start_str
str_start, here and elsewhere in the comment.


http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.h@141
PS10, Line 141:   const char* FindEndOfTextToken(const char* str_start, const 
char* str_end,
  :   bool is_escaped);
This should be a static function too.


http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@124
PS10, Line 124:   if (token->second.type == FX_MODIFIER) {
  : if (used_tokens_.size() > 0) return 
MISPLACED_FX_MODIFIER_ERROR;
  : dt_ctx_->fx_modifier = true;
  : *current_pos += curr_token_size;
  : return SUCCESS;
  :   }
  :   if (token->second.type == FM_MODIFIER) {
  : fm_modifier_active_ = true;
  : *current_pos += curr_token_size;
  : return SUCCESS;
  :   }
This allows weird format strings too, e.g.: 'FXFMFMFX-MM-DD'
Probably these should return an error.


http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@251
PS10, Line 251: DCHECK(str_begin < str_end);
nit: DCHECK(str_begin <= *current_pos && *current_pos < str_end);


http://gerrit.cloudera.org:8080/#/c/14291/10/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@254
PS10, Line 254: (is_escaped)
nit: no need to put is_ecaped inside parentheses.



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 18 Oct 2019 12:05:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-18 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@782
PS7, Line 782: # Strict separator matching.
 : result = self.client.execute("select cast('2001-03-02 
03:10:15' as timestamp format"
 : "'FX MM-DD HH12:MI:SS')")
 : assert result.data == ["NULL"]
 :
 : result = self.client.execute("select cast('2001-03-03 
03:10:15' as timestamp format"
 : "'FX-MM-DD HH12::MI:SS')")
 : assert result.data == ["NULL"]
 :
 : result = self.client.execute("select cast('2001-03-04' 
as timestamp format"
 : "'FX-MM-DD ')")
 : assert result.data == ["NULL"]
 :
 : # Strict token length matching.
 : result = self.client.execute("select cast('2001-3-05' as 
timestamp format "
 : "'FX-MM-DD')")
 : assert result.data == ["NULL"]
> L778 is a positive test for FX modifier. Does that cover what you ask for?
Ok, thanks.



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 18 Oct 2019 12:10:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4819/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 18 Oct 2019 07:06:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-18 Thread Gabor Kaszab (Code Review)
Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14291

to look at the new patch set (#10).

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..

IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

This patch adds additional datetime format tokens on top of what was
introduced with Milestone 1 (IMPALA-8703).

The tokens introduced:
- Free text token: Surrounded by double quotes, a free text section
  can be given in the format where the same text is expected in the
  input without the surrounding double quotes.
- FX modifier: This modifier has to be given at the beginning of the
  format and is valid for the whole format.
  In a string to datetime conversion this forces strict separator
  matching and expects all the tokens in the input to have the same
  length as their maximum length. E.g. A month has to be of length 2
  prefixed by zero if needed.
  This is the default in a datetime to string conversion.
- FM modifier: This modifier affects only the following token and
  overrides the FX modifier. Using this the value of a token can be
  shorter than the max length if followed by a separator.
  E.g. 1-digit month, less than 4-digit year, etc. This is the
  default behaviour in a string to datetime conversion.

Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
---
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/datetime-iso-sql-format-parser.h
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/timestamp-parse-util.cc
M tests/query_test/test_cast_with_format.py
9 files changed, 631 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/10
--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14291/10/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/10/tests/query_test/test_cast_with_format.py@743
PS10, Line 743: assert result.data == [r'''1985 text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/10/tests/query_test/test_cast_with_format.py@747
PS10, Line 747: assert result.data == [r'''1985 text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/10/tests/query_test/test_cast_with_format.py@751
PS10, Line 751: assert result.data == [r'''1985 text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/10/tests/query_test/test_cast_with_format.py@755
PS10, Line 755: assert result.data == [r'''1985 text
tab used for whitespace



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 18 Oct 2019 06:24:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4815/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 17 Oct 2019 18:48:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-17 Thread Gabor Kaszab (Code Review)
Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14291

to look at the new patch set (#9).

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..

IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

This patch adds additional datetime format tokens on top of what was
introduced with Milestone 1 (IMPALA-8703).

The tokens introduced:
- Free text token: Surrounded by double quotes, a free text section
  can be given in the format where the same text is expected in the
  input without the surrounding double quotes.
- FX modifier: This modifier has to be given at the beginning of the
  format and is valid for the whole format.
  In a string to datetime conversion this forces strict separator
  matching and expects all the tokens in the input to have the same
  length as their maximum length. E.g. A month has to be of length 2
  prefixed by zero if needed.
  This is the default in a datetime to string conversion.
- FM modifier: This modifier affects only the following token and
  overrides the FX modifier. Using this the value of a token can be
  shorter than the max length if followed by a separator.
  E.g. 1-digit month, less than 4-digit year, etc. This is the
  default behaviour in a string to datetime conversion.

Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
---
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/datetime-iso-sql-format-parser.h
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/timestamp-parse-util.cc
M tests/query_test/test_cast_with_format.py
9 files changed, 637 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/9
--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-17 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 6:

(14 comments)

Thanks for taking another look!

http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h
File be/src/runtime/datetime-iso-sql-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h@78
PS7, Line 78: format
> text token inside the format
Done


http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h@81
PS7, Line 81: format
> text token
Done


http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h@81
PS7, Line 81: and moves
:   // '*format' to 'a'.
> I don't think part this is true.
Ideed, thx.
Done


http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc@48
PS7, Line 48: if (tok->type == SEPARATOR && !dt_ctx.fx_modifier) {
:   bool res = ProcessSeparatorSequence(_pos, end_pos, 
dt_ctx, );
:   if (!res || current_pos == end_pos) return res;
:   DCHECK(i < dt_ctx.toks.size());
:   // Next token, following the separator sequence.
:   tok = _ctx.toks[i];
: }
> I'd prefer to handle strict separator matching (tok->type == SEPARATOR && d
Done


http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc@56
PS7, Line 56: const char* token_end_pos = FindEndOfToken(current_pos, 
end_pos - current_pos, *tok);
: if (token_end_pos == nullptr) return false;
: int token_len = token_end_pos - current_pos;
:
: if (dt_ctx.fx_modifier && !tok->fm_modifier && tok->type != 
TEXT &&
: token_len != tok->len) {
:   return false;
: }
> for TEXT tokens, 'token_end_pos' and 'token_len' aren't set here. 'token_en
Done


http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc@185
PS7, Line 185: token_len == 1
> dt_ctx.fx_modifier && token_len == 1
Merged this with the loose separator handling. This case is no longer needed 
here.


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@240
PS6, Line 240: strncmp(current_pos, "\\\"", 2) == 0)
> You're correct, it is null-terminated for iso-sql format strings, because t
I see. I wasn't aware that the format is not null-terminated for the 
SingleDateFormat path. Thanks for letting me know!
I'll add comments about this assumption on the IsoSqlFormat branch for both 
IsoSqlFormatTokenizer::dt_ctx_ and IsoSqlFormatParser::ParseDateTime().

Thanks again for spotting this!


http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-parser-common.h
File be/src/runtime/datetime-parser-common.h:

http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-parser-common.h@134
PS7, Line 134: then
> that
Done


http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@738
PS6, Line 738: select cast("1985-\a\b\f\n\r\t\v\'12-09" as '''
 : r'''date format '-"\a\b\f\n\r\t\v\'"MM-DD')
> This test is about special escape sequences but non-special chars in the te
Isn't intentional. Remained from a previous approach I had implemented to 
handle special char. I'm removing the non-special chars from this test.
Done


http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@782
PS7, Line 782: # Strict separator matching.
 : result = self.client.execute("select cast('2001-03-02 
03:10:15' as timestamp format"
 : "'FX MM-DD HH12:MI:SS')")
 : assert result.data == ["NULL"]
 :
 : result = self.client.execute("select cast('2001-03-03 
03:10:15' as timestamp format"
 : "'FX-MM-DD HH12::MI:SS')")
 : assert result.data == ["NULL"]
 :
 : result = self.client.execute("select cast('2001-03-04' 
as timestamp format"
 : "'FX-MM-DD ')")
 : assert result.data == ["NULL"]

[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 9:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14291/9/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/9/tests/query_test/test_cast_with_format.py@743
PS9, Line 743: assert result.data == [r'''1985  text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/9/tests/query_test/test_cast_with_format.py@747
PS9, Line 747: assert result.data == [r'''1985  text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/9/tests/query_test/test_cast_with_format.py@751
PS9, Line 751: assert result.data == [r'''1985  text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/9/tests/query_test/test_cast_with_format.py@755
PS9, Line 755: assert result.data == [r'''1985  text
tab used for whitespace



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 17 Oct 2019 18:05:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-16 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@240
PS6, Line 240: strncmp(current_pos, "\\\"", 2) == 0)
> It is zero-terminated
You're correct, it is null-terminated for iso-sql format strings, because the 
DateTimeFormatContext instance is always created using a null-terminated string 
(in CastFormatExpr::OpenEvaluator()).

On the other hand, DateTimeFormatContext class does not guarantee that 
DateTimeFormatContext::fmt is null-terminated.
E.g. TimestampFunctions::UnixAndFromUnixPrepare() creates a 
DateTimeFormatContext instance where fmt is not null-terminated.
For this reason, we are very careful in SimpleDateFormatTokenizer::Tokenize() 
not to assume that format strings will end with a null-byte.

Maybe add a comment at the beginning of this file to explain that iso-sql 
format strings are always null-terminated?


http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@738
PS6, Line 738: select cast("1985-\a\b\f\n\r\t\v\'12-09" as '''
 : r'''date format '-"\a\b\f\n\r\t\v\'"MM-DD')
This test is about special escape sequences but non-special chars in the text 
token are escaped too. Is that intentional?



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 16 Oct 2019 18:02:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-16 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@768
PS6, Line 768: covered
> Don't see the difference. This seems nit of the nits :D
"covered" and "surrounded" mean very different things:

https://www.thefreedictionary.com/covered
https://www.thefreedictionary.com/surrounded

:)



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 16 Oct 2019 16:08:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4807/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 16 Oct 2019 14:47:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-16 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 6:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-parser.cc@225
PS6, Line 225:  switch (**format) {
 : case 'b': return '\b';
 : case 'n': return '\n';
 : case 'r': return '\r';
 : case 't': return '\t';
 :   }
> Please check what escape sequences are supported by ANSI SQL standard.
Well, I'm not convinced that ANSI SQL is relevant here as according to this 
page it doesn't allow escaping quotes and double quotes.
https://www.ibm.com/support/knowledgecenter/en/SSGU8G_12.1.0/com.ibm.esqlc.doc/ids_esqlc_0015.htm

This doc about MySql is kind of in line with my implementation:
https://www.oreilly.com/library/view/mysql-cookbook/0596001452/ch04s02.html
It also says that 2 sequential double quotes works as an escaped double quote 
same as with a backslash. However, I don't feel the need for this.
Additionally MySql has \0 as a special char (ANSII NULL) but again, don't find 
it important.


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@109
PS6, Line 109: *current_pos != nullptr
> nit: current_pos != nullptr && *current_pos != nullptr
Done


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@112
PS6, Line 112: *current_pos < str_end
> nit: str_begin <= *current_pos && *current_pos < str_end
Done


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@240
PS6, Line 240: strncmp(current_pos, "\\\"", 2) == 0)
> This is safe only if 'current_pos' is zero-terminated, but I don't think it
It is zero-terminated


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@273
PS6, Line 273:  strncmp(*current_pos, "\\\"", 2)
> same as L240.
see above


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@277
PS6, Line 277: strncmp(*current_pos, "\\\"", 4)
> same as L240.
see above


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@282
PS6, Line 282: strncmp(*current_pos, "", 2)
> same as L240.
see above


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-parser-common.cc
File be/src/runtime/datetime-parser-common.cc:

http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-parser-common.cc@201
PS6, Line 201: continue
> nit: can be removed, here and below.
Done


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-parser-common.cc@202
PS6, Line 202:} else if (!tok.is_double_escaped && strncmp(text_it, "\\\"", 
2) == 0) {
 :   result.append("\"");
 :   ++text_it;
 :   continue;
 : } else if (strncmp(text_it, "", 2) == 0) {
 :   result.append("\\");
 :   ++text_it;
 :   continue;
 : } else if (strncmp(text_it, "\\b", 2) == 0) {
 :   result.append("\b");
 :   ++text_it;
 :   continue;
 : } else if (strncmp(text_it, "\\n", 2) == 0) {
 :   result.append("\n");
 :   ++text_it;
 :   continue;
 : } else if (strncmp(text_it, "\\r", 2) == 0) {
 :   result.append("\r");
 :   ++text_it;
 :   continue;
 : } else if (strncmp(text_it, "\\t", 2) == 0) {
 :   result.append("\t");
 :   ++text_it;
 :   continue;
 : }
> Are these the only escape sequences supported by ANSI SQL?
See the answer in datetime-iso-sql-format-parser.cc


http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@663
PS6, Line 663:# Format part is surrounded by double quotes so the quotes 
indicating the start and
 : # end of the text token has to be escaped.
> I did some testing around when the format string is surrounded by ' and the
If the format is surrounded by single quotes but the text double quotes 
surrounding the text token is escaped is a valid scenario. From the code it's 
not possible to decide whether a 

[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14291/8/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/8/tests/query_test/test_cast_with_format.py@743
PS8, Line 743: assert result.data == [r'''1985  text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/8/tests/query_test/test_cast_with_format.py@747
PS8, Line 747: assert result.data == [r'''1985  text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/8/tests/query_test/test_cast_with_format.py@751
PS8, Line 751: assert result.data == [r'''1985  text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/8/tests/query_test/test_cast_with_format.py@755
PS8, Line 755: assert result.data == [r'''1985  text
tab used for whitespace



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 16 Oct 2019 14:02:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-16 Thread Gabor Kaszab (Code Review)
Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14291

to look at the new patch set (#8).

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..

IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

This patch adds additional datetime format tokens on top of what was
introduced with Milestone 1 (IMPALA-8703).

The tokens introduced:
- Free text token: Surrounded by double quotes, a free text section
  can be given in the format where the same text is expected in the
  input without the surrounding double quotes.
- FX modifier: This modifier has to be given at the beginning of the
  format and is valid for the whole format.
  In a string to datetime conversion this forces strict separator
  matching and expects all the tokens in the input to have the same
  length as their maximum length. E.g. A month has to be of length 2
  prefixed by zero if needed.
  This is the default in a datetime to string conversion.
- FM modifier: This modifier affects only the following token and
  overrides the FX modifier. Using this the value of a token can be
  shorter than the max length if followed by a separator.
  E.g. 1-digit month, less than 4-digit year, etc. This is the
  default behaviour in a string to datetime conversion.

Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
---
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/datetime-iso-sql-format-parser.h
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/timestamp-parse-util.cc
M tests/query_test/test_cast_with_format.py
9 files changed, 632 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/8
--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-16 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 7:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h
File be/src/runtime/datetime-iso-sql-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h@78
PS7, Line 78: format
text token inside the format


http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h@81
PS7, Line 81: format
text token


http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h@81
PS7, Line 81: and moves
:   // '*format' to 'a'.
I don't think part this is true.

In GetNextCharFromTextToken() *format is moved to the last character of the 
escape sequence (to '"' in the example given above).


http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc@48
PS7, Line 48: if (tok->type == SEPARATOR && !dt_ctx.fx_modifier) {
:   bool res = ProcessSeparatorSequence(_pos, end_pos, 
dt_ctx, );
:   if (!res || current_pos == end_pos) return res;
:   DCHECK(i < dt_ctx.toks.size());
:   // Next token, following the separator sequence.
:   tok = _ctx.toks[i];
: }
I'd prefer to handle strict separator matching (tok->type == SEPARATOR && 
dt_ctx.fx_modifier) right after this, instead of L183.


http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc@56
PS7, Line 56: const char* token_end_pos = FindEndOfToken(current_pos, 
end_pos - current_pos, *tok);
: if (token_end_pos == nullptr) return false;
: int token_len = token_end_pos - current_pos;
:
: if (dt_ctx.fx_modifier && !tok->fm_modifier && tok->type != 
TEXT &&
: token_len != tok->len) {
:   return false;
: }
for TEXT tokens, 'token_end_pos' and 'token_len' aren't set here. 
'token_end_pos' is set in L170.

For clarity I'd move the code in L170-182 right before L56, and execute L56-193 
only if tok->type != TEXT.


http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc@185
PS7, Line 185: token_len == 1
dt_ctx.fx_modifier && token_len == 1


http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-parser-common.h
File be/src/runtime/datetime-parser-common.h:

http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-parser-common.h@134
PS7, Line 134: then
that



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 16 Oct 2019 12:11:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-16 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@782
PS7, Line 782: # Strict separator matching.
 : result = self.client.execute("select cast('2001-03-02 
03:10:15' as timestamp format"
 : "'FX MM-DD HH12:MI:SS')")
 : assert result.data == ["NULL"]
 :
 : result = self.client.execute("select cast('2001-03-03 
03:10:15' as timestamp format"
 : "'FX-MM-DD HH12::MI:SS')")
 : assert result.data == ["NULL"]
 :
 : result = self.client.execute("select cast('2001-03-04' 
as timestamp format"
 : "'FX-MM-DD ')")
 : assert result.data == ["NULL"]
 :
 : # Strict token length matching.
 : result = self.client.execute("select cast('2001-3-05' as 
timestamp format "
 : "'FX-MM-DD')")
 : assert result.data == ["NULL"]
Maybe add positive tests too for these.

e.g after L783:
select cast('2001 03-02 03:10:15' as timestamp format 'FX MM-DD HH12:MI:SS')


http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@866
PS7, Line 866: doesn't
don't


http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@872
PS7, Line 872: result = self.client.execute("select cast(cast('2001-03-09' 
as date) "
 : "as string format '-FMMM-FMDD')")
 : assert result.data == ["2001-3-9"]
Maybe add a test were year token is FM-modified

e.g.:
select cast(date'0001-03-09' as string format 'FM-FMMM-FMDD')
select cast(date'0001-03-09' as string format 'FMYY-FMMM-FMDD')


http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@882
PS7, Line 882:result = self.client.execute("select cast(cast('2001-04-09' 
as date) "
 : "as string format 'FX-FMMM-FMDD')")
Test FM-modified year token here too.


http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@1045
PS7, Line 1045: "select cast('1985-11-21text' as timestamp format 
'-MM-DD\"text')"
Again, please use raw python strings, here and below.



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 16 Oct 2019 09:47:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-15 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 6:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-parser.cc@225
PS6, Line 225:  switch (**format) {
 : case 'b': return '\b';
 : case 'n': return '\n';
 : case 'r': return '\r';
 : case 't': return '\t';
 :   }
Please check what escape sequences are supported by ANSI SQL standard.
(same as datetime-parser-common.cc:226)


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@109
PS6, Line 109: *current_pos != nullptr
nit: current_pos != nullptr && *current_pos != nullptr


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@112
PS6, Line 112: *current_pos < str_end
nit: str_begin <= *current_pos && *current_pos < str_end


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@240
PS6, Line 240: strncmp(current_pos, "\\\"", 2) == 0)
This is safe only if 'current_pos' is zero-terminated, but I don't think it is. 
Probably you should check first that there are at least 2 characters between 
current_pos and end_pos.


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@273
PS6, Line 273:  strncmp(*current_pos, "\\\"", 2)
same as L240.


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@277
PS6, Line 277: strncmp(*current_pos, "\\\"", 4)
same as L240.


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@282
PS6, Line 282: strncmp(*current_pos, "", 2)
same as L240.


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-parser-common.cc
File be/src/runtime/datetime-parser-common.cc:

http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-parser-common.cc@201
PS6, Line 201: continue
nit: can be removed, here and below.


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-parser-common.cc@202
PS6, Line 202:} else if (!tok.is_double_escaped && strncmp(text_it, "\\\"", 
2) == 0) {
 :   result.append("\"");
 :   ++text_it;
 :   continue;
 : } else if (strncmp(text_it, "", 2) == 0) {
 :   result.append("\\");
 :   ++text_it;
 :   continue;
 : } else if (strncmp(text_it, "\\b", 2) == 0) {
 :   result.append("\b");
 :   ++text_it;
 :   continue;
 : } else if (strncmp(text_it, "\\n", 2) == 0) {
 :   result.append("\n");
 :   ++text_it;
 :   continue;
 : } else if (strncmp(text_it, "\\r", 2) == 0) {
 :   result.append("\r");
 :   ++text_it;
 :   continue;
 : } else if (strncmp(text_it, "\\t", 2) == 0) {
 :   result.append("\t");
 :   ++text_it;
 :   continue;
 : }
Are these the only escape sequences supported by ANSI SQL?

C, for instance supports a bunch of others too:
https://en.wikipedia.org/wiki/Escape_sequences_in_C


http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@663
PS6, Line 663:# Format part is surrounded by double quotes so the quotes 
indicating the start and
 : # end of the text token has to be escaped.
I did some testing around when the format string is surrounded by ' and the 
text token is surrounded by \"

Date to string conversion:

>> select cast(date"1985-12-08" as string format '-MM-DD \"X\"');
returns: 1985-12-08 X
I'm not sure if this should work or return an error instead.

also if only the opening " to the text token is escaped:

>> select cast(date"1985-12-08" as string format '-MM-DD \"X"');
returns 1985-12-08
This should probably return an error.

On the other hand when doing string to date conversion:

>> select cast("1985-12-08 X" as date format '-MM-DD \"X\"');
returns: 1985-12-08

>> select cast("1985-12-08 X" as date format '-MM-DD \"X"');
returns an error.

>> select cast("1985-12-08" as date format '-MM-DD \"X"');

[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4763/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 09 Oct 2019 11:19:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4762/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 09 Oct 2019 11:10:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-09 Thread Gabor Kaszab (Code Review)
Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14291

to look at the new patch set (#6).

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..

IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

This patch adds additional datetime format tokens on top of what was
introduced with Milestone 1 (IMPALA-8703).

The tokens introduced:
- Free text token: Surrounded by double quotes, a free text section
  can be given in the format where the same text is expected in the
  input without the surrounding double quotes.
- FX modifier: This modifier has to be given at the beginning of the
  format and is valid for the whole format.
  In a string to datetime conversion this forces strict separator
  matching and expects all the tokens in the input to have the same
  length as their maximum length. E.g. A month has to be of length 2
  prefixed by zero if needed.
  This is the default in a datetime to string conversion.
- FM modifier: This modifier affects only the following token and
  overrides the FX modifier. Using this the value of a token can be
  shorter than the max length if followed by a separator.
  E.g. 1-digit month, less than 4-digit year, etc. This is the
  default behaviour in a string to datetime conversion.

Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
---
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/datetime-iso-sql-format-parser.h
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/timestamp-parse-util.cc
M tests/query_test/test_cast_with_format.py
9 files changed, 616 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/6
--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-09 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@743
PS6, Line 743: assert result.data == [r'''1985  text
> tab used for whitespace
This is intentional to check the output contains a tab.


http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@747
PS6, Line 747: assert result.data == [r'''1985  text
> tab used for whitespace
This is intentional to check the output contains a tab.


http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@751
PS6, Line 751: assert result.data == [r'''1985  text
> tab used for whitespace
This is intentional to check the output contains a tab.


http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@755
PS6, Line 755: assert result.data == [r'''1985  text
> tab used for whitespace
This is intentional to check the output contains a tab.



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 09 Oct 2019 10:40:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@743
PS6, Line 743: assert result.data == [r'''1985  text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@747
PS6, Line 747: assert result.data == [r'''1985  text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@751
PS6, Line 751: assert result.data == [r'''1985  text
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@755
PS6, Line 755: assert result.data == [r'''1985  text
tab used for whitespace



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 09 Oct 2019 10:38:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 5:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@743
PS5, Line 743: r
flake8: W291 trailing whitespace


http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@743
PS5, Line 743: assert result.data == [r'''1985
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@743
PS5, Line 743: assert result.data == [r'''1985
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@747
PS5, Line 747: r
flake8: W291 trailing whitespace


http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@747
PS5, Line 747: assert result.data == [r'''1985
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@747
PS5, Line 747: assert result.data == [r'''1985
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@751
PS5, Line 751: r
flake8: W291 trailing whitespace


http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@751
PS5, Line 751: assert result.data == [r'''1985
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@751
PS5, Line 751: assert result.data == [r'''1985
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@755
PS5, Line 755: r
flake8: W291 trailing whitespace


http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@755
PS5, Line 755: assert result.data == [r'''1985
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/14291/5/tests/query_test/test_cast_with_format.py@755
PS5, Line 755: assert result.data == [r'''1985
tab used for whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 09 Oct 2019 10:29:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-09 Thread Gabor Kaszab (Code Review)
Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14291

to look at the new patch set (#5).

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..

IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

This patch adds additional datetime format tokens on top of what was
introduced with Milestone 1 (IMPALA-8703).

The tokens introduced:
- Free text token: Surrounded by double quotes, a free text section
  can be given in the format where the same text is expected in the
  input without the surrounding double quotes.
- FX modifier: This modifier has to be given at the beginning of the
  format and is valid for the whole format.
  In a string to datetime conversion this forces strict separator
  matching and expects all the tokens in the input to have the same
  length as their maximum length. E.g. A month has to be of length 2
  prefixed by zero if needed.
  This is the default in a datetime to string conversion.
- FM modifier: This modifier affects only the following token and
  overrides the FX modifier. Using this the value of a token can be
  shorter than the max length if followed by a separator.
  E.g. 1-digit month, less than 4-digit year, etc. This is the
  default behaviour in a string to datetime conversion.

Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
---
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/datetime-iso-sql-format-parser.h
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/timestamp-parse-util.cc
M tests/query_test/test_cast_with_format.py
9 files changed, 616 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/5
--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4761/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 09 Oct 2019 10:25:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 4:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@744
PS4, Line 744:  12-10''']
flake8: E101 indentation contains mixed spaces and tabs


http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@744
PS4, Line 744:  12-10''']
flake8: W191 indentation contains tabs


http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@744
PS4, Line 744:  12-10''']
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@745
PS4, Line 745: result = self.client.execute(r'''select 
cast(cast("1985-12-11" as date) as string '''
flake8: E101 indentation contains mixed spaces and tabs


http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@748
PS4, Line 748:  12-11''']
flake8: E101 indentation contains mixed spaces and tabs


http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@748
PS4, Line 748:  12-11''']
flake8: W191 indentation contains tabs


http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@748
PS4, Line 748:  12-11''']
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@749
PS4, Line 749: result = self.client.execute(r'''select 
cast(cast("1985-12-12" as timestamp) as '''
flake8: E101 indentation contains mixed spaces and tabs


http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@752
PS4, Line 752:  12-12''']
flake8: E101 indentation contains mixed spaces and tabs


http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@752
PS4, Line 752:  12-12''']
flake8: W191 indentation contains tabs


http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@752
PS4, Line 752:  12-12''']
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@753
PS4, Line 753: result = self.client.execute(r'''select 
cast(cast("1985-12-13" as timestamp) as '''
flake8: E101 indentation contains mixed spaces and tabs


http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@756
PS4, Line 756:  12-13''']
flake8: E101 indentation contains mixed spaces and tabs


http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@756
PS4, Line 756:  12-13''']
flake8: W191 indentation contains tabs


http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@756
PS4, Line 756:  12-13''']
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/14291/4/tests/query_test/test_cast_with_format.py@758
PS4, Line 758: # Escaped backslash in text token.
flake8: E101 indentation contains mixed spaces and tabs



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 09 Oct 2019 09:45:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-09 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py@724
PS3, Line 724: 'format '-"some \ text"MM-DD'
> I'm confused about the usage of backslash in this format string and the for
Well, being honest I got also confused by all this escaping and such :)
Anyway the first bullet point seems correct for me and in the second the output 
shouldn't contain a backslash.
Will do the change to work accordingly.


http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py@730
PS3, Line 730: s
> Probably the apostrophe can be removed for clarity. I assume the - separato
Done


http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py@731
PS3, Line 731: '-"\some text"MM-DD'
> Again, the backslash is used here to escape the s character right?
See above


http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py@735
PS3, Line 735: '1985-some text12-08'
> Please use a raw string here too, to make it explicit if the backslash mean
Done



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 09 Oct 2019 09:45:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-09 Thread Gabor Kaszab (Code Review)
Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14291

to look at the new patch set (#4).

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..

IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

This patch adds additional datetime format tokens on top of what was
introduced with Milestone 1 (IMPALA-8703).

The tokens introduced:
- Free text token: Surrounded by double quotes, a free text section
  can be given in the format where the same text is expected in the
  input without the surrounding double quotes.
- FX modifier: This modifier has to be given at the beginning of the
  format and is valid for the whole format.
  In a string to datetime conversion this forces strict separator
  matching and expects all the tokens in the input to have the same
  length as their maximum length. E.g. A month has to be of length 2
  prefixed by zero if needed.
  This is the default in a datetime to string conversion.
- FM modifier: This modifier affects only the following token and
  overrides the FX modifier. Using this the value of a token can be
  shorter than the max length if followed by a separator.
  E.g. 1-digit month, less than 4-digit year, etc. This is the
  default behaviour in a string to datetime conversion.

Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
---
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/datetime-iso-sql-format-parser.h
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/timestamp-parse-util.cc
M tests/query_test/test_cast_with_format.py
9 files changed, 616 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/4
--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-03 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py@735
PS3, Line 735: '1985-\some text12-08
Please use a raw string here too, to make it explicit if the backslash means a 
literal backslash or an escape character.



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 03 Oct 2019 15:32:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-03 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py@724
PS3, Line 724: format '-"some \ text"MM-DD')
I'm confused about the usage of backslash in this format string and the format 
string in L727.

Is it used for a literal backslash character or for escaping the space 
character that follows it?

1. For instance a slightly modified select:

select cast("1985- some  text12-05" as date format '-"some \ text"MM-DD');

Returns : 1985-12-05
Which makes me think that the backslash in the format string escapes the space.

2. On the other hand the select in L726

select cast(cast("1985-12-06" as date) as string format '-"some \ 
text"MM-DD')

Returns: 1985-some \ text12-06
Which makes me think that the backslash in the format string is supposed to be 
taken literally.


http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py@730
PS3, Line 730: '
Probably the apostrophe can be removed for clarity. I assume the - separator in 
the format string matches -' in the input string.


http://gerrit.cloudera.org:8080/#/c/14291/3/tests/query_test/test_cast_with_format.py@731
PS3, Line 731: -"\some text"MM-DD')
Again, the backslash is used here to escape the s character right?

However the backslash in the format string in L714 is interpreted literally.

It could be interesting to try running these with Hive and check how it handles 
backslashes.



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 03 Oct 2019 15:15:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4686/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 01 Oct 2019 13:32:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-01 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14291/2/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/2/tests/query_test/test_cast_with_format.py@578
PS2, Line 578: test_text_token
> Thanks for spotting this. The code is not really prepared to handle backsla
Update: I did some adjustments to include backslash in the format and input. 
You can now do so and even include some special chars like '\n' or '\t'. Note, 
that it's still not feasible to go super wild with the backslashes and include 
multiple ones after each other.
I'm not sure if this is worth putting more efforts into because I don't see any 
real life scenarios where that would be needed. What do you think?



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 01 Oct 2019 12:58:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-10-01 Thread Gabor Kaszab (Code Review)
Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14291

to look at the new patch set (#3).

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..

IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

This patch adds additional datetime format tokens on top of what was
introduced with Milestone 1 (IMPALA-8703).

The tokens introduced:
- Free text token: Surrounded by double quotes, a free text section
  can be given in the format where the same text is expected in the
  input without the surrounding double quotes.
- FX modifier: This modifier has to be given at the beginning of the
  format and is valid for the whole format.
  In a string to datetime conversion this forces strict separator
  matching and expects all the tokens in the input to have the same
  length as their maximum length. E.g. A month has to be of length 2
  prefixed by zero if needed.
  This is the default in a datetime to string conversion.
- FM modifier: This modifier affects only the following token and
  overrides the FX modifier. Using this the value of a token can be
  shorter than the max length if followed by a separator.
  E.g. 1-digit month, less than 4-digit year, etc. This is the
  default behaviour in a string to datetime conversion.

Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
---
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/datetime-iso-sql-format-parser.h
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/timestamp-parse-util.cc
M tests/query_test/test_cast_with_format.py
9 files changed, 534 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/3
--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-09-27 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14291/2/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/2/tests/query_test/test_cast_with_format.py@578
PS2, Line 578: test_text_token
> I'm confused about how a backslash character should be represented inside t
Thanks for spotting this. The code is not really prepared to handle backslashes 
inside the text token. What makes this complicated is that when parsing we 
don't know if the content of the text token is escaped or double escaped 
(latter can happen when the surrounding double quotes of the text token are 
escaped themselves.) so we won't know how many backslashes to skip.
I'll give this a second though, or if I can't figure out anything we might want 
to emphasize that backslashes are not supported only for escaping double quotes.


http://gerrit.cloudera.org:8080/#/c/14291/2/tests/query_test/test_cast_with_format.py@580
PS2, Line 580: '''
 : r'''
> You can probably remove these from the end of the L580 and the beginning of
It might work, but I wanted to not include a new line char in the query string. 
I don't see much benefit of removing these.



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 27 Sep 2019 11:36:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-09-26 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14291/2/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/2/tests/query_test/test_cast_with_format.py@578
PS2, Line 578: test_text_token
I'm confused about how a backslash character should be represented inside the 
text format token:

This works:

select cast('1985 \ 11-28' as date format '" \ "MM-DD');

and this works too:

select cast('1985 \ 11-28' as date format '" \\ "MM-DD');

Is this the expected behavior or a bug?


http://gerrit.cloudera.org:8080/#/c/14291/2/tests/query_test/test_cast_with_format.py@580
PS2, Line 580: '''
 : r'''
You can probably remove these from the end of the L580 and the beginning of 
L581, here and elsewhere in the method. These kind of strings in python may 
span multiple lines.

r'''select cast('1985-11-19T01:02:03Z' as timestamp
format '-MM-DD"T"HH24:MI:SS"Z"')'''

The \n character will become part of the string, but in this case it probably 
won't be a problem.



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 26 Sep 2019 17:41:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4648/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 26 Sep 2019 14:29:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-09-26 Thread Gabor Kaszab (Code Review)
Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14291

to look at the new patch set (#2).

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..

IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

This patch adds additional datetime format tokens on top of what was
introduced with Milestone 1 (IMPALA-8703).

The tokens introduced:
- Free text token: Surrounded by double quotes, a free text section
  can be given in the format where the same text is expected in the
  input without the surrounding double quotes.
- FX modifier: This modifier has to be given at the beginning of the
  format and is valid for the whole format.
  In a string to datetime conversion this forces strict separator
  matching and expects all the tokens in the input to have the same
  length as their maximum length. E.g. A month has to be of length 2
  prefixed by zero if needed.
  This is the default in a datetime to string conversion.
- FM modifier: This modifier affects only the following token and
  overrides the FX modifier. Using this the value of a token can be
  shorter than the max length if followed by a separator.
  E.g. 1-digit month, less than 4-digit year, etc. This is the
  default behaviour in a string to datetime conversion.

Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
---
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/datetime-iso-sql-format-parser.h
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/timestamp-parse-util.cc
M tests/query_test/test_cast_with_format.py
9 files changed, 482 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/2
--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-09-26 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14291/1/be/src/runtime/date-parse-util.cc
File be/src/runtime/date-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/14291/1/be/src/runtime/date-parse-util.cc@158
PS1, Line 158:
> Last parameter 3 can be removed, MONTH_ARRAY[] elements are already 3 char
Done


http://gerrit.cloudera.org:8080/#/c/14291/1/be/src/runtime/date-parse-util.cc@172
PS1, Line 172: // Remove the escaping from the double quotes ins
> I think, this works only if the format string is between double quotes.  Wh
The first replace is used when the format is between double quotes, the text 
token is surrounded by escaped double quotes and there is a double escaped 
double quote inside.  E.g.: ... FORMAT "\"text1\\\"text2\"")

The replace below is used when the there is an escaped double quote inside the 
text token (not double-escaped).

Note that these replace calls are running only the content of the text token 
and not for the surrounding double or single quotes.

If the format is covered with single quotes than the forst replace doesn't have 
to run because the quote inside the text token won't be double-escaped.


http://gerrit.cloudera.org:8080/#/c/14291/1/be/src/runtime/date-parse-util.cc@172
PS1, Line 172:// Remove the escaping from the double quotes inside the 
text token.
 : // If the format is surrounded by double quot
> Please add some comments about these calls.
Done


http://gerrit.cloudera.org:8080/#/c/14291/1/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/1/tests/query_test/test_cast_with_format.py@595
PS1, Line 595: r'''select cast('1985-11-21' as timestamp '''
 : r'''format '""""-""MM""-""DD""')''')
> This and other strings would be easier to read if you used python's raw-str
Thanks for the comment, this is in fact way more readable with raw-strings.



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 26 Sep 2019 13:49:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-09-24 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 1:

(4 comments)

I took a quick look. I'll dive into it more tomorrow.

http://gerrit.cloudera.org:8080/#/c/14291/1/be/src/runtime/date-parse-util.cc
File be/src/runtime/date-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/14291/1/be/src/runtime/date-parse-util.cc@158
PS1, Line 158: , 3)
Last parameter 3 can be removed, MONTH_ARRAY[] elements are already 3 char 
length.


http://gerrit.cloudera.org:8080/#/c/14291/1/be/src/runtime/date-parse-util.cc@172
PS1, Line 172: boost::replace_all(text_token, "\\\"", "\"");
I think, this works only if the format string is between double quotes.  What 
happens if format string is between apostrophe characters?


http://gerrit.cloudera.org:8080/#/c/14291/1/be/src/runtime/date-parse-util.cc@172
PS1, Line 172:boost::replace_all(text_token, "\\\"", "\"");
 : boost::replace_all(text_token, "\\\"", "\"");
Please add some comments about these calls.


http://gerrit.cloudera.org:8080/#/c/14291/1/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/1/tests/query_test/test_cast_with_format.py@595
PS1, Line 595: "select cast('1985-11-21' as timestamp "
 : "format '\"\"\"\"-\"\"MM\"\"-\"\"DD\"\"')"
This and other strings would be easier to read if you used python's raw-string 
notation.

For instance, instead of:
"select cast('1985-11-21' as timestamp format 
'\"\"\"\"-\"\"MM\"\"-\"\"DD\"\"')"

you could write:
r'''select cast('1985-11-21' as timestamp format '""""-""MM""-""DD""')'''

no need to escape " ' \ characters inside raw strings.



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 24 Sep 2019 16:08:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-09-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4625/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 24 Sep 2019 13:43:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

2019-09-24 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/14291


Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
..

IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2

This patch adds additional datetime format tokens on top of what was
introduced with Milestone 1 (IMPALA-8703).

The tokens introduced:
- Free text token: Surrounded by double quotes, a free text section
  can be given in the format where the same text is expected in the
  input without the surrounding double quotes.
- FX modifier: This modifier has to be given at the beginning of the
  format and is valid for the whole format.
  In a string to datetime conversion this forces strict separator
  matching and expects all the tokens in the input to have the same
  length as their maximum length. E.g. A month has to be of length 2
  prefixed by zero if needed.
  This is the default in a datetime to string conversion.
- FM modifier: This modifier affects only the following token and
  overrides the FX modifier. Using this the value of a token can be
  shorter than the max length if followed by a separator.
  E.g. 1-digit month, less than 4-digit year, etc. This is the
  default behaviour in a string to datetime conversion.

Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
---
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/datetime-iso-sql-format-parser.h
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/timestamp-parse-util.cc
M tests/query_test/test_cast_with_format.py
9 files changed, 473 insertions(+), 40 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/14291/1
--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab