[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-10-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has abandoned this change. ( http://gerrit.cloudera.org:8080/8508 
)

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Abandoned

I'm going to mark this as abandoned for now so that it doesn't show up in open 
CR searches. Feel free to revive it if needed.
--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 10
Gerrit-Owner: Jinchul Kim 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Jinchul Kim 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-03-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc@175
PS9, Line 175:   // quote due to the compatibility with Hive's behavior.
> Why do we display a single quote in the middle of the string if we have to 
> split "'foo''bar'" into 'foo' and 'bar'?

The reason this PS and the previous produced the correct result is that they 
split foo''bar into two tokens foo' and bar.

> Could you please explain why we cannot convert into a single token without 
> creating a new string and copying the data? If we need a pair of alloc and 
> memcpy for new tokens, this is not related to the number of broken tokens.

Suppose you want to take the input string foo''bar and convert the '' escape 
sequence to '. If you want to produce a single output token, you need it to be 
foo'bar, which which you can't represent as a pointer into the original string 
+ a length.



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 9
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 06 Mar 2018 01:43:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-03-04 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 9:

(1 comment)

Thanks for your comment.

http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc@175
PS9, Line 175:   // quote due to the compatibility with Hive's behavior.
> I think we still need a comment like my above one, since we are still treat
I had thought "'foo''bar'" should be broken into two tokens as you mentioned 
until I realized foo'bar should be expected string(not foobar). Now I believe 
the escaped single quotes should be in a token for single quoted string.

Why do we display a single quote in the middle of the string if we have to 
split "'foo''bar'" into 'foo' and 'bar'?

Could you please explain why we cannot convert into a single token without 
creating a new string and copying the data? If we need a pair of alloc and 
memcpy for new tokens, this is not related to the number of broken tokens.



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 9
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 05 Mar 2018 00:45:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-03-02 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 10:

(1 comment)

Sorry for the slow turnaround time here - things got crazy for a while.

http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc@175
PS9, Line 175:   // e.g. '''foo''bar''' => 'foo'bar'
> Let me provide a more intuitive algorithm to find escaped single quotes bet
I think we still need a comment like my above one, since we are still treating 
the last single quote found as the start of a new string.

To be honest I think the new approach less intuitive, because we still split 
the literal token up into multiple tokens if there are escaped quotes in the 
middle of the string. It seems like the approach only changes when dealing with 
quotes at the end of the string.

E.g. my understanding is that "'foo''bar'" is still broken into two tokens, 
"foo'" and "bar", whereas "foobar'''" is handled as a single token "foo'".

I think fundamentally we need to do this token-splitting because of the memory 
management - we can't convert "'foo''bar'" into a single token 'foo'bar' 
without creating a new string and copying the data. So my preference is to use 
the previous approach and explain it clearly.



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 10
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 03 Mar 2018 01:09:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-02-21 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 10:

(5 comments)

Thanks for your comments. I've revised the algorithm to find escaped single 
quotes. I think this is more intuitive and readable than previous version.

http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/exprs/expr-test.cc@6631
PS9, Line 6631:   // Test that pairs of single quotes are treated as an escape 
sequence.
> Can you add a comment, e.g.
Done


http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/exprs/expr-test.cc@6632
PS9, Line 6632:   TestStringValue(R"(from_unixtime(0, 'H\'foo\'\'bar\'H'))", 
"0foo'bar0");
> Can you also test the case when the escaped quote is at the end of the stri
Done


http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc@86
PS9, Line 86:
> nit: remove extra line
Done


http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc@175
PS9, Line 175:   // e.g. '''foo''bar''' => 'foo'bar'
> This is a little tricky and needs some explanation. I'd suggest something l
Let me provide a more intuitive algorithm to find escaped single quotes between 
single quotes.


http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc@177
PS9, Line 177:  q
> nit: spaces around +, i.e. should be " + 1"
Done



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 10
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 22 Feb 2018 02:31:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-02-21 Thread Kim Jin Chul (Code Review)
Hello Gabor Kaszab, Attila Jeges, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8508

to look at the new patch set (#10).

Change subject: IMPALA-5237: Support a quoted string in date/time format
..

IMPALA-5237: Support a quoted string in date/time format

Impala does not represent a quoted string at date/time format.
For example, addtional quoted string between the patterns
(e.g. HH\'H\' => 11H). Hive supports this feature, so user wants
to get a same result from Impala. By the way, Impala returns
a different result as below.

* Hive
> select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\'');
08H 39M 24S

* Impala
> select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\'');
08'8' 39'4' 24'0'

The change affects the format pattern for
from_unixtime/from_timestamp/unix_timestamp.

In unix_timestamp, user can also specify a quoted string like this.

> select unix_timestamp('\'Epoch time: \'19700101',
> '\'Epoch time: \'MMdd');
0

By the way, the quoted strings between input and format should be
exactly same and internally string comparison is case-sensitive.

There is one intentional difference between Hive and Impala.
In Impala, the format string should have any date or time patten
as below. Throwing the error/warning is better if Impala cannot
optimize the expression. User must rewrite the query and don't pay
the function call.

* Hive
> select from_unixtime(0, '\'Hello world!\'');
Hello world!

* Impala
> select from_unixtime(0, '\'Hello world!\'');
Bad date/time conversion format: 'Hello world!'

Testing:
Add unit tests: ExprTest.QuotedStringForDateTimeFormat

Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
3 files changed, 144 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/8508/10
--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 10
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-02-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 9:

(5 comments)

Thank you for the fix. Just had a final round of comments to make it a bit 
easier to understand.

http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/exprs/expr-test.cc@6631
PS9, Line 6631:   TestStringValue(R"(from_unixtime(0, 'H\'foo\'\'bar\'H'))", 
"0foo'bar0");
Can you add a comment, e.g.

 // Test that pairs of single quotes are treated as an escape sequence.


http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/exprs/expr-test.cc@6632
PS9, Line 6632:   TestStringValue(R"(from_unixtime(0, 'H\'foo\'H\'bar\'H'))", 
"0foo0bar0");
Can you also test the case when the escaped quote is at the end of the string, 
e.g. 'foobar'''


http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc@86
PS9, Line 86:
nit: remove extra line


http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc@175
PS9, Line 175:   // quote due to the compatibility with Hive's behavior.
This is a little tricky and needs some explanation. I'd suggest something like:

"We treat the first of the pair of quotes as the last character of the current 
string literal and continue parsing from the second quote, treating it as the 
start of a new string literal. This avoids the need to copy the string literal."


http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc@177
PS9, Line 177: +1
nit: spaces around +, i.e. should be " + 1"



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 9
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 20 Feb 2018 23:32:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-02-18 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 1:

(10 comments)

Thanks for the comments. I've applied the update.

http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/exprs/expr-test.cc@6622
PS8, Line 6622: int expected_var_begin, const map& 
expected_offsets) {
> I found a case where your implementation is inconsistent with hive. It look
Done. Thanks for finding the corner case. The case is added.


http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.h@187
PS8, Line 187:   /// len -- length of the string to parse (must be > 0)
> This might change if you fix the inconsistency I mentioned.
As you mentioned in the corner case, I've fixed the inconsistency on the last 
patch set. See 
https://gerrit.cloudera.org/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc 
(#line 177)


http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.h@188
PS8, Line 188:   /// dt_ctx -- date/time format context (must contain valid 
tokens)
> nit: use /// for function comments.
Done


http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@150
PS8, Line 150: ParseFormatTokens
> I think this should be a method of DateTimeFormatContext instead of Timesta
Done


http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@150
PS8, Line 150: ParseFormatTokens
> Maybe DateTimeFormatContext::AppendToken()
Done


http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@155
PS8, Line 155: const
> Use DCHECK_GE macro to get better info on failure.
Done


http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@156
PS8, Line 156: tr_end =
> You can use emplace_back(token_type, position_of_token, ...) instead.
Done


http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@213
PS8, Line 213: while (curr_tok_chr < str_end) {
> Can you convert these other locations in this function to use your new help
Done


http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@518
PS8, Line 518: boost::unordered_map::const_iterator 
iter =
> Use DCHECK_LE instead.
Done


http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@519
PS8, Line 519: TH_INDE
> I think using memcmp() would be better, since strncmp() is meant for null-t
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 1
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 19 Feb 2018 04:38:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-02-18 Thread Kim Jin Chul (Code Review)
Hello Gabor Kaszab, Attila Jeges, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8508

to look at the new patch set (#9).

Change subject: IMPALA-5237: Support a quoted string in date/time format
..

IMPALA-5237: Support a quoted string in date/time format

Impala does not represent a quoted string at date/time format.
For example, addtional quoted string between the patterns
(e.g. HH\'H\' => 11H). Hive supports this feature, so user wants
to get a same result from Impala. By the way, Impala returns
a different result as below.

* Hive
> select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\'');
08H 39M 24S

* Impala
> select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\'');
08'8' 39'4' 24'0'

The change affects the format pattern for
from_unixtime/from_timestamp/unix_timestamp.

In unix_timestamp, user can also specify a quoted string like this.

> select unix_timestamp('\'Epoch time: \'19700101',
> '\'Epoch time: \'MMdd');
0

By the way, the quoted strings between input and format should be
exactly same and internally string comparison is case-sensitive.

There is one intentional difference between Hive and Impala.
In Impala, the format string should have any date or time patten
as below. Throwing the error/warning is better if Impala cannot
optimize the expression. User must rewrite the query and don't pay
the function call.

* Hive
> select from_unixtime(0, '\'Hello world!\'');
Hello world!

* Impala
> select from_unixtime(0, '\'Hello world!\'');
Bad date/time conversion format: 'Hello world!'

Testing:
Add unit tests: ExprTest.QuotedStringForDateTimeFormat

Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
3 files changed, 135 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/8508/9
--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 9
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-02-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@519
PS8, Line 519: strncmp
I think using memcmp() would be better, since strncmp() is meant for 
null-terminated strings, which ours arent.



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 8
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 08 Feb 2018 01:03:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-02-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.h@187
PS8, Line 187:   /// Finding the closing quote and getting the string between 
the quotes.
> Can you comment that there is no way to escape single quotes?
This might change if you fix the inconsistency I mentioned.



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 8
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 08 Feb 2018 01:01:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-02-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 8:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/exprs/expr-test.cc@6622
PS8, Line 6622: TEST_F(ExprTest, QuotedStringForDateTimeFormat) {
I found a case where your implementation is inconsistent with hive. It looks 
like hive treats '' inside quotes as an escape sequence for '.

  hive> select from_unixtime(1492677561, "HH'foo\'\'bar\'");
  OK
  01foo'bar


  [localhost:21000] > select from_unixtime(1492677561, "HH'foo\'\'bar\'");
  Query: select from_unixtime(1492677561, "HH'foo\'\'bar\'")
  Query submitted at: 2018-02-07 16:58:04 (Coordinator: 
http://tarmstrong-box:25000)
  Query progress can be monitored at: 
http://tarmstrong-box:25000/query_plan?query_id=aa4422fefbd02bcc:1f7de5ac
  +---+
  | from_unixtime(1492677561, 'hh\'foo\'\'bar\'') |
  +---+
  | 08foobar  |
  +---+


http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.h@187
PS8, Line 187:   /// Finding the closing quote and getting the string between 
the quotes.
Can you comment that there is no way to escape single quotes?


http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.h@188
PS8, Line 188:   //  str -- start pointer of the opening quote when the 
function is called. Once the
nit: use /// for function comments.


http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@150
PS8, Line 150: SaveDateTimeToken
Maybe DateTimeFormatContext::AppendToken()


http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@150
PS8, Line 150: SaveDateTimeToken
I think this should be a method of DateTimeFormatContext instead of 
TimestampParser.


http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@155
PS8, Line 155: DCHECK
Use DCHECK_GE macro to get better info on failure.


http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@156
PS8, Line 156: push_back
You can use emplace_back(token_type, position_of_token, ...) instead.


http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@213
PS8, Line 213: dt_ctx->toks.push_back(DateTimeFormatToken(TZ_OFFSET, 
str - str_begin,
Can you convert these other locations in this function to use your new helper 
function? We should be consistent...


http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@518
PS8, Line 518:   DCHECK((tok.pos + shift_len + tok.len) <= str_len);
Use DCHECK_LE instead.



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 8
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 08 Feb 2018 00:59:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-02-02 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 8: Code-Review+1

Thanks for taking care of this. For me it's fine.


--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 8
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 02 Feb 2018 10:09:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-01-30 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 8:

(5 comments)

Applied the update.

http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.h@189
PS7, Line 189:   // function finishes str pointer is changed to point 
to the closing quote.
> This sentence is not needed in my opinion.
Done


http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.h@190
PS7, Line 190:   //  position_of_string_literal -- start position of the string 
literal.
> str is assumed to point to...
Done


http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.h@192
PS7, Line 192:   //  string_literal_begin -- start pointer of the string 
literal if there is a pair of
> some separator between the param name and it's description would be great.
Done


http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.cc@161
PS7, Line 161: const char** str, int* position_of_string_literal, int* 
length_of_string_literal,
> One thing I learned recently, that in Impala the out parameters are preferr
You're right. Actually, Tim already answered the similar question on this 
change: https://gerrit.cloudera.org/#/c/8770/2/be/src/exec/data-sink.cc


http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.cc@203
PS7, Line 203:   // str is to point to the closing quote and the 
incrementing points to the first
> It might worth a comment here that incrementing this would result in the st
Done



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 8
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 30 Jan 2018 09:19:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-01-30 Thread Kim Jin Chul (Code Review)
Hello Gabor Kaszab, Attila Jeges, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8508

to look at the new patch set (#8).

Change subject: IMPALA-5237: Support a quoted string in date/time format
..

IMPALA-5237: Support a quoted string in date/time format

Impala does not represent a quoted string at date/time format.
For example, addtional quoted string between the patterns
(e.g. HH\'H\' => 11H). Hive supports this feature, so user wants
to get a same result from Impala. By the way, Impala returns
a different result as below.

* Hive
> select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\'');
08H 39M 24S

* Impala
> select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\'');
08'8' 39'4' 24'0'

The change affects the format pattern for
from_unixtime/from_timestamp/unix_timestamp.

In unix_timestamp, user can also specify a quoted string like this.

> select unix_timestamp('\'Epoch time: \'19700101',
> '\'Epoch time: \'MMdd');
0

By the way, the quoted strings between input and format should be
exactly same and internally string comparison is case-sensitive.

There is one intentional difference between Hive and Impala.
In Impala, the format string should have any date or time patten
as below. Throwing the error/warning is better if Impala cannot
optimize the expression. User must rewrite the query and don't pay
the function call.

* Hive
> select from_unixtime(0, '\'Hello world!\'');
Hello world!

* Impala
> select from_unixtime(0, '\'Hello world!\'');
Bad date/time conversion format: 'Hello world!'

Testing:
Add unit tests: ExprTest.QuotedStringForDateTimeFormat

Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
3 files changed, 128 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/8508/8
--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 8
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-01-26 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 7:

(5 comments)

Thanks for getting back to this review! :)

One general comment: Try to avoid sending code rebase and new changes in the 
same patch set as it's makes reviewing the changes between the current and the 
previous patch set more difficult.

http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.h@189
PS7, Line 189:   //  This function returns three kinds of values via the output 
parameters.
This sentence is not needed in my opinion.


http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.h@190
PS7, Line 190:   //  str is to point to the opening quote when the function is 
called. Once the function
str is assumed to point to...
This seems more natural to me. What do you think?


http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.h@192
PS7, Line 192:   //  position_of_string_literal is start position of the string 
literal.
some separator between the param name and it's description would be great. e.g. 
check the function comments below.


http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.cc@161
PS7, Line 161: const char*& str, int& position_of_string_literal, int& 
length_of_string_literal,
One thing I learned recently, that in Impala the out parameters are preferred 
as a pointer instead of references.

Tim, can you confirm?


http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.cc@203
PS7, Line 203:   ++str;
It might worth a comment here that incrementing this would result in the str 
pointing to the first char after the closing quote. (I know it's mentioned in 
the comment for GetStringLiteralBetweenQuotes but it might increase readability 
repeating here.)



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 7
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jan 2018 14:22:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-01-25 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 7:

(5 comments)

Applied the update.

@Gabor, I am sorry to make you review it again after a long time. Now we can 
deliver some test cases for the mix of single and double quotes as you advised!

http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h@190
PS6, Line 190:   //  str is to point to the opening quote when the function is 
called. Once the function
> Let me reflect it on the next patch set.
Done


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@174
PS6, Line 174:   length_of_string_literal = str - quoted_string_begin - 1;
> Right, it should be a redundant. Let me remove this on the next patch set.
Done


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@176
PS6, Line 176:   // e.g. from_unixtime(0, '\'\'') => '
> Duplicate. Let me remove this on the next patch set.
Done


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@177
PS6, Line 177:   if (length_of_string_literal == 0) length_of_string_literal = 
1;
> Okay, let me reflect this on the next patch set.
Done


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@178
PS6, Line 178:   position_of_string_literal = str - str_begin - 
length_of_string_literal;
> Okay, I accept your approach.
Done



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 7
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jan 2018 06:09:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-01-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 6:

Sorry for the slow response. If I understand correctly, you can't add the test 
case because of IMPALA-3942. It looks like you have a fix for IMPALA-3942 so I 
think we should finish the review on that and merge it before this patch.


--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 6
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 21:38:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2017-12-12 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 6:

AFAIK with Impala we try to keep one code change in one patch. It happens to 
split it to different parts but usually it is to reduce the size of review, and 
the split happens when there are clear boundary between the sub-functionalities 
within the change.
My opinion is to hold on with this until the blocker is submitted.

Tim, what do you think?


--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 6
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 12 Dec 2017 10:30:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2017-12-11 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 6: Code-Review-1

(2 comments)

Currently I count not add the test case as you mentioned at the following 
change due to the issue at IMPALA-3942:
https://gerrit.cloudera.org/#/c/8508/6/be/src/exprs/expr-test.cc

We have two options. Which is better for you? Or any other option? Thanks.
1. Deliver a complete fix. Resolve IMPALA-3942 and then deliver this fix with a 
test case for mixing single and double quotes.
2. Deliver a first part of the fix which does not include the test case. 
Resolve IMPALA-3942 and then deliver a second part of the fix which includes 
the test case.

I prefer the second one because we can deliver fixes incrementally. There will 
be some time for discussion and providing a solution to resolve blocker.

http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h@190
PS6, Line 190:   //  nullptr if input does not have unclosed quote
> My suggestion would be:
Let me reflect it on the next patch set.


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@178
PS6, Line 178:   return quoted_string_begin + 1;
> Yes I do.
Okay, I accept your approach.



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 6
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 12 Dec 2017 03:31:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2017-12-11 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h@190
PS6, Line 190:   //  nullptr if input does not have unclosed quote
> I see. Your point is that the pointer can be changeable in the function. Wh
My suggestion would be:
"The expectation for str is to point to the opening quote when the function is 
called. Once the function finishes str pointer is changed to point to the 
closing quote."


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@178
PS6, Line 178:   return quoted_string_begin + 1;
> Do you think the function name(i.e. Get...) is still valid even though retu
Yes I do.



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 6
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 11 Dec 2017 13:29:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2017-12-11 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 6:

(6 comments)

Thanks for your answers.

http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h@190
PS6, Line 190:   //  nullptr if input does not have unclosed quote
> Before you call this function the str parameter points to the opening quote
I see. Your point is that the pointer can be changeable in the function. What 
do you think about if I will add a comment like this? Please feel free to 
modify the comment or suggest one.

The pointer of the str can be changeable because of advancing the str parameter 
to the closing quote.


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@161
PS6, Line 161: const char*& str, int& position_of_string_literal, int& 
length_of_string_literal) {
> Sure, thx for the explanation. I'm not sure we should prepare a function fo
Thanks for your understanding.


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@174
PS6, Line 174:   DCHECK(length_of_string_literal >= 0);
> Still I feel that making a CHECK on the same thing twice is an overkill. We
Right, it should be a redundant. Let me remove this on the next patch set.


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@176
PS6, Line 176:   DCHECK(position_of_string_literal >= 0);
Duplicate. Let me remove this on the next patch set.


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@178
PS6, Line 178:   return quoted_string_begin + 1;
> In fact you have 3 different values returned from this function. 2 of them
Do you think the function name(i.e. Get...) is still valid even though return 
value will move to an output parameter? I mean logically the function returns 
three kinds of outputs, but I guess a reader may think the function name looks 
weird because it should be void function.


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@198
PS6, Line 198:   if (UNLIKELY(!string_literal_begin)) return false;
> Could you explain me why we have DCHECKs for position_of_string_literal, an
As I mentioned at line#174 and line#176, they are not necessary. The checks at 
SaveDateTimeToken are enough. Let me remove them.



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 6
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 11 Dec 2017 10:45:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2017-12-11 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 6:

(5 comments)

Thanks for the explanation!

http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h@190
PS6, Line 190:   //  nullptr if input does not have unclosed quote
> Sorry, I could not catch what you meant. Would you please explain it with a
Before you call this function the str parameter points to the opening quote. 
When the function returns it points to the closing quote. I think you should 
mention this side effect (of moving the str pointer to the closing quote) in 
the function's comment.


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@161
PS6, Line 161: const char*& str, int& position_of_string_literal, int& 
length_of_string_literal) {
> I think SaveDateTimeToken is not suitable to calculate the place of the pos
Sure, thx for the explanation. I'm not sure we should prepare a function for 
general use even though currently we want to use to only for this purpose. But 
I accept your point.


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@174
PS6, Line 174:   DCHECK(length_of_string_literal >= 0);
Still I feel that making a CHECK on the same thing twice is an overkill. We can 
be sure, that SaveDateTimeToken is called right after this function so I think 
that making the DCHECKs there is sufficient.


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@178
PS6, Line 178:   return quoted_string_begin + 1;
In fact you have 3 different values returned from this function. 2 of them are 
output parameters and one is a return value. I think in this case making all 3 
of them as an output parameter would make sense.


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@198
PS6, Line 198:   if (UNLIKELY(!string_literal_begin)) return false;
Could you explain me why we have DCHECKs for position_of_string_literal, and 
length_of_string_literal while we make an if-check on string_literal_begin?

I think that could also be a DCHECK and can be done inside SaveSateTimeToken.



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 6
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 11 Dec 2017 10:19:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2017-12-07 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 6:

(7 comments)

Thanks, for answering/incorporating my comments!

http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/exprs/expr-test.cc@5980
PS6, Line 5980:   TestStringValue("from_unixtime(1492677561, 'HH\\'H\\'')", 
"08H");
I checked in Hive and it kind of handles nested quotes like:

select from_unixtime(1492677561, "HH'H\"H\"' mm'm' ss's'");
01H"H" 39m 21s

Would it be possible to add a testcase to cover this?


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@151
PS6, Line 151: const DateTimeFormatTokenType token_type, const int 
position_of_token,
SEPARATOR as token_type is not necessary as a param.

dt_ctx, token_begin, token_length are sufficient as parameters as 
position_of_token could be calculated inside the function based on these, right?


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@153
PS6, Line 153:   DCHECK(token_begin != nullptr);
2 of these DCHECKs are already done in the other function. I think it's enough 
to do the only once, e.g. here.


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@161
PS6, Line 161: const char*& str, int& position_of_string_literal, int& 
length_of_string_literal) {
Thanks for making these adjustments!
Few more comments:

I think this function should provide a pointer to the start of the string 
literal, and the length of the str (and as a side effect advancing the str 
pointer to the closing quote, that fact should be documented in the header 
file).

SaveDateTimeToken seems a better place to calculate the 
position_of_string_literal.

What do you think?


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@163
PS6, Line 163:   const char* str_end = str_begin + dt_ctx->fmt_len;
Can't we pass the str_end pointer to this function? Then we could get rid of 
dt_ctx param (I think str_begin is not needed in this function).


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@177
PS6, Line 177:   ++str;
At the end of the excution of this function I feel that it's a better approach 
to have the str pointing to the finishing quote.
Before making a 'continue' it can be increased to the next char.

My reasoning would be, that a function to get a string literal between quotes 
shouldn't be aware whether there are more characters after the finishing quote 
or not. It should only care about the chars between the quotes.


http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@199
PS6, Line 199:   SaveDateTimeToken(dt_ctx, SEPARATOR, 
position_of_string_literal,
No need to pass the separator as a parameter, it is accessible inside the 
function too.



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 6
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 07 Dec 2017 09:42:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2017-12-06 Thread Kim Jin Chul (Code Review)
Hello Gabor Kaszab, Attila Jeges, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8508

to look at the new patch set (#6).

Change subject: IMPALA-5237: Support a quoted string in date/time format
..

IMPALA-5237: Support a quoted string in date/time format

Impala does not represent a quoted string at date/time format.
For example, addtional quoted string between the patterns
(e.g. HH\'H\' => 11H). Hive supports this feature, so user wants
to get a same result from Impala. By the way, Impala returns
a different result as below.

* Hive
> select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\'');
08H 39M 24S

* Impala
> select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\'');
08'8' 39'4' 24'0'

The change affects the format pattern for
from_unixtime/from_timestamp/unix_timestamp.

In unix_timestamp, user can also specify a quoted string like this.

> select unix_timestamp('\'Epoch time: \'19700101',
> '\'Epoch time: \'MMdd');
0

By the way, the quoted strings between input and format should be
exactly same and internally string comparison is case-sensitive.

There is one intentional difference between Hive and Impala.
In Impala, the format string should have any date or time patten
as below. Throwing the error/warning is better if Impala cannot
optimize the expression. User must rewrite the query and don't pay
the function call.

* Hive
> select from_unixtime(0, '\'Hello world!\'');
Hello world!

* Impala
> select from_unixtime(0, '\'Hello world!\'');
Bad date/time conversion format: 'Hello world!'

Testing:
Add unit tests for working and nonworking cases

Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
3 files changed, 98 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/8508/6
--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 6
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2017-12-06 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h@183
PS4, Line 183:   /// dt_ctx -- date/time format context
> Actually I don't want leave meaningless comments. I think the parameter nam
I meant "the parameter names" are in new code in the recent patch set. Please 
feel free to tell me if you have a better name or more informative comment.



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 4
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Dec 2017 11:44:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2017-12-06 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5746
PS3, Line 5746:   TestValue("unix_timestamp('\\'Epoch time: \\'1970|01|01 
00|00|00',\
> Thanks for the explanation!
I guess your question is that the result seems to be wrong even though the 
format is find and everything is okey to run.  Timezone can affect epoch 
timestamp. Impala interprets date and time with UTC timezone by default due to 
avoidance of undesired results by local timezone.
- Regarding default timezone: 
https://github.com/apache/impala/blob/master/docs/topics/impala_timestamp.xml#L89
- Regarding data and time interpretation at unix_timestamp: 
https://github.com/apache/impala/blob/master/docs/topics/impala_datetime_functions.xml#L2463
- Test case for epoch: 
https://github.com/apache/impala/blob/master/be/src/exprs/expr-test.cc#L5617

Therefore, I think the result should be zero.


http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h@182
PS4, Line 182: Reset(fmt, fmt_len);
> 1: This comment doesn't add  any further info as it is basically the functi
Thanks for your kind comment.
1,2: I agree. Actually I thought I should have to add the meaningless comment, 
but I didn't imagine more meaningful comment and I think the function name 
looks self-descriptive.
3. More detailed explanation is added to return description instead of the word 
"parse".
4. I agree. Reader expects a string literal due to "Get", so the revised 
function will return a pointer of string literal. The function is split into 
two parts: GetStringLiteralBetweenQuotes and SaveDateTimeToken. I think 
SaveDateTimeToken will be used generally when refactoring ParseFormatTokens or 
adding a new code.

I fully agree an unit function should have a minimal purpose and responsibility.


http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h@183
PS4, Line 183:   }
> Again, this is simply writing the type of the parameter with spaces.
Actually I don't want leave meaningless comments. I think the parameter names 
are self-explainable, but please leave a comment as a new reader if you think 
any additional information should be helpful.



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 3
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Dec 2017 11:41:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2017-12-06 Thread Kim Jin Chul (Code Review)
Hello Gabor Kaszab, Attila Jeges, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8508

to look at the new patch set (#5).

Change subject: IMPALA-5237: Support a quoted string in date/time format
..

IMPALA-5237: Support a quoted string in date/time format

Impala does not represent a quoted string at date/time format.
For example, addtional quoted string between the patterns
(e.g. HH\'H\' => 11H). Hive supports this feature, so user wants
to get a same result from Impala. By the way, Impala returns
a different result as below.

* Hive
> select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\'');
08H 39M 24S

* Impala
> select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\'');
08'8' 39'4' 24'0'

The change affects the format pattern for
from_unixtime/from_timestamp/unix_timestamp.

In unix_timestamp, user can also specify a quoted string like this.

> select unix_timestamp('\'Epoch time: \'19700101',
> '\'Epoch time: \'MMdd');
0

By the way, the quoted strings between input and format should be
exactly same and internally string comparison is case-sensitive.

There is one intentional difference between Hive and Impala.
In Impala, the format string should have any date or time patten
as below. Throwing the error/warning is better if Impala cannot
optimize the expression. User must rewrite the query and don't pay
the function call.

* Hive
> select from_unixtime(0, '\'Hello world!\'');
Hello world!

* Impala
> select from_unixtime(0, '\'Hello world!\'');
Bad date/time conversion format: 'Hello world!'

Testing:
Add unit tests for working and nonworking cases

Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
3 files changed, 98 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/8508/5
--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 5
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2017-12-06 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5746
PS3, Line 5746:   TestIsNull("months_between('23:33:45','15:33:45')", 
TYPE_DOUBLE);
> I guess you may expect "Epoch time: 1970|01|01 00|00|00" as a result, unix_
Thanks for the explanation!

One more question: If we decide not to return Null here, then wouldn't it make 
sense to return the value as if that "Epoch time: " something wasn't there?

I have a feeling that we should either return Null, or return the value 
indicated by the given parameters, so in this case I think this query should 
return the same as the following:
unix_timestamp("1970|01|01 00|00|00","|MM|dd HH|mm|ss");
28800

Could you please share your thoughts?


http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5764
PS3, Line 5764:   TestIsNull("dayofweek(cast('09:10:11.00' as timestamp))", 
TYPE_INT);
> As I explained above, I think returning NULL for the mismatched case is eno
Thanks again for the explanation!

Using TestIsNull here sounds fine.


http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h@182
PS4, Line 182:   /// Get string literal between quotes
1: This comment doesn't add  any further info as it is basically the function 
name with spaces.

2: The comment should tell the reader if there is some precondition for this 
function. In this case I think it would be beneficial to mention that we assume 
that str is currently pointing to the opening quote.

3: For me it would also make sense to say explicitly what successful and 
unsuccessful parse mean. e.g. parsing is unsuccessful if there is no closing 
quote.

4: The 'Get' in the function name indicates that when we call this function it 
either returns the found string literal as a return value or through an out 
parameter. I think this is a bit misleading as in fact this function is for 2 
different things (but not for returning a string literal):
  - Finding the closing quote and getting the string between the 2 quotes.
  - Adding a DateTimeFormatToken to dt_ctx.

My preference here would be to introduce functions with a single area of 
responsibility: one for finding the string between the quotes and one for 
inserting the FormatToken to dt_ctx.

What do you think?


http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h@183
PS4, Line 183:   /// dt_ctx -- date/time format context
Again, this is simply writing the type of the parameter with spaces.



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 4
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Dec 2017 09:42:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2017-12-05 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5746
PS3, Line 5746:   TestValue("unix_timestamp('\\'Epoch time: \\'1970|01|01 
00|00|00',\
> I checked and Hive in fact returns NULL for this input.
I guess you may expect "Epoch time: 1970|01|01 00|00|00" as a result, 
unix_timestamp is not suitable because it returns bigint. I think two kinds of 
results are only available: zero or NULL

As you mentioned, Hive returns NULL because the quoted string might be 
recognized as an unexpected format string.

The string literal in the quotes is meaningless in unix_timestamp function, but 
I think an user wants to add some description and it should not affect 
returning a result. The flexibility might be helpful to an user even though a 
difference happens between Impala and Hive. Therefore, the result should be 
zero in Impala.


http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5756
PS3, Line 5756:   // 2) Not allowd unclosed quoted string
> nit: typo: allowed
Done


http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5764
PS3, Line 5764:   TestValue("unix_timestamp('\\'Epoch time: \\'1970|01|01 
00|00|00',\
> There is one more thing I don't understand: Regardless of giving this forma
As I explained above, I think returning NULL for the mismatched case is enough. 
I would like to prefer NULL instead of throwing an error.

There are two issues.
1) The expected result is NULL because the input does not fit to the 
format(e.g. Epoch and Enoch). I think returning NULL is enough for happening a 
mismatch internally.

2) I should have to use TestIsNull instead of TestValue. It's my mistake when I 
copied from the above. Let me fix this. By the way, there is a different 
problem in our test framework. I guess the following test should fail: 
TestValue(cast(null as int), TYPE_INT, 0)
As you know, NULL cannot be compared with any value because it is unknown, so 
it should not be accept. I filed the issue on IMPALA-6283.


http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.h@77
PS3, Line 77: /// IMPALA-5237: Support a quoted string in date/time format
> I have the feeling that copying the whole commit message here is a bit of a
Done


http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.cc@163
PS3, Line 163: if (*str == '\'') {
> I would find this part of code a bit more self-descriptive if it was in the
Done


http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.cc@173
PS3, Line 173:   if (len == 0) len = 1;
> The naming of len, val and pos aren't that self-explanatory for me. Would i
Done



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 3
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Dec 2017 07:56:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2017-12-05 Thread Kim Jin Chul (Code Review)
Hello Gabor Kaszab, Attila Jeges, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8508

to look at the new patch set (#4).

Change subject: IMPALA-5237: Support a quoted string in date/time format
..

IMPALA-5237: Support a quoted string in date/time format

Impala does not represent a quoted string at date/time format.
For example, addtional quoted string between the patterns
(e.g. HH\'H\' => 11H). Hive supports this feature, so user wants
to get a same result from Impala. By the way, Impala returns
a different result as below.

* Hive
> select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\'');
08H 39M 24S

* Impala
> select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\'');
08'8' 39'4' 24'0'

The change affects the format pattern for
from_unixtime/from_timestamp/unix_timestamp.

In unix_timestamp, user can also specify a quoted string like this.

> select unix_timestamp('\'Epoch time: \'19700101',
> '\'Epoch time: \'MMdd');
0

By the way, the quoted strings between input and format should be
exactly same and internally string comparison is case-sensitive.

There is one intentional difference between Hive and Impala.
In Impala, the format string should have any date or time patten
as below. Throwing the error/warning is better if Impala cannot
optimize the expression. User must rewrite the query and don't pay
the function call.

* Hive
> select from_unixtime(0, '\'Hello world!\'');
Hello world!

* Impala
> select from_unixtime(0, '\'Hello world!\'');
Bad date/time conversion format: 'Hello world!'

Testing:
Add unit tests for working and nonworking cases

Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
3 files changed, 81 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/8508/4
--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 4
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2017-12-05 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5746
PS3, Line 5746:   TestValue("unix_timestamp('\\'Epoch time: \\'1970|01|01 
00|00|00',\
I checked and Hive in fact returns NULL for this input.

To be honest I was expecting a different result containing the given string 
plus the time in the desired format. (instead of NULL or zero)
e.g. in this case:
"Epoch time: 1970|01|01 00|00|00"

Am I missing something? Shouldn't this be the expected result?


http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5756
PS3, Line 5756:   // 2) Not allowd unclosed quoted string
nit: typo: allowed


http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5764
PS3, Line 5764:   TestValue("unix_timestamp('\\'Epoch time: \\'1970|01|01 
00|00|00',\
There is one more thing I don't understand: Regardless of giving this format 
correctly or with a mismatch with the string literals we still expect the query 
to return the same result (bigint zero) without any of them returning an error.

Shouldn't this one return an error?


http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.h@77
PS3, Line 77: /// IMPALA-5237: Support a quoted string in date/time format
I have the feeling that copying the whole commit message here is a bit of an 
overkill.

Would it be possible to wrap this up and have a few lines of description about 
the new functionality.
In addition I think it's not necessary to mention here how Hive handles these 
cases. We should focus here on how Impala handles them.


http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.cc@163
PS3, Line 163: if (*str == '\'') {
I would find this part of code a bit more self-descriptive if it was in the 
form of a series of function calls. It might be just myself but for me figuring 
out the intent behind these pointer manipulations is something that takes some 
time.

This is what I mean:
if(*str == '\'') {
  GetClosingQuote(str);
  /* +some error handling if closing not found */
  SaveTokenWhaterver(string_literal, str);
  /* Or whatever parameter is essential to this function in order to calculate 
len, pos, and val*/
}

What is your opinion?


http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.cc@173
PS3, Line 173:   if (len == 0) len = 1;
The naming of len, val and pos aren't that self-explanatory for me. Would it be 
possible to rename them for something more informative?



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 3
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 05 Dec 2017 14:52:43 +
Gerrit-HasComments: Yes