[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Tim Armstrong has abandoned this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Abandoned I'm going to mark this as abandoned for now so that it doesn't show up in open CR searches. Feel free to revive it if needed. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 10 Gerrit-Owner: Jinchul Kim Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Jinchul Kim Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc@175 PS9, Line 175: // quote due to the compatibility with Hive's behavior. > Why do we display a single quote in the middle of the string if we have to > split "'foo''bar'" into 'foo' and 'bar'? The reason this PS and the previous produced the correct result is that they split foo''bar into two tokens foo' and bar. > Could you please explain why we cannot convert into a single token without > creating a new string and copying the data? If we need a pair of alloc and > memcpy for new tokens, this is not related to the number of broken tokens. Suppose you want to take the input string foo''bar and convert the '' escape sequence to '. If you want to produce a single output token, you need it to be foo'bar, which which you can't represent as a pointer into the original string + a length. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 9 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 06 Mar 2018 01:43:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 9: (1 comment) Thanks for your comment. http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc@175 PS9, Line 175: // quote due to the compatibility with Hive's behavior. > I think we still need a comment like my above one, since we are still treat I had thought "'foo''bar'" should be broken into two tokens as you mentioned until I realized foo'bar should be expected string(not foobar). Now I believe the escaped single quotes should be in a token for single quoted string. Why do we display a single quote in the middle of the string if we have to split "'foo''bar'" into 'foo' and 'bar'? Could you please explain why we cannot convert into a single token without creating a new string and copying the data? If we need a pair of alloc and memcpy for new tokens, this is not related to the number of broken tokens. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 9 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 05 Mar 2018 00:45:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 10: (1 comment) Sorry for the slow turnaround time here - things got crazy for a while. http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc@175 PS9, Line 175: // e.g. '''foo''bar''' => 'foo'bar' > Let me provide a more intuitive algorithm to find escaped single quotes bet I think we still need a comment like my above one, since we are still treating the last single quote found as the start of a new string. To be honest I think the new approach less intuitive, because we still split the literal token up into multiple tokens if there are escaped quotes in the middle of the string. It seems like the approach only changes when dealing with quotes at the end of the string. E.g. my understanding is that "'foo''bar'" is still broken into two tokens, "foo'" and "bar", whereas "foobar'''" is handled as a single token "foo'". I think fundamentally we need to do this token-splitting because of the memory management - we can't convert "'foo''bar'" into a single token 'foo'bar' without creating a new string and copying the data. So my preference is to use the previous approach and explain it clearly. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 10 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 03 Mar 2018 01:09:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 10: (5 comments) Thanks for your comments. I've revised the algorithm to find escaped single quotes. I think this is more intuitive and readable than previous version. http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/exprs/expr-test.cc@6631 PS9, Line 6631: // Test that pairs of single quotes are treated as an escape sequence. > Can you add a comment, e.g. Done http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/exprs/expr-test.cc@6632 PS9, Line 6632: TestStringValue(R"(from_unixtime(0, 'H\'foo\'\'bar\'H'))", "0foo'bar0"); > Can you also test the case when the escaped quote is at the end of the stri Done http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc@86 PS9, Line 86: > nit: remove extra line Done http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc@175 PS9, Line 175: // e.g. '''foo''bar''' => 'foo'bar' > This is a little tricky and needs some explanation. I'd suggest something l Let me provide a more intuitive algorithm to find escaped single quotes between single quotes. http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc@177 PS9, Line 177: q > nit: spaces around +, i.e. should be " + 1" Done -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 10 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 22 Feb 2018 02:31:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Hello Gabor Kaszab, Attila Jeges, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8508 to look at the new patch set (#10). Change subject: IMPALA-5237: Support a quoted string in date/time format .. IMPALA-5237: Support a quoted string in date/time format Impala does not represent a quoted string at date/time format. For example, addtional quoted string between the patterns (e.g. HH\'H\' => 11H). Hive supports this feature, so user wants to get a same result from Impala. By the way, Impala returns a different result as below. * Hive > select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\''); 08H 39M 24S * Impala > select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\''); 08'8' 39'4' 24'0' The change affects the format pattern for from_unixtime/from_timestamp/unix_timestamp. In unix_timestamp, user can also specify a quoted string like this. > select unix_timestamp('\'Epoch time: \'19700101', > '\'Epoch time: \'MMdd'); 0 By the way, the quoted strings between input and format should be exactly same and internally string comparison is case-sensitive. There is one intentional difference between Hive and Impala. In Impala, the format string should have any date or time patten as below. Throwing the error/warning is better if Impala cannot optimize the expression. User must rewrite the query and don't pay the function call. * Hive > select from_unixtime(0, '\'Hello world!\''); Hello world! * Impala > select from_unixtime(0, '\'Hello world!\''); Bad date/time conversion format: 'Hello world!' Testing: Add unit tests: ExprTest.QuotedStringForDateTimeFormat Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 144 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/8508/10 -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 10 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 9: (5 comments) Thank you for the fix. Just had a final round of comments to make it a bit easier to understand. http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/exprs/expr-test.cc@6631 PS9, Line 6631: TestStringValue(R"(from_unixtime(0, 'H\'foo\'\'bar\'H'))", "0foo'bar0"); Can you add a comment, e.g. // Test that pairs of single quotes are treated as an escape sequence. http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/exprs/expr-test.cc@6632 PS9, Line 6632: TestStringValue(R"(from_unixtime(0, 'H\'foo\'H\'bar\'H'))", "0foo0bar0"); Can you also test the case when the escaped quote is at the end of the string, e.g. 'foobar''' http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc@86 PS9, Line 86: nit: remove extra line http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc@175 PS9, Line 175: // quote due to the compatibility with Hive's behavior. This is a little tricky and needs some explanation. I'd suggest something like: "We treat the first of the pair of quotes as the last character of the current string literal and continue parsing from the second quote, treating it as the start of a new string literal. This avoids the need to copy the string literal." http://gerrit.cloudera.org:8080/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc@177 PS9, Line 177: +1 nit: spaces around +, i.e. should be " + 1" -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 9 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 20 Feb 2018 23:32:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 1: (10 comments) Thanks for the comments. I've applied the update. http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/exprs/expr-test.cc@6622 PS8, Line 6622: int expected_var_begin, const map>& expected_offsets) { > I found a case where your implementation is inconsistent with hive. It look Done. Thanks for finding the corner case. The case is added. http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.h@187 PS8, Line 187: /// len -- length of the string to parse (must be > 0) > This might change if you fix the inconsistency I mentioned. As you mentioned in the corner case, I've fixed the inconsistency on the last patch set. See https://gerrit.cloudera.org/#/c/8508/9/be/src/runtime/timestamp-parse-util.cc (#line 177) http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.h@188 PS8, Line 188: /// dt_ctx -- date/time format context (must contain valid tokens) > nit: use /// for function comments. Done http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@150 PS8, Line 150: ParseFormatTokens > I think this should be a method of DateTimeFormatContext instead of Timesta Done http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@150 PS8, Line 150: ParseFormatTokens > Maybe DateTimeFormatContext::AppendToken() Done http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@155 PS8, Line 155: const > Use DCHECK_GE macro to get better info on failure. Done http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@156 PS8, Line 156: tr_end = > You can use emplace_back(token_type, position_of_token, ...) instead. Done http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@213 PS8, Line 213: while (curr_tok_chr < str_end) { > Can you convert these other locations in this function to use your new help Done http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@518 PS8, Line 518: boost::unordered_map::const_iterator iter = > Use DCHECK_LE instead. Done http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@519 PS8, Line 519: TH_INDE > I think using memcmp() would be better, since strncmp() is meant for null-t Done -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 1 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 19 Feb 2018 04:38:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Hello Gabor Kaszab, Attila Jeges, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8508 to look at the new patch set (#9). Change subject: IMPALA-5237: Support a quoted string in date/time format .. IMPALA-5237: Support a quoted string in date/time format Impala does not represent a quoted string at date/time format. For example, addtional quoted string between the patterns (e.g. HH\'H\' => 11H). Hive supports this feature, so user wants to get a same result from Impala. By the way, Impala returns a different result as below. * Hive > select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\''); 08H 39M 24S * Impala > select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\''); 08'8' 39'4' 24'0' The change affects the format pattern for from_unixtime/from_timestamp/unix_timestamp. In unix_timestamp, user can also specify a quoted string like this. > select unix_timestamp('\'Epoch time: \'19700101', > '\'Epoch time: \'MMdd'); 0 By the way, the quoted strings between input and format should be exactly same and internally string comparison is case-sensitive. There is one intentional difference between Hive and Impala. In Impala, the format string should have any date or time patten as below. Throwing the error/warning is better if Impala cannot optimize the expression. User must rewrite the query and don't pay the function call. * Hive > select from_unixtime(0, '\'Hello world!\''); Hello world! * Impala > select from_unixtime(0, '\'Hello world!\''); Bad date/time conversion format: 'Hello world!' Testing: Add unit tests: ExprTest.QuotedStringForDateTimeFormat Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 135 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/8508/9 -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 9 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@519 PS8, Line 519: strncmp I think using memcmp() would be better, since strncmp() is meant for null-terminated strings, which ours arent. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 8 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 08 Feb 2018 01:03:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.h@187 PS8, Line 187: /// Finding the closing quote and getting the string between the quotes. > Can you comment that there is no way to escape single quotes? This might change if you fix the inconsistency I mentioned. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 8 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 08 Feb 2018 01:01:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 8: (9 comments) http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/exprs/expr-test.cc@6622 PS8, Line 6622: TEST_F(ExprTest, QuotedStringForDateTimeFormat) { I found a case where your implementation is inconsistent with hive. It looks like hive treats '' inside quotes as an escape sequence for '. hive> select from_unixtime(1492677561, "HH'foo\'\'bar\'"); OK 01foo'bar [localhost:21000] > select from_unixtime(1492677561, "HH'foo\'\'bar\'"); Query: select from_unixtime(1492677561, "HH'foo\'\'bar\'") Query submitted at: 2018-02-07 16:58:04 (Coordinator: http://tarmstrong-box:25000) Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=aa4422fefbd02bcc:1f7de5ac +---+ | from_unixtime(1492677561, 'hh\'foo\'\'bar\'') | +---+ | 08foobar | +---+ http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.h@187 PS8, Line 187: /// Finding the closing quote and getting the string between the quotes. Can you comment that there is no way to escape single quotes? http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.h@188 PS8, Line 188: // str -- start pointer of the opening quote when the function is called. Once the nit: use /// for function comments. http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@150 PS8, Line 150: SaveDateTimeToken Maybe DateTimeFormatContext::AppendToken() http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@150 PS8, Line 150: SaveDateTimeToken I think this should be a method of DateTimeFormatContext instead of TimestampParser. http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@155 PS8, Line 155: DCHECK Use DCHECK_GE macro to get better info on failure. http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@156 PS8, Line 156: push_back You can use emplace_back(token_type, position_of_token, ...) instead. http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@213 PS8, Line 213: dt_ctx->toks.push_back(DateTimeFormatToken(TZ_OFFSET, str - str_begin, Can you convert these other locations in this function to use your new helper function? We should be consistent... http://gerrit.cloudera.org:8080/#/c/8508/8/be/src/runtime/timestamp-parse-util.cc@518 PS8, Line 518: DCHECK((tok.pos + shift_len + tok.len) <= str_len); Use DCHECK_LE instead. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 8 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 08 Feb 2018 00:59:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 8: Code-Review+1 Thanks for taking care of this. For me it's fine. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 8 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 02 Feb 2018 10:09:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 8: (5 comments) Applied the update. http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.h@189 PS7, Line 189: // function finishes str pointer is changed to point to the closing quote. > This sentence is not needed in my opinion. Done http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.h@190 PS7, Line 190: // position_of_string_literal -- start position of the string literal. > str is assumed to point to... Done http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.h@192 PS7, Line 192: // string_literal_begin -- start pointer of the string literal if there is a pair of > some separator between the param name and it's description would be great. Done http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.cc@161 PS7, Line 161: const char** str, int* position_of_string_literal, int* length_of_string_literal, > One thing I learned recently, that in Impala the out parameters are preferr You're right. Actually, Tim already answered the similar question on this change: https://gerrit.cloudera.org/#/c/8770/2/be/src/exec/data-sink.cc http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.cc@203 PS7, Line 203: // str is to point to the closing quote and the incrementing points to the first > It might worth a comment here that incrementing this would result in the st Done -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 8 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Jan 2018 09:19:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Hello Gabor Kaszab, Attila Jeges, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8508 to look at the new patch set (#8). Change subject: IMPALA-5237: Support a quoted string in date/time format .. IMPALA-5237: Support a quoted string in date/time format Impala does not represent a quoted string at date/time format. For example, addtional quoted string between the patterns (e.g. HH\'H\' => 11H). Hive supports this feature, so user wants to get a same result from Impala. By the way, Impala returns a different result as below. * Hive > select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\''); 08H 39M 24S * Impala > select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\''); 08'8' 39'4' 24'0' The change affects the format pattern for from_unixtime/from_timestamp/unix_timestamp. In unix_timestamp, user can also specify a quoted string like this. > select unix_timestamp('\'Epoch time: \'19700101', > '\'Epoch time: \'MMdd'); 0 By the way, the quoted strings between input and format should be exactly same and internally string comparison is case-sensitive. There is one intentional difference between Hive and Impala. In Impala, the format string should have any date or time patten as below. Throwing the error/warning is better if Impala cannot optimize the expression. User must rewrite the query and don't pay the function call. * Hive > select from_unixtime(0, '\'Hello world!\''); Hello world! * Impala > select from_unixtime(0, '\'Hello world!\''); Bad date/time conversion format: 'Hello world!' Testing: Add unit tests: ExprTest.QuotedStringForDateTimeFormat Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 128 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/8508/8 -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 8 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 7: (5 comments) Thanks for getting back to this review! :) One general comment: Try to avoid sending code rebase and new changes in the same patch set as it's makes reviewing the changes between the current and the previous patch set more difficult. http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.h@189 PS7, Line 189: // This function returns three kinds of values via the output parameters. This sentence is not needed in my opinion. http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.h@190 PS7, Line 190: // str is to point to the opening quote when the function is called. Once the function str is assumed to point to... This seems more natural to me. What do you think? http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.h@192 PS7, Line 192: // position_of_string_literal is start position of the string literal. some separator between the param name and it's description would be great. e.g. check the function comments below. http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.cc@161 PS7, Line 161: const char*& str, int& position_of_string_literal, int& length_of_string_literal, One thing I learned recently, that in Impala the out parameters are preferred as a pointer instead of references. Tim, can you confirm? http://gerrit.cloudera.org:8080/#/c/8508/7/be/src/runtime/timestamp-parse-util.cc@203 PS7, Line 203: ++str; It might worth a comment here that incrementing this would result in the str pointing to the first char after the closing quote. (I know it's mentioned in the comment for GetStringLiteralBetweenQuotes but it might increase readability repeating here.) -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 7 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 26 Jan 2018 14:22:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 7: (5 comments) Applied the update. @Gabor, I am sorry to make you review it again after a long time. Now we can deliver some test cases for the mix of single and double quotes as you advised! http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h@190 PS6, Line 190: // str is to point to the opening quote when the function is called. Once the function > Let me reflect it on the next patch set. Done http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@174 PS6, Line 174: length_of_string_literal = str - quoted_string_begin - 1; > Right, it should be a redundant. Let me remove this on the next patch set. Done http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@176 PS6, Line 176: // e.g. from_unixtime(0, '\'\'') => ' > Duplicate. Let me remove this on the next patch set. Done http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@177 PS6, Line 177: if (length_of_string_literal == 0) length_of_string_literal = 1; > Okay, let me reflect this on the next patch set. Done http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@178 PS6, Line 178: position_of_string_literal = str - str_begin - length_of_string_literal; > Okay, I accept your approach. Done -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 7 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 26 Jan 2018 06:09:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Hello Gabor Kaszab, Attila Jeges, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8508 to look at the new patch set (#7). Change subject: IMPALA-5237: Support a quoted string in date/time format .. IMPALA-5237: Support a quoted string in date/time format Impala does not represent a quoted string at date/time format. For example, addtional quoted string between the patterns (e.g. HH\'H\' => 11H). Hive supports this feature, so user wants to get a same result from Impala. By the way, Impala returns a different result as below. * Hive > select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\''); 08H 39M 24S * Impala > select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\''); 08'8' 39'4' 24'0' The change affects the format pattern for from_unixtime/from_timestamp/unix_timestamp. In unix_timestamp, user can also specify a quoted string like this. > select unix_timestamp('\'Epoch time: \'19700101', > '\'Epoch time: \'MMdd'); 0 By the way, the quoted strings between input and format should be exactly same and internally string comparison is case-sensitive. There is one intentional difference between Hive and Impala. In Impala, the format string should have any date or time patten as below. Throwing the error/warning is better if Impala cannot optimize the expression. User must rewrite the query and don't pay the function call. * Hive > select from_unixtime(0, '\'Hello world!\''); Hello world! * Impala > select from_unixtime(0, '\'Hello world!\''); Bad date/time conversion format: 'Hello world!' Testing: Add unit tests: ExprTest.QuotedStringForDateTimeFormat Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 128 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/8508/7 -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 7 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 6: The problem at IMPALA-3942 has been resolved. Hence, let me continue this task and provide a new patch set soon. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 6 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 24 Jan 2018 03:02:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 6: Sorry for the slow response. If I understand correctly, you can't add the test case because of IMPALA-3942. It looks like you have a fix for IMPALA-3942 so I think we should finish the review on that and merge it before this patch. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 6 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 05 Jan 2018 21:38:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 6: AFAIK with Impala we try to keep one code change in one patch. It happens to split it to different parts but usually it is to reduce the size of review, and the split happens when there are clear boundary between the sub-functionalities within the change. My opinion is to hold on with this until the blocker is submitted. Tim, what do you think? -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 6 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 12 Dec 2017 10:30:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 6: Code-Review-1 (2 comments) Currently I count not add the test case as you mentioned at the following change due to the issue at IMPALA-3942: https://gerrit.cloudera.org/#/c/8508/6/be/src/exprs/expr-test.cc We have two options. Which is better for you? Or any other option? Thanks. 1. Deliver a complete fix. Resolve IMPALA-3942 and then deliver this fix with a test case for mixing single and double quotes. 2. Deliver a first part of the fix which does not include the test case. Resolve IMPALA-3942 and then deliver a second part of the fix which includes the test case. I prefer the second one because we can deliver fixes incrementally. There will be some time for discussion and providing a solution to resolve blocker. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h@190 PS6, Line 190: // nullptr if input does not have unclosed quote > My suggestion would be: Let me reflect it on the next patch set. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@178 PS6, Line 178: return quoted_string_begin + 1; > Yes I do. Okay, I accept your approach. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 6 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 12 Dec 2017 03:31:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h@190 PS6, Line 190: // nullptr if input does not have unclosed quote > I see. Your point is that the pointer can be changeable in the function. Wh My suggestion would be: "The expectation for str is to point to the opening quote when the function is called. Once the function finishes str pointer is changed to point to the closing quote." http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@178 PS6, Line 178: return quoted_string_begin + 1; > Do you think the function name(i.e. Get...) is still valid even though retu Yes I do. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 6 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 11 Dec 2017 13:29:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 6: (6 comments) Thanks for your answers. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h@190 PS6, Line 190: // nullptr if input does not have unclosed quote > Before you call this function the str parameter points to the opening quote I see. Your point is that the pointer can be changeable in the function. What do you think about if I will add a comment like this? Please feel free to modify the comment or suggest one. The pointer of the str can be changeable because of advancing the str parameter to the closing quote. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@161 PS6, Line 161: const char*& str, int& position_of_string_literal, int& length_of_string_literal) { > Sure, thx for the explanation. I'm not sure we should prepare a function fo Thanks for your understanding. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@174 PS6, Line 174: DCHECK(length_of_string_literal >= 0); > Still I feel that making a CHECK on the same thing twice is an overkill. We Right, it should be a redundant. Let me remove this on the next patch set. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@176 PS6, Line 176: DCHECK(position_of_string_literal >= 0); Duplicate. Let me remove this on the next patch set. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@178 PS6, Line 178: return quoted_string_begin + 1; > In fact you have 3 different values returned from this function. 2 of them Do you think the function name(i.e. Get...) is still valid even though return value will move to an output parameter? I mean logically the function returns three kinds of outputs, but I guess a reader may think the function name looks weird because it should be void function. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@198 PS6, Line 198: if (UNLIKELY(!string_literal_begin)) return false; > Could you explain me why we have DCHECKs for position_of_string_literal, an As I mentioned at line#174 and line#176, they are not necessary. The checks at SaveDateTimeToken are enough. Let me remove them. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 6 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 11 Dec 2017 10:45:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 6: (5 comments) Thanks for the explanation! http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h@190 PS6, Line 190: // nullptr if input does not have unclosed quote > Sorry, I could not catch what you meant. Would you please explain it with a Before you call this function the str parameter points to the opening quote. When the function returns it points to the closing quote. I think you should mention this side effect (of moving the str pointer to the closing quote) in the function's comment. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@161 PS6, Line 161: const char*& str, int& position_of_string_literal, int& length_of_string_literal) { > I think SaveDateTimeToken is not suitable to calculate the place of the pos Sure, thx for the explanation. I'm not sure we should prepare a function for general use even though currently we want to use to only for this purpose. But I accept your point. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@174 PS6, Line 174: DCHECK(length_of_string_literal >= 0); Still I feel that making a CHECK on the same thing twice is an overkill. We can be sure, that SaveDateTimeToken is called right after this function so I think that making the DCHECKs there is sufficient. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@178 PS6, Line 178: return quoted_string_begin + 1; In fact you have 3 different values returned from this function. 2 of them are output parameters and one is a return value. I think in this case making all 3 of them as an output parameter would make sense. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@198 PS6, Line 198: if (UNLIKELY(!string_literal_begin)) return false; Could you explain me why we have DCHECKs for position_of_string_literal, and length_of_string_literal while we make an if-check on string_literal_begin? I think that could also be a DCHECK and can be done inside SaveSateTimeToken. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 6 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 11 Dec 2017 10:19:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 6: (8 comments) Thanks for the potential problem for double quotes. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/exprs/expr-test.cc@5980 PS6, Line 5980: TestStringValue("from_unixtime(1492677561, 'HH\\'H\\'')", "08H"); > I checked in Hive and it kind of handles nested quotes like: You found a potential problem in my change. I didn't consider a double quote. Let me provide a solution with including the case in the next patch set. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h@190 PS6, Line 190: // nullptr if input does not have unclosed quote > Also mention please the side-effect of advancing the str param to the closi Sorry, I could not catch what you meant. Would you please explain it with an example? http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@151 PS6, Line 151: const DateTimeFormatTokenType token_type, const int position_of_token, > SEPARATOR as token_type is not necessary as a param. I think it should be passed to this function because the function is used more in general. There are different token types. When we refactor several use cases, the function will be used. e.g. https://github.com/apache/impala/blob/master/be/src/runtime/timestamp-parse-util.cc#L165 http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@153 PS6, Line 153: DCHECK(token_begin != nullptr); > 2 of these DCHECKs are already done in the other function. I think it's eno As I mentioned in the comment of line#151, the function will be used broadly, so the essential checks are required. The function should be self-contained. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@161 PS6, Line 161: const char*& str, int& position_of_string_literal, int& length_of_string_literal) { > Thanks for making these adjustments! I think SaveDateTimeToken is not suitable to calculate the place of the position_of_string_literal. As I mentioned, the function will be used in general. The formula of the calculation is fit to here, so I don't think the formula can cover all the cases because there are a few different calculation logics. e.g. https://github.com/apache/impala/blob/master/be/src/runtime/timestamp-parse-util.cc#L169 http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@163 PS6, Line 163: const char* str_end = str_begin + dt_ctx->fmt_len; > Can't we pass the str_end pointer to this function? Then we could get rid o Not acceptable due to the reason above. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@177 PS6, Line 177: ++str; > At the end of the excution of this function I feel that it's a better appro Okay, let me reflect this on the next patch set. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@199 PS6, Line 199: SaveDateTimeToken(dt_ctx, SEPARATOR, position_of_string_literal, > No need to pass the separator as a parameter, it is accessible inside the f Not acceptable due to the reason above. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 6 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 10 Dec 2017 12:00:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.h@190 PS6, Line 190: // nullptr if input does not have unclosed quote Also mention please the side-effect of advancing the str param to the closing quote. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 6 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 07 Dec 2017 09:54:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 6: (7 comments) Thanks, for answering/incorporating my comments! http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/exprs/expr-test.cc@5980 PS6, Line 5980: TestStringValue("from_unixtime(1492677561, 'HH\\'H\\'')", "08H"); I checked in Hive and it kind of handles nested quotes like: select from_unixtime(1492677561, "HH'H\"H\"' mm'm' ss's'"); 01H"H" 39m 21s Would it be possible to add a testcase to cover this? http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@151 PS6, Line 151: const DateTimeFormatTokenType token_type, const int position_of_token, SEPARATOR as token_type is not necessary as a param. dt_ctx, token_begin, token_length are sufficient as parameters as position_of_token could be calculated inside the function based on these, right? http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@153 PS6, Line 153: DCHECK(token_begin != nullptr); 2 of these DCHECKs are already done in the other function. I think it's enough to do the only once, e.g. here. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@161 PS6, Line 161: const char*& str, int& position_of_string_literal, int& length_of_string_literal) { Thanks for making these adjustments! Few more comments: I think this function should provide a pointer to the start of the string literal, and the length of the str (and as a side effect advancing the str pointer to the closing quote, that fact should be documented in the header file). SaveDateTimeToken seems a better place to calculate the position_of_string_literal. What do you think? http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@163 PS6, Line 163: const char* str_end = str_begin + dt_ctx->fmt_len; Can't we pass the str_end pointer to this function? Then we could get rid of dt_ctx param (I think str_begin is not needed in this function). http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@177 PS6, Line 177: ++str; At the end of the excution of this function I feel that it's a better approach to have the str pointing to the finishing quote. Before making a 'continue' it can be increased to the next char. My reasoning would be, that a function to get a string literal between quotes shouldn't be aware whether there are more characters after the finishing quote or not. It should only care about the chars between the quotes. http://gerrit.cloudera.org:8080/#/c/8508/6/be/src/runtime/timestamp-parse-util.cc@199 PS6, Line 199: SaveDateTimeToken(dt_ctx, SEPARATOR, position_of_string_literal, No need to pass the separator as a parameter, it is accessible inside the function too. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 6 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 07 Dec 2017 09:42:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Hello Gabor Kaszab, Attila Jeges, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8508 to look at the new patch set (#6). Change subject: IMPALA-5237: Support a quoted string in date/time format .. IMPALA-5237: Support a quoted string in date/time format Impala does not represent a quoted string at date/time format. For example, addtional quoted string between the patterns (e.g. HH\'H\' => 11H). Hive supports this feature, so user wants to get a same result from Impala. By the way, Impala returns a different result as below. * Hive > select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\''); 08H 39M 24S * Impala > select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\''); 08'8' 39'4' 24'0' The change affects the format pattern for from_unixtime/from_timestamp/unix_timestamp. In unix_timestamp, user can also specify a quoted string like this. > select unix_timestamp('\'Epoch time: \'19700101', > '\'Epoch time: \'MMdd'); 0 By the way, the quoted strings between input and format should be exactly same and internally string comparison is case-sensitive. There is one intentional difference between Hive and Impala. In Impala, the format string should have any date or time patten as below. Throwing the error/warning is better if Impala cannot optimize the expression. User must rewrite the query and don't pay the function call. * Hive > select from_unixtime(0, '\'Hello world!\''); Hello world! * Impala > select from_unixtime(0, '\'Hello world!\''); Bad date/time conversion format: 'Hello world!' Testing: Add unit tests for working and nonworking cases Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 98 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/8508/6 -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 6 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h@183 PS4, Line 183: /// dt_ctx -- date/time format context > Actually I don't want leave meaningless comments. I think the parameter nam I meant "the parameter names" are in new code in the recent patch set. Please feel free to tell me if you have a better name or more informative comment. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 4 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Dec 2017 11:44:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5746 PS3, Line 5746: TestValue("unix_timestamp('\\'Epoch time: \\'1970|01|01 00|00|00',\ > Thanks for the explanation! I guess your question is that the result seems to be wrong even though the format is find and everything is okey to run. Timezone can affect epoch timestamp. Impala interprets date and time with UTC timezone by default due to avoidance of undesired results by local timezone. - Regarding default timezone: https://github.com/apache/impala/blob/master/docs/topics/impala_timestamp.xml#L89 - Regarding data and time interpretation at unix_timestamp: https://github.com/apache/impala/blob/master/docs/topics/impala_datetime_functions.xml#L2463 - Test case for epoch: https://github.com/apache/impala/blob/master/be/src/exprs/expr-test.cc#L5617 Therefore, I think the result should be zero. http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h@182 PS4, Line 182: Reset(fmt, fmt_len); > 1: This comment doesn't add any further info as it is basically the functi Thanks for your kind comment. 1,2: I agree. Actually I thought I should have to add the meaningless comment, but I didn't imagine more meaningful comment and I think the function name looks self-descriptive. 3. More detailed explanation is added to return description instead of the word "parse". 4. I agree. Reader expects a string literal due to "Get", so the revised function will return a pointer of string literal. The function is split into two parts: GetStringLiteralBetweenQuotes and SaveDateTimeToken. I think SaveDateTimeToken will be used generally when refactoring ParseFormatTokens or adding a new code. I fully agree an unit function should have a minimal purpose and responsibility. http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h@183 PS4, Line 183: } > Again, this is simply writing the type of the parameter with spaces. Actually I don't want leave meaningless comments. I think the parameter names are self-explainable, but please leave a comment as a new reader if you think any additional information should be helpful. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 3 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Dec 2017 11:41:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Hello Gabor Kaszab, Attila Jeges, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8508 to look at the new patch set (#5). Change subject: IMPALA-5237: Support a quoted string in date/time format .. IMPALA-5237: Support a quoted string in date/time format Impala does not represent a quoted string at date/time format. For example, addtional quoted string between the patterns (e.g. HH\'H\' => 11H). Hive supports this feature, so user wants to get a same result from Impala. By the way, Impala returns a different result as below. * Hive > select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\''); 08H 39M 24S * Impala > select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\''); 08'8' 39'4' 24'0' The change affects the format pattern for from_unixtime/from_timestamp/unix_timestamp. In unix_timestamp, user can also specify a quoted string like this. > select unix_timestamp('\'Epoch time: \'19700101', > '\'Epoch time: \'MMdd'); 0 By the way, the quoted strings between input and format should be exactly same and internally string comparison is case-sensitive. There is one intentional difference between Hive and Impala. In Impala, the format string should have any date or time patten as below. Throwing the error/warning is better if Impala cannot optimize the expression. User must rewrite the query and don't pay the function call. * Hive > select from_unixtime(0, '\'Hello world!\''); Hello world! * Impala > select from_unixtime(0, '\'Hello world!\''); Bad date/time conversion format: 'Hello world!' Testing: Add unit tests for working and nonworking cases Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 98 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/8508/5 -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 5 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5746 PS3, Line 5746: TestIsNull("months_between('23:33:45','15:33:45')", TYPE_DOUBLE); > I guess you may expect "Epoch time: 1970|01|01 00|00|00" as a result, unix_ Thanks for the explanation! One more question: If we decide not to return Null here, then wouldn't it make sense to return the value as if that "Epoch time: " something wasn't there? I have a feeling that we should either return Null, or return the value indicated by the given parameters, so in this case I think this query should return the same as the following: unix_timestamp("1970|01|01 00|00|00","|MM|dd HH|mm|ss"); 28800 Could you please share your thoughts? http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5764 PS3, Line 5764: TestIsNull("dayofweek(cast('09:10:11.00' as timestamp))", TYPE_INT); > As I explained above, I think returning NULL for the mismatched case is eno Thanks again for the explanation! Using TestIsNull here sounds fine. http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h@182 PS4, Line 182: /// Get string literal between quotes 1: This comment doesn't add any further info as it is basically the function name with spaces. 2: The comment should tell the reader if there is some precondition for this function. In this case I think it would be beneficial to mention that we assume that str is currently pointing to the opening quote. 3: For me it would also make sense to say explicitly what successful and unsuccessful parse mean. e.g. parsing is unsuccessful if there is no closing quote. 4: The 'Get' in the function name indicates that when we call this function it either returns the found string literal as a return value or through an out parameter. I think this is a bit misleading as in fact this function is for 2 different things (but not for returning a string literal): - Finding the closing quote and getting the string between the 2 quotes. - Adding a DateTimeFormatToken to dt_ctx. My preference here would be to introduce functions with a single area of responsibility: one for finding the string between the quotes and one for inserting the FormatToken to dt_ctx. What do you think? http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h@183 PS4, Line 183: /// dt_ctx -- date/time format context Again, this is simply writing the type of the parameter with spaces. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 4 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Dec 2017 09:42:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5746 PS3, Line 5746: TestValue("unix_timestamp('\\'Epoch time: \\'1970|01|01 00|00|00',\ > I checked and Hive in fact returns NULL for this input. I guess you may expect "Epoch time: 1970|01|01 00|00|00" as a result, unix_timestamp is not suitable because it returns bigint. I think two kinds of results are only available: zero or NULL As you mentioned, Hive returns NULL because the quoted string might be recognized as an unexpected format string. The string literal in the quotes is meaningless in unix_timestamp function, but I think an user wants to add some description and it should not affect returning a result. The flexibility might be helpful to an user even though a difference happens between Impala and Hive. Therefore, the result should be zero in Impala. http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5756 PS3, Line 5756: // 2) Not allowd unclosed quoted string > nit: typo: allowed Done http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5764 PS3, Line 5764: TestValue("unix_timestamp('\\'Epoch time: \\'1970|01|01 00|00|00',\ > There is one more thing I don't understand: Regardless of giving this forma As I explained above, I think returning NULL for the mismatched case is enough. I would like to prefer NULL instead of throwing an error. There are two issues. 1) The expected result is NULL because the input does not fit to the format(e.g. Epoch and Enoch). I think returning NULL is enough for happening a mismatch internally. 2) I should have to use TestIsNull instead of TestValue. It's my mistake when I copied from the above. Let me fix this. By the way, there is a different problem in our test framework. I guess the following test should fail: TestValue(cast(null as int), TYPE_INT, 0) As you know, NULL cannot be compared with any value because it is unknown, so it should not be accept. I filed the issue on IMPALA-6283. http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.h@77 PS3, Line 77: /// IMPALA-5237: Support a quoted string in date/time format > I have the feeling that copying the whole commit message here is a bit of a Done http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.cc@163 PS3, Line 163: if (*str == '\'') { > I would find this part of code a bit more self-descriptive if it was in the Done http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.cc@173 PS3, Line 173: if (len == 0) len = 1; > The naming of len, val and pos aren't that self-explanatory for me. Would i Done -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 3 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Dec 2017 07:56:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Hello Gabor Kaszab, Attila Jeges, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8508 to look at the new patch set (#4). Change subject: IMPALA-5237: Support a quoted string in date/time format .. IMPALA-5237: Support a quoted string in date/time format Impala does not represent a quoted string at date/time format. For example, addtional quoted string between the patterns (e.g. HH\'H\' => 11H). Hive supports this feature, so user wants to get a same result from Impala. By the way, Impala returns a different result as below. * Hive > select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\''); 08H 39M 24S * Impala > select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\''); 08'8' 39'4' 24'0' The change affects the format pattern for from_unixtime/from_timestamp/unix_timestamp. In unix_timestamp, user can also specify a quoted string like this. > select unix_timestamp('\'Epoch time: \'19700101', > '\'Epoch time: \'MMdd'); 0 By the way, the quoted strings between input and format should be exactly same and internally string comparison is case-sensitive. There is one intentional difference between Hive and Impala. In Impala, the format string should have any date or time patten as below. Throwing the error/warning is better if Impala cannot optimize the expression. User must rewrite the query and don't pay the function call. * Hive > select from_unixtime(0, '\'Hello world!\''); Hello world! * Impala > select from_unixtime(0, '\'Hello world!\''); Bad date/time conversion format: 'Hello world!' Testing: Add unit tests for working and nonworking cases Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 81 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/8508/4 -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 4 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5746 PS3, Line 5746: TestValue("unix_timestamp('\\'Epoch time: \\'1970|01|01 00|00|00',\ I checked and Hive in fact returns NULL for this input. To be honest I was expecting a different result containing the given string plus the time in the desired format. (instead of NULL or zero) e.g. in this case: "Epoch time: 1970|01|01 00|00|00" Am I missing something? Shouldn't this be the expected result? http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5756 PS3, Line 5756: // 2) Not allowd unclosed quoted string nit: typo: allowed http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5764 PS3, Line 5764: TestValue("unix_timestamp('\\'Epoch time: \\'1970|01|01 00|00|00',\ There is one more thing I don't understand: Regardless of giving this format correctly or with a mismatch with the string literals we still expect the query to return the same result (bigint zero) without any of them returning an error. Shouldn't this one return an error? http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.h@77 PS3, Line 77: /// IMPALA-5237: Support a quoted string in date/time format I have the feeling that copying the whole commit message here is a bit of an overkill. Would it be possible to wrap this up and have a few lines of description about the new functionality. In addition I think it's not necessary to mention here how Hive handles these cases. We should focus here on how Impala handles them. http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.cc@163 PS3, Line 163: if (*str == '\'') { I would find this part of code a bit more self-descriptive if it was in the form of a series of function calls. It might be just myself but for me figuring out the intent behind these pointer manipulations is something that takes some time. This is what I mean: if(*str == '\'') { GetClosingQuote(str); /* +some error handling if closing not found */ SaveTokenWhaterver(string_literal, str); /* Or whatever parameter is essential to this function in order to calculate len, pos, and val*/ } What is your opinion? http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/runtime/timestamp-parse-util.cc@173 PS3, Line 173: if (len == 0) len = 1; The naming of len, val and pos aren't that self-explanatory for me. Would it be possible to rename them for something more informative? -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 3 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 05 Dec 2017 14:52:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 3: Gabor, maybe you could take a look at this one? -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 3 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 04 Dec 2017 19:57:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 3: I need to get back to this one after the thanksgiving break - it's on my radar. @Attila does the latest iteration look good to you? -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 3 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Nov 2017 00:31:33 + Gerrit-HasComments: No