Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 )
Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 ...................................................................... Patch Set 6: (14 comments) Thanks for taking another look! http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h File be/src/runtime/datetime-iso-sql-format-parser.h: http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h@78 PS7, Line 78: format > text token inside the format Done http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h@81 PS7, Line 81: format > text token Done http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.h@81 PS7, Line 81: and moves : // '*format' to 'a'. > I don't think part this is true. Ideed, thx. Done http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc File be/src/runtime/datetime-iso-sql-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc@48 PS7, Line 48: if (tok->type == SEPARATOR && !dt_ctx.fx_modifier) { : bool res = ProcessSeparatorSequence(¤t_pos, end_pos, dt_ctx, &i); : if (!res || current_pos == end_pos) return res; : DCHECK(i < dt_ctx.toks.size()); : // Next token, following the separator sequence. : tok = &dt_ctx.toks[i]; : } > I'd prefer to handle strict separator matching (tok->type == SEPARATOR && d Done http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc@56 PS7, Line 56: const char* token_end_pos = FindEndOfToken(current_pos, end_pos - current_pos, *tok); : if (token_end_pos == nullptr) return false; : int token_len = token_end_pos - current_pos; : : if (dt_ctx.fx_modifier && !tok->fm_modifier && tok->type != TEXT && : token_len != tok->len) { : return false; : } > for TEXT tokens, 'token_end_pos' and 'token_len' aren't set here. 'token_en Done http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-iso-sql-format-parser.cc@185 PS7, Line 185: token_len == 1 > dt_ctx.fx_modifier && token_len == 1 Merged this with the loose separator handling. This case is no longer needed here. http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc File be/src/runtime/datetime-iso-sql-format-tokenizer.cc: http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@240 PS6, Line 240: strncmp(current_pos, "\\\"", 2) == 0) > You're correct, it is null-terminated for iso-sql format strings, because t I see. I wasn't aware that the format is not null-terminated for the SingleDateFormat path. Thanks for letting me know! I'll add comments about this assumption on the IsoSqlFormat branch for both IsoSqlFormatTokenizer::dt_ctx_ and IsoSqlFormatParser::ParseDateTime(). Thanks again for spotting this! http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-parser-common.h File be/src/runtime/datetime-parser-common.h: http://gerrit.cloudera.org:8080/#/c/14291/7/be/src/runtime/datetime-parser-common.h@134 PS7, Line 134: then > that Done http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@738 PS6, Line 738: select cast("1985-\a\b\f\n\r\t\v\'12-09" as ''' : r'''date format 'YYYY-"\a\b\f\n\r\t\v\'"MM-DD') > This test is about special escape sequences but non-special chars in the te Isn't intentional. Remained from a previous approach I had implemented to handle special char. I'm removing the non-special chars from this test. Done http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@782 PS7, Line 782: # Strict separator matching. : result = self.client.execute("select cast('2001-03-02 03:10:15' as timestamp format" : "'FXYYYY MM-DD HH12:MI:SS')") : assert result.data == ["NULL"] : : result = self.client.execute("select cast('2001-03-03 03:10:15' as timestamp format" : "'FXYYYY-MM-DD HH12::MI:SS')") : assert result.data == ["NULL"] : : result = self.client.execute("select cast('2001-03-04 ' as timestamp format" : "'FXYYYY-MM-DD ')") : assert result.data == ["NULL"] : : # Strict token length matching. : result = self.client.execute("select cast('2001-3-05' as timestamp format " : "'FXYYYY-MM-DD')") : assert result.data == ["NULL"] > Maybe add positive tests too for these. L778 is a positive test for FX modifier. Does that cover what you ask for? http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@866 PS7, Line 866: doesn't > don't Done http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@872 PS7, Line 872: result = self.client.execute("select cast(cast('2001-03-09' as date) " : "as string format 'YYYY-FMMM-FMDD')") : assert result.data == ["2001-3-9"] > Maybe add a test were year token is FM-modified Done http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@882 PS7, Line 882: result = self.client.execute("select cast(cast('2001-04-09' as date) " : "as string format 'FXYYYY-FMMM-FMDD')") > Test FM-modified year token here too. Done http://gerrit.cloudera.org:8080/#/c/14291/7/tests/query_test/test_cast_with_format.py@1045 PS7, Line 1045: "select cast('1985-11-21text' as timestamp format 'YYYY-MM-DD\"text')" > Again, please use raw python strings, here and below. Done -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Thu, 17 Oct 2019 18:11:33 +0000 Gerrit-HasComments: Yes
