Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14291 )
Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2 ...................................................................... Patch Set 6: (11 comments) http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-parser.cc File be/src/runtime/datetime-iso-sql-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-parser.cc@225 PS6, Line 225: switch (**format) { : case 'b': return '\b'; : case 'n': return '\n'; : case 'r': return '\r'; : case 't': return '\t'; : } > Please check what escape sequences are supported by ANSI SQL standard. Well, I'm not convinced that ANSI SQL is relevant here as according to this page it doesn't allow escaping quotes and double quotes. https://www.ibm.com/support/knowledgecenter/en/SSGU8G_12.1.0/com.ibm.esqlc.doc/ids_esqlc_0015.htm This doc about MySql is kind of in line with my implementation: https://www.oreilly.com/library/view/mysql-cookbook/0596001452/ch04s02.html It also says that 2 sequential double quotes works as an escaped double quote same as with a backslash. However, I don't feel the need for this. Additionally MySql has \0 as a special char (ANSII NULL) but again, don't find it important. http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc File be/src/runtime/datetime-iso-sql-format-tokenizer.cc: http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@109 PS6, Line 109: *current_pos != nullptr > nit: current_pos != nullptr && *current_pos != nullptr Done http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@112 PS6, Line 112: *current_pos < str_end > nit: str_begin <= *current_pos && *current_pos < str_end Done http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@240 PS6, Line 240: strncmp(current_pos, "\\\"", 2) == 0) > This is safe only if 'current_pos' is zero-terminated, but I don't think it It is zero-terminated http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@273 PS6, Line 273: strncmp(*current_pos, "\\\"", 2) > same as L240. see above http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@277 PS6, Line 277: strncmp(*current_pos, "\\\\\\\"", 4) > same as L240. see above http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@282 PS6, Line 282: strncmp(*current_pos, "\\\\", 2) > same as L240. see above http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-parser-common.cc File be/src/runtime/datetime-parser-common.cc: http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-parser-common.cc@201 PS6, Line 201: continue > nit: can be removed, here and below. Done http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-parser-common.cc@202 PS6, Line 202: } else if (!tok.is_double_escaped && strncmp(text_it, "\\\"", 2) == 0) { : result.append("\""); : ++text_it; : continue; : } else if (strncmp(text_it, "\\\\", 2) == 0) { : result.append("\\"); : ++text_it; : continue; : } else if (strncmp(text_it, "\\b", 2) == 0) { : result.append("\b"); : ++text_it; : continue; : } else if (strncmp(text_it, "\\n", 2) == 0) { : result.append("\n"); : ++text_it; : continue; : } else if (strncmp(text_it, "\\r", 2) == 0) { : result.append("\r"); : ++text_it; : continue; : } else if (strncmp(text_it, "\\t", 2) == 0) { : result.append("\t"); : ++text_it; : continue; : } > Are these the only escape sequences supported by ANSI SQL? See the answer in datetime-iso-sql-format-parser.cc http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@663 PS6, Line 663: # Format part is surrounded by double quotes so the quotes indicating the start and : # end of the text token has to be escaped. > I did some testing around when the format string is surrounded by ' and the If the format is surrounded by single quotes but the text double quotes surrounding the text token is escaped is a valid scenario. From the code it's not possible to decide whether a string is surrounded by single or double quotes I so won't be able to prevent that scenario. The ones where the opening double quote is escaped but the closing isn't should result an error. This is a bug, I'll take care of it. Done http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@768 PS6, Line 768: covered > surrounded Don't see the difference. This seems nit of the nits :D Done -- To view, visit http://gerrit.cloudera.org:8080/14291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855 Gerrit-Change-Number: 14291 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Comment-Date: Wed, 16 Oct 2019 14:03:09 +0000 Gerrit-HasComments: Yes