Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14291 )

Change subject: IMPALA-8704: ISO:SQL:2016 datetime patterns - Milestone 2
......................................................................


Patch Set 6:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-parser.cc@225
PS6, Line 225:  switch (**format) {
             :     case 'b': return '\b';
             :     case 'n': return '\n';
             :     case 'r': return '\r';
             :     case 't': return '\t';
             :   }
Please check what escape sequences are supported by ANSI SQL standard.
(same as datetime-parser-common.cc:226)


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@109
PS6, Line 109: *current_pos != nullptr
nit: current_pos != nullptr && *current_pos != nullptr


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@112
PS6, Line 112: *current_pos < str_end
nit: str_begin <= *current_pos && *current_pos < str_end


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@240
PS6, Line 240: strncmp(current_pos, "\\\"", 2) == 0)
This is safe only if 'current_pos' is zero-terminated, but I don't think it is. 
Probably you should check first that there are at least 2 characters between 
current_pos and end_pos.


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@273
PS6, Line 273:  strncmp(*current_pos, "\\\"", 2)
same as L240.


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@277
PS6, Line 277: strncmp(*current_pos, "\\\\\\\"", 4)
same as L240.


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@282
PS6, Line 282: strncmp(*current_pos, "\\\\", 2)
same as L240.


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-parser-common.cc
File be/src/runtime/datetime-parser-common.cc:

http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-parser-common.cc@201
PS6, Line 201: continue
nit: can be removed, here and below.


http://gerrit.cloudera.org:8080/#/c/14291/6/be/src/runtime/datetime-parser-common.cc@202
PS6, Line 202:    } else if (!tok.is_double_escaped && strncmp(text_it, "\\\"", 
2) == 0) {
             :       result.append("\"");
             :       ++text_it;
             :       continue;
             :     } else if (strncmp(text_it, "\\\\", 2) == 0) {
             :       result.append("\\");
             :       ++text_it;
             :       continue;
             :     } else if (strncmp(text_it, "\\b", 2) == 0) {
             :       result.append("\b");
             :       ++text_it;
             :       continue;
             :     } else if (strncmp(text_it, "\\n", 2) == 0) {
             :       result.append("\n");
             :       ++text_it;
             :       continue;
             :     } else if (strncmp(text_it, "\\r", 2) == 0) {
             :       result.append("\r");
             :       ++text_it;
             :       continue;
             :     } else if (strncmp(text_it, "\\t", 2) == 0) {
             :       result.append("\t");
             :       ++text_it;
             :       continue;
             :     }
Are these the only escape sequences supported by ANSI SQL?

C, for instance supports a bunch of others too:
https://en.wikipedia.org/wiki/Escape_sequences_in_C


http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@663
PS6, Line 663:    # Format part is surrounded by double quotes so the quotes 
indicating the start and
             :     # end of the text token has to be escaped.
I did some testing around when the format string is surrounded by ' and the 
text token is surrounded by \"

Date to string conversion:

>> select cast(date"1985-12-08" as string format 'YYYY-MM-DD \"X\"');
returns: 1985-12-08 X
I'm not sure if this should work or return an error instead.

also if only the opening " to the text token is escaped:

>> select cast(date"1985-12-08" as string format 'YYYY-MM-DD \"X"');
returns 1985-12-08
This should probably return an error.

On the other hand when doing string to date conversion:

>> select cast("1985-12-08 X" as date format 'YYYY-MM-DD \"X\"');
returns: 1985-12-08

>> select cast("1985-12-08 X" as date format 'YYYY-MM-DD \"X"');
returns an error.

>> select cast("1985-12-08" as date format 'YYYY-MM-DD \"X"');
returns an error too.


http://gerrit.cloudera.org:8080/#/c/14291/6/tests/query_test/test_cast_with_format.py@768
PS6, Line 768: covered
surrounded



--
To view, visit http://gerrit.cloudera.org:8080/14291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30d2f6656054371476aaa8bd0d51f572b9369855
Gerrit-Change-Number: 14291
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Oct 2019 15:42:17 +0000
Gerrit-HasComments: Yes

Reply via email to