Thank you for your comments, 2016-11-04 20:36 GMT+02:00 Tom Lane <t...@sss.pgh.pa.us>: > > Artur Zakirov <a.zaki...@postgrespro.ru> writes: > > I attached new version of the patch, which fix is_char_separator() > > declaration too. > > I did some experimenting using > http://rextester.com/l/oracle_online_compiler >
> > which makes it look a lot like Oracle treats separator characters in the > pattern the same as spaces (but I haven't checked their documentation to > confirm that). > > The proposed patch doesn't seem to me to be trying to follow > these Oracle behaviors, but I think there is very little reason for > changing any of this stuff unless we move it closer to Oracle. Previous versions of the patch doesn't try to follow all Oracle behaviors. It tries to fix Amul Sul's behaviors. Because I was confused by dealing with spaces and separators by Oracle to_timestamp() and there is not information about it in the Oracle documentation: https://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements004.htm#g195443 I've thought better about it now and fixed the patch. Now parser removes spaces after and before fields and insists that count of separators in the input string should match count of spaces or separators in the formatting string (but in formatting string we can have more separators than in input string). > > Some other nitpicking: > > * I think the is-separator function would be better coded like > > static bool > is_separator_char(const char *str) > { > /* ASCII printable character, but not letter or digit */ > return (*str > 0x20 && *str < 0x7F && > !(*str >= 'A' && *str <= 'Z') && > !(*str >= 'a' && *str <= 'z') && > !(*str >= '0' && *str <= '9')); > } > > The previous way is neither readable nor remarkably efficient, and it > knows much more about the ASCII character set than it needs to. Fixed. > > * Don't forget the cast to unsigned char when using isspace() or other > <ctype.h> functions. Fixed. > > * I do not see the reason for throwing an error here: > > + /* Previous character was a backslash */ > + if (in_backslash) > + { > + /* After backslash should go non-space character */ > + if (isspace(*str)) > + ereport(ERROR, > + > (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("invalid escape > sequence"))); > + in_backslash = false; > > Why shouldn't backslash-space be a valid quoting sequence? > Hm, truly. Fixed it. I've attached the fixed patch. -- Sincerely, Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 2e64cc4..5a4e248 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -6159,7 +6159,8 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); <function>to_timestamp</function> and <function>to_date</function> skip multiple blank spaces in the input string unless the <literal>FX</literal> option is used. For example, - <literal>to_timestamp('2000 JUN', 'YYYY MON')</literal> works, but + <literal>to_timestamp('2000 JUN', 'YYYY MON')</literal> and + <literal>to_timestamp('2000 JUN', 'YYYY MON')</literal> work, but <literal>to_timestamp('2000 JUN', 'FXYYYY MON')</literal> returns an error because <function>to_timestamp</function> expects one space only. <literal>FX</literal> must be specified as the first item in @@ -6169,6 +6170,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); <listitem> <para> + <function>to_timestamp</function> and <function>to_date</function> don't + skip multiple printable non letter and non digit characters in the input + string, but skip them in the formatting string. For example, + <literal>to_timestamp('2000-JUN', 'YYYY/MON')</literal> and + <literal>to_timestamp('2000/JUN', 'YYYY//MON')</literal> work, but + <literal>to_timestamp('2000//JUN', 'YYYY/MON')</literal> + returns an error because count of the "/" character in the input string + doesn't match count of it in the formatting string. + </para> + </listitem> + + <listitem> + <para> Ordinary text is allowed in <function>to_char</function> templates and will be output literally. You can put a substring in double quotes to force it to be interpreted as literal text diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c index d4eaa50..d28ceec 100644 --- a/src/backend/utils/adt/formatting.c +++ b/src/backend/utils/adt/formatting.c @@ -169,6 +169,8 @@ struct FormatNode #define NODE_TYPE_END 1 #define NODE_TYPE_ACTION 2 #define NODE_TYPE_CHAR 3 +#define NODE_TYPE_SEPARATOR 4 +#define NODE_TYPE_SPACE 5 #define SUFFTYPE_PREFIX 1 #define SUFFTYPE_POSTFIX 2 @@ -951,6 +953,7 @@ typedef struct NUMProc static const KeyWord *index_seq_search(const char *str, const KeyWord *kw, const int *index); static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int type); +static bool is_separator_char(const char *str); static void NUMDesc_prepare(NUMDesc *num, FormatNode *n); static void parse_format(FormatNode *node, const char *str, const KeyWord *kw, const KeySuffix *suf, const int *index, int ver, NUMDesc *Num); @@ -967,7 +970,6 @@ static void dump_node(FormatNode *node, int max); static const char *get_th(char *num, int type); static char *str_numth(char *dest, char *num, int type); static int adjust_partial_year_to_2020(int year); -static int strspace_len(char *str); static void from_char_set_mode(TmFromChar *tmfc, const FromCharDateMode mode); static void from_char_set_int(int *dest, const int value, const FormatNode *node); static int from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node); @@ -1040,6 +1042,16 @@ suff_search(const char *str, const KeySuffix *suf, int type) return NULL; } +static bool +is_separator_char(const char *str) +{ + /* ASCII printable character, but not letter or digit */ + return (*str > 0x20 && *str < 0x7F && + !(*str >= 'A' && *str <= 'Z') && + !(*str >= 'a' && *str <= 'z') && + !(*str >= '0' && *str <= '9')); +} + /* ---------- * Prepare NUMDesc (number description struct) via FormatNode struct * ---------- @@ -1225,9 +1237,10 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw, { const KeySuffix *s; FormatNode *n; + bool in_text = false, + in_backslash = false; int node_set = 0, - suffix, - last = 0; + suffix; #ifdef DEBUG_TO_FROM_CHAR elog(DEBUG_elog_output, "to_char/number(): run parser"); @@ -1239,6 +1252,50 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw, { suffix = 0; + /* Previous character was a backslash */ + if (in_backslash) + { + in_backslash = false; + + n->type = NODE_TYPE_CHAR; + n->character = *str; + n->key = NULL; + n->suffix = 0; + n++; + str++; + continue; + } + /* Previous character was a quote */ + else if (in_text) + { + if (*str == '"') + { + str++; + in_text = false; + } + else if (*str == '\\') + { + str++; + in_backslash = true; + } + else + { + if (ver == DCH_TYPE && is_separator_char(str)) + n->type = NODE_TYPE_SEPARATOR; + else if (isspace((unsigned char) *str)) + n->type = NODE_TYPE_SPACE; + else + n->type = NODE_TYPE_CHAR; + + n->character = *str; + n->key = NULL; + n->suffix = 0; + n++; + str++; + } + continue; + } + /* * Prefix */ @@ -1278,48 +1335,30 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw, } else if (*str) { - /* - * Special characters '\' and '"' - */ - if (*str == '"' && last != '\\') + if (*str == '"') { - int x = 0; - - while (*(++str)) - { - if (*str == '"' && x != '\\') - { - str++; - break; - } - else if (*str == '\\' && x != '\\') - { - x = '\\'; - continue; - } - n->type = NODE_TYPE_CHAR; - n->character = *str; - n->key = NULL; - n->suffix = 0; - ++n; - x = *str; - } + in_text = true; node_set = 0; - suffix = 0; - last = 0; + str++; } - else if (*str && *str == '\\' && last != '\\' && *(str + 1) == '"') + else if (*str == '\\') { - last = *str; + in_backslash = true; + node_set = 0; str++; } - else if (*str) + else { - n->type = NODE_TYPE_CHAR; + if (ver == DCH_TYPE && is_separator_char(str)) + n->type = NODE_TYPE_SEPARATOR; + else if (isspace((unsigned char) *str)) + n->type = NODE_TYPE_SPACE; + else + n->type = NODE_TYPE_CHAR; + n->character = *str; n->key = NULL; node_set = 1; - last = 0; str++; } } @@ -1336,6 +1375,17 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw, } } + if (in_backslash) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("invalid escape sequence"))); + + /* If we didn't meet closing quotes */ + if (in_text) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("unexpected end of format string, expected '\"' character"))); + n->type = NODE_TYPE_END; n->suffix = 0; } @@ -2068,20 +2118,6 @@ adjust_partial_year_to_2020(int year) return year; } - -static int -strspace_len(char *str) -{ - int len = 0; - - while (*str && isspace((unsigned char) *str)) - { - str++; - len++; - } - return len; -} - /* * Set the date mode of a from-char conversion. * @@ -2151,11 +2187,6 @@ from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node) char *init = *src; int used; - /* - * Skip any whitespace before parsing the integer. - */ - *src += strspace_len(*src); - Assert(len <= DCH_MAX_ITEM_SIZ); used = (int) strlcpy(copy, *src, len + 1); @@ -2934,19 +2965,62 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out) FormatNode *n; char *s; int len, - value; + value, + prev_type = NODE_TYPE_SPACE; bool fx_mode = false; for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++) { - if (n->type != NODE_TYPE_ACTION) + /* Ignore all spaces after previous field */ + if (prev_type == NODE_TYPE_ACTION) + while (*s != '\0' && isspace((unsigned char) *s)) + s++; + + if (n->type == NODE_TYPE_SPACE || n->type == NODE_TYPE_SEPARATOR) + { + /* + * In non FX (fixed format) mode we skip spaces and separator + * characters. + */ + if (!fx_mode) + { + if (isspace((unsigned char) *s) || is_separator_char(s)) + s++; + + prev_type = n->type; + continue; + } + + /* Checks for FX mode */ + if (n->type == NODE_TYPE_SPACE && !isspace((unsigned char) *s)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_DATETIME_FORMAT), + errmsg("unexpected character \"%c\", expected \"%c\"", + *s, n->character), + errdetail("The given value did not match any of the allowed " + "values for this field."))); + else if (n->type == NODE_TYPE_SEPARATOR && n->character != *s) + ereport(ERROR, + (errcode(ERRCODE_INVALID_DATETIME_FORMAT), + errmsg("unexpected character \"%c\", expected \"%c\"", + *s, n->character), + errdetail("The given value did not match any of the allowed " + "values for this field."))); + + s++; + prev_type = n->type; + continue; + } + else if (n->type == NODE_TYPE_CHAR) { /* - * Separator, so consume one character from input string. Notice - * we don't insist that the consumed character match the format's - * character. + * Text character, so consume one character from input string. + * Notice we don't insist that the consumed character match the + * format's character. + * Text field ignores FX mode. */ s++; + prev_type = n->type; continue; } @@ -2955,7 +3029,25 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out) { while (*s != '\0' && isspace((unsigned char) *s)) s++; + + /* + * Lost separator character from the input string. Separator + * character should match at least one space or separator character + * from the format string. + */ + if (is_separator_char(s)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_DATETIME_FORMAT), + errmsg("unexpected character \"%c\"", *s), + errdetail("The given value did not match any of the allowed " + "values for this field."))); } + else if (isspace((unsigned char) *s) || is_separator_char(s)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_DATETIME_FORMAT), + errmsg("unexpected character \"%c\"", *s), + errdetail("The given value did not match any of the allowed " + "values for this field."))); from_char_set_mode(out, n->key->date_mode); @@ -3194,6 +3286,7 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out) SKIP_THth(s, n->suffix); break; } + prev_type = n->type; } } diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out index f9d12e0..96773fa 100644 --- a/src/test/regress/expected/horology.out +++ b/src/test/regress/expected/horology.out @@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS'); Sat Feb 16 08:14:30 0097 PST (1 row) +SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS'); + to_timestamp +------------------------------ + Fri Mar 18 23:38:15 2011 PDT +(1 row) + SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD'); to_timestamp ------------------------------ Sat Jan 12 00:00:00 1985 PST (1 row) +SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD'); + to_timestamp +------------------------------ + Sat Jan 12 00:00:00 1985 PST +(1 row) + +SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD'); + to_timestamp +------------------------------ + Sat Jan 12 00:00:00 1985 PST +(1 row) + SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16', - '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD'); + '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD'); to_timestamp ------------------------------ Sun May 16 00:00:00 1976 PDT @@ -2789,12 +2807,18 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD'); (1 row) SELECT to_timestamp('15 "text between quote marks" 98 54 45', - E'HH24 "\\text between quote marks\\"" YY MI SS'); + E'HH24 "\\"text between quote marks\\"" YY MI SS'); to_timestamp ------------------------------ Thu Jan 01 15:54:45 1998 PST (1 row) +SELECT to_timestamp('Year: 1976, Month: May, Day: 16', '" Year:" YYYY, "Month:" FMMonth, "Day:" DD'); + to_timestamp +------------------------------ + Sun May 16 00:00:00 1976 PDT +(1 row) + SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY'); to_timestamp ------------------------------ @@ -2808,8 +2832,23 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay'); (1 row) SELECT to_timestamp('97/Feb/16', 'YYMonDD'); -ERROR: invalid value "/Fe" for "Mon" +ERROR: unexpected character "/" DETAIL: The given value did not match any of the allowed values for this field. +SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD'); + to_timestamp +------------------------------ + Sun Feb 16 00:00:00 1997 PST +(1 row) + +SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD'); +ERROR: unexpected character "/", expected ":" +DETAIL: The given value did not match any of the allowed values for this field. +SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD'); + to_timestamp +------------------------------ + Sun Feb 16 00:00:00 1997 PST +(1 row) + SELECT to_timestamp('19971116', 'YYYYMMDD'); to_timestamp ------------------------------ @@ -2936,7 +2975,7 @@ SELECT to_timestamp('2011-12-18 11:38 PM', 'YYYY-MM-DD HH12:MI PM'); SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD HH24:MI:SS'); to_timestamp ------------------------------ - Sun Dec 18 03:38:15 2011 PST + Sun Dec 18 23:38:15 2011 PST (1 row) SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD HH24:MI:SS'); @@ -2966,9 +3005,30 @@ SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD HH24:MI:SS'); SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD HH24:MI:SS'); to_timestamp ------------------------------ - Sun Dec 18 03:38:15 2011 PST + Sun Dec 18 23:38:15 2011 PST (1 row) +SELECT to_timestamp('2000----JUN', 'YYYY MON'); +ERROR: unexpected character "-" +DETAIL: The given value did not match any of the allowed values for this field. +SELECT to_timestamp('2000 + JUN', 'YYYY MON'); + to_timestamp +------------------------------ + Thu Jun 01 00:00:00 2000 PDT +(1 row) + +SELECT to_timestamp('2000 + + JUN', 'YYYY MON'); +ERROR: unexpected character "+" +DETAIL: The given value did not match any of the allowed values for this field. +SELECT to_timestamp('2000 + + JUN', 'YYYY MON'); + to_timestamp +------------------------------ + Thu Jun 01 00:00:00 2000 PDT +(1 row) + +SELECT to_timestamp('2000--JUN', 'YYYY/MON'); +ERROR: unexpected character "-" +DETAIL: The given value did not match any of the allowed values for this field. SELECT to_date('2011 12 18', 'YYYY MM DD'); to_date ------------ @@ -2984,13 +3044,13 @@ SELECT to_date('2011 12 18', 'YYYY MM DD'); SELECT to_date('2011 12 18', 'YYYY MM DD'); to_date ------------ - 12-08-2011 + 12-18-2011 (1 row) SELECT to_date('2011 12 18', 'YYYY MM DD'); to_date ------------ - 02-18-2011 + 12-18-2011 (1 row) SELECT to_date('2011 12 18', 'YYYY MM DD'); diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql index a7bc9dc..2563980 100644 --- a/src/test/regress/sql/horology.sql +++ b/src/test/regress/sql/horology.sql @@ -392,15 +392,23 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS'); SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS'); +SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS'); + SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD'); +SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD'); + +SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD'); + SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16', - '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD'); + '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD'); SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD'); SELECT to_timestamp('15 "text between quote marks" 98 54 45', - E'HH24 "\\text between quote marks\\"" YY MI SS'); + E'HH24 "\\"text between quote marks\\"" YY MI SS'); + +SELECT to_timestamp('Year: 1976, Month: May, Day: 16', '" Year:" YYYY, "Month:" FMMonth, "Day:" DD'); SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY'); @@ -408,6 +416,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay'); SELECT to_timestamp('97/Feb/16', 'YYMonDD'); +SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD'); + +SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD'); + +SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD'); + SELECT to_timestamp('19971116', 'YYYYMMDD'); SELECT to_timestamp('20000-1116', 'YYYY-MMDD'); @@ -458,6 +472,12 @@ SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD HH24:MI:SS'); SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD HH24:MI:SS'); SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD HH24:MI:SS'); +SELECT to_timestamp('2000----JUN', 'YYYY MON'); +SELECT to_timestamp('2000 + JUN', 'YYYY MON'); +SELECT to_timestamp('2000 + + JUN', 'YYYY MON'); +SELECT to_timestamp('2000 + + JUN', 'YYYY MON'); +SELECT to_timestamp('2000--JUN', 'YYYY/MON'); + SELECT to_date('2011 12 18', 'YYYY MM DD'); SELECT to_date('2011 12 18', 'YYYY MM DD'); SELECT to_date('2011 12 18', 'YYYY MM DD');
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers