On Mon, Jun 20, 2016 at 8:19 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Mon, Jun 13, 2016 at 12:25 PM, Robert Haas <robertmh...@gmail.com>
> wrote:
> > On Mon, Jun 13, 2016 at 12:12 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >> amul sul <sul_a...@yahoo.co.in> writes:
> >>> It's look like bug in to_timestamp() function when format string has
> more whitespaces compare to input string, see below:
> >>
> >> No, I think this is a case of "input doesn't match the format string".
> >>
> >> As a rule of thumb, using to_timestamp() for input that could be parsed
> >> just fine by the standard timestamp input function is not a particularly
> >> good idea.  to_timestamp() is meant to deal with input that is in a
> >> well-defined format that happens to not be parsable by timestamp_in.
> >> This example doesn't meet either of those preconditions.
> >
> > I think a space in the format string should skip a whitespace
> > character in the input string, but not a non-whitespace character.
> > It's my understanding that these functions exist in no small part for
> > compatibility with Oracle, and Oracle declines to skip the digit '1'
> > on the basis of an extra space in the format string, which IMHO is the
> > behavior any reasonable user would expect.
>
> So Amul and I are of one opinion and Tom is of another.  Anyone else
> have an opinion?
>
>
​At least Tom's position has the benefit of being consistent with current
behavior.  The current implementation doesn't actually care what literal
value you specify - any non-special character consumes a single token from
the input, period.

SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'YYYY/MM/DD--HH24:MI:SS');
SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'YYYY/MM/DD-HH24:MI:SS');

Both the above exhibit the same behavior as if you used a space instead of
the hyphen as the group separator.

The documentation should be updated to make this particular dynamic more
clear.

I don't see changing the general behavior of these "date formatting"
functions a worthwhile endeavor.​  Adding a less-liberal "parse_timestamp"
function I could get behind.

IOW, the user seems to be happy with the fact that the "/" in the date
matches his "-" but them complains that the space matches the number "1".
You don't get to have it both ways.

[re-reads the third usage note]

Or maybe you do.  We already define space as a being a greedy operator
(when not used in conjunction with FX).  A greedy space-space sequence
makes little sense on its face and if we are going to be helpful here we
should treat it as a single greedy space matcher.

Note that "returns an error because to_timestamp expects one space only" is
wrong - it errors because only a single space is captured and then the
attempt to parse '    JUN' using "MON" fails.  The following query doesn't
fail though it exhibits the same space discrepancy (it just gives the same
"wrong" result).

SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'FXYYYY/MM/DD  HH24:MI:SS');

Given that we already partially special-case the space expression I'd be
inclined to consider Robert's and Amul's position on the matter.  I think
I'd redefine our treatment of space to be "zero or more" instead of "one or
more" and require that it only match a literal space in the input.

Having considered that, I'm not convinced its worth a compatibility break.
I'd much rather deprecate these <to_*> versions and write
slightly-less-liberal versions named <parse_*>.

In any case I'd called the present wording a bug. Instead:

A single space consumes a single token of input and then, unless the FX
modifier is present, consumes zero or more subsequent literal spaces.
Thus, using two spaces in a row without the FX modifier, while allowed, is
unlikely to give you a satisfactory result.  The first space will consume
all available consecutive spaces so that the second space will be
guaranteed to consume a non-space token from the input.

David J.

Reply via email to