Re: [HACKERS] Bug in to_timestamp().

2018-09-26 Thread Liudmila Mantrova


On 09/22/2018 10:00 PM, Alexander Korotkov wrote:

On Thu, Sep 20, 2018 at 3:52 PM Alexander Korotkov
 wrote:

On Thu, Sep 20, 2018 at 6:09 AM amul sul  wrote:

Agreed, thanks for working on this.

Pushed, thanks.

Please, find attached patch improving documentation about
letters/digits in to_date()/to_timestamp() template string.  I think
it needs review from native English speaker.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Hi Alexander,

I'm not a native speaker, but let me try to help. A new doc version is 
attached.



--

Liudmila Mantrova
Technical writer at Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9a7f683..1532bcc 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6286,13 +6286,46 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
exceeds the number of separators in the template.
   
   
-   If FX is specified, separator in template string
-   matches to exactly one character in input string.  Notice we don't insist
-   input string character to be the same as template string separator.
+   If FX is specified, a separator in the template
+   string matches exactly one character in the input string.  The
+   input string character does not need to be the same as the template
+   string separator.
For example, to_timestamp('2000/JUN', 'FX MON')
works, but to_timestamp('2000/JUN', 'FXMON')
-   returns an error because a space second template string space consumed
-   letter J from the input string.
+   returns an error because the second space in the template string
+   consumes letter J of the input string.
+  
+ 
+
+ 
+  
+Template strings of to_timestamp and
+to_date functions can also contain arbitrary
+letters/digits between patterns.  Such letters/digits can match
+any characters in the input string.  For example,
+to_timestamp('2000yJUN', 'xMON') works.
+  
+  
+Letters/digits consume an input string character only if the number
+of extra characters at the beginning of the input string or between
+the identified date/time values is less than or equal to the number
+of the corresponding characters in the template string. This ensures
+that the template string does not consume any characters of date/time
+values when used without the FX option, even if
+a letter/digit separator in the input string appears after a space.
+For example, to_timestamp('2000y JUN', 'xMON')
+works, but to_timestamp('2000 yJUN', 'xMON')
+returns an error.
+  
+  
+Note that if the template string contains an arbitrary letter,
+the pattern that precedes this letter becomes greedy and tries
+to match as many characters as possible. For example,
+to_timestamp('2000906901', 'xMMxDD')
+fails because the  pattern matches
+the whole input string instead of the first four characters.
+Patterns separated by digits are non-greedy, so
+to_timestamp('2000906901', '0MM0DD') works fine.
   
  
 


Re: [HACKERS] Bug in to_timestamp().

2018-09-22 Thread Alexander Korotkov
On Thu, Sep 20, 2018 at 3:52 PM Alexander Korotkov
 wrote:
>
> On Thu, Sep 20, 2018 at 6:09 AM amul sul  wrote:
> > Agreed, thanks for working on this.
>
> Pushed, thanks.

Please, find attached patch improving documentation about
letters/digits in to_date()/to_timestamp() template string.  I think
it needs review from native English speaker.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


to_timestamp_letters_digits.patch
Description: Binary data


Re: [HACKERS] Bug in to_timestamp().

2018-09-20 Thread Alexander Korotkov
On Thu, Sep 20, 2018 at 6:09 AM amul sul  wrote:
> On Thu, Sep 20, 2018, 3:22 AM Alexander Korotkov  
> wrote:
>> It's hard to understand whether it was expected, because it wasn't
>> properly documented.  More important that it's the same behavior we
>> have before cf984672, and purpose of cf984672 was not to change this.
>>
>> But from the code comments, it's intentional. If you put digits or
>> text characters into format string, you can skip non-separator input
>> string characters.  For instance you may do.
>>
>> # SELECT to_date('y18y12y2011', 'xDDxMMx');
>>   to_date
>> 
>>  2011-12-18
>> (1 row)
>>
>> It's even more interesting that letters and digits are handled in
>> different manner.
>>
>> # SELECT to_date('01801202011', 'xDDxMMx');
>> ERROR:  date/time field value out of range: "01801202011"
>> Time: 0,453 ms
>>
>> # SELECT to_date('01801202011', '9DD9MM9');
>>   to_date
>> 
>>  2011-12-18
>> (1 row)
>>
>> So, letters in format string doesn't allow you to extract fields at
>> particular positions of digit sequence, but digits in format string
>> allows you to.  That's rather undocumented, but from the code you can
>> get that it's intentional.  Thus, I think it would be nice to improve
>> the documentation here.  But I still propose to commit the patch I
>> propose to bring back unintentional behavior change in cf984672.
>
> Agreed, thanks for working on this.

Pushed, thanks.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-09-19 Thread amul sul
On Thu, Sep 20, 2018, 3:22 AM Alexander Korotkov 
wrote:

> On Wed, Sep 19, 2018 at 1:38 PM amul sul  wrote:
> > On Wed, Sep 19, 2018 at 3:51 PM amul sul  wrote:
> > >
> > > On Wed, Sep 19, 2018 at 2:57 PM Alexander Korotkov
> > [...]
> > >
> > > With this patch, to_date and to_timestamp behaving differently, see
> this:
> > >
> > > edb=# SELECT to_date('18 12 2011', 'xDDxMMx');
> > >   to_date
> > > 
> > >  18-DEC-11 00:00:00
> > > (1 row)
> > >
> > > edb=# SELECT to_timestamp('18 12 2011', 'xDDxMMx');
> > >to_timestamp
> > > ---
> > >  08-DEC-11 00:00:00 -05:00  <=== Incorrect output.
> > > (1 row)
> > >
> > Sorry, this was wrong info -- with this patch, I had some mine trial
> changes.
> >
> > Both to_date and to_timestamp behaving same with your patch -- the
> > wrong output, we are expecting that?
> >
> > postgres =# SELECT to_date('18 12 2011', 'xDDxMMx');
> >   to_date
> > 
> >  2011-12-08
> > (1 row)
> >ma
> > postgres =# SELECT to_timestamp('18 12 2011', 'xDDxMMx');
> >   to_timestamp
> > 
> >  2011-12-08 00:00:00-05
> > (1 row)
>
> It's hard to understand whether it was expected, because it wasn't
> properly documented.  More important that it's the same behavior we
> have before cf984672, and purpose of cf984672 was not to change this.
>
> But from the code comments, it's intentional. If you put digits or
> text characters into format string, you can skip non-separator input
> string characters.  For instance you may do.
>
> # SELECT to_date('y18y12y2011', 'xDDxMMx');
>   to_date
> 
>  2011-12-18
> (1 row)
>
> It's even more interesting that letters and digits are handled in
> different manner.
>
> # SELECT to_date('01801202011', 'xDDxMMx');
> ERROR:  date/time field value out of range: "01801202011"
> Time: 0,453 ms
>
> # SELECT to_date('01801202011', '9DD9MM9');
>   to_date
> 
>  2011-12-18
> (1 row)
>
> So, letters in format string doesn't allow you to extract fields at
> particular positions of digit sequence, but digits in format string
> allows you to.  That's rather undocumented, but from the code you can
> get that it's intentional.  Thus, I think it would be nice to improve
> the documentation here.  But I still propose to commit the patch I
> propose to bring back unintentional behavior change in cf984672.
>

Agreed, thanks for working on this.

Regards,
Amul

>


Re: [HACKERS] Bug in to_timestamp().

2018-09-19 Thread Alexander Korotkov
On Wed, Sep 19, 2018 at 1:38 PM amul sul  wrote:
> On Wed, Sep 19, 2018 at 3:51 PM amul sul  wrote:
> >
> > On Wed, Sep 19, 2018 at 2:57 PM Alexander Korotkov
> [...]
> >
> > With this patch, to_date and to_timestamp behaving differently, see this:
> >
> > edb=# SELECT to_date('18 12 2011', 'xDDxMMx');
> >   to_date
> > 
> >  18-DEC-11 00:00:00
> > (1 row)
> >
> > edb=# SELECT to_timestamp('18 12 2011', 'xDDxMMx');
> >to_timestamp
> > ---
> >  08-DEC-11 00:00:00 -05:00  <=== Incorrect output.
> > (1 row)
> >
> Sorry, this was wrong info -- with this patch, I had some mine trial changes.
>
> Both to_date and to_timestamp behaving same with your patch -- the
> wrong output, we are expecting that?
>
> postgres =# SELECT to_date('18 12 2011', 'xDDxMMx');
>   to_date
> 
>  2011-12-08
> (1 row)
>ma
> postgres =# SELECT to_timestamp('18 12 2011', 'xDDxMMx');
>   to_timestamp
> 
>  2011-12-08 00:00:00-05
> (1 row)

It's hard to understand whether it was expected, because it wasn't
properly documented.  More important that it's the same behavior we
have before cf984672, and purpose of cf984672 was not to change this.

But from the code comments, it's intentional. If you put digits or
text characters into format string, you can skip non-separator input
string characters.  For instance you may do.

# SELECT to_date('y18y12y2011', 'xDDxMMx');
  to_date

 2011-12-18
(1 row)

It's even more interesting that letters and digits are handled in
different manner.

# SELECT to_date('01801202011', 'xDDxMMx');
ERROR:  date/time field value out of range: "01801202011"
Time: 0,453 ms

# SELECT to_date('01801202011', '9DD9MM9');
  to_date

 2011-12-18
(1 row)

So, letters in format string doesn't allow you to extract fields at
particular positions of digit sequence, but digits in format string
allows you to.  That's rather undocumented, but from the code you can
get that it's intentional.  Thus, I think it would be nice to improve
the documentation here.  But I still propose to commit the patch I
propose to bring back unintentional behavior change in cf984672.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-09-19 Thread amul sul
On Wed, Sep 19, 2018 at 3:51 PM amul sul  wrote:
>
> On Wed, Sep 19, 2018 at 2:57 PM Alexander Korotkov
[...]
>
> With this patch, to_date and to_timestamp behaving differently, see this:
>
> edb=# SELECT to_date('18 12 2011', 'xDDxMMx');
>   to_date
> 
>  18-DEC-11 00:00:00
> (1 row)
>
> edb=# SELECT to_timestamp('18 12 2011', 'xDDxMMx');
>to_timestamp
> ---
>  08-DEC-11 00:00:00 -05:00  <=== Incorrect output.
> (1 row)
>
Sorry, this was wrong info -- with this patch, I had some mine trial changes.

Both to_date and to_timestamp behaving same with your patch -- the
wrong output, we are expecting that?

postgres =# SELECT to_date('18 12 2011', 'xDDxMMx');
  to_date

 2011-12-08
(1 row)

postgres =# SELECT to_timestamp('18 12 2011', 'xDDxMMx');
  to_timestamp

 2011-12-08 00:00:00-05
(1 row)

Regards,
Amul



Re: [HACKERS] Bug in to_timestamp().

2018-09-19 Thread amul sul
On Wed, Sep 19, 2018 at 2:57 PM Alexander Korotkov
 wrote:
>
> On Tue, Sep 18, 2018 at 4:42 PM Alexander Korotkov
>  wrote:
> > But, I found related issue in cf9846724.  Before it was:
> >
> > # select to_timestamp('2018 01 01', '9MM9DD');
> >   to_timestamp
> > 
> >  2018-01-01 00:00:00+03
> > (1 row)
> >
> > But after it becomes so.
> >
> > # select to_timestamp('2018 01 01', '9MM9DD');
> > ERROR:  invalid value "1 " for "MM"
> > DETAIL:  Field requires 2 characters, but only 1 could be parsed.
> > HINT:  If your source string is not fixed-width, try using the "FM" 
> > modifier.
> >
> > That happens because we've already skipped space "for free", and then
> > NODE_TYPE_CHAR eats digit.  I've checked that Oracle doesn't allow
> > random charaters/digits to appear in format string.
> >
> > select to_timestamp('2018 01 01', '9MM9DD') from dual
> > ORA-01821: date format not recognized
> >
> > So, Oracle compatibility isn't argument here. Therefore I'm going to
> > propose following fix for that: let NODE_TYPE_CHAR eat characters only
> > if we didn't skip input string characters more than it was in format
> > string.  I'm sorry for vague explanation.  I'll come up with patch
> > later, and it should be clear then.
>
> Please find attached patch for fixing this issue.  It makes handling
> of format string text characters be similar to pre cf984672 behavior.
> See the examples in regression tests and explanation in the commit
> message.  I'm going to commit this if no objections.
>

With this patch, to_date and to_timestamp behaving differently, see this:

edb=# SELECT to_date('18 12 2011', 'xDDxMMx');
  to_date

 18-DEC-11 00:00:00
(1 row)

edb=# SELECT to_timestamp('18 12 2011', 'xDDxMMx');
   to_timestamp
---
 08-DEC-11 00:00:00 -05:00  <=== Incorrect output.
(1 row)

Regards,
Amul



Re: [HACKERS] Bug in to_timestamp().

2018-09-19 Thread Alexander Korotkov
On Tue, Sep 18, 2018 at 4:42 PM Alexander Korotkov
 wrote:
> But, I found related issue in cf9846724.  Before it was:
>
> # select to_timestamp('2018 01 01', '9MM9DD');
>   to_timestamp
> 
>  2018-01-01 00:00:00+03
> (1 row)
>
> But after it becomes so.
>
> # select to_timestamp('2018 01 01', '9MM9DD');
> ERROR:  invalid value "1 " for "MM"
> DETAIL:  Field requires 2 characters, but only 1 could be parsed.
> HINT:  If your source string is not fixed-width, try using the "FM" modifier.
>
> That happens because we've already skipped space "for free", and then
> NODE_TYPE_CHAR eats digit.  I've checked that Oracle doesn't allow
> random charaters/digits to appear in format string.
>
> select to_timestamp('2018 01 01', '9MM9DD') from dual
> ORA-01821: date format not recognized
>
> So, Oracle compatibility isn't argument here. Therefore I'm going to
> propose following fix for that: let NODE_TYPE_CHAR eat characters only
> if we didn't skip input string characters more than it was in format
> string.  I'm sorry for vague explanation.  I'll come up with patch
> later, and it should be clear then.

Please find attached patch for fixing this issue.  It makes handling
of format string text characters be similar to pre cf984672 behavior.
See the examples in regression tests and explanation in the commit
message.  I'm going to commit this if no objections.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


to_timestamp_fix.patch
Description: Binary data


Re: [HACKERS] Bug in to_timestamp().

2018-09-18 Thread Alexander Korotkov
On Tue, Sep 18, 2018 at 2:08 PM Prabhat Sahu
 wrote:
>
> Few more findings on to_timestamp() test with HEAD.
>
> postgres[3493]=# select to_timestamp('15-07-1984 23:30:32',' dd- mm-    
> hh24: mi: ss');
>to_timestamp
> ---
>  1984-07-15 23:30:32+05:30
> (1 row)
>
> postgres[3493]=# select to_timestamp('15-07-1984 23:30:32','9dd-9mm-99 
> 9hh24:9mi:9ss');
>  to_timestamp
> --
>  0084-07-05 03:00:02+05:53:28
> (1 row)
>
> If there are spaces before any formate then output is fine(1st output) but 
> instead of spaces if we have digit then we are getting wrong output.

This behavior might look strange, but it wasn't introduced by
cf9846724.  to_timestamp() behaves so, because it takes digit have
NODE_TYPE_CHAR type.  And for NODE_TYPE_CHAR we just "eat" single
character of input string regardless what is it.

But, I found related issue in cf9846724.  Before it was:

# select to_timestamp('2018 01 01', '9MM9DD');
  to_timestamp

 2018-01-01 00:00:00+03
(1 row)

But after it becomes so.

# select to_timestamp('2018 01 01', '9MM9DD');
ERROR:  invalid value "1 " for "MM"
DETAIL:  Field requires 2 characters, but only 1 could be parsed.
HINT:  If your source string is not fixed-width, try using the "FM" modifier.

That happens because we've already skipped space "for free", and then
NODE_TYPE_CHAR eats digit.  I've checked that Oracle doesn't allow
random charaters/digits to appear in format string.

select to_timestamp('2018 01 01', '9MM9DD') from dual
ORA-01821: date format not recognized

So, Oracle compatibility isn't argument here. Therefore I'm going to
propose following fix for that: let NODE_TYPE_CHAR eat characters only
if we didn't skip input string characters more than it was in format
string.  I'm sorry for vague explanation.  I'll come up with patch
later, and it should be clear then.


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-09-18 Thread Prabhat Sahu
Hi All,

Few more findings on to_timestamp() test with HEAD.

postgres[3493]=# select to_timestamp('15-07-1984 23:30:32',' dd- mm-  
hh24: mi: ss');
   to_timestamp
---
 1984-07-15 23:30:32+05:30
(1 row)

postgres[3493]=# select to_timestamp('15-07-*1984* 23:30:32','*9*dd-*9*mm-
*99* *9*hh24:*9*mi:*9*ss');
 to_timestamp
--
 *0084*-07-05 03:00:02+05:53:28
(1 row)

If there are spaces before any formate then output is fine(1st output) but
instead of spaces if we have *digit* then we are getting wrong output.


On Mon, Sep 10, 2018 at 12:23 AM Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Thu, Sep 6, 2018 at 3:58 PM Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> On Thu, Sep 6, 2018 at 2:40 PM Alexander Korotkov
>>  wrote:
>> >
>> > On Wed, Sep 5, 2018 at 7:28 PM amul sul  wrote:
>> > > On Wed, Sep 5, 2018, 6:35 PM Alexander Korotkov <
>> a.korot...@postgrespro.ru> wrote:
>> > >> On Wed, Sep 5, 2018 at 3:10 PM amul sul  wrote:
>> > >> > On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
>> > >> >  wrote:
>> > >> > > On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
>> > >> > >  wrote:
>> > >> > > > From those results the question is how important is it to
>> force the following breakage on our users (i.e., introduce FX exact symbol
>> matching):
>> > >> > > >
>> > >> > > > SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
>> > >> > > > - to_timestamp
>> > >> > > > ---
>> > >> > > > - Sun Feb 16 00:00:00 1997 PST
>> > >> > > > -(1 row)
>> > >> > > > -
>> > >> > > > +ERROR:  unexpected character "/", expected character ":"
>> > >> > > > +HINT:  In FX mode, punctuation in the input string must
>> exactly match the format string.
>> > >> > > >
>> > >> > > > There seemed to be some implicit approvals of this breakage
>> some 30 emails and 10 months ago but given that this is the only change
>> from a correct result to a failure I'd like to officially put it out there
>> for opinion/vote gathering.   Mine is a -1; though keeping the distinction
>> between space and non-alphanumeric characters is expected.
>> > >> > >
>> > >> > > Do I understand correctly that you're -1 to changes to FX mode,
>> but no
>> > >> > > objection to changes in non-FX mode?
>> > >> > >
>> > >> > Ditto.
>> > >>
>> > >> So, if no objections for non-FX mode changes, then I'll extract that
>> > >> part and commit it separately.
>> > >
>> > >
>> > > Yeah, that make sense to me, thank you.
>> >
>> > OK!  I've removed FX changes from the patch.  The result is attached.
>> > I'm going to commit this if no objections.
>>
>> Attached revision fixes usage of two subsequent spaces in the
>> documentation.
>>
>
> So, pushed!  Thanks to every thread participant for review and feedback.
>
> --
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>


-- 


With Regards,

Prabhat Kumar Sahu
Skype ID: prabhat.sahu1984
EnterpriseDB Corporation

The Postgres Database Company


Re: [HACKERS] Bug in to_timestamp().

2018-09-13 Thread Tom Lane
Alexander Korotkov  writes:
> I've closed commitfest entry.  I think we can add new commitfest entry if
> required.  Regarding FX part, it easy to extract it as separate patch, but
> it's hard to find consensus.  I think there are at least three possible
> decisions.

> 1) Change FX mode to require separators to be the same.
> 2) Leave FX mode "as is".
> 3) Introduce GUC variable controlling behavior of FX mode.

> Any thoughts?

A GUC variable is a horrid solution.  Years ago we thought it'd be OK
to have GUCs that change query behavior, but we've regretted it every
time we did that, and often removed them again later (e.g. regex_flavor,
sql_inheritance).  Applications that want to be portable have to contend
with all possible values of the GUC, and that's no fun for anybody.

Given the lack of consensus, it's hard to make a case for breaking
backwards compatibility, so I'd have to vote for option 2.

regards, tom lane



Re: [HACKERS] Bug in to_timestamp().

2018-09-13 Thread Alexander Korotkov
On Tue, Sep 11, 2018 at 6:18 PM Arthur Zakirov 
wrote:

> On Sun, Sep 09, 2018 at 09:52:46PM +0300, Alexander Korotkov wrote:
> > So, pushed!  Thanks to every thread participant for review and feedback.
>
> Great! Should we close the commitfest entry? There is FX part of the
> patch though. But it seems that nobody is happy with it. It could be
> done with a separate patch anyway.
>

I've closed commitfest entry.  I think we can add new commitfest entry if
required.  Regarding FX part, it easy to extract it as separate patch, but
it's hard to find consensus.  I think there are at least three possible
decisions.

1) Change FX mode to require separators to be the same.
2) Leave FX mode "as is".
3) Introduce GUC variable controlling behavior of FX mode.

Any thoughts?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Bug in to_timestamp().

2018-09-11 Thread Arthur Zakirov
On Sun, Sep 09, 2018 at 09:52:46PM +0300, Alexander Korotkov wrote:
> So, pushed!  Thanks to every thread participant for review and feedback.

Great! Should we close the commitfest entry? There is FX part of the
patch though. But it seems that nobody is happy with it. It could be
done with a separate patch anyway.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-09-09 Thread Alexander Korotkov
On Thu, Sep 6, 2018 at 3:58 PM Alexander Korotkov 
wrote:

> On Thu, Sep 6, 2018 at 2:40 PM Alexander Korotkov
>  wrote:
> >
> > On Wed, Sep 5, 2018 at 7:28 PM amul sul  wrote:
> > > On Wed, Sep 5, 2018, 6:35 PM Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
> > >> On Wed, Sep 5, 2018 at 3:10 PM amul sul  wrote:
> > >> > On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
> > >> >  wrote:
> > >> > > On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
> > >> > >  wrote:
> > >> > > > From those results the question is how important is it to force
> the following breakage on our users (i.e., introduce FX exact symbol
> matching):
> > >> > > >
> > >> > > > SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
> > >> > > > - to_timestamp
> > >> > > > ---
> > >> > > > - Sun Feb 16 00:00:00 1997 PST
> > >> > > > -(1 row)
> > >> > > > -
> > >> > > > +ERROR:  unexpected character "/", expected character ":"
> > >> > > > +HINT:  In FX mode, punctuation in the input string must
> exactly match the format string.
> > >> > > >
> > >> > > > There seemed to be some implicit approvals of this breakage
> some 30 emails and 10 months ago but given that this is the only change
> from a correct result to a failure I'd like to officially put it out there
> for opinion/vote gathering.   Mine is a -1; though keeping the distinction
> between space and non-alphanumeric characters is expected.
> > >> > >
> > >> > > Do I understand correctly that you're -1 to changes to FX mode,
> but no
> > >> > > objection to changes in non-FX mode?
> > >> > >
> > >> > Ditto.
> > >>
> > >> So, if no objections for non-FX mode changes, then I'll extract that
> > >> part and commit it separately.
> > >
> > >
> > > Yeah, that make sense to me, thank you.
> >
> > OK!  I've removed FX changes from the patch.  The result is attached.
> > I'm going to commit this if no objections.
>
> Attached revision fixes usage of two subsequent spaces in the
> documentation.
>

So, pushed!  Thanks to every thread participant for review and feedback.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Bug in to_timestamp().

2018-09-06 Thread Alexander Korotkov
On Thu, Sep 6, 2018 at 2:40 PM Alexander Korotkov
 wrote:
>
> On Wed, Sep 5, 2018 at 7:28 PM amul sul  wrote:
> > On Wed, Sep 5, 2018, 6:35 PM Alexander Korotkov  
> > wrote:
> >> On Wed, Sep 5, 2018 at 3:10 PM amul sul  wrote:
> >> > On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
> >> >  wrote:
> >> > > On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
> >> > >  wrote:
> >> > > > From those results the question is how important is it to force the 
> >> > > > following breakage on our users (i.e., introduce FX exact symbol 
> >> > > > matching):
> >> > > >
> >> > > > SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
> >> > > > - to_timestamp
> >> > > > ---
> >> > > > - Sun Feb 16 00:00:00 1997 PST
> >> > > > -(1 row)
> >> > > > -
> >> > > > +ERROR:  unexpected character "/", expected character ":"
> >> > > > +HINT:  In FX mode, punctuation in the input string must exactly 
> >> > > > match the format string.
> >> > > >
> >> > > > There seemed to be some implicit approvals of this breakage some 30 
> >> > > > emails and 10 months ago but given that this is the only change from 
> >> > > > a correct result to a failure I'd like to officially put it out 
> >> > > > there for opinion/vote gathering.   Mine is a -1; though keeping the 
> >> > > > distinction between space and non-alphanumeric characters is 
> >> > > > expected.
> >> > >
> >> > > Do I understand correctly that you're -1 to changes to FX mode, but no
> >> > > objection to changes in non-FX mode?
> >> > >
> >> > Ditto.
> >>
> >> So, if no objections for non-FX mode changes, then I'll extract that
> >> part and commit it separately.
> >
> >
> > Yeah, that make sense to me, thank you.
>
> OK!  I've removed FX changes from the patch.  The result is attached.
> I'm going to commit this if no objections.

Attached revision fixes usage of two subsequent spaces in the documentation.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-to-timestamp-format-checking-v19.patch
Description: Binary data


Re: [HACKERS] Bug in to_timestamp().

2018-09-06 Thread Alexander Korotkov
On Wed, Sep 5, 2018 at 7:28 PM amul sul  wrote:
> On Wed, Sep 5, 2018, 6:35 PM Alexander Korotkov  
> wrote:
>> On Wed, Sep 5, 2018 at 3:10 PM amul sul  wrote:
>> > On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
>> >  wrote:
>> > > On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
>> > >  wrote:
>> > > > From those results the question is how important is it to force the 
>> > > > following breakage on our users (i.e., introduce FX exact symbol 
>> > > > matching):
>> > > >
>> > > > SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
>> > > > - to_timestamp
>> > > > ---
>> > > > - Sun Feb 16 00:00:00 1997 PST
>> > > > -(1 row)
>> > > > -
>> > > > +ERROR:  unexpected character "/", expected character ":"
>> > > > +HINT:  In FX mode, punctuation in the input string must exactly match 
>> > > > the format string.
>> > > >
>> > > > There seemed to be some implicit approvals of this breakage some 30 
>> > > > emails and 10 months ago but given that this is the only change from a 
>> > > > correct result to a failure I'd like to officially put it out there 
>> > > > for opinion/vote gathering.   Mine is a -1; though keeping the 
>> > > > distinction between space and non-alphanumeric characters is expected.
>> > >
>> > > Do I understand correctly that you're -1 to changes to FX mode, but no
>> > > objection to changes in non-FX mode?
>> > >
>> > Ditto.
>>
>> So, if no objections for non-FX mode changes, then I'll extract that
>> part and commit it separately.
>
>
> Yeah, that make sense to me, thank you.

OK!  I've removed FX changes from the patch.  The result is attached.
I'm going to commit this if no objections.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-to-timestamp-format-checking-v18.patch
Description: Binary data


Re: [HACKERS] Bug in to_timestamp().

2018-09-05 Thread amul sul
On Wed, Sep 5, 2018, 6:35 PM Alexander Korotkov 
wrote:

> On Wed, Sep 5, 2018 at 3:10 PM amul sul  wrote:
> > On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
> >  wrote:
> > > On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
> > >  wrote:
> > > > From those results the question is how important is it to force the
> following breakage on our users (i.e., introduce FX exact symbol matching):
> > > >
> > > > SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
> > > > - to_timestamp
> > > > ---
> > > > - Sun Feb 16 00:00:00 1997 PST
> > > > -(1 row)
> > > > -
> > > > +ERROR:  unexpected character "/", expected character ":"
> > > > +HINT:  In FX mode, punctuation in the input string must exactly
> match the format string.
> > > >
> > > > There seemed to be some implicit approvals of this breakage some 30
> emails and 10 months ago but given that this is the only change from a
> correct result to a failure I'd like to officially put it out there for
> opinion/vote gathering.   Mine is a -1; though keeping the distinction
> between space and non-alphanumeric characters is expected.
> > >
> > > Do I understand correctly that you're -1 to changes to FX mode, but no
> > > objection to changes in non-FX mode?
> > >
> > Ditto.
>
> So, if no objections for non-FX mode changes, then I'll extract that
> part and commit it separately.
>

Yeah, that make sense to me, thank you.

Regards,
Amul

>


Re: [HACKERS] Bug in to_timestamp().

2018-09-05 Thread amul sul
On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
 wrote:
>
> On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
>  wrote:
> > On Mon, Sep 3, 2018 at 2:30 PM, Alexander Korotkov 
> >  wrote:
> >>
> >> The current version of patch doesn't really distinguish spaces and
> >> delimiters in format string in non-FX mode.  So, spaces and delimiters
> >> are forming single group.  For me Oracle behavior is ridiculous at
> >> least because it doesn't allow cases when input string exactly matches
> >> format string.
> >>
> >> This one fails:
> >> SELECT to_timestamp('2018- -01 02', '- -MM DD') FROM dual
> >
> > Related to below - since this works today it should continue to work.  I 
> > was under the wrong impression that we followed their behavior today and 
> > were pondering deviating from it because of its ridiculousness.
>
> Right, that's just incompatibility with Oracle behavior, not with our
> previous behavior.
>
> >> > The couple of regression tests that change do so for the better.  It 
> >> > would be illuminating to set this up as two patches though, one 
> >> > introducing all of the new regression tests against the current code and 
> >> > then a second patch with the changed behavior with only the affected 
> >> > tests.
> >>
> >> OK, here you go.  0001-to_timestamp-regression-test-v17.patch
> >> introduces changes in regression tests and their output for current
> >> master, while 0002-to_timestamp-format-checking-v17.patch contain
> >> changes to to_timestamp itself.
> >>
> >
> > From those results the question is how important is it to force the 
> > following breakage on our users (i.e., introduce FX exact symbol matching):
> >
> > SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
> > - to_timestamp
> > ---
> > - Sun Feb 16 00:00:00 1997 PST
> > -(1 row)
> > -
> > +ERROR:  unexpected character "/", expected character ":"
> > +HINT:  In FX mode, punctuation in the input string must exactly match the 
> > format string.
> >
> > There seemed to be some implicit approvals of this breakage some 30 emails 
> > and 10 months ago but given that this is the only change from a correct 
> > result to a failure I'd like to officially put it out there for 
> > opinion/vote gathering.   Mine is a -1; though keeping the distinction 
> > between space and non-alphanumeric characters is expected.
>
> Do I understand correctly that you're -1 to changes to FX mode, but no
> objection to changes in non-FX mode?
>
Ditto.

Regards,
Amul Sul.



Re: [HACKERS] Bug in to_timestamp().

2018-09-05 Thread Alexander Korotkov
On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
 wrote:
> On Mon, Sep 3, 2018 at 2:30 PM, Alexander Korotkov 
>  wrote:
>>
>> The current version of patch doesn't really distinguish spaces and
>> delimiters in format string in non-FX mode.  So, spaces and delimiters
>> are forming single group.  For me Oracle behavior is ridiculous at
>> least because it doesn't allow cases when input string exactly matches
>> format string.
>>
>> This one fails:
>> SELECT to_timestamp('2018- -01 02', '- -MM DD') FROM dual
>
> Related to below - since this works today it should continue to work.  I was 
> under the wrong impression that we followed their behavior today and were 
> pondering deviating from it because of its ridiculousness.

Right, that's just incompatibility with Oracle behavior, not with our
previous behavior.

>> > The couple of regression tests that change do so for the better.  It would 
>> > be illuminating to set this up as two patches though, one introducing all 
>> > of the new regression tests against the current code and then a second 
>> > patch with the changed behavior with only the affected tests.
>>
>> OK, here you go.  0001-to_timestamp-regression-test-v17.patch
>> introduces changes in regression tests and their output for current
>> master, while 0002-to_timestamp-format-checking-v17.patch contain
>> changes to to_timestamp itself.
>>
>
> From those results the question is how important is it to force the following 
> breakage on our users (i.e., introduce FX exact symbol matching):
>
> SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
> - to_timestamp
> ---
> - Sun Feb 16 00:00:00 1997 PST
> -(1 row)
> -
> +ERROR:  unexpected character "/", expected character ":"
> +HINT:  In FX mode, punctuation in the input string must exactly match the 
> format string.
>
> There seemed to be some implicit approvals of this breakage some 30 emails 
> and 10 months ago but given that this is the only change from a correct 
> result to a failure I'd like to officially put it out there for opinion/vote 
> gathering.   Mine is a -1; though keeping the distinction between space and 
> non-alphanumeric characters is expected.

Do I understand correctly that you're -1 to changes to FX mode, but no
objection to changes in non-FX mode?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-09-04 Thread David G. Johnston
On Mon, Sep 3, 2018 at 2:30 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> The current version of patch doesn't really distinguish spaces and
> delimiters in format string in non-FX mode.  So, spaces and delimiters
> are forming single group.  For me Oracle behavior is ridiculous at
> least because it doesn't allow cases when input string exactly matches
> format string.
>
> This one fails:
> SELECT to_timestamp('2018- -01 02', '- -MM DD') FROM dual
>

Related to below - since this works today it should continue to work.  I
was under the wrong impression that we followed their behavior today and
were pondering deviating from it because of its ridiculousness.


> > The couple of regression tests that change do so for the better.  It
> would be illuminating to set this up as two patches though, one introducing
> all of the new regression tests against the current code and then a second
> patch with the changed behavior with only the affected tests.
>
> OK, here you go.  0001-to_timestamp-regression-test-v17.patch
> introduces changes in regression tests and their output for current
> master, while 0002-to_timestamp-format-checking-v17.patch contain
> changes to to_timestamp itself.
>
>
>From those results the question is how important is it to force the
following breakage on our users (i.e., introduce FX exact symbol matching):

SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
- to_timestamp
---
- Sun Feb 16 00:00:00 1997 PST
-(1 row)
-
+ERROR:  unexpected character "/", expected character ":"
+HINT:  In FX mode, punctuation in the input string must exactly match the
format string.

There seemed to be some implicit approvals of this breakage some 30 emails
and 10 months ago but given that this is the only change from a correct
result to a failure I'd like to officially put it out there for
opinion/vote gathering.   Mine is a -1; though keeping the distinction
between space and non-alphanumeric characters is expected.

David J.


Re: [HACKERS] Bug in to_timestamp().

2018-09-03 Thread Alexander Korotkov
Hi!

Sorry for very long reply.

On Thu, Aug 16, 2018 at 11:44 PM David G. Johnston
 wrote:
> If the new behavior is an error I don't really have a problem since the need 
> to fix one's queries will be obvious.
>
> "So length of last group of spaces/separators in the pattern should be
> greater or equal to length of spaces/separators in the input string.
> Other previous groups are ignored in Oracle.  And that seems
> ridiculous for me."
>
> What do you believe should (or does) happen?  Multiple groups always fail or 
> something else?  How does this interplay with the detection of the negative 
> timezone offset?  I'm not finding the behavior ridiculous at first blush; not 
> to the extent to avoid emulating it in a function whose purpose is emulation. 
>  Being more lenient than Oracle seems undesirable.  Regardless of the choice 
> made here it should be memorialized in the regression tests.

The current version of patch doesn't really distinguish spaces and
delimiters in format string in non-FX mode.  So, spaces and delimiters
are forming single group.  For me Oracle behavior is ridiculous at
least because it doesn't allow cases when input string exactly matches
format string.

This one fails:
SELECT to_timestamp('2018- -01 02', '- -MM DD') FROM dual

But both this two are working:
SELECT to_timestamp('2018- -01 02', '---MM DD') FROM dual
SELECT to_timestamp('2018- -01 02', '   MM DD') FROM dual

Regarding TZH, Oracle takes into account total number of characters
between placeholders as we do.  So, there is no change in this aspect.

> The couple of regression tests that change do so for the better.  It would be 
> illuminating to set this up as two patches though, one introducing all of the 
> new regression tests against the current code and then a second patch with 
> the changed behavior with only the affected tests.

OK, here you go.  0001-to_timestamp-regression-test-v17.patch
introduces changes in regression tests and their output for current
master, while 0002-to_timestamp-format-checking-v17.patch contain
changes to to_timestamp itself.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 63e39198e68..e585b07c18c 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FM/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT to_timestamp('2011$03!18 23_38_15', '-MM-DD HH24:MI:SS');
+ to_timestamp 
+--
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', ' FMMonth DD');
  to_timestamp 
 --
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', ' "FMMonth" DD');
+ to_timestamp 
+--
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', ' \\ DD');
+ to_timestamp 
+--
+ Wed Jan 02 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-'"My birthday-> Year" , "Month:" FMMonth, "Day:" DD');
+'"My birthday-> Year:" , "Month:" FMMonth, "Day:" DD');
  to_timestamp 
 --
  Sun May 16 00:00:00 1976 PDT
@@ -2789,7 +2807,7 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-E'HH24 "\\text between quote marks\\"" YY MI SS');
+E'HH24 "\\"text between quote marks\\"" YY MI SS');
  to_timestamp 
 --
  Thu Jan 01 15:54:45 1998 PST
@@ -2810,6 +2828,24 @@ SELECT to_timestamp('2000January09Sunday', 'FMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+ to_timestamp 
+--
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ to_timestamp 
+--
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+ to_timestamp 
+--
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'MMDD');
  to_timestamp 
 --
@@ -2999,6 +3035,48 @@ SELECT to_timestamp('2011-12-18  23:38:15', '-MM-DD   HH24:MI:SS');
  Sun Dec 18 03:38:15 2011 PST
 (1 row)
 
+SELECT to_timestamp('2000+   JUN', '/MON');
+ to_timestamp  

Re: [HACKERS] Bug in to_timestamp().

2018-08-16 Thread David G. Johnston
On Thu, Aug 16, 2018 at 3:53 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Hi, David!
>
> You can notice that:
>
> 1) We identified downside of changes in to_timestamp() function and
> documented them [1].
> 2) We found 4 more differences between between patches behavior and
> Oracle behavior [2][3].  One of them was assumed to be ridiculous and
> wasn't implemented.  So, I think this answers your original concern
> that we shouldn't copy Oracle behavior bug to bug.  So, it's depends
> on particular case.  For me, if function was introduced for Oracle
> compatibility, then it's OK to copy aspects of Oracle behavior that
> look reasonable.  But if aspect doesn't look reasonable, then we don't
> copy it.
>
> Do you have any feedback on current state of patch?
>
> 1. https://www.postgresql.org/message-id/20180723141254.
> GA10168%40zakirov.localdomain
> 2. https://www.postgresql.org/message-id/CAPpHfdtqOSniGJRvJ2zaaE8%
> 3DeMB8XDnzvVS-9c3Xufaw%3DiPA%2BQ%40mail.gmail.com
> 3. https://www.postgresql.org/message-id/CAPpHfdso_Yvbo-
> EXKD8t3cuAeR7wszPyuWNBdjQLi1NrMt3O5w%40mail.gmail.com


If the new behavior is an error I don't really have a problem since the
need to fix one's queries will be obvious.

"So length of last group of spaces/separators in the pattern should be
greater or equal to length of spaces/separators in the input string.
Other previous groups are ignored in Oracle.  And that seems
ridiculous for me."

What do you believe should (or does) happen?  Multiple groups always fail
or something else?  How does this interplay with the detection of the
negative timezone offset?  I'm not finding the behavior ridiculous at first
blush; not to the extent to avoid emulating it in a function whose purpose
is emulation.  Being more lenient than Oracle seems undesirable.
Regardless of the choice made here it should be memorialized in the
regression tests.

The couple of regression tests that change do so for the better.  It would
be illuminating to set this up as two patches though, one introducing all
of the new regression tests against the current code and then a second
patch with the changed behavior with only the affected tests.

David J.


Re: [HACKERS] Bug in to_timestamp().

2018-08-16 Thread Alexander Korotkov
On Thu, Aug 16, 2018 at 4:57 PM Liudmila Mantrova
 wrote:
> On 08/14/2018 06:38 PM, Alexander Korotkov wrote:
> > On Thu, Aug 2, 2018 at 9:06 PM Alexander Korotkov
> >  wrote:
> > BTW, I've also revised documentation and regression tests.  Patch is 
> > attached.
> >
> Please consider some further documentation improvements in the attached
> patch.

Thank you very much.  Your editing looks good for me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-08-16 Thread Liudmila Mantrova


On 08/14/2018 06:38 PM, Alexander Korotkov wrote:

On Thu, Aug 2, 2018 at 9:06 PM Alexander Korotkov
 wrote:

On Thu, Aug 2, 2018 at 6:17 PM Alexander Korotkov
 wrote:

After some experiments I found that when you mix spaces and separators
between two fields, then Oracle takes into account only length of last
group of spaces/separators.

# SELECT to_timestamp('2018- -01 02', '   --- --MM-DD') FROM
dual2018-01-01 00:00:00 -10:00
(length of last spaces/separators group is 2)

# SELECT to_timestamp('2018- -01 02', '   --- --MM-DD') FROM dual
2018-01-01 00:00:00 -10:00
(length of last spaces/separators group is 3)

# SELECT to_timestamp('2018- -01 02', '   -- ---MM-DD') FROM dual
02.01.2018 00:00:00
(length of last spaces/separators group is 2)

Ooops... I'm sorry, but I've posted wrong results here.  Correct
version is here.

# SELECT to_timestamp('2018- -01 02', '   --- --MM-DD') FROM dual
ORA-01843: not a valid month
(length of last spaces/separators group is 2)

# SELECT to_timestamp('2018- -01 02', '   -- ---MM-DD') FROM dual
02.01.2018 00:00:00
(length of last spaces/separators group is 3)

So length of last group of spaces/separators in the pattern should be
greater or equal to length of spaces/separators in the input string.
Other previous groups are ignored in Oracle.  And that seems
ridiculous for me.

BTW, I've also revised documentation and regression tests.  Patch is attached.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Hi,
Please consider some further documentation improvements in the attached 
patch.


--
Liudmila Mantrova
Technical writer at Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index edc9be9..a8bbafc 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6262,11 +6262,12 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
  
   
to_timestamp and to_date
-   skip multiple blank spaces in the input string unless the
-   FX option is used. For example,
-   to_timestamp('2000JUN', ' MON') works, but
+   skip multiple blank spaces at the beginning of the input string and
+   around date and time values unless the FX option is used.  For example,
+   to_timestamp('2000JUN', ' MON') and
+   to_timestamp('2000 - JUN', '-MON') work, but
to_timestamp('2000JUN', 'FX MON') returns an error
-   because to_timestamp expects one space only.
+   because to_timestamp expects a single space only.
FX must be specified as the first item in
the template.
   
@@ -6274,6 +6275,43 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
 
  
   
+   A separator (a space or a non-letter/non-digit character) in the template string of
+   to_timestamp and to_date
+   matches any single separator in the input string or is skipped,
+   unless the FX option is used.
+   For example, to_timestamp('2000JUN', '///MON') and
+   to_timestamp('2000/JUN', ' MON') work, but
+   to_timestamp('2000//JUN', '/MON')
+   returns an error because the number of separators in the input string
+   exceeds the number of separators in the template.
+  
+  
+   If FX is specified, separators in the
+   input and template strings must match exactly. For example,
+   to_timestamp('2000/JUN', 'FX MON')
+   returns an error because a space is expected in the input string.
+  
+ 
+
+ 
+  
+   TZH template pattern can match a signed number.
+   Without the FX option, it may lead to ambiguity in
+   interpretation of the minus sign, which can also be interpreted as a separator.
+   This ambiguity is resolved as follows.  If the number of separators before
+   TZH in the template string is less than the number of
+   separators before the minus sign in the input string, the minus sign
+   is interpreted as part of TZH.
+   Otherwise, the minus sign is considered to be a separator between values.
+   For example, to_timestamp('2000 -10', ' TZH') matches
+   -10 to TZH, but
+   to_timestamp('2000 -10', 'TZH')
+   matches 10 to TZH.
+  
+ 
+
+ 
+  
Ordinary text is allowed in to_char
templates and will be output literally.  You can put a substring
in double quotes to force it to be interpreted as literal text
@@ -6287,6 +6325,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
string; for example "XX" skips two input characters
(whether or not they are XX).
   
+  
+
+  Prior to PostgreSQL 12, it was possible to
+  skip arbitrary text in the input string using non-letter or non-digit
+  characters. For example,
+  to_timestamp('2000y6m1d', 

Re: [HACKERS] Bug in to_timestamp().

2018-08-16 Thread Alexander Korotkov
Hi, David!

You can notice that:

1) We identified downside of changes in to_timestamp() function and
documented them [1].
2) We found 4 more differences between between patches behavior and
Oracle behavior [2][3].  One of them was assumed to be ridiculous and
wasn't implemented.  So, I think this answers your original concern
that we shouldn't copy Oracle behavior bug to bug.  So, it's depends
on particular case.  For me, if function was introduced for Oracle
compatibility, then it's OK to copy aspects of Oracle behavior that
look reasonable.  But if aspect doesn't look reasonable, then we don't
copy it.

Do you have any feedback on current state of patch?

1. 
https://www.postgresql.org/message-id/20180723141254.GA10168%40zakirov.localdomain
2. 
https://www.postgresql.org/message-id/CAPpHfdtqOSniGJRvJ2zaaE8%3DeMB8XDnzvVS-9c3Xufaw%3DiPA%2BQ%40mail.gmail.com
3. 
https://www.postgresql.org/message-id/CAPpHfdso_Yvbo-EXKD8t3cuAeR7wszPyuWNBdjQLi1NrMt3O5w%40mail.gmail.com


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-08-15 Thread Arthur Zakirov
On Tue, Aug 14, 2018 at 06:38:56PM +0300, Alexander Korotkov wrote:
> BTW, I've also revised documentation and regression tests.  Patch is attached.

I looked at the patch. It applies without errors.

The document looks good. It compiles.
The code looks good too. It compiles and tests are passed.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-08-14 Thread Alexander Korotkov
On Thu, Aug 2, 2018 at 9:06 PM Alexander Korotkov
 wrote:
> On Thu, Aug 2, 2018 at 6:17 PM Alexander Korotkov
>  wrote:
> > After some experiments I found that when you mix spaces and separators
> > between two fields, then Oracle takes into account only length of last
> > group of spaces/separators.
> >
> > # SELECT to_timestamp('2018- -01 02', '   --- --MM-DD') FROM
> > dual2018-01-01 00:00:00 -10:00
> > (length of last spaces/separators group is 2)
> >
> > # SELECT to_timestamp('2018- -01 02', '   --- --MM-DD') FROM dual
> > 2018-01-01 00:00:00 -10:00
> > (length of last spaces/separators group is 3)
> >
> > # SELECT to_timestamp('2018- -01 02', '   -- ---MM-DD') FROM dual
> > 02.01.2018 00:00:00
> > (length of last spaces/separators group is 2)
>
> Ooops... I'm sorry, but I've posted wrong results here.  Correct
> version is here.
>
> # SELECT to_timestamp('2018- -01 02', '   --- --MM-DD') FROM dual
> ORA-01843: not a valid month
> (length of last spaces/separators group is 2)
>
> # SELECT to_timestamp('2018- -01 02', '   -- ---MM-DD') FROM dual
> 02.01.2018 00:00:00
> (length of last spaces/separators group is 3)
>
> So length of last group of spaces/separators in the pattern should be
> greater or equal to length of spaces/separators in the input string.
> Other previous groups are ignored in Oracle.  And that seems
> ridiculous for me.

BTW, I've also revised documentation and regression tests.  Patch is attached.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-to-timestamp-format-checking-v15.patch
Description: Binary data


Re: [HACKERS] Bug in to_timestamp().

2018-08-02 Thread Arthur Zakirov
On Thu, Aug 02, 2018 at 06:17:01PM +0300, Alexander Korotkov wrote:
> So, if number of spaces/separators between fields in input string is
> more than in format string and list space/separator skipped is minus
> sign, then it decides to glue that minus sign to TZH.  I think this
> behavior is nice to follow, because it allows us to skip spaces after
> fields.

Good approach! With this change the following queries work now:

=# SELECT to_timestamp('2000 + JUN', ' MON');
=# SELECT to_timestamp('2000 +JUN', ' MON');

I think it is worth to add them to regression tests.

> 4) Spaces and separators in format string seem to be handled in the
> same way in non-FX mode.  But strange things happen when you mix
> spaces and separators there.
> ...
> 1+2+3 are implemented in the attached patch, but not 4.  I think that
> it's only worth to follow Oracle when it does reasonable things.

I agree with you. I think it isn't worth to copy this behaviour.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-08-02 Thread Alexander Korotkov
On Thu, Aug 2, 2018 at 6:17 PM Alexander Korotkov
 wrote:
> After some experiments I found that when you mix spaces and separators
> between two fields, then Oracle takes into account only length of last
> group of spaces/separators.
>
> # SELECT to_timestamp('2018- -01 02', '   --- --MM-DD') FROM
> dual2018-01-01 00:00:00 -10:00
> (length of last spaces/separators group is 2)
>
> # SELECT to_timestamp('2018- -01 02', '   --- --MM-DD') FROM dual
> 2018-01-01 00:00:00 -10:00
> (length of last spaces/separators group is 3)
>
> # SELECT to_timestamp('2018- -01 02', '   -- ---MM-DD') FROM dual
> 02.01.2018 00:00:00
> (length of last spaces/separators group is 2)

Ooops... I'm sorry, but I've posted wrong results here.  Correct
version is here.

# SELECT to_timestamp('2018- -01 02', '   --- --MM-DD') FROM dual
ORA-01843: not a valid month
(length of last spaces/separators group is 2)

# SELECT to_timestamp('2018- -01 02', '   -- ---MM-DD') FROM dual
02.01.2018 00:00:00
(length of last spaces/separators group is 3)

So length of last group of spaces/separators in the pattern should be
greater or equal to length of spaces/separators in the input string.
Other previous groups are ignored in Oracle.  And that seems
ridiculous for me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-08-02 Thread Alexander Korotkov
On Wed, Aug 1, 2018 at 3:17 PM Arthur Zakirov  wrote:
> On Mon, Jul 23, 2018 at 05:21:43PM +0300, Alexander Korotkov wrote:
> > Thank you, Arthur.  These examples shows downside of this patch, where
> > users may be faced with incompatibility.  But it's good that this
> > situation can be handled by altering format string.  I think these
> > examples should be added to the documentation and highlighted in
> > release notes.
>
> I updated the documentation. I added a tip text which explains
> how to_timestamp() and to_date() handled ordinary text prior to
> PostgreSQL 11 and how they should handle ordinary text now.
>
> There is now changes in the code and tests.

Thank you, Arthur!

I made following observations on Oracle behavior.
1) Oracle also supports parsing TZH in to_timestamp_tz() function.
Previously, in order to handle TZH (which could be negative) we
decided to skip spaces before fields, but not after fields [1][2].
That leads to behavior, which is both incompatible with Oracle and
unlogical (at least in my opinion).  But after exploration of
to_timestamp_tz() behavior I found that Oracle can distinguish minus
sign used as separator from minus sign in TZH field.

# select to_char(to_timestamp_tz('2018-01-01 -10', ' MM DD  TZH'),
'-MM-DD HH24:MI:SS TZH:TZM') from dual
2018-01-01 00:00:00 +10:00

# select to_char(to_timestamp_tz('2018-01-01--10', ' MM DD  TZH'),
'-MM-DD HH24:MI:SS TZH:TZM') from dual
2018-01-01 00:00:00 +10:00

# select to_char(to_timestamp_tz('2018-01-01  -10', ' MM DD
TZH'), '-MM-DD HH24:MI:SS TZH:TZM') from dual
2018-01-01 00:00:00 -10:00

# select to_char(to_timestamp_tz('2018-01-01---10', ' MM DD
TZH'), '-MM-DD HH24:MI:SS TZH:TZM') from dual
2018-01-01 00:00:00 -10:00

So, if number of spaces/separators between fields in input string is
more than in format string and list space/separator skipped is minus
sign, then it decides to glue that minus sign to TZH.  I think this
behavior is nice to follow, because it allows us to skip spaces after
fields.

2) It appears that Oracle skips spaces not only before fields, but
also in the beginning of the input string.

SELECT to_timestamp(' -2018', ' ') FROM dual
# 01.08.2018 00:00:00

In the example given it seems that first space is skipped, while minus
sign is matched to space.

3) When multiple separators are specified in the format string, Oracle
doesn't allow skipping arbitrary number of spaces before each
separator as we did.

# SELECT to_timestamp('2018- -01 02', '--MM-DD') FROM dual
ORA-01843: not a valid month

4) Spaces and separators in format string seem to be handled in the
same way in non-FX mode.  But strange things happen when you mix
spaces and separators there.

# SELECT to_timestamp('2018- -01 02', '---MM-DD') FROM dual
02.01.2018 00:00:00

# SELECT to_timestamp('2018- -01 02', '   MM-DD') FROM dual
02.01.2018 00:00:00

# SELECT to_timestamp('2018- -01 02', '- -MM-DD') FROM dual
ORA-01843: not a valid month

After some experiments I found that when you mix spaces and separators
between two fields, then Oracle takes into account only length of last
group of spaces/separators.

# SELECT to_timestamp('2018- -01 02', '   --- --MM-DD') FROM
dual2018-01-01 00:00:00 -10:00
(length of last spaces/separators group is 2)

# SELECT to_timestamp('2018- -01 02', '   --- --MM-DD') FROM dual
2018-01-01 00:00:00 -10:00
(length of last spaces/separators group is 3)

# SELECT to_timestamp('2018- -01 02', '   -- ---MM-DD') FROM dual
02.01.2018 00:00:00
(length of last spaces/separators group is 2)

1+2+3 are implemented in the attached patch, but not 4.  I think that
it's only worth to follow Oracle when it does reasonable things.

I also plan to revise documentation and regression tests in the patch.
But I wanted to share my results so that everybody can give a feedback
if any.

1. 
https://www.postgresql.org/message-id/CAEepm%3D1Y7z1VSisBKxa6x3E-jP-%2B%3DrWfw25q_PH2nGjqVcX-rw%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/20180112125848.GA32559%40zakirov.localdomain--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-to-timestamp-format-checking-v14.patch
Description: Binary data


Re: [HACKERS] Bug in to_timestamp().

2018-08-01 Thread Arthur Zakirov
Hello, Alexander,

On Mon, Jul 23, 2018 at 05:21:43PM +0300, Alexander Korotkov wrote:
> Thank you, Arthur.  These examples shows downside of this patch, where
> users may be faced with incompatibility.  But it's good that this
> situation can be handled by altering format string.  I think these
> examples should be added to the documentation and highlighted in
> release notes.

I updated the documentation. I added a tip text which explains
how to_timestamp() and to_date() handled ordinary text prior to
PostgreSQL 11 and how they should handle ordinary text now.

There is now changes in the code and tests.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index edc9be92a6..db1da7bee7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6264,7 +6264,8 @@ SELECT regexp_match('abc01234xyz', 
'(?:(.*?)(\d+)(.*)){1,1}');
to_timestamp and to_date
skip multiple blank spaces in the input string unless the
FX option is used. For example,
-   to_timestamp('2000JUN', ' 
MON') works, but
+   to_timestamp('2000JUN', ' 
MON') and
+   to_timestamp('2000JUN', 
'MON') work, but
to_timestamp('2000JUN', 'FX 
MON') returns an error
because to_timestamp expects one space only.
FX must be specified as the first item in
@@ -6272,6 +6273,19 @@ SELECT regexp_match('abc01234xyz', 
'(?:(.*?)(\d+)(.*)){1,1}');
   
  
 
+ 
+  
+   to_timestamp and to_date don't
+   skip multiple non letter and non digit characters in the input string,
+   but skip them in the formatting string. For example,
+   to_timestamp('2000-JUN', '/MON') and
+   to_timestamp('2000/JUN', '//MON') work, but
+   to_timestamp('2000//JUN', '/MON')
+   returns an error because count of the "/" character in the input string
+   doesn't match count of it in the formatting string.
+  
+ 
+
  
   
Ordinary text is allowed in to_char
@@ -6287,6 +6301,20 @@ SELECT regexp_match('abc01234xyz', 
'(?:(.*?)(\d+)(.*)){1,1}');
string; for example "XX" skips two input characters
(whether or not they are XX).
   
+  
+
+  Prior to PostgreSQL 11 it was possible to
+  skip input ordinary text using non letter and non digit characters.
+  For example,
+  to_timestamp('2000y6m1d', '-MM-DD') worked
+  before.  But currently to skip ordinary text in the input string it
+  is necessary to use only letter characters.  For example,
+  to_timestamp('2000y6m1d', 'tMMtDDt') or
+  to_timestamp('2000y6m1d', '"y"MM"m"DD"d"') 
will
+  skip y, m and
+  d now.
+
+  
  
 
  
diff --git a/src/backend/utils/adt/formatting.c 
b/src/backend/utils/adt/formatting.c
index 30696e3575..e7547091cd 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -165,6 +165,8 @@ typedef struct
 #define NODE_TYPE_END  1
 #define NODE_TYPE_ACTION   2
 #define NODE_TYPE_CHAR 3
+#define NODE_TYPE_SEPARATOR4
+#define NODE_TYPE_SPACE5
 
 #define SUFFTYPE_PREFIX1
 #define SUFFTYPE_POSTFIX   2
@@ -955,6 +957,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int 
type);
+static bool is_separator_char(const char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 const KeySuffix *suf, const int *index, int ver, 
NUMDesc *Num);
@@ -1044,6 +1047,16 @@ suff_search(const char *str, const KeySuffix *suf, int 
type)
return NULL;
 }
 
+static bool
+is_separator_char(const char *str)
+{
+   /* ASCII printable character, but not letter or digit */
+   return (*str > 0x20 && *str < 0x7F &&
+   !(*str >= 'A' && *str <= 'Z') &&
+   !(*str >= 'a' && *str <= 'z') &&
+   !(*str >= '0' && *str <= '9'));
+}
+
 /* --
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * --
@@ -1319,7 +1332,14 @@ parse_format(FormatNode *node, const char *str, const 
KeyWord *kw,
if (*str == '\\' && *(str + 1) == '"')
str++;
chlen = pg_mblen(str);
-   n->type = NODE_TYPE_CHAR;
+
+   if (ver == DCH_TYPE && is_separator_char(str))
+   n->type = NODE_TYPE_SEPARATOR;
+   else if (isspace((unsigned char) *str))
+  

Re: [HACKERS] Bug in to_timestamp().

2018-07-23 Thread Alexander Korotkov
On Mon, Jul 23, 2018 at 5:12 PM Arthur Zakirov  wrote:
> On Mon, Jul 23, 2018 at 04:30:43PM +0300, Arthur Zakirov wrote:
> > I looked for some tradeoffs of the patch. I think it could be parsing
> > strings like the following input strings:
> >
> > SELECT TO_TIMESTAMP('2011年5月1日', '-MM-DD');
> > SELECT TO_TIMESTAMP('2011y5m1d', '-MM-DD');
> >
> > HEAD extracts year, month and day from the string. But patched
> > to_timestamp() raises an error. Someone could rely on such behaviour.
> > The patch divides separator characters from letters and digits. And
> > '年' or 'y' are letters here. And so the format string doesn't match the
> > input string.
>
> Sorry, I forgot to mention that the patch can handle this by using
> different format string. You can execute:
>
> =# SELECT TO_TIMESTAMP('2011年5月1日', '年MM月DD日');
>   to_timestamp
> 
>  2011-05-01 00:00:00+04
>
> =# SELECT TO_TIMESTAMP('2011y5m1d', '"y"MM"m"DD"d"');
>   to_timestamp
> 
>  2011-05-01 00:00:00+04
>
> or:
>
> =# SELECT TO_TIMESTAMP('2011y5m1d', 'tMMtDDt');
>   to_timestamp
> 
>  2011-05-01 00:00:00+04

Thank you, Arthur.  These examples shows downside of this patch, where
users may be faced with incompatibility.  But it's good that this
situation can be handled by altering format string.  I think these
examples should be added to the documentation and highlighted in
release notes.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-07-23 Thread Arthur Zakirov
On Mon, Jul 23, 2018 at 04:30:43PM +0300, Arthur Zakirov wrote:
> I looked for some tradeoffs of the patch. I think it could be parsing
> strings like the following input strings:
> 
> SELECT TO_TIMESTAMP('2011年5月1日', '-MM-DD');
> SELECT TO_TIMESTAMP('2011y5m1d', '-MM-DD');
> 
> HEAD extracts year, month and day from the string. But patched
> to_timestamp() raises an error. Someone could rely on such behaviour.
> The patch divides separator characters from letters and digits. And
> '年' or 'y' are letters here. And so the format string doesn't match the
> input string.

Sorry, I forgot to mention that the patch can handle this by using
different format string. You can execute:

=# SELECT TO_TIMESTAMP('2011年5月1日', '年MM月DD日');
  to_timestamp  

 2011-05-01 00:00:00+04

=# SELECT TO_TIMESTAMP('2011y5m1d', '"y"MM"m"DD"d"');
  to_timestamp  

 2011-05-01 00:00:00+04

or:

=# SELECT TO_TIMESTAMP('2011y5m1d', 'tMMtDDt');
  to_timestamp  

 2011-05-01 00:00:00+04

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-07-23 Thread Arthur Zakirov
On Sun, Jul 22, 2018 at 08:22:23AM -0700, David G. Johnston wrote:
> My re-read of the thread the other day left me with a feeling of
> contentment that this was an acceptable change but I also get the feeling
> like I'm missing the downside trade-off too...I was hoping your review
> would help in that regard but as it did not speak to specific
> incompatibilities it has not.

I like more behaviour of the function with the patch. It gives less
unexpected results. For example, the query mentioned above:

SELECT to_timestamp('2011-12-18 23:38:15', '-MM-DD  HH24:MI:SS')

I looked for some tradeoffs of the patch. I think it could be parsing
strings like the following input strings:

SELECT TO_TIMESTAMP('2011年5月1日', '-MM-DD');
SELECT TO_TIMESTAMP('2011y5m1d', '-MM-DD');

HEAD extracts year, month and day from the string. But patched
to_timestamp() raises an error. Someone could rely on such behaviour.
The patch divides separator characters from letters and digits. And
'年' or 'y' are letters here. And so the format string doesn't match the
input string.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-07-22 Thread Alexander Korotkov
On Sun, Jul 22, 2018 at 6:22 PM David G. Johnston
 wrote:
> On Sat, Jul 21, 2018 at 2:34 PM, Alexander Korotkov 
>  wrote:
>>
>> So, as I get from this thread, current patch brings these function very 
>> close to Oracle behavior.  The only divergence found yet is related to 
>> handling of unmatched tails of input strings (Oracle throws an error while 
>> we swallow that silently) [1].  But this issue can be be addressed by a 
>> separate patch.
>>
>> Patch seems to be in a pretty good shape.  So, the question is whether there 
>> is a consensus that we want a backward-incompatible behavior change, which 
>> this patch does.
>>
>> My personal opinion is +1 for committing this, because I see that this patch 
>> takes a lot of community efforts on discussion, coding, review etc.  All 
>> these efforts give a substantial result in a patch, which makes behavior 
>> more Oracle-compatible and (IMHO) more clear.
>>
>> However, in this thread other opinions were expressed.  In particular, David 
>> G. Johnston expressed opinion that we shouldn't change behavior of existing 
>> functions, alternatively we could introduce new functions with new behavior. 
>>  However, I see David doesn't participate this thread for a quite long time.
>
> Been lurking about...I'm still of the opinion that this capability should 
> exist in PostgreSQL as "our" function and not just as an Oracle compatibility.

For sure, we're not going to copy Oracle behavior "bug to bug".  But
when users find our behavior to be confusing, there is nothing wrong
to look for Oracle behavior and accept it if it looks good.

> That said the thing I'm most curious to read is the release note entry that 
> would go along with this change that informs users what will be changing: 
> silently, visibly, or otherwise.

I'm sure that release note entry should directly inform users about
backward-incompatible changes in to_timestamp()/to_date() functions.
Users need to be advised to test their applications before porting
them to new major release of PostgreSQL.

> Knowing how much we (don't) diverge from our chosen "standard" after making 
> this change is important but the change in behavior from current is, IMO, 
> more important in deciding whether this particular change is worth making.
> My re-read of the thread the other day left me with a feeling of contentment 
> that this was an acceptable change but I also get the feeling like I'm 
> missing the downside trade-off too...I was hoping your review would help in 
> that regard but as it did not speak to specific incompatibilities it has not.

So, if I understand you correctly, downside of trade-off are use-cases
when current behavior looks correct, while patched behavior looks
incorrect.  Yes, while looking at this thread we can't find such
use-cases.  Intuitively, they should be present.  But I thought about
that and didn't find it yet...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-07-22 Thread David G. Johnston
On Sat, Jul 21, 2018 at 2:34 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> So, as I get from this thread, current patch brings these function very
> close to Oracle behavior.  The only divergence found yet is related to
> handling of unmatched tails of input strings (Oracle throws an error while
> we swallow that silently) [1].  But this issue can be be addressed by a
> separate patch.
>
> Patch seems to be in a pretty good shape.  So, the question is whether
> there is a consensus that we want a backward-incompatible behavior change,
> which this patch does.
>
> My personal opinion is +1 for committing this, because I see that this
> patch takes a lot of community efforts on discussion, coding, review etc.
> All these efforts give a substantial result in a patch, which makes
> behavior more Oracle-compatible and (IMHO) more clear.
>
> However, in this thread other opinions were expressed.  In
> particular, David G. Johnston expressed opinion that we shouldn't change
> behavior of existing functions, alternatively we could introduce new
> functions with new behavior.  However, I see David doesn't participate this
> thread for a quite long time.
>

​Been lurking about...I'm still of the opinion that this capability should
exist in PostgreSQL as "our" function and not just as an Oracle
compatibility.

That said the thing I'm most curious to read is the release note entry that
would go along with this change that informs users what will be changing:
silently, visibly, or otherwise.  Knowing how much we (don't) diverge from
our chosen "standard" after making this change is important but the change
in behavior from current is, IMO, more important in deciding whether this
particular change is worth making.

My re-read of the thread the other day left me with a feeling of
contentment that this was an acceptable change but I also get the feeling
like I'm missing the downside trade-off too...I was hoping your review
would help in that regard but as it did not speak to specific
incompatibilities it has not.

David J.


Re: [HACKERS] Bug in to_timestamp().

2018-07-21 Thread Alexander Korotkov
On Sun, Jul 8, 2018 at 12:15 AM Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Sat, Jul 7, 2018 at 12:31 AM Tom Lane  wrote:
> > Alexander Korotkov  writes:
> > > I tool a look at this patch.  It looks good for me.  It applies
> > > cleanly on last master, builds without warnings, passes tests.
> > > Functionality seems to be well-covered by documentation and regression
> > > tests.  So, I'm going to commit it if no objections.
> >
> > AFAIR, the argument was mainly over whether we agreed with the proposed
> > behavioral changes.  It seems a bit premature to me to commit given that
> > there's not consensus on that.
>
> Ok.  I've briefly looked to the thread, and I thought there is
> consensus already.  Given your concern, I'll investigate this thread
> in detail and come up with summary.
>

So, I've studies this thread in more details.  Let me share my short
summary.

This thread was started from complaint about confusing behavior of
to_timestamp() function in some cases.  Basically confusing behavior is
related to two issues: interpretation of spaces and separators in both
input string and format string, tolerance to invalid dates and times
(i.e. 2009-06-40 becomes 2009-07-10).  Fix for the second issue was already
committed by Tom Lane (d3cd36a1).  So, the improvement under consideration
is interpretation of spaces and separators.

to_timestamp()/to_date() functions are not part of SQL standard.  So,
unfortunately we can't use standard as a guideline and have to search for
other criteria.  On the one hand to_timestamp()/to_date() is here for quite
long time and there might be existing applications relying on its
behavior.  Changing the behavior of these functions might appear to be a
pain for users.  On the other hand these functions were introduced
basically for Oracle compatibility.  So it would be nice to keep their
behavior as close to Oracle as possible.  On the third hand, if current
behavior of these functions is agreed to be confusing, and new behavior is
agreed to be less confusing and more close to Oracle, then we can accept
it.  If even it would discourage small fraction of users, who are relying
on the behavior which we assume to be confusing, way larger fraction of
users would be encouraged by this change.

So, as I get from this thread, current patch brings these function very
close to Oracle behavior.  The only divergence found yet is related to
handling of unmatched tails of input strings (Oracle throws an error while
we swallow that silently) [1].  But this issue can be be addressed by a
separate patch.

Patch seems to be in a pretty good shape.  So, the question is whether
there is a consensus that we want a backward-incompatible behavior change,
which this patch does.

My personal opinion is +1 for committing this, because I see that this
patch takes a lot of community efforts on discussion, coding, review etc.
All these efforts give a substantial result in a patch, which makes
behavior more Oracle-compatible and (IMHO) more clear.

However, in this thread other opinions were expressed.  In
particular, David G. Johnston expressed opinion that we shouldn't change
behavior of existing functions, alternatively we could introduce new
functions with new behavior.  However, I see David doesn't participate this
thread for a quite long time.

Thus, in the current situation I can propose following.  Let's establish
some period when people can express objections against committing this
patch (reasonable short period, say 1 week).  If no objections were
expressed then commit.  Otherwise, discuss the objections expressed.

1.
https://www.postgresql.org/message-id/CA%2Bq6zcUS3fQGLoeLcuJLtz-gRD6vHqpn1fBe0cnORdx93QtO4w%40mail.gmail.com

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Bug in to_timestamp().

2018-07-07 Thread Alexander Korotkov
On Sat, Jul 7, 2018 at 12:31 AM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > I tool a look at this patch.  It looks good for me.  It applies
> > cleanly on last master, builds without warnings, passes tests.
> > Functionality seems to be well-covered by documentation and regression
> > tests.  So, I'm going to commit it if no objections.
>
> AFAIR, the argument was mainly over whether we agreed with the proposed
> behavioral changes.  It seems a bit premature to me to commit given that
> there's not consensus on that.

Ok.  I've briefly looked to the thread, and I thought there is
consensus already.  Given your concern, I'll investigate this thread
in detail and come up with summary.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-07-06 Thread Tom Lane
Alexander Korotkov  writes:
> I tool a look at this patch.  It looks good for me.  It applies
> cleanly on last master, builds without warnings, passes tests.
> Functionality seems to be well-covered by documentation and regression
> tests.  So, I'm going to commit it if no objections.

AFAIR, the argument was mainly over whether we agreed with the proposed
behavioral changes.  It seems a bit premature to me to commit given that
there's not consensus on that.

regards, tom lane



Re: [HACKERS] Bug in to_timestamp().

2018-07-05 Thread Alexander Korotkov
On Fri, Mar 2, 2018 at 3:52 PM Arthur Zakirov  wrote:
>
> On Fri, Mar 02, 2018 at 01:44:53PM +0100, Dmitry Dolgov wrote:
> > Well, I'm not sure that it's completely unrelated, but I see your point - it
> > makes sense to discuss this and move forward in small steps. So, if there 
> > are
> > no objections I'll mark this patch as ready.
>
> Thank you!

I tool a look at this patch.  It looks good for me.  It applies
cleanly on last master, builds without warnings, passes tests.
Functionality seems to be well-covered by documentation and regression
tests.  So, I'm going to commit it if no objections.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-03-02 Thread Arthur Zakirov
On Fri, Mar 02, 2018 at 01:44:53PM +0100, Dmitry Dolgov wrote:
> Well, I'm not sure that it's completely unrelated, but I see your point - it
> makes sense to discuss this and move forward in small steps. So, if there are
> no objections I'll mark this patch as ready.

Thank you!

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-03-02 Thread Dmitry Dolgov
> On 2 March 2018 at 10:33, Arthur Zakirov  wrote:
>
> The fix you proposed isn't related with the initial report [1] by Amul
> Sul, IMHO. The patch tries to fix that behaviour and additionally tries
> to be more smart in handling and matching separator characters within
> input and format strings. And unfortunately, it partly breaks backward
> compatibility.
>
> Your propose could break backward compatibility too, I think. And in
> different manner than the patch I think. And that's why it needs another
> patch I think and needs an additional decision about breaking backward
> compatibility. All this is IMHO.
>
> 1 - 
> https://www.postgresql.org/message-id/1873520224.1784572.1465833145330.JavaMail.yahoo%40mail.yahoo.com

Well, I'm not sure that it's completely unrelated, but I see your point - it
makes sense to discuss this and move forward in small steps. So, if there are
no objections I'll mark this patch as ready.



Re: [HACKERS] Bug in to_timestamp().

2018-03-02 Thread Arthur Zakirov
On Fri, Mar 02, 2018 at 12:58:57AM +0100, Dmitry Dolgov wrote:
> > On 18 February 2018 at 18:49, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > > On 17 February 2018 at 10:02, Arthur Zakirov 
> wrote:
> > > ...
> > > I think it could be fixed by another patch. But I'm not sure that it
> > > will be accepted as well as this patch :).
> >
> > Why do you think there should be another patch for it?
> 
> Just to make myself clear. I think it's fair enough to mark this patch as
> ready
> for committer - the implementation is neat and sufficient as for me, and it
> provides a visible improvement for user experience. The question above is
> probably the only thing that prevents me from proposing this.

The fix you proposed isn't related with the initial report [1] by Amul
Sul, IMHO. The patch tries to fix that behaviour and additionally tries
to be more smart in handling and matching separator characters within
input and format strings. And unfortunately, it partly breaks backward
compatibility.

Your propose could break backward compatibility too, I think. And in
different manner than the patch I think. And that's why it needs another
patch I think and needs an additional decision about breaking backward
compatibility. All this is IMHO.


1 - 
https://www.postgresql.org/message-id/1873520224.1784572.1465833145330.JavaMail.yahoo%40mail.yahoo.com

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-03-01 Thread Dmitry Dolgov
> On 18 February 2018 at 18:49, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > On 17 February 2018 at 10:02, Arthur Zakirov 
wrote:
> > > On Wed, Feb 14, 2018 at 05:23:53PM +0100, Dmitry Dolgov wrote:
> > >
> > > SELECT to_timestamp('2000 + JUN', ' /') FROM dual
> > > ORA-01830: date format picture ends before converting entire input
string
> > >
> > > SELECT to_timestamp('2000 + JUN', ' ') FROM dual
> > > ORA-01830: date format picture ends before converting entire input
string
> > >
> > > It's sort of corner case, but anyway maybe you would be interested to
handle
> > > it.
> >
> > I think it could be fixed by another patch. But I'm not sure that it
> > will be accepted as well as this patch :).
>
> Why do you think there should be another patch for it?

Just to make myself clear. I think it's fair enough to mark this patch as
ready
for committer - the implementation is neat and sufficient as for me, and it
provides a visible improvement for user experience. The question above is
probably the only thing that prevents me from proposing this.


Re: [HACKERS] Bug in to_timestamp().

2018-02-18 Thread Dmitry Dolgov
> On 17 February 2018 at 10:02, Arthur Zakirov 
wrote:
>
> I think it could be fixed by another patch. But I'm not sure that it
> will be accepted as well as this patch :).

Why do you think there should be another patch for it?


Re: [HACKERS] Bug in to_timestamp().

2018-02-17 Thread Arthur Zakirov
On Wed, Feb 14, 2018 at 05:23:53PM +0100, Dmitry Dolgov wrote:
> * On line 52 there is a removed empty line, without any other changes around

Agree, it is unnecessary change. There was the macro there before.

> * On line 177 there is a new commented out line of code. I assume it's not
> an
>   explanation or something and we don't need it, am I right?

It explains a node type we deal with. But maybe it is not very useful,
so I removed it.

> And Oracle complains about this:
> 
> SELECT to_timestamp('2000 + JUN', ' /') FROM dual
> ORA-01830: date format picture ends before converting entire input string
> 
> SELECT to_timestamp('2000 + JUN', ' ') FROM dual
> ORA-01830: date format picture ends before converting entire input string
> 
> It's sort of corner case, but anyway maybe you would be interested to handle
> it.

I think it could be fixed by another patch. But I'm not sure that it
will be accepted as well as this patch :).

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1e535cf215..1cc8b078f7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6174,7 +6174,8 @@ SELECT regexp_match('abc01234xyz', 
'(?:(.*?)(\d+)(.*)){1,1}');
to_timestamp and to_date
skip multiple blank spaces in the input string unless the
FX option is used. For example,
-   to_timestamp('2000JUN', ' 
MON') works, but
+   to_timestamp('2000JUN', ' 
MON') and
+   to_timestamp('2000JUN', 
'MON') work, but
to_timestamp('2000JUN', 'FX 
MON') returns an error
because to_timestamp expects one space only.
FX must be specified as the first item in
@@ -6182,6 +6183,19 @@ SELECT regexp_match('abc01234xyz', 
'(?:(.*?)(\d+)(.*)){1,1}');
   
  
 
+ 
+  
+   to_timestamp and to_date don't
+   skip multiple printable non letter and non digit characters in the input
+   string, but skip them in the formatting string. For example,
+   to_timestamp('2000-JUN', '/MON') and
+   to_timestamp('2000/JUN', '//MON') work, but
+   to_timestamp('2000//JUN', '/MON')
+   returns an error because count of the "/" character in the input string
+   doesn't match count of it in the formatting string.
+  
+ 
+
  
   
Ordinary text is allowed in to_char
diff --git a/src/backend/utils/adt/formatting.c 
b/src/backend/utils/adt/formatting.c
index b8bd4caa3e..a23122d3de 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -171,6 +171,8 @@ typedef struct
 #define NODE_TYPE_END  1
 #define NODE_TYPE_ACTION   2
 #define NODE_TYPE_CHAR 3
+#define NODE_TYPE_SEPARATOR4
+#define NODE_TYPE_SPACE5
 
 #define SUFFTYPE_PREFIX1
 #define SUFFTYPE_POSTFIX   2
@@ -961,6 +963,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int 
type);
+static bool is_separator_char(const char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 const KeySuffix *suf, const int *index, int ver, 
NUMDesc *Num);
@@ -1050,6 +1053,16 @@ suff_search(const char *str, const KeySuffix *suf, int 
type)
return NULL;
 }
 
+static bool
+is_separator_char(const char *str)
+{
+   /* ASCII printable character, but not letter or digit */
+   return (*str > 0x20 && *str < 0x7F &&
+   !(*str >= 'A' && *str <= 'Z') &&
+   !(*str >= 'a' && *str <= 'z') &&
+   !(*str >= '0' && *str <= '9'));
+}
+
 /* --
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * --
@@ -1325,7 +1338,14 @@ parse_format(FormatNode *node, const char *str, const 
KeyWord *kw,
if (*str == '\\' && *(str + 1) == '"')
str++;
chlen = pg_mblen(str);
-   n->type = NODE_TYPE_CHAR;
+
+   if (ver == DCH_TYPE && is_separator_char(str))
+   n->type = NODE_TYPE_SEPARATOR;
+   else if (isspace((unsigned char) *str))
+   n->type = NODE_TYPE_SPACE;
+   else
+   n->type = NODE_TYPE_CHAR;
+
memcpy(n->character, str, chlen);
n->character[chlen] = '\0';
n->key = NULL;
@@ -2996,12 +3016,73 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar 
*out)
 
for (n = 

Re: [HACKERS] Bug in to_timestamp().

2018-02-14 Thread Dmitry Dolgov
> On 12 February 2018 at 12:49, Arthur Zakirov 
wrote:
>
> Yes, I somehow missed it. I changed the behaviour of separator
> characters in format string. They are greedy now and eat whitespaces
> before a separator character in an input string. I suppose that there
> should be no such problems as in [1].

Thanks. I have a few minor complains about some noise:

* On line 52 there is a removed empty line, without any other changes around

* On line 177 there is a new commented out line of code. I assume it's not
an
  explanation or something and we don't need it, am I right?

Also, I spotted one more difference between this patch and Oracle. In the
situation, when a format string doesn't have anything meaningful, with the
patch we've got:

SELECT to_timestamp('2000 + JUN', '/');
  to_timestamp
-
0001-01-01 00:00:00+00:53:28 BC
(1 row)

SELECT to_timestamp('2000 + JUN', ' ');
  to_timestamp
-
0001-01-01 00:00:00+00:53:28 BC
(1 row)

And Oracle complains about this:

SELECT to_timestamp('2000 + JUN', ' /') FROM dual
ORA-01830: date format picture ends before converting entire input string

SELECT to_timestamp('2000 + JUN', ' ') FROM dual
ORA-01830: date format picture ends before converting entire input string

It's sort of corner case, but anyway maybe you would be interested to handle
it.

>> About usage of locale dependent functions e.g. `isalpha`. Yes, looks like
>> it's better to have locale-agnostic implementation, but then I'm
confused - all
>> functions except `isdigit`/`isxdigit` are locale-dependent, including
>> `isspace`, which is also in use.
>
> I returned is_separator_char() function for now.

Thanks. Answering my own question about `isspace`, I finally noticed, that
this
function was used before the patch, and there were no complains - so
probably
it's fine.


Re: [HACKERS] Bug in to_timestamp().

2018-02-12 Thread Arthur Zakirov
Hello,

On Fri, Feb 09, 2018 at 04:31:14PM +0100, Dmitry Dolgov wrote:
> I went through this thread, and want to summarize a bit:
> 
> From what I see this patch addresses most important concerns that were
> mentioned in the thread, i.e. to make `to_timestamp` less confusing and be
> close to Oracles behavior. The code itself looks clear and sufficient, with 
> the
> required documentation and green tests.

Thank you for reviewing!

> Looks like there are just two questions left so far:
> 
> * I've noticed what I think a difference between current that was introduced 
> in
> this patch and Oracle. In the following case we can have any number of spaces
> after a separator `+` in the input string
> ...
> But no spaces before it (it actually depends on how many separators do we have
> in the format string)
> ... 
> Judging from this http://rextester.com/l/oracle_online_compiler in Oracle it's
> possible to have any number of spaces before or after `+` independently from
> the number of separators in an input string. Is it intended?

Yes, I somehow missed it. I changed the behaviour of separator
characters in format string. They are greedy now and eat whitespaces
before a separator character in an input string. I suppose that there
should be no such problems as in [1].

> * About usage of locale dependent functions e.g. `isalpha`. Yes, looks like
> it's better to have locale-agnostic implementation, but then I'm confused - 
> all
> functions except `isdigit`/`isxdigit` are locale-dependent, including
> `isspace`, which is also in use.

I returned is_separator_char() function for now.

1 - 
https://www.postgresql.org/message-id/20180112125848.GA32559%40zakirov.localdomain

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 640ff09a7b..a34781378e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6174,7 +6174,8 @@ SELECT regexp_match('abc01234xyz', 
'(?:(.*?)(\d+)(.*)){1,1}');
to_timestamp and to_date
skip multiple blank spaces in the input string unless the
FX option is used. For example,
-   to_timestamp('2000JUN', ' 
MON') works, but
+   to_timestamp('2000JUN', ' 
MON') and
+   to_timestamp('2000JUN', 
'MON') work, but
to_timestamp('2000JUN', 'FX 
MON') returns an error
because to_timestamp expects one space only.
FX must be specified as the first item in
@@ -6182,6 +6183,19 @@ SELECT regexp_match('abc01234xyz', 
'(?:(.*?)(\d+)(.*)){1,1}');
   
  
 
+ 
+  
+   to_timestamp and to_date don't
+   skip multiple printable non letter and non digit characters in the input
+   string, but skip them in the formatting string. For example,
+   to_timestamp('2000-JUN', '/MON') and
+   to_timestamp('2000/JUN', '//MON') work, but
+   to_timestamp('2000//JUN', '/MON')
+   returns an error because count of the "/" character in the input string
+   doesn't match count of it in the formatting string.
+  
+ 
+
  
   
Ordinary text is allowed in to_char
diff --git a/src/backend/utils/adt/formatting.c 
b/src/backend/utils/adt/formatting.c
index b8bd4caa3e..470cd4e187 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -171,6 +171,8 @@ typedef struct
 #define NODE_TYPE_END  1
 #define NODE_TYPE_ACTION   2
 #define NODE_TYPE_CHAR 3
+#define NODE_TYPE_SEPARATOR4
+#define NODE_TYPE_SPACE5
 
 #define SUFFTYPE_PREFIX1
 #define SUFFTYPE_POSTFIX   2
@@ -542,7 +544,6 @@ static const KeySuffix DCH_suff[] = {
{NULL, 0, 0, 0}
 };
 
-
 /* --
  * Format-pictures (KeyWord).
  *
@@ -961,6 +962,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int 
type);
+static bool is_separator_char(const char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 const KeySuffix *suf, const int *index, int ver, 
NUMDesc *Num);
@@ -1050,6 +1052,16 @@ suff_search(const char *str, const KeySuffix *suf, int 
type)
return NULL;
 }
 
+static bool
+is_separator_char(const char *str)
+{
+   /* ASCII printable character, but not letter or digit */
+   return (*str > 0x20 && *str < 0x7F &&
+   !(*str >= 'A' && *str <= 'Z') &&
+   !(*str >= 'a' && *str <= 'z') &&
+   !(*str >= '0' && *str <= '9'));
+}
+
 /* --
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * --
@@ -1325,7 +1337,14 @@ parse_format(FormatNode *node, const char *str, const 
KeyWord *kw,
   

Re: [HACKERS] Bug in to_timestamp().

2018-02-09 Thread Dmitry Dolgov
> On 7 February 2018 at 22:51, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>> On 6 February 2018 at 10:17, Arthur Zakirov  wrote:
>> It is strange. I still can apply both v9 [1] and v10 [2] via 'git
>> apply'. And Patch Tester [3] says that it is applied. But maybe
>> it is because of my git (git version 2.16.1).
>>
>> You can try also 'patch -p1':
>
> Yes, looks like the problem is on my side, sorry.

I went through this thread, and want to summarize a bit:

>From what I see this patch addresses most important concerns that were
mentioned in the thread, i.e. to make `to_timestamp` less confusing and be
close to Oracles behavior. The code itself looks clear and sufficient, with the
required documentation and green tests.

Looks like there are just two questions left so far:

* I've noticed what I think a difference between current that was introduced in
this patch and Oracle. In the following case we can have any number of spaces
after a separator `+` in the input string

SELECT to_timestamp('2000+JUN', '/MON');
  to_timestamp

 2000-06-01 00:00:00+02
(1 row)

SELECT to_timestamp('2000+   JUN', '/MON');
  to_timestamp

 2000-06-01 00:00:00+02
(1 row)

But no spaces before it (it actually depends on how many separators do we have
in the format string)

SELECT to_timestamp('2000 +JUN', '/MON');
ERROR:  22007: invalid value "+JU" for "MON"
DETAIL:  The given value did not match any of the allowed values
for this field.
LOCATION:  from_char_seq_search, formatting.c:2410

SELECT to_timestamp('2000 +JUN', '//MON');
  to_timestamp

 2000-06-01 00:00:00+02
(1 row)

SELECT to_timestamp('2000  +JUN', '//MON');
ERROR:  22007: invalid value "+JU" for "MON"
DETAIL:  The given value did not match any of the allowed values
for this field.
LOCATION:  from_char_seq_search, formatting.c:2410

Judging from this http://rextester.com/l/oracle_online_compiler in Oracle it's
possible to have any number of spaces before or after `+` independently from
the number of separators in an input string. Is it intended?

SELECT to_timestamp('2000  +  JUN', '/MON') FROM dual
01.06.2000 00:00:00

* About usage of locale dependent functions e.g. `isalpha`. Yes, looks like
it's better to have locale-agnostic implementation, but then I'm confused - all
functions except `isdigit`/`isxdigit` are locale-dependent, including
`isspace`, which is also in use.



Re: [HACKERS] Bug in to_timestamp().

2018-02-07 Thread Dmitry Dolgov
> On 6 February 2018 at 10:17, Arthur Zakirov  wrote:
> It is strange. I still can apply both v9 [1] and v10 [2] via 'git
> apply'. And Patch Tester [3] says that it is applied. But maybe
> it is because of my git (git version 2.16.1).
>
> You can try also 'patch -p1':

Yes, looks like the problem is on my side, sorry.



Re: [HACKERS] Bug in to_timestamp().

2018-02-06 Thread Arthur Zakirov
On Fri, Feb 02, 2018 at 09:54:45PM +0100, Dmitry Dolgov wrote:
> For some reason I can't apply it clean to the latest master:
> 
> (Stripping trailing CRs from patch; use --binary to disable.)
> patching file doc/src/sgml/func.sgml
> (Stripping trailing CRs from patch; use --binary to disable.)
> patching file src/backend/utils/adt/formatting.c
> (Stripping trailing CRs from patch; use --binary to disable.)
> patching file src/test/regress/expected/horology.out
> Hunk #1 FAILED at 2769.
> Hunk #2 FAILED at 2789.
> Hunk #3 succeeded at 2810 with fuzz 2.
> Hunk #4 succeeded at 2981 with fuzz 2.
> Hunk #5 succeeded at 3011 with fuzz 2.
> Hunk #6 FAILED at 3029.
> 3 out of 6 hunks FAILED -- saving rejects to file
> src/test/regress/expected/horology.out.rej
> (Stripping trailing CRs from patch; use --binary to disable.)
> patching file src/test/regress/sql/horology.sql

It is strange. I still can apply both v9 [1] and v10 [2] via 'git
apply'. And Patch Tester [3] says that it is applied. But maybe
it is because of my git (git version 2.16.1).

You can try also 'patch -p1':

$ patch -p1 < 0001-to-timestamp-format-checking-v10.patch

1 - 
https://www.postgresql.org/message-id/20180112125848.GA32559@zakirov.localdomain
2 - 
https://www.postgresql.org/message-id/20180201102449.GA29082@zakirov.localdomain
3 - http://commitfest.cputube.org/

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-02-02 Thread Dmitry Dolgov
> On 1 February 2018 at 11:24, Arthur Zakirov  wrote:
>
> I fixed those messages, but in a different manner. I think that an
> unexpected character is unknown and neither space nor separator. And
> better to say that was expected space/separator character.

Sounds good, thanks.

> Attached fixed patch.

For some reason I can't apply it clean to the latest master:

(Stripping trailing CRs from patch; use --binary to disable.)
patching file doc/src/sgml/func.sgml
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/backend/utils/adt/formatting.c
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/test/regress/expected/horology.out
Hunk #1 FAILED at 2769.
Hunk #2 FAILED at 2789.
Hunk #3 succeeded at 2810 with fuzz 2.
Hunk #4 succeeded at 2981 with fuzz 2.
Hunk #5 succeeded at 3011 with fuzz 2.
Hunk #6 FAILED at 3029.
3 out of 6 hunks FAILED -- saving rejects to file
src/test/regress/expected/horology.out.rej
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/test/regress/sql/horology.sql

> On 2 February 2018 at 16:40, Robert Haas  wrote:
>
> I'm not quite sure whether it's relevant here, but I think some of the
> ctype.h functions have locale-dependent behavior.  By implementing our
> own test we make sure that we don't accidentally inherit any such
> behavior.

Yes, you're right, `isalpha` is actually locale dependent function.

In some locales, there may be additional characters for which isalpha() is
true—letters which are neither uppercase nor lowercase.

So, if I understand the patch correctly, with `isalpha` the function
`is_separator_char` will return false for some locale-specific characters, and
without - those characters will be treated as separators. Is it desire
behavior?



Re: [HACKERS] Bug in to_timestamp().

2018-02-02 Thread Robert Haas
On Wed, Jan 31, 2018 at 11:53 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> Why not write `is_separator_char` using `isprint`, `isalpha`, `isdigit` from
> ctype.h? Something like:
>
> return isprint(*str) && !isalpha(*str) && !isdigit(*str)
>
> From what I see in the source code they do exactly the same and tests are
> successfully passing with this change.

I'm not quite sure whether it's relevant here, but I think some of the
ctype.h functions have locale-dependent behavior.  By implementing our
own test we make sure that we don't accidentally inherit any such
behavior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Bug in to_timestamp().

2018-02-01 Thread Arthur Zakirov
On Wed, Jan 31, 2018 at 05:53:29PM +0100, Dmitry Dolgov wrote:
> Thanks for working on that! I haven't followed this thread before, and after a
> quick review I have few side questions.

Thank you for your comments!

> Why not write `is_separator_char` using `isprint`, `isalpha`, `isdigit` from
> ctype.h? Something like:
> 
> return isprint(*str) && !isalpha(*str) && !isdigit(*str)
> 
> From what I see in the source code they do exactly the same and tests are
> successfully passing with this change.

Fixed. The patch uses those functions now. I made is_separator_char() as a
IS_SEPARATOR_CHAR() macro.

> What do you think about providing two slightly different messages for these 
> two
> situations:
> 
> if (n->type == NODE_TYPE_SPACE && !isspace((unsigned char) *s))
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_DATETIME_FORMAT),
>  errmsg("unexpected character \"%.*s\", expected \"%s\"",
> pg_mblen(s), s, n->character),
>  errhint("In FX mode, punctuation in the input string "
>  "must exactly match the format string.")));
> /*
>  * In FX mode we insist that separator character from the format
>  * string matches separator character from the input string.
>  */
> else if (n->type == NODE_TYPE_SEPARATOR && *n->character != *s)
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_DATETIME_FORMAT),
>  errmsg("unexpected character \"%.*s\", expected \"%s\"",
> pg_mblen(s), s, n->character),
>  errhint("In FX mode, punctuation in the input string "
>  "must exactly match the format string.")));
> 
> E.g. "unexpected space character" and "unexpected separator character". The
> difference is quite subtle, but I think a bit of context would never hurt.

I fixed those messages, but in a different manner. I think that an
unexpected character is unknown and neither space nor separator. And
better to say that was expected space/separator character.

Attached fixed patch.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 487c7ff750..053d153d35 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6174,7 +6174,8 @@ SELECT regexp_match('abc01234xyz', 
'(?:(.*?)(\d+)(.*)){1,1}');
to_timestamp and to_date
skip multiple blank spaces in the input string unless the
FX option is used. For example,
-   to_timestamp('2000JUN', ' 
MON') works, but
+   to_timestamp('2000JUN', ' 
MON') and
+   to_timestamp('2000JUN', 
'MON') work, but
to_timestamp('2000JUN', 'FX 
MON') returns an error
because to_timestamp expects one space only.
FX must be specified as the first item in
@@ -6182,6 +6183,19 @@ SELECT regexp_match('abc01234xyz', 
'(?:(.*?)(\d+)(.*)){1,1}');
   
  
 
+ 
+  
+   to_timestamp and to_date don't
+   skip multiple printable non letter and non digit characters in the input
+   string, but skip them in the formatting string. For example,
+   to_timestamp('2000-JUN', '/MON') and
+   to_timestamp('2000/JUN', '//MON') work, but
+   to_timestamp('2000//JUN', '/MON')
+   returns an error because count of the "/" character in the input string
+   doesn't match count of it in the formatting string.
+  
+ 
+
  
   
Ordinary text is allowed in to_char
diff --git a/src/backend/utils/adt/formatting.c 
b/src/backend/utils/adt/formatting.c
index b8bd4caa3e..35007bec5c 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -171,6 +171,8 @@ typedef struct
 #define NODE_TYPE_END  1
 #define NODE_TYPE_ACTION   2
 #define NODE_TYPE_CHAR 3
+#define NODE_TYPE_SEPARATOR4
+#define NODE_TYPE_SPACE5
 
 #define SUFFTYPE_PREFIX1
 #define SUFFTYPE_POSTFIX   2
@@ -542,6 +544,10 @@ static const KeySuffix DCH_suff[] = {
{NULL, 0, 0, 0}
 };
 
+#define IS_SEPARATOR_CHAR(s) ( \
+   isprint(*(unsigned char*)(s)) && \
+   !isalpha(*(unsigned char*)(s)) && \
+   !isdigit(*(unsigned char*)(s)))
 
 /* --
  * Format-pictures (KeyWord).
@@ -1325,7 +1331,14 @@ parse_format(FormatNode *node, const char *str, const 
KeyWord *kw,
if (*str == '\\' && *(str + 1) == '"')
str++;
chlen = pg_mblen(str);
-   n->type = NODE_TYPE_CHAR;
+
+   if (ver == DCH_TYPE && IS_SEPARATOR_CHAR(str))
+   n->type = NODE_TYPE_SEPARATOR;
+   else if (isspace((unsigned char) *str))
+   n->type = NODE_TYPE_SPACE;

Re: [HACKERS] Bug in to_timestamp().

2018-01-31 Thread Dmitry Dolgov
> On 12 January 2018 at 13:58, Arthur Zakirov  wrote:
>
> I attached new version of the patch. Now (expected output):
>

Thanks for working on that! I haven't followed this thread before, and after a
quick review I have few side questions.

Why not write `is_separator_char` using `isprint`, `isalpha`, `isdigit` from
ctype.h? Something like:

return isprint(*str) && !isalpha(*str) && !isdigit(*str)

>From what I see in the source code they do exactly the same and tests are
successfully passing with this change.

What do you think about providing two slightly different messages for these two
situations:

if (n->type == NODE_TYPE_SPACE && !isspace((unsigned char) *s))
ereport(ERROR,
(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
 errmsg("unexpected character \"%.*s\", expected \"%s\"",
pg_mblen(s), s, n->character),
 errhint("In FX mode, punctuation in the input string "
 "must exactly match the format string.")));
/*
 * In FX mode we insist that separator character from the format
 * string matches separator character from the input string.
 */
else if (n->type == NODE_TYPE_SEPARATOR && *n->character != *s)
ereport(ERROR,
(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
 errmsg("unexpected character \"%.*s\", expected \"%s\"",
pg_mblen(s), s, n->character),
 errhint("In FX mode, punctuation in the input string "
 "must exactly match the format string.")));

E.g. "unexpected space character" and "unexpected separator character". The
difference is quite subtle, but I think a bit of context would never hurt.



Re: [HACKERS] Bug in to_timestamp().

2018-01-12 Thread Arthur Zakirov
Hello,

On Fri, Jan 12, 2018 at 02:48:40PM +1300, Thomas Munro wrote:
> 
> I'm guessing that commit 11b623dd0a2c385719ebbbdd42dd4ec395dcdc9d had
> something to do with the following failure, when your patch is
> applied:
> 
>  horology ... FAILED
> 

Thank you a lot for pointing on that.

It seems to me that it happens because the patch eats minus sign "-" before 
"05". And it is wrong to do that.

I attached new version of the patch. Now (expected output):

=# SELECT to_timestamp('2011-12-18 11:38 -05', '-MM-DD HH12:MI TZH');
  to_timestamp  

 2011-12-18 20:38:00+04

But these queries may confuse:

=# SELECT to_timestamp('2011-12-18 11:38 -05', '-MM-DD HH12:MI  TZH');
  to_timestamp  

 2011-12-18 10:38:00+04

=# SELECT to_timestamp('2011-12-18 11:38 -05', '-MM-DD HH12:MI -TZH');
  to_timestamp  

 2011-12-18 10:38:00+04

And these queries don't work anymore using new version of the patch:

=# SELECT to_timestamp('2000 + JUN', ' MON');
ERROR:  invalid value "+ J" for "MON"
DETAIL:  The given value did not match any of the allowed values for this field.

=# SELECT to_timestamp('2000 +   JUN', ' MON');
ERROR:  invalid value "+  " for "MON"
DETAIL:  The given value did not match any of the allowed values for this field.

Other queries mentioned in the thread work as before.

Any thoughts? If someone has better approach, feel free to share it.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2428434030..545129334c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6174,7 +6174,8 @@ SELECT regexp_match('abc01234xyz', 
'(?:(.*?)(\d+)(.*)){1,1}');
to_timestamp and to_date
skip multiple blank spaces in the input string unless the
FX option is used. For example,
-   to_timestamp('2000JUN', ' 
MON') works, but
+   to_timestamp('2000JUN', ' 
MON') and
+   to_timestamp('2000JUN', 
'MON') work, but
to_timestamp('2000JUN', 'FX 
MON') returns an error
because to_timestamp expects one space only.
FX must be specified as the first item in
@@ -6182,6 +6183,19 @@ SELECT regexp_match('abc01234xyz', 
'(?:(.*?)(\d+)(.*)){1,1}');
   
  
 
+ 
+  
+   to_timestamp and to_date don't
+   skip multiple printable non letter and non digit characters in the input
+   string, but skip them in the formatting string. For example,
+   to_timestamp('2000-JUN', '/MON') and
+   to_timestamp('2000/JUN', '//MON') work, but
+   to_timestamp('2000//JUN', '/MON')
+   returns an error because count of the "/" character in the input string
+   doesn't match count of it in the formatting string.
+  
+ 
+
  
   
Ordinary text is allowed in to_char
diff --git a/src/backend/utils/adt/formatting.c 
b/src/backend/utils/adt/formatting.c
index b8bd4caa3e..dddfc4b1bf 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -171,6 +171,8 @@ typedef struct
 #define NODE_TYPE_END  1
 #define NODE_TYPE_ACTION   2
 #define NODE_TYPE_CHAR 3
+#define NODE_TYPE_SEPARATOR4
+#define NODE_TYPE_SPACE5
 
 #define SUFFTYPE_PREFIX1
 #define SUFFTYPE_POSTFIX   2
@@ -961,6 +963,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int 
type);
+static bool is_separator_char(const char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 const KeySuffix *suf, const int *index, int ver, 
NUMDesc *Num);
@@ -1050,6 +1053,16 @@ suff_search(const char *str, const KeySuffix *suf, int 
type)
return NULL;
 }
 
+static bool
+is_separator_char(const char *str)
+{
+   /* ASCII printable character, but not letter or digit */
+   return (*str > 0x20 && *str < 0x7F &&
+   !(*str >= 'A' && *str <= 'Z') &&
+   !(*str >= 'a' && *str <= 'z') &&
+   !(*str >= '0' && *str <= '9'));
+}
+
 /* --
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * --
@@ -1325,7 +1338,14 @@ parse_format(FormatNode *node, const char *str, const 
KeyWord *kw,
if (*str == '\\' && *(str + 1) == '"')
str++;
chlen = pg_mblen(str);
-   n->type = NODE_TYPE_CHAR;
+
+   if (ver == DCH_TYPE && is_separator_char(str))
+  

Re: [HACKERS] Bug in to_timestamp().

2018-01-11 Thread Thomas Munro
On Thu, Nov 23, 2017 at 4:23 AM, Arthur Zakirov
 wrote:
> I've attached new version of the patch. It is a little bit simpler now than 
> the previous one.
> The patch doesn't handle backslashes now, since there was a commit which 
> fixes quoted-substring handling recently.
> Anyway I'm not sure that this handling was necessary.
>
> I've checked queries from this thread. It seems that they give right 
> timestamps and work like in Oracle (except different messages).
>
> The patch contains documentation and regression test fixes.

I'm guessing that commit 11b623dd0a2c385719ebbbdd42dd4ec395dcdc9d had
something to do with the following failure, when your patch is
applied:

 horology ... FAILED

= Contents of ./src/test/regress/regression.diffs
*** 
/home/travis/build/postgresql-cfbot/postgresql/src/test/regress/expected/horology.out
2018-01-11 17:11:49.744128698 +
--- 
/home/travis/build/postgresql-cfbot/postgresql/src/test/regress/results/horology.out
2018-01-11 17:18:53.988815652 +
***
*** 2972,2978 
  SELECT to_timestamp('2011-12-18 11:38 -05','-MM-DD HH12:MI TZH');
   to_timestamp
  --
!  Sun Dec 18 08:38:00 2011 PST
  (1 row)

  SELECT to_timestamp('2011-12-18 11:38 +05:20', '-MM-DD HH12:MI TZH:TZM');
--- 2972,2978 
  SELECT to_timestamp('2011-12-18 11:38 -05','-MM-DD HH12:MI TZH');
   to_timestamp
  --
!  Sat Dec 17 22:38:00 2011 PST
  (1 row)

  SELECT to_timestamp('2011-12-18 11:38 +05:20', '-MM-DD HH12:MI TZH:TZM');
***
*** 2984,2990 
  SELECT to_timestamp('2011-12-18 11:38 -05:20', '-MM-DD HH12:MI TZH:TZM');
   to_timestamp
  --
!  Sun Dec 18 08:58:00 2011 PST
  (1 row)

  SELECT to_timestamp('2011-12-18 11:38 20', '-MM-DD HH12:MI TZM');
--- 2984,2990 
  SELECT to_timestamp('2011-12-18 11:38 -05:20', '-MM-DD HH12:MI TZH:TZM');
   to_timestamp
  --
!  Sat Dec 17 22:18:00 2011 PST
  (1 row)

  SELECT to_timestamp('2011-12-18 11:38 20', '-MM-DD HH12:MI TZM');
==

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] Bug in to_timestamp().

2017-11-30 Thread Arthur Zakirov
On Thu, Nov 30, 2017 at 10:39:56AM -0500, Robert Haas wrote:
> 
> So is it now the case that all of the regression test cases in this
> patch produce the same results as they would on Oracle?
> 

Yes, exactly. All new cases give the same result as in Oracle, except
text of error messages.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2017-11-19 Thread Arthur Zakirov
On Sat, Nov 18, 2017 at 05:48:26PM -0500, Tom Lane wrote:
> This patch needs a rebase over the formatting.c fixes that have gone
> in over the last couple of days.
> 
> Looking at the rejects, I notice that in your changes to parse_format(),
> you seem to be making it rely even more heavily on remembering state about
> the previous input.  I recommend against that --- it was broken before,
> and it's a pretty fragile approach.  Backslashes are not that hard to
> deal with in-line.

I can continue to work on a better approach. Though, the patch was made a long 
time
ago I'll refresh my memory. The main intent was to fix the following
behaviour:

=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36', '/MM/DD  HH24:MI:SS');
to_timestamp 

2016-06-13 05:43:36-07   <— incorrect time

> select to_timestamp('97/Feb/16', 'YY:Mon:DD')
> select to_timestamp('97/Feb/16', 'YY Mon DD')
> select to_timestamp('97 Feb 16', 'YY/Mon/DD')
> 
> (Well, Oracle thinks these mean 2097 where we think 1997, but the point is
> you don't get an error.)  I see from your regression test additions that
> you want to make some of these throw an error, and I think that any such
> proposal is dead in the water.  Nobody is going to consider it an
> improvement if it both breaks working PG apps and disagrees with Oracle.
> 
>   regards, tom lane

If I understand correctly, these queries don't throw an error and the patch 
tried to follow Oracle's rules.
Only queries with FX field throw an error. For example:

=# SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
ERROR:  unexpected character "/", expected ":"

>From Oracle's documentation [1]:

> FX - Requires exact matching between the character data and the format model.

I agree that compatibility breaking is not good and a fu
ure patch may only try to fix wrong output date and time as in Amul's first 
email.

1 - 
https://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements004.htm#i34924

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2017-11-18 Thread Tom Lane
Artur Zakirov  writes:
> [ 0001-to-timestamp-format-checking-v7.patch ]

This patch needs a rebase over the formatting.c fixes that have gone
in over the last couple of days.

Looking at the rejects, I notice that in your changes to parse_format(),
you seem to be making it rely even more heavily on remembering state about
the previous input.  I recommend against that --- it was broken before,
and it's a pretty fragile approach.  Backslashes are not that hard to
deal with in-line.

The larger picture though is that I'm not real sure that this patch is
going in the direction we want.  All of these cases work in both our
code and Oracle:

select to_timestamp('97/Feb/16', 'YY:Mon:DD')
select to_timestamp('97/Feb/16', 'YY Mon DD')
select to_timestamp('97 Feb 16', 'YY/Mon/DD')

(Well, Oracle thinks these mean 2097 where we think 1997, but the point is
you don't get an error.)  I see from your regression test additions that
you want to make some of these throw an error, and I think that any such
proposal is dead in the water.  Nobody is going to consider it an
improvement if it both breaks working PG apps and disagrees with Oracle.

regards, tom lane