Re: [HACKERS] Rectifying wrong Date outputs

2011-09-07 Thread Bruce Momjian

Applied, with a function rename.  The only odd case we have left is:

test= select to_date('079', 'YYY');
  to_date

 1979-01-01
(1 row)

(Note the zero is ignored.)  I can't see an easy way to fix this and
continue to be easily documented.

---

Bruce Momjian wrote:
 Bruce Momjian wrote:
  Piyush Newe wrote:
   Hi,
   
   I was randomly testing some date related stuff on PG  observed that the
   outputs were wrong.
   
   e.g.
   postgres=# SELECT TO_DATE('01-jan-2010',  'DD-MON-YY');
 to_date
   
3910-01-01  - Look at this
   (1 row)
   
   postgres=# SELECT TO_DATE('01-jan-2010',  'DD-MON-');
 to_date
   
2010-01-01
   (1 row)
  
  I have done some work on this problem, and have developed the attached
  patch.  It genarates the output in the final column of this table:
  
  Oracle  PostgreSQL  
  With PG Patch
   1  TO_DATE('01-jan-1',  'DD-MON-Y')01-JAN-2011 01-JAN-2001 
  01-JAN-2001+
   2  TO_DATE('01-jan-1',  'DD-MON-YY')   01-JAN-2001 01-JAN-2001 
  01-JAN-2001
   3  TO_DATE('01-jan-1',  'DD-MON-YYY')  01-JAN-2001 01-JAN-2001 
  01-JAN-2001
   4  TO_DATE('01-jan-1',  'DD-MON-') 01-JAN-0001 01-JAN-0001 
  01-JAN-0001
   5  TO_DATE('01-jan-10',  'DD-MON-Y')   Error   01-JAN-2010 
  01-JAN-2010
   6  TO_DATE('01-jan-10',  'DD-MON-YY')  01-JAN-2010 01-JAN-2010 
  01-JAN-2010
   7  TO_DATE('01-jan-10',  'DD-MON-YYY') 01-JAN-2010 01-JAN-2010 
  01-JAN-2010
   8  TO_DATE('01-jan-10',  'DD-MON-')01-JAN-0010 01-JAN-0010 
  01-JAN-0010
   9  TO_DATE('01-jan-067',  'DD-MON-Y')  Error   01-JAN-2067 
  01-JAN-2067
  10  TO_DATE('01-jan-111',  'DD-MON-YY') 01-JAN-0111 01-JAN-2011 
  01-JAN-2111*+
  11  TO_DATE('01-jan-678',  'DD-MON-YYY')01-JAN-2678 01-JAN-1678 
  01-JAN-1678+
  12  TO_DATE('01-jan-001',  'DD-MON-')   01-JAN-0001 01-JAN-0001 
  01-JAN-0001
  13  TO_DATE('01-jan-2010',  'DD-MON-Y') Error   01-JAN-4010 
  01-JAN-2010*
  14  TO_DATE('01-jan-2010',  'DD-MON-YY')01-JAN-2010 01-JAN-3910 
  01-JAN-2010*
  15  TO_DATE('01-jan-2010',  'DD-MON-YYY')   Error   01-JAN-3010 
  01-JAN-2010*
  16  TO_DATE('01-jan-2010',  'DD-MON-')  01-JAN-2010 01-JAN-2010 
  01-JAN-2010
 
 In an attempt to make the to_date/to_timestamp behavior documentable, I
 have modified the patch to have dates adjust toward the year 2020, and
 added code so if four digits are supplied, we don't do any adjustment. 
 Here is the current odd behavior, which is fixed by the patch:
 
   test= select to_date('222', 'YYY');
 to_date
   
-01-01
   (1 row)
   
   test= select to_date('0222', 'YYY');
 to_date
   
-01-01
   (1 row)
 
 If they supply a full 4-digit year, it seems we should honor that, even
 for YYY.   still does no adjustment, and I doubt we want to change
 that:
 
   test= select to_date('222', '');
 to_date
   
0222-01-01
   (1 row)
   
   test= select to_date('0222', '');
 to_date
   
0222-01-01
   (1 row)
 
 -- 
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com
 
   + It's impossible for everything to be true. +

[ text/x-diff is unsupported, treating like TEXT/PLAIN ]

 diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
 new file mode 100644
 index c03dd6c..282bb0d
 *** a/doc/src/sgml/func.sgml
 --- b/doc/src/sgml/func.sgml
 *** SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1
 *** 5550, 
 --- 5550,5564 
   
listitem
 para
 +If the year format specification is less than four digits, e.g.
 +literalYYY/, and the supplied year is less than four digits,
 +the year will be adjusted to be nearest to year 2020, e.g.
 +literal95/ becomes 1995.
 +   /para
 +  /listitem
 + 
 +  listitem
 +   para
  The literal/literal conversion from string to 
 typetimestamp/type or
  typedate/type has a restriction when processing years with more 
 than 4 digits. You must
  use some non-digit character or template after 
 literal/literal,
 diff --git a/src/backend/utils/adt/formatting.c 
 b/src/backend/utils/adt/formatting.c
 new file mode 100644
 index 726a1f4..1a3ec1c
 *** a/src/backend/utils/adt/formatting.c
 --- b/src/backend/utils/adt/formatting.c
 *** static void dump_node(FormatNode *node, 
 *** 964,969 
 --- 964,970 
   
   static char *get_th(char *num, int type);
   static char *str_numth(char *dest, 

Re: [HACKERS] Rectifying wrong Date outputs

2011-09-06 Thread Bruce Momjian
Bruce Momjian wrote:
 Piyush Newe wrote:
  Hi,
  
  I was randomly testing some date related stuff on PG  observed that the
  outputs were wrong.
  
  e.g.
  postgres=# SELECT TO_DATE('01-jan-2010',  'DD-MON-YY');
to_date
  
   3910-01-01  - Look at this
  (1 row)
  
  postgres=# SELECT TO_DATE('01-jan-2010',  'DD-MON-');
to_date
  
   2010-01-01
  (1 row)
 
 I have done some work on this problem, and have developed the attached
 patch.  It genarates the output in the final column of this table:
 
 Oracle  PostgreSQL  
 With PG Patch
  1  TO_DATE('01-jan-1',  'DD-MON-Y')01-JAN-2011 01-JAN-2001 
 01-JAN-2001+
  2  TO_DATE('01-jan-1',  'DD-MON-YY')   01-JAN-2001 01-JAN-2001 
 01-JAN-2001
  3  TO_DATE('01-jan-1',  'DD-MON-YYY')  01-JAN-2001 01-JAN-2001 
 01-JAN-2001
  4  TO_DATE('01-jan-1',  'DD-MON-') 01-JAN-0001 01-JAN-0001 
 01-JAN-0001
  5  TO_DATE('01-jan-10',  'DD-MON-Y')   Error   01-JAN-2010 
 01-JAN-2010
  6  TO_DATE('01-jan-10',  'DD-MON-YY')  01-JAN-2010 01-JAN-2010 
 01-JAN-2010
  7  TO_DATE('01-jan-10',  'DD-MON-YYY') 01-JAN-2010 01-JAN-2010 
 01-JAN-2010
  8  TO_DATE('01-jan-10',  'DD-MON-')01-JAN-0010 01-JAN-0010 
 01-JAN-0010
  9  TO_DATE('01-jan-067',  'DD-MON-Y')  Error   01-JAN-2067 
 01-JAN-2067
 10  TO_DATE('01-jan-111',  'DD-MON-YY') 01-JAN-0111 01-JAN-2011 
 01-JAN-2111*+
 11  TO_DATE('01-jan-678',  'DD-MON-YYY')01-JAN-2678 01-JAN-1678 
 01-JAN-1678+
 12  TO_DATE('01-jan-001',  'DD-MON-')   01-JAN-0001 01-JAN-0001 
 01-JAN-0001
 13  TO_DATE('01-jan-2010',  'DD-MON-Y') Error   01-JAN-4010 
 01-JAN-2010*
 14  TO_DATE('01-jan-2010',  'DD-MON-YY')01-JAN-2010 01-JAN-3910 
 01-JAN-2010*
 15  TO_DATE('01-jan-2010',  'DD-MON-YYY')   Error   01-JAN-3010 
 01-JAN-2010*
 16  TO_DATE('01-jan-2010',  'DD-MON-')  01-JAN-2010 01-JAN-2010 
 01-JAN-2010

In an attempt to make the to_date/to_timestamp behavior documentable, I
have modified the patch to have dates adjust toward the year 2020, and
added code so if four digits are supplied, we don't do any adjustment. 
Here is the current odd behavior, which is fixed by the patch:

test= select to_date('222', 'YYY');
  to_date

 -01-01
(1 row)

test= select to_date('0222', 'YYY');
  to_date

 -01-01
(1 row)

If they supply a full 4-digit year, it seems we should honor that, even
for YYY.   still does no adjustment, and I doubt we want to change
that:

test= select to_date('222', '');
  to_date

 0222-01-01
(1 row)

test= select to_date('0222', '');
  to_date

 0222-01-01
(1 row)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index c03dd6c..282bb0d
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1
*** 5550, 
--- 5550,5564 
  
   listitem
para
+If the year format specification is less than four digits, e.g.
+literalYYY/, and the supplied year is less than four digits,
+the year will be adjusted to be nearest to year 2020, e.g.
+literal95/ becomes 1995.
+   /para
+  /listitem
+ 
+  listitem
+   para
 The literal/literal conversion from string to typetimestamp/type or
 typedate/type has a restriction when processing years with more than 4 digits. You must
 use some non-digit character or template after literal/literal,
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
new file mode 100644
index 726a1f4..1a3ec1c
*** a/src/backend/utils/adt/formatting.c
--- b/src/backend/utils/adt/formatting.c
*** static void dump_node(FormatNode *node, 
*** 964,969 
--- 964,970 
  
  static char *get_th(char *num, int type);
  static char *str_numth(char *dest, char *num, int type);
+ static int	add_era_to_partial_year(int year);
  static int	strspace_len(char *str);
  static int	strdigits_len(char *str);
  static void from_char_set_mode(TmFromChar *tmfc, const FromCharDateMode mode);
*** is_next_separator(FormatNode *n)
*** 1968,1973 
--- 1969,1999 
  	return TRUE;/* some non-digit input (separator) */
  }
  
+ 
+ static int
+ add_era_to_partial_year(int year)
+ {
+ 	/*
+ 	 * Adjust all dates toward 2020;  this is effectively what happens
+ 	 * when we assume '70' is 1970 

Re: [HACKERS] Rectifying wrong Date outputs

2011-09-05 Thread Bruce Momjian
Piyush Newe wrote:
 Hi,
 
 I was randomly testing some date related stuff on PG  observed that the
 outputs were wrong.
 
 e.g.
 postgres=# SELECT TO_DATE('01-jan-2010',  'DD-MON-YY');
   to_date
 
  3910-01-01  - Look at this
 (1 row)
 
 postgres=# SELECT TO_DATE('01-jan-2010',  'DD-MON-');
   to_date
 
  2010-01-01
 (1 row)

I have done some work on this problem, and have developed the attached
patch.  It genarates the output in the final column of this table:

Oracle  PostgreSQL  
With PG Patch
 1  TO_DATE('01-jan-1',  'DD-MON-Y')01-JAN-2011 01-JAN-2001 
01-JAN-2001+
 2  TO_DATE('01-jan-1',  'DD-MON-YY')   01-JAN-2001 01-JAN-2001 
01-JAN-2001
 3  TO_DATE('01-jan-1',  'DD-MON-YYY')  01-JAN-2001 01-JAN-2001 
01-JAN-2001
 4  TO_DATE('01-jan-1',  'DD-MON-') 01-JAN-0001 01-JAN-0001 
01-JAN-0001
 5  TO_DATE('01-jan-10',  'DD-MON-Y')   Error   01-JAN-2010 
01-JAN-2010
 6  TO_DATE('01-jan-10',  'DD-MON-YY')  01-JAN-2010 01-JAN-2010 
01-JAN-2010
 7  TO_DATE('01-jan-10',  'DD-MON-YYY') 01-JAN-2010 01-JAN-2010 
01-JAN-2010
 8  TO_DATE('01-jan-10',  'DD-MON-')01-JAN-0010 01-JAN-0010 
01-JAN-0010
 9  TO_DATE('01-jan-067',  'DD-MON-Y')  Error   01-JAN-2067 
01-JAN-2067
10  TO_DATE('01-jan-111',  'DD-MON-YY') 01-JAN-0111 01-JAN-2011 
01-JAN-2111*+
11  TO_DATE('01-jan-678',  'DD-MON-YYY')01-JAN-2678 01-JAN-1678 
01-JAN-1678+
12  TO_DATE('01-jan-001',  'DD-MON-')   01-JAN-0001 01-JAN-0001 
01-JAN-0001
13  TO_DATE('01-jan-2010',  'DD-MON-Y') Error   01-JAN-4010 
01-JAN-2010*
14  TO_DATE('01-jan-2010',  'DD-MON-YY')01-JAN-2010 01-JAN-3910 
01-JAN-2010*
15  TO_DATE('01-jan-2010',  'DD-MON-YYY')   Error   01-JAN-3010 
01-JAN-2010*
16  TO_DATE('01-jan-2010',  'DD-MON-')  01-JAN-2010 01-JAN-2010 
01-JAN-2010

I marked with '*' every case where the patch doesn't match current PG,
and used a '+' to mark every case where it doesn't match Oracle.

I know Tom was worried that because the year field took more digits than
specified, it would prevent numeric columns from being pulled apart, but
our code has this check:

if (S_FM(node-suffix) || is_next_separator(node))
{
/*
 * This node is in Fill Mode, or the next node is known to be a
 * non-digit value, so we just slurp as many characters as we can get.
 */
errno = 0;
result = strtol(init, src, 10);
}

The reason these tests are accepting an unlimited number of digits is
because it is at the end of the string.  If you place a digit field
right after it, it will not use more characters than specified:

test= select to_date('9876', 'YYY');
  to_date

 9876-01-01
(1 row)

test= select to_date('9876', 'YYYMM');
  to_date

 1987-06-01
(1 row)

Yes, not documented, but I assume the coder was trying to be helpful.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
new file mode 100644
index 726a1f4..f4677af
*** a/src/backend/utils/adt/formatting.c
--- b/src/backend/utils/adt/formatting.c
*** static void dump_node(FormatNode *node, 
*** 964,969 
--- 964,970 
  
  static char *get_th(char *num, int type);
  static char *str_numth(char *dest, char *num, int type);
+ static int add_era_to_partial_year(int year);
  static int	strspace_len(char *str);
  static int	strdigits_len(char *str);
  static void from_char_set_mode(TmFromChar *tmfc, const FromCharDateMode mode);
*** is_next_separator(FormatNode *n)
*** 1968,1973 
--- 1969,1995 
  	return TRUE;/* some non-digit input (separator) */
  }
  
+ 
+ static int
+ add_era_to_partial_year(int year)
+ {
+ 	/* Force 0-69 into the 2000's */
+ 	if (year  70)
+ 		return year + 2000;
+ 	/* Force 70-99 into the 1900's */
+ 	else if (year = 70  year  100)
+ 		return year + 1900;
+ 	/* Force 100-499 into the 2000's */
+ 	else if (year = 100  year  500)
+ 		return year + 2000;
+ 	/* Force 500-999 into the 1000's */
+ 	else if (year = 500  year  1000)
+ 		return year + 1000;
+ 	else
+ 		return year;
+ }
+ 
+ 
  static int
  strspace_len(char *str)
  {
*** DCH_from_char(FormatNode *node, char *in
*** 2931,2972 
  			case DCH_YYY:
  			case DCH_IYY:
  from_char_parse_int(out-year, s, n);
  out-yysz = 3;
- 
- /*
-  * 3-digit year: '100' ... '999' = 1100 ... 1999 '000' ...
-  * '099' = 2000 ... 2099
-  */
- if (out-year = 100)
- 	out-year += 1000;
- else
- 	out-year += 2000;
  

Re: [HACKERS] Rectifying wrong Date outputs

2011-03-24 Thread Piyush Newe
Any inputs further ?

On Tue, Mar 22, 2011 at 2:37 PM, Piyush Newe
piyush.n...@enterprisedb.comwrote:

 Thanks Heikki, Tom  Robert for your valuable inputs.

 According to the code, PG is behaving what is mentioned below.

 1. Format = Y
 0 ... 9 = 2000 ... 2009  (we are always adding 2000 to the year)

 2. Format = YY
 00 ... 69 = 2000 ... 2069  (we are adding 2000 to the year)
 70 ... 99 = 1970 ... 1999  (we are adding 1900 to the year)

 3. Format = YYY
 100 ... 999 = 1100 ... 1999 (we are adding 1000 to the year)
 000 ... 099 = 2000 ... 2099 (we are adding 2000 to the year)

 4. Format = 
 Any number -4712 to 

 If we closely look at the code, we can say that, if the given INPUT value
 is not in the range of the DESIRED format (i.e. Y/YY/YYY/), then it
 results some weired YEAR.

 e.g.
 1. TO_DATE('01-jan-2010',  'DD-MON-Y')
 Here it falls in the 1st format case i.e. Y. As per the code, we are
 blindly adding 2000 in the year value, hence the result is returned as
 4010.

 2. TO_DATE('01-jan-2010',  'DD-MON-YYY')
 Here it falls in the 3rd case i.e. YYY. As per the code, without checking
 the input we are adding 1000 to the value  hence it results 3010.

 IMHO, before deciding the century we should check what is the INPUT. This
 check is missing in the code. As Heikki said, we really don't have such
 document anywhere. We need to atleast document what we are doing. However,
 if we are doing something vague then we need to decide what we are really
 going to follow. We need to follow one standard, then it might be either
 Oracle or something else.

 It might happened that the given input contains the Century. In this case,
 we should intelligent enough and decide not to add anything to the year. To
 avoid this situation, I still feel we have to follow the format given by the
 user. i.e.  if the number of digits specified exceeds the number of Y,
 lets throw an error. This will make our life easier  relatively easier to
 fix.

 Robert,
 If we follow the standard what Oracle is using, we will not break any
 case.

 One more observation in Oracle

 SQL SELECT to_char(TO_DATE('01-jan-0001',  'DD-MON-YY'), 'DD-MON-')
 from dual;

 TO_CHAR(TO_DATE('01-
 
 01-JAN-0001

 SQL  SELECT to_char(TO_DATE('01-jan-1',  'DD-MON-YY'), 'DD-MON-') from
 dual;

 TO_CHAR(TO_DATE('01-
 
 01-JAN-2001

 If we observe this closely, in the later case, century is not given in the
 input, hence Oracle is taking the current century. But the first case is bit
 puzzling. Because if we convert the '0001' to number it is still interprets
 as 1, then why it is not adding current century their ? I think it is
 checking the INPUT  accordingly taking the decision.

 Anyways, will it work if we check the input before deciding the century ?


 On Mon, Mar 21, 2011 at 10:35 PM, Robert Haas robertmh...@gmail.comwrote:

 On Mon, Mar 21, 2011 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  As far as I can see, that would completely destroy the use-case of
  trying to parse a string where there's not non-digit delimiters and
  so you have to take exactly the specified number of digits, not more.
 
  Yeah, I thought about that, but it seems that use case is already
  hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to
  worry about it.
 
  How so?
 
  regression=# select to_date('20110321', 'MMDD');
   to_date
  
   2011-03-21
  (1 row)
 
  regression=# select to_date('110321', 'YYMMDD');
   to_date
  
   2011-03-21
  (1 row)
 
  If you break the latter case, I am sure the villagers will be on your
  doorstep shortly.

 Oh, dear.  No wonder this code is so hard to get right.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company




 --
 --
 Piyush S Newe
 Principal Engineer
 EnterpriseDB
 office: +91 20 3058 9500
 www.enterprisedb.com

 Website: www.enterprisedb.com
 EnterpriseDB Blog: http://blogs.enterprisedb.com/
 Follow us on Twitter: http://www.twitter.com/enterprisedb

 This e-mail message (and any attachment) is intended for the use of the
 individual or entity to whom it is addressed. This message contains
 information from EnterpriseDB Corporation that may be privileged,
 confidential, or exempt from disclosure under applicable law. If you are not
 the intended recipient or authorized to receive this for the intended
 recipient, any use, dissemination, distribution, retention, archiving, or
 copying of this communication is strictly prohibited. If you have received
 this e-mail in error, please notify the sender immediately by reply e-mail
 and delete this message.




-- 
-- 
Piyush S Newe
Principal Engineer
EnterpriseDB
office: +91 20 3058 9500
www.enterprisedb.com

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: 

Re: [HACKERS] Rectifying wrong Date outputs

2011-03-22 Thread Piyush Newe
Thanks Heikki, Tom  Robert for your valuable inputs.

According to the code, PG is behaving what is mentioned below.

1. Format = Y
0 ... 9 = 2000 ... 2009  (we are always adding 2000 to the year)

2. Format = YY
00 ... 69 = 2000 ... 2069  (we are adding 2000 to the year)
70 ... 99 = 1970 ... 1999  (we are adding 1900 to the year)

3. Format = YYY
100 ... 999 = 1100 ... 1999 (we are adding 1000 to the year)
000 ... 099 = 2000 ... 2099 (we are adding 2000 to the year)

4. Format = 
Any number -4712 to 

If we closely look at the code, we can say that, if the given INPUT value is
not in the range of the DESIRED format (i.e. Y/YY/YYY/), then it results
some weired YEAR.

e.g.
1. TO_DATE('01-jan-2010',  'DD-MON-Y')
Here it falls in the 1st format case i.e. Y. As per the code, we are
blindly adding 2000 in the year value, hence the result is returned as
4010.

2. TO_DATE('01-jan-2010',  'DD-MON-YYY')
Here it falls in the 3rd case i.e. YYY. As per the code, without checking
the input we are adding 1000 to the value  hence it results 3010.

IMHO, before deciding the century we should check what is the INPUT. This
check is missing in the code. As Heikki said, we really don't have such
document anywhere. We need to atleast document what we are doing. However,
if we are doing something vague then we need to decide what we are really
going to follow. We need to follow one standard, then it might be either
Oracle or something else.

It might happened that the given input contains the Century. In this case,
we should intelligent enough and decide not to add anything to the year. To
avoid this situation, I still feel we have to follow the format given by the
user. i.e.  if the number of digits specified exceeds the number of Y,
lets throw an error. This will make our life easier  relatively easier to
fix.

Robert,
If we follow the standard what Oracle is using, we will not break any case.

One more observation in Oracle

SQL SELECT to_char(TO_DATE('01-jan-0001',  'DD-MON-YY'), 'DD-MON-')
from dual;

TO_CHAR(TO_DATE('01-

01-JAN-0001

SQL  SELECT to_char(TO_DATE('01-jan-1',  'DD-MON-YY'), 'DD-MON-') from
dual;

TO_CHAR(TO_DATE('01-

01-JAN-2001

If we observe this closely, in the later case, century is not given in the
input, hence Oracle is taking the current century. But the first case is bit
puzzling. Because if we convert the '0001' to number it is still interprets
as 1, then why it is not adding current century their ? I think it is
checking the INPUT  accordingly taking the decision.

Anyways, will it work if we check the input before deciding the century ?


On Mon, Mar 21, 2011 at 10:35 PM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Mar 21, 2011 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  As far as I can see, that would completely destroy the use-case of
  trying to parse a string where there's not non-digit delimiters and
  so you have to take exactly the specified number of digits, not more.
 
  Yeah, I thought about that, but it seems that use case is already
  hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to
  worry about it.
 
  How so?
 
  regression=# select to_date('20110321', 'MMDD');
   to_date
  
   2011-03-21
  (1 row)
 
  regression=# select to_date('110321', 'YYMMDD');
   to_date
  
   2011-03-21
  (1 row)
 
  If you break the latter case, I am sure the villagers will be on your
  doorstep shortly.

 Oh, dear.  No wonder this code is so hard to get right.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company




-- 
-- 
Piyush S Newe
Principal Engineer
EnterpriseDB
office: +91 20 3058 9500
www.enterprisedb.com

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are not
the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-21 Thread Heikki Linnakangas

On 21.03.2011 07:40, Piyush Newe wrote:

On Thu, Mar 17, 2011 at 7:56 PM, Tom Lanet...@sss.pgh.pa.us  wrote:


Robert Haasrobertmh...@gmail.com  writes:

On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
alvhe...@commandprompt.com  wrote:

Keep in mind that the datetime stuff was abandoned by the maintainer
some years ago with quite some rough edges.  Some of it has been fixed,
but a lot of bugs remain.  Looks like this is one of those places and it
seems appropriate to spend some time fixing it.  Since it would involve
a behavior change, it should only go to 9.2, of course.



I wouldn't object to fixing the problem with # of digits  # of Ys in
9.1, if the fix is simple and clear-cut.  I think we are still
accepting patches to make minor tweaks, like the tab-completion patch
I committed yesterday.  It also doesn't bother me tremendously if we
push it off, but I don't think that anyone's going to be too sad if
TO_DATE('01-jan-2010',  'DD-MON-YYY') starts returning something more
sensible than 3010-01-01.


Agreed, it's certainly not too late for bug fixes in 9.1.  I agree
that this isn't something we would want to tweak in released branches,
but 9.1 isn't there yet.



I feel the patch for the same would be easier and was attached in the
initial mail of this mail thread. For your ready reference, I am attaching
the same patch here again.



Having said that, it's not entirely clear to me what sane behavior is
here.  Personally I would expect that an n-Ys format spec would consume
at most n digits from the input.  Otherwise how are you going to use
to_date to pick apart strings that don't have any separators?


Yeah, seems reasonable.


 So
I think the problem is actually upstream of the behavior complained of
here.  However, what we should first do is see what Oracle does in such
cases, because the main driving factor for these functions is Oracle
compatibility not what might seem sane in a vacuum.


Following is the extended chart which is comparing the behavior of Oracle,
PG  EDBAS.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-1',  'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1',  'DD-MON-YY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1',  'DD-MON-YYY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1',  'DD-MON-') 01-JAN-0001 01-JAN-0001 01-JAN-0001

In this case, all the cases are in sync except the 1st one. I didn't
understand why Oracle is interpreting year '1' as '2011'.


It makes sense to me. Year 1, when dat format is Y, means the year 
closest to current date that ends with 1. Or maybe the year that ends 
with 1 in the current decade. This is analoguous to how two-digit years 
are interpreted (except that we've hardcoded that the current date to 
compare against is year 2000 - an assumption that will start to bite us 
some time before year 2100).



*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-10',  'DD-MON-Y') Error 01-JAN-2010 Error
TO_DATE('01-jan-10',  'DD-MON-YY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
TO_DATE('01-jan-10',  'DD-MON-YYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
TO_DATE('01-jan-10',  'DD-MON-') 01-JAN-0010 01-JAN-0010 01-JAN-0010

In this case, it seems in last 3 cases PostgreSQL is behaving correctly.
Oracle is throwing error in 1st case since the Format ('Y') is lesser than
the actual value ('10'). But PostgreSQL is ignoring this case and throwing
whatever is input. The output is might not be the same was user is
expecting.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-067',  'DD-MON-Y') Error 01-JAN-2067 Error
TO_DATE('01-jan-111',  'DD-MON-YY') 01-JAN-0111 01-JAN-2011 Error
TO_DATE('01-jan-678',  'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-2678
TO_DATE('01-jan-001',  'DD-MON-') 01-JAN-0001 01-JAN-0001 01-JAN-0001

In this case, just last case was correct in PG. Rest other cases are not in
sync with Oracle, rather the output is vague.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-2010',  'DD-MON-Y') Error 01-JAN-4010 Error
TO_DATE('01-jan-2010',  'DD-MON-YY') 01-JAN-2010 01-JAN-3910 Error
TO_DATE('01-jan-2010',  'DD-MON-YYY') Error 01-JAN-3010 Error
TO_DATE('01-jan-2010',  'DD-MON-') 01-JAN-2010 01-JAN-2010 01-JAN-2010

In this case, PG is giving wrong output in first 3 cases. Those need to get
rectified. Oracle is throwing error in 1st and 3rd case and the reason is,
the format is lesser than the actual value. It seems this rule is not
applicable for 2nd case in Oracle.


Yeah, quite inconsistent :-(.

These results are not in favor of the idea that a format with n Ys 
always consumess up to n digits from the input. With that rule, 
to_date('01-jan-2010', 'DD-MON-YY') would return 01-JAN-2020, which 
isn't what Oracle does and seems quite surprising too.


So ignoring the cases where Oracle throws an error but PostgreSQL 
doesn't, there's four cases where the results differ:


 *Data Format Oracle PostgreSQL EDBAS*
 TO_DATE('01-jan-1',  'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
 

Re: [HACKERS] Rectifying wrong Date outputs

2011-03-21 Thread Robert Haas
On Mon, Mar 21, 2011 at 6:24 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Having said that, it's not entirely clear to me what sane behavior is
 here.  Personally I would expect that an n-Ys format spec would consume
 at most n digits from the input.  Otherwise how are you going to use
 to_date to pick apart strings that don't have any separators?

 Yeah, seems reasonable.

On the flip side, what if you want to allow either a two digit year or
a four digit year?  It doesn't seem unreasonable to allow YY to
emcompass what  would have allowed, unless there's a separate
notion for 'either YY or '.

 It makes sense to me. Year 1, when dat format is Y, means the year
 closest to current date that ends with 1. Or maybe the year that ends with 1
 in the current decade. This is analoguous to how two-digit years are
 interpreted (except that we've hardcoded that the current date to compare
 against is year 2000 - an assumption that will start to bite us some time
 before year 2100).

Agree with all of this.

 So ignoring the cases where Oracle throws an error but PostgreSQL doesn't,
 there's four cases where the results differ:

 *Data Format Oracle PostgreSQL EDBAS*
 TO_DATE('01-jan-1',  'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
 TO_DATE('01-jan-111',  'DD-MON-YY') 01-JAN-0111 01-JAN-2011 Error
 TO_DATE('01-jan-678',  'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-2678
 TO_DATE('01-jan-2010',  'DD-MON-YY') 01-JAN-2010 01-JAN-3910 Error

 IMHO our current behavior in 2nd and 4th case is so bizarre that we should
 change them to match Oracle. I think we should fix the 1st too, the notion
 that a single-digit year means something between 2000-2009 seems pretty
 useless (granted, using a single digit for year is brain-dead to begin
 with).

I agree, but do we understand what Oracle does categorically, rather
than just its output on this specific input?

 The 3rd one is debatable. The range for three-digit years is currently
 1100-2099, which is enough range for many applications. But should we change
 it for the sake of matching Oracle's behavior? Not that anyone uses YYY in
 practice, but still.

I'm OK with that, but again, exactly what rule is Oracle applying here?

 BTW, whatever behavior we choose, this needs to be documented. I don't see
 anything in the docs on how Y, YY or YYY are expanded.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 21, 2011 at 6:24 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Having said that, it's not entirely clear to me what sane behavior is
 here.  Personally I would expect that an n-Ys format spec would consume
 at most n digits from the input.  Otherwise how are you going to use
 to_date to pick apart strings that don't have any separators?

 Yeah, seems reasonable.

 On the flip side, what if you want to allow either a two digit year or
 a four digit year?  It doesn't seem unreasonable to allow YY to
 emcompass what  would have allowed, unless there's a separate
 notion for 'either YY or '.

What I was thinking was that  would take either 2 or 4 digits.
Whatever you do here, the year will have to be delimited by a non-digit
for such cases to be parseable.

 I'm OK with that, but again, exactly what rule is Oracle applying here?

Yeah.  Hopefully they documented it, and we don't have to try to
reverse-engineer the intention from an undersized set of samples.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-21 Thread Robert Haas
On Mon, Mar 21, 2011 at 9:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 21, 2011 at 6:24 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Having said that, it's not entirely clear to me what sane behavior is
 here.  Personally I would expect that an n-Ys format spec would consume
 at most n digits from the input.  Otherwise how are you going to use
 to_date to pick apart strings that don't have any separators?

 Yeah, seems reasonable.

 On the flip side, what if you want to allow either a two digit year or
 a four digit year?  It doesn't seem unreasonable to allow YY to
 emcompass what  would have allowed, unless there's a separate
 notion for 'either YY or '.

 What I was thinking was that  would take either 2 or 4 digits.
 Whatever you do here, the year will have to be delimited by a non-digit
 for such cases to be parseable.

I was assuming a slightly more general variant of that - namely, Y,
YY, or YYY would all accept that many digits, or more; and the result
of Y with 2, 3, or 4 digits would be the same as if YY, YYY, or ,
respectively, had been used.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 21, 2011 at 9:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 What I was thinking was that  would take either 2 or 4 digits.
 Whatever you do here, the year will have to be delimited by a non-digit
 for such cases to be parseable.

 I was assuming a slightly more general variant of that - namely, Y,
 YY, or YYY would all accept that many digits, or more; and the result
 of Y with 2, 3, or 4 digits would be the same as if YY, YYY, or ,
 respectively, had been used.

As far as I can see, that would completely destroy the use-case of
trying to parse a string where there's not non-digit delimiters and
so you have to take exactly the specified number of digits, not more.

Why not head in the other direction of allowing fewer digits than
suggested by the format, instead of more?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-21 Thread Robert Haas
On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 21, 2011 at 9:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 What I was thinking was that  would take either 2 or 4 digits.
 Whatever you do here, the year will have to be delimited by a non-digit
 for such cases to be parseable.

 I was assuming a slightly more general variant of that - namely, Y,
 YY, or YYY would all accept that many digits, or more; and the result
 of Y with 2, 3, or 4 digits would be the same as if YY, YYY, or ,
 respectively, had been used.

 As far as I can see, that would completely destroy the use-case of
 trying to parse a string where there's not non-digit delimiters and
 so you have to take exactly the specified number of digits, not more.

Yeah, I thought about that, but it seems that use case is already
hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to
worry about it.  If Piyush's table is to be believed, Oracle only
throws an error for the wrong number of digits if the format is Y or
YYY, and the actual number of digits is more.  If the format is YY,
then it accepts 2, 3, or 4 digit years.  And since YY is exponentially
more likely to be used than Y or YYY, that pretty much means you can't
do what you're talking about using this syntax anyway.

 Why not head in the other direction of allowing fewer digits than
 suggested by the format, instead of more?

Well, that seems a bit counterintuitive to me.  I think it's much more
likely that someone wants to insist on a four-digit year (and not
allow just two digits) than that they want to insist on a two-digit
year (and not allow four digits).  I also think that would be pretty
terrible for Oracle compatibility, since they're clearly interpreting
99 vs.  as meaning either 0099, not 1999 or 2099.  I don't think
we want to be randomly incompatible there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 As far as I can see, that would completely destroy the use-case of
 trying to parse a string where there's not non-digit delimiters and
 so you have to take exactly the specified number of digits, not more.

 Yeah, I thought about that, but it seems that use case is already
 hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to
 worry about it.

How so?

regression=# select to_date('20110321', 'MMDD');
  to_date   

 2011-03-21
(1 row)

regression=# select to_date('110321', 'YYMMDD');
  to_date   

 2011-03-21
(1 row)

If you break the latter case, I am sure the villagers will be on your
doorstep shortly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-21 Thread Robert Haas
On Mon, Mar 21, 2011 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 As far as I can see, that would completely destroy the use-case of
 trying to parse a string where there's not non-digit delimiters and
 so you have to take exactly the specified number of digits, not more.

 Yeah, I thought about that, but it seems that use case is already
 hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to
 worry about it.

 How so?

 regression=# select to_date('20110321', 'MMDD');
  to_date
 
  2011-03-21
 (1 row)

 regression=# select to_date('110321', 'YYMMDD');
  to_date
 
  2011-03-21
 (1 row)

 If you break the latter case, I am sure the villagers will be on your
 doorstep shortly.

Oh, dear.  No wonder this code is so hard to get right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-20 Thread Piyush Newe
On Thu, Mar 17, 2011 at 7:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Robert Haas robertmh...@gmail.com writes:
  On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
  alvhe...@commandprompt.com wrote:
  Keep in mind that the datetime stuff was abandoned by the maintainer
  some years ago with quite some rough edges.  Some of it has been fixed,
  but a lot of bugs remain.  Looks like this is one of those places and it
  seems appropriate to spend some time fixing it.  Since it would involve
  a behavior change, it should only go to 9.2, of course.

  I wouldn't object to fixing the problem with # of digits  # of Ys in
  9.1, if the fix is simple and clear-cut.  I think we are still
  accepting patches to make minor tweaks, like the tab-completion patch
  I committed yesterday.  It also doesn't bother me tremendously if we
  push it off, but I don't think that anyone's going to be too sad if
  TO_DATE('01-jan-2010',  'DD-MON-YYY') starts returning something more
  sensible than 3010-01-01.

 Agreed, it's certainly not too late for bug fixes in 9.1.  I agree
 that this isn't something we would want to tweak in released branches,
 but 9.1 isn't there yet.


I feel the patch for the same would be easier and was attached in the
initial mail of this mail thread. For your ready reference, I am attaching
the same patch here again.


 Having said that, it's not entirely clear to me what sane behavior is
 here.  Personally I would expect that an n-Ys format spec would consume
 at most n digits from the input.  Otherwise how are you going to use
 to_date to pick apart strings that don't have any separators?  So
 I think the problem is actually upstream of the behavior complained of
 here.  However, what we should first do is see what Oracle does in such
 cases, because the main driving factor for these functions is Oracle
 compatibility not what might seem sane in a vacuum.


Following is the extended chart which is comparing the behavior of Oracle,
PG  EDBAS.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-1',  'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1',  'DD-MON-YY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1',  'DD-MON-YYY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1',  'DD-MON-') 01-JAN-0001 01-JAN-0001 01-JAN-0001

In this case, all the cases are in sync except the 1st one. I didn't
understand why Oracle is interpreting year '1' as '2011'.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-10',  'DD-MON-Y') Error 01-JAN-2010 Error
TO_DATE('01-jan-10',  'DD-MON-YY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
TO_DATE('01-jan-10',  'DD-MON-YYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
TO_DATE('01-jan-10',  'DD-MON-') 01-JAN-0010 01-JAN-0010 01-JAN-0010

In this case, it seems in last 3 cases PostgreSQL is behaving correctly.
Oracle is throwing error in 1st case since the Format ('Y') is lesser than
the actual value ('10'). But PostgreSQL is ignoring this case and throwing
whatever is input. The output is might not be the same was user is
expecting.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-067',  'DD-MON-Y') Error 01-JAN-2067 Error
TO_DATE('01-jan-111',  'DD-MON-YY') 01-JAN-0111 01-JAN-2011 Error
TO_DATE('01-jan-678',  'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-2678
TO_DATE('01-jan-001',  'DD-MON-') 01-JAN-0001 01-JAN-0001 01-JAN-0001

In this case, just last case was correct in PG. Rest other cases are not in
sync with Oracle, rather the output is vague.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-2010',  'DD-MON-Y') Error 01-JAN-4010 Error
TO_DATE('01-jan-2010',  'DD-MON-YY') 01-JAN-2010 01-JAN-3910 Error
TO_DATE('01-jan-2010',  'DD-MON-YYY') Error 01-JAN-3010 Error
TO_DATE('01-jan-2010',  'DD-MON-') 01-JAN-2010 01-JAN-2010 01-JAN-2010

In this case, PG is giving wrong output in first 3 cases. Those need to get
rectified. Oracle is throwing error in 1st and 3rd case and the reason is,
the format is lesser than the actual value. It seems this rule is not
applicable for 2nd case in Oracle.

In all above mentioned cases, the observation is, If the # Ys are lesser
than the # of digits,, then it should throw an error. Only in case of 'YY',
its not correct, unless the year is later than . In this way, we can fix
the wrong outputs in PG.





regards, tom lane




-- 
-- 
Piyush S Newe
Principal Engineer
EnterpriseDB
office: +91 20 3058 9500
www.enterprisedb.com

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are not
the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, 

Re: [HACKERS] Rectifying wrong Date outputs

2011-03-17 Thread Alvaro Herrera
Excerpts from Piyush Newe's message of jue mar 17 02:30:06 -0300 2011:
 Sorry for creating the confusion. The table drawn was PostgreSQL vs EDB
 Advanced Server.
 Thanks Burce for clarification.
 
 For the 1-digit, 2-digit  3-digit Year inputs, as I said, I didn't see any
 document in PG which will explain what would be the century considered if it
 is not given. If I missed out it somewhere please let me know.

Keep in mind that the datetime stuff was abandoned by the maintainer
some years ago with quite some rough edges.  Some of it has been fixed,
but a lot of bugs remain.  Looks like this is one of those places and it
seems appropriate to spend some time fixing it.  Since it would involve
a behavior change, it should only go to 9.2, of course.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-17 Thread Robert Haas
On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Piyush Newe's message of jue mar 17 02:30:06 -0300 2011:
 Sorry for creating the confusion. The table drawn was PostgreSQL vs EDB
 Advanced Server.
 Thanks Burce for clarification.

 For the 1-digit, 2-digit  3-digit Year inputs, as I said, I didn't see any
 document in PG which will explain what would be the century considered if it
 is not given. If I missed out it somewhere please let me know.

 Keep in mind that the datetime stuff was abandoned by the maintainer
 some years ago with quite some rough edges.  Some of it has been fixed,
 but a lot of bugs remain.  Looks like this is one of those places and it
 seems appropriate to spend some time fixing it.  Since it would involve
 a behavior change, it should only go to 9.2, of course.

I wouldn't object to fixing the problem with # of digits  # of Ys in
9.1, if the fix is simple and clear-cut.  I think we are still
accepting patches to make minor tweaks, like the tab-completion patch
I committed yesterday.  It also doesn't bother me tremendously if we
push it off, but I don't think that anyone's going to be too sad if
TO_DATE('01-jan-2010',  'DD-MON-YYY') starts returning something more
sensible than 3010-01-01.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-17 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 Keep in mind that the datetime stuff was abandoned by the maintainer
 some years ago with quite some rough edges.  Some of it has been fixed,
 but a lot of bugs remain.  Looks like this is one of those places and it
 seems appropriate to spend some time fixing it.  Since it would involve
 a behavior change, it should only go to 9.2, of course.

 I wouldn't object to fixing the problem with # of digits  # of Ys in
 9.1, if the fix is simple and clear-cut.  I think we are still
 accepting patches to make minor tweaks, like the tab-completion patch
 I committed yesterday.  It also doesn't bother me tremendously if we
 push it off, but I don't think that anyone's going to be too sad if
 TO_DATE('01-jan-2010',  'DD-MON-YYY') starts returning something more
 sensible than 3010-01-01.

Agreed, it's certainly not too late for bug fixes in 9.1.  I agree
that this isn't something we would want to tweak in released branches,
but 9.1 isn't there yet.

Having said that, it's not entirely clear to me what sane behavior is
here.  Personally I would expect that an n-Ys format spec would consume
at most n digits from the input.  Otherwise how are you going to use
to_date to pick apart strings that don't have any separators?  So
I think the problem is actually upstream of the behavior complained of
here.  However, what we should first do is see what Oracle does in such
cases, because the main driving factor for these functions is Oracle
compatibility not what might seem sane in a vacuum.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-17 Thread Alvaro Herrera
Excerpts from Robert Haas's message of jue mar 17 11:09:56 -0300 2011:
 On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:

  Keep in mind that the datetime stuff was abandoned by the maintainer
  some years ago with quite some rough edges.  Some of it has been fixed,
  but a lot of bugs remain.  Looks like this is one of those places and it
  seems appropriate to spend some time fixing it.  Since it would involve
  a behavior change, it should only go to 9.2, of course.
 
 I wouldn't object to fixing the problem with # of digits  # of Ys in
 9.1, if the fix is simple and clear-cut.  I think we are still
 accepting patches to make minor tweaks, like the tab-completion patch
 I committed yesterday.  It also doesn't bother me tremendously if we
 push it off, but I don't think that anyone's going to be too sad if
 TO_DATE('01-jan-2010',  'DD-MON-YYY') starts returning something more
 sensible than 3010-01-01.

If it can be delivered quickly and it is simple, sure.  But anything
more involved should respect the release schedule.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-17 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 what we should first do is see what Oracle does in such cases,
 because the main driving factor for these functions is Oracle
 compatibility not what might seem sane in a vacuum.
 
+1
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-16 Thread Robert Haas
On Wed, Mar 16, 2011 at 8:21 AM, Piyush Newe
piyush.n...@enterprisedb.com wrote:
 Data Format                 PostgreSQL EDBAS
 TO_DATE('01-jan-10',  'DD-MON-Y')         2010-01-01 Error
 TO_DATE('01-jan-10',  'DD-MON-YY')         2010-01-01 01-JAN-2010
 TO_DATE('01-jan-10',  'DD-MON-YYY') 2010-01-01 01-JAN-2010
 TO_DATE('01-jan-10',  'DD-MON-') 0010-01-01 01-JAN-0010
 In this case, it seems in last 3 cases PG is behaving correctly. Whereas in
 1st case the output is not correct since the Format ('Y') is lesser than the
 actual input ('10'). But PG is ignoring this condition and throwing whatever
 is input. The output year is might not be the year, what user is expecting.
 Hence PG should throw an error.

I can't get worked up about this.  If there's a consensus that
throwing an error here is better, fine, but on first blush the PG
behavior doesn't look unreasonable to me.

 Data Format                 PostgreSQL EDBAS
 TO_DATE('01-jan-2010',  'DD-MON-Y') 4010-01-01 Error
 TO_DATE('01-jan-2010',  'DD-MON-YY') 3910-01-01 Error
 TO_DATE('01-jan-2010',  'DD-MON-YYY') 3010-01-01 Error
 TO_DATE('01-jan-2010',  'DD-MON-') 2010-01-01 01-JAN-2010

These cases look a lot stranger.  I'm inclined to think that if the
number of digits specified exceeds the number of Ys, then we can
either (1) throw an error, as you suggest or (2) give the same answer
we would have given if the number of Ys were equal to the number of
digits given.  In other words, if we're not going to error out here,
all of these should return 2010-01-01.

 Data Format PostgreSQL EDBAS
 TO_DATE('01-jan-067',  'DD-MON-Y') 2067-01-01 Error
 TO_DATE('01-jan-111',  'DD-MON-YY') 2011-01-01 Error
 TO_DATE('01-jan-678',  'DD-MON-YYY') 1678-01-01 01-JAN-2678
 TO_DATE('01-jan-001',  'DD-MON-') 0001-01-01 01-JAN-0001

These are so strange that it's hard to reason about them; who uses
three-digit years?In the third case above, you should EDBAS
deciding that 678 means 2678 instead of 1678, but that seems quite
arbitrary.  1678 seems just as plausible.  But the behavior in the
second case looks wrong (shouldn't the answer should be either  or
2111?) and the first case looks inconsistent with the third one (why
does 067 mean 2067 rather than 1967 while 678 means 1678 rather than
2678?).

I'm inclined to think that we have a bug here in the case where the #
of digits given is greater than the # of Ys.  See also this:

rhaas=# select to_date('01-jan-678', 'DD-MON-Y');
  to_date

 2678-01-01
(1 row)

rhaas=# select to_date('01-jan-678', 'DD-MON-YY');
  to_date

 2578-01-01
(1 row)

rhaas=# select to_date('01-jan-678', 'DD-MON-YYY');
  to_date

 1678-01-01
(1 row)

It's a lot less clear to me that we have a bug in the other direction
(# of digits given is less than the # of Ys), but maybe

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-16 Thread Bruce Momjian
Robert Haas wrote:
 On Wed, Mar 16, 2011 at 8:21 AM, Piyush Newe
 piyush.n...@enterprisedb.com wrote:
  Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS
  TO_DATE('01-jan-10', ?'DD-MON-Y') ?? ? ? ?2010-01-01 Error
  TO_DATE('01-jan-10', ?'DD-MON-YY') ?? ? ? ?2010-01-01 01-JAN-2010
  TO_DATE('01-jan-10', ?'DD-MON-YYY') 2010-01-01 01-JAN-2010
  TO_DATE('01-jan-10', ?'DD-MON-') 0010-01-01 01-JAN-0010
  In this case, it seems in last 3 cases PG is behaving correctly. Whereas in
  1st case the output is not correct since the Format ('Y') is lesser than the
  actual input ('10'). But PG is ignoring this condition and throwing whatever
  is input. The output year is might not be the year, what user is expecting.
  Hence PG should throw an error.
 
 I can't get worked up about this.  If there's a consensus that
 throwing an error here is better, fine, but on first blush the PG
 behavior doesn't look unreasonable to me.
 
  Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS

To clarify, the user is reporting EDB Advanced Server, though the
community PG has the same issues, or at least similar;  with git HEAD:

test= SELECT TO_DATE('01-jan-2010',  'DD-MON-YY');
  to_date

 3910-01-01
(1 row)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-16 Thread Robert Haas
On Wed, Mar 16, 2011 at 5:52 PM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 On Wed, Mar 16, 2011 at 8:21 AM, Piyush Newe
 piyush.n...@enterprisedb.com wrote:
  Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS
  TO_DATE('01-jan-10', ?'DD-MON-Y') ?? ? ? ?2010-01-01 Error
  TO_DATE('01-jan-10', ?'DD-MON-YY') ?? ? ? ?2010-01-01 01-JAN-2010
  TO_DATE('01-jan-10', ?'DD-MON-YYY') 2010-01-01 01-JAN-2010
  TO_DATE('01-jan-10', ?'DD-MON-') 0010-01-01 01-JAN-0010
  In this case, it seems in last 3 cases PG is behaving correctly. Whereas in
  1st case the output is not correct since the Format ('Y') is lesser than 
  the
  actual input ('10'). But PG is ignoring this condition and throwing 
  whatever
  is input. The output year is might not be the year, what user is expecting.
  Hence PG should throw an error.

 I can't get worked up about this.  If there's a consensus that
 throwing an error here is better, fine, but on first blush the PG
 behavior doesn't look unreasonable to me.

  Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS

 To clarify, the user is reporting EDB Advanced Server, though the
 community PG has the same issues, or at least similar;  with git HEAD:

        test= SELECT TO_DATE('01-jan-2010',  'DD-MON-YY');
          to_date
        
         3910-01-01
        (1 row)

Actually, I think he's comparing PostgreSQL to Advanced Server.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-16 Thread Bruce Momjian
Robert Haas wrote:
 On Wed, Mar 16, 2011 at 5:52 PM, Bruce Momjian br...@momjian.us wrote:
  Robert Haas wrote:
  On Wed, Mar 16, 2011 at 8:21 AM, Piyush Newe
  piyush.n...@enterprisedb.com wrote:
   Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS
   TO_DATE('01-jan-10', ?'DD-MON-Y') ?? ? ? ?2010-01-01 Error
   TO_DATE('01-jan-10', ?'DD-MON-YY') ?? ? ? ?2010-01-01 01-JAN-2010
   TO_DATE('01-jan-10', ?'DD-MON-YYY') 2010-01-01 01-JAN-2010
   TO_DATE('01-jan-10', ?'DD-MON-') 0010-01-01 01-JAN-0010
   In this case, it seems in last 3 cases PG is behaving correctly. Whereas 
   in
   1st case the output is not correct since the Format ('Y') is lesser than 
   the
   actual input ('10'). But PG is ignoring this condition and throwing 
   whatever
   is input. The output year is might not be the year, what user is 
   expecting.
   Hence PG should throw an error.
 
  I can't get worked up about this. ?If there's a consensus that
  throwing an error here is better, fine, but on first blush the PG
  behavior doesn't look unreasonable to me.
 
   Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS
 
  To clarify, the user is reporting EDB Advanced Server, though the
  community PG has the same issues, or at least similar; ?with git HEAD:
 
  ? ? ? ?test= SELECT TO_DATE('01-jan-2010', ?'DD-MON-YY');
  ? ? ? ? ?to_date
  ? ? ? ?
  ? ? ? ? 3910-01-01
  ? ? ? ?(1 row)
 
 Actually, I think he's comparing PostgreSQL to Advanced Server.

Oh, I understand now.  I was confused that the headings didn't line up
with the values.  I see now the first value is community PG and the
second is EDBAS.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-16 Thread Piyush Newe
Sorry for creating the confusion. The table drawn was PostgreSQL vs EDB
Advanced Server.
Thanks Burce for clarification.

For the 1-digit, 2-digit  3-digit Year inputs, as I said, I didn't see any
document in PG which will explain what would be the century considered if it
is not given. If I missed out it somewhere please let me know.

I refer to following link which explains the behavior of Oracle.
http://forums.oracle.com/forums/thread.jspa?threadID=312239

Now, if
1. # of digits given is greater than the # of Ys

i.e.
postgres=# select to_date('01-jan-111', 'DD-MON-Y');
  to_date

 2111-01-01
(1 row)

What we should do ? Either we should throw an error or we should give what
user has provided.
IMHO, we should follow what format is given by user. However, even if the
'format' gets wrong rather invalid, it is not throwing any error.

e.g.
postgres=# select to_date('01-jan-111', 'DD-MON-Y POSTGRES');   Look
at this
  to_date

 2111-01-01
(1 row)



2. # of digits given is less than the # of Ys
Consider following case,

postgres=# select to_date('01-jan-6', 'DD-MON-Y'), to_date('01-jan-6',
'DD-MON-');
  to_date   |  to_date
+
 2006-01-01 | 0006-01-01


Why this behaviour not predictable ? I think we are always considering the
current century, if it is not provided. If I missed out any related
document, please share.

And yes,

postgres=# select to_date('01-jan-1761', 'DD-MON-Y');
  to_date

 3761-01-01- Look at this.
(1 row)

Definitely, their is a bug in this case.

Am I missing something ?

-Piyush

On Thu, Mar 17, 2011 at 3:30 AM, Bruce Momjian br...@momjian.us wrote:

 Robert Haas wrote:
  On Wed, Mar 16, 2011 at 5:52 PM, Bruce Momjian br...@momjian.us wrote:
   Robert Haas wrote:
   On Wed, Mar 16, 2011 at 8:21 AM, Piyush Newe
   piyush.n...@enterprisedb.com wrote:
Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS
TO_DATE('01-jan-10', ?'DD-MON-Y') ?? ? ? ?2010-01-01 Error
TO_DATE('01-jan-10', ?'DD-MON-YY') ?? ? ? ?2010-01-01 01-JAN-2010
TO_DATE('01-jan-10', ?'DD-MON-YYY') 2010-01-01 01-JAN-2010
TO_DATE('01-jan-10', ?'DD-MON-') 0010-01-01 01-JAN-0010
In this case, it seems in last 3 cases PG is behaving correctly.
 Whereas in
1st case the output is not correct since the Format ('Y') is lesser
 than the
actual input ('10'). But PG is ignoring this condition and throwing
 whatever
is input. The output year is might not be the year, what user is
 expecting.
Hence PG should throw an error.
  
   I can't get worked up about this. ?If there's a consensus that
   throwing an error here is better, fine, but on first blush the PG
   behavior doesn't look unreasonable to me.
  
Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS
  
   To clarify, the user is reporting EDB Advanced Server, though the
   community PG has the same issues, or at least similar; ?with git HEAD:
  
   ? ? ? ?test= SELECT TO_DATE('01-jan-2010', ?'DD-MON-YY');
   ? ? ? ? ?to_date
   ? ? ? ?
   ? ? ? ? 3910-01-01
   ? ? ? ?(1 row)
 
  Actually, I think he's comparing PostgreSQL to Advanced Server.

 Oh, I understand now.  I was confused that the headings didn't line up
 with the values.  I see now the first value is community PG and the
 second is EDBAS.

 --
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +




-- 
-- 
Piyush S Newe
Principal Engineer
EnterpriseDB
office: +91 20 3058 9500
www.enterprisedb.com

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are not
the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.