Re: [HACKERS] Simplifying timezone support

2003-03-02 Thread Ross J. Reedstrom
According to my sent folder, this went out Monday afternoon, but I
haven't seen it yet, so I'm resending to the list only, without the
attached patch.  I'll send the patch over to patches.

Any comment on the behavior, specifically, the heuristic for deciding
tzset() failed, and the proposed order of application of tzset()
vs. table lookup?

Ross

On Mon, Feb 24, 2003 at 03:34:56PM -0600, Ross J. Reedstrom wrote:
 On Fri, Feb 21, 2003 at 08:39:12PM -0600, Ross J. Reedstrom wrote:
  
  Every other validly formatted TZ variable that returns GMT should be
  caught be the datetktbl check.
  
  I'll play with it this weekend, see how hard it is to make it work.
 
 O.K., the weekend's over, And I've created two different version
 of this.  Both work, ipass all the regression test, and solve the
 'CST is just a funny way to say GMT' problem.  I was able to make use
 of DecodePosixTimezone (DPT) from Thomas's datetime parsing code in
 assign_timezone. However, the order of application of this vis. tzset
 is unclear.
 
 I had proposed doing the DPT first, then tzset, then a NOTICE if it
 looked like tzset didn't. Got that working, but discovered a change of
 behavior: for some of those who have a timezone in the zoneinfo database
 that is a three letter abbreviation, the current code (tzset only) will
 provide daylight savings time transitions, so that a timestamp in July
 returns a different timezone than one in February.  This is not true for
 our internal values of set time zone: there, we convert to a numerical
 offset, which is constant no matter when the timestamp occurs.
 
 This is still a win for those who's timezone abbreviation is _not_ in the
 zoneinfo DB, (such as CST), which currently is silently interpreted as
 an odd spelling of GMT.
 
 Second solution - try tzset() first, and apply the following heuristic
 to see if it took:
 
 tzname[0]==$TZ and tzname[1]== and timezone=0 and daylight=0
 
 In other words, _all_ the timezone related information remains the
 default.  I tested this against the 1607 zoneinfo files on my system:
 every one was filtered out, even things that _are_ GMT with no DST (they
 all had a non-null tzname[1] == tzname[0])
 
 If this succeeds (i.e. tzset didn't recognize the TZ), go ahead and look
 it up in our big table'o date/time strings. This also works, fixing the
 bogus GMT spellings, without changing current behavior for any string
 that is not bogus.
 
 Note that the sysadmin can always tell if tzset or the table was used, by
 looking at the format of the 'show time zone' result. If tzset was called,
 this is the string that was passed to 'set time zone'. If the table was
 used, it will be an hours west of GMT offset.
 
 The problem with this approach is that it does nothing to reduce our 
 dependency on the OS timezone functionality.
 
 Comments? I've attached the second patch for discussion.
 
 Ross
 

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [HACKERS] Simplifying timezone support

2003-03-02 Thread Ross J. Reedstrom
On Fri, Feb 21, 2003 at 08:39:12PM -0600, Ross J. Reedstrom wrote:
 
 Every other validly formatted TZ variable that returns GMT should be
 caught be the datetktbl check.
 
 I'll play with it this weekend, see how hard it is to make it work.

O.K., the weekend's over, And I've created two different version
of this.  Both work, ipass all the regression test, and solve the
'CST is just a funny way to say GMT' problem.  I was able to make use
of DecodePosixTimezone (DPT) from Thomas's datetime parsing code in
assign_timezone. However, the order of application of this vis. tzset
is unclear.

I had proposed doing the DPT first, then tzset, then a NOTICE if it
looked like tzset didn't. Got that working, but discovered a change of
behavior: for some of those who have a timezone in the zoneinfo database
that is a three letter abbreviation, the current code (tzset only) will
provide daylight savings time transitions, so that a timestamp in July
returns a different timezone than one in February.  This is not true for
our internal values of set time zone: there, we convert to a numerical
offset, which is constant no matter when the timestamp occurs.

This is still a win for those who's timezone abbreviation is _not_ in the
zoneinfo DB, (such as CST), which currently is silently interpreted as
an odd spelling of GMT.

Second solution - try tzset() first, and apply the following heuristic
to see if it took:

tzname[0]==$TZ and tzname[1]== and timezone=0 and daylight=0

In other words, _all_ the timezone related information remains the
default.  I tested this against the 1607 zoneinfo files on my system:
every one was filtered out, even things that _are_ GMT with no DST (they
all had a non-null tzname[1] == tzname[0])

If this succeeds (i.e. tzset didn't recognize the TZ), go ahead and look
it up in our big table'o date/time strings. This also works, fixing the
bogus GMT spellings, without changing current behavior for any string
that is not bogus.

Note that the sysadmin can always tell if tzset or the table was used, by
looking at the format of the 'show time zone' result. If tzset was called,
this is the string that was passed to 'set time zone'. If the table was
used, it will be an hours west of GMT offset.

The problem with this approach is that it does nothing to reduce our 
dependency on the OS timezone functionality.

Comments? I've attached the second patch for discussion.

Ross

Index: src//backend/commands/variable.c
===
RCS file: /projects/cvsroot/pgsql-server/src/backend/commands/variable.c,v
retrieving revision 1.73
diff -c -r1.73 variable.c
*** src//backend/commands/variable.c2003/02/01 18:31:28 1.73
--- src//backend/commands/variable.c2003/02/24 21:29:44
***
*** 243,252 
  const char *
  assign_timezone(const char *value, bool doit, bool interactive)
  {
!   char   *result;
!   char   *endptr;
double  hours;
  
/*
 * Check for INTERVAL 'foo'
 */
--- 243,257 
  const char *
  assign_timezone(const char *value, bool doit, bool interactive)
  {
!   char*result;
!   char*endptr;
!   chartztmp[sizeof(tzbuf)];
!   char*lp;
!   const char  *cp;
double  hours;
+   int tzval,i;
  
+ 
/*
 * Check for INTERVAL 'foo'
 */
***
*** 337,348 
else
{
/*
!* Otherwise assume it is a timezone name.
 *
!* XXX unfortunately we have no reasonable way to check 
whether a
!* timezone name is good, so we have to just assume that it
!* is.
 */
if (doit)
{
strcpy(tzbuf, TZ=);
--- 342,360 
else
{
/*
!* Otherwise assume it is a timezone name. 
!* Try tzset() first. If that fails, see if our internal 
!* table of timezone names, can handle it.
 *
!* XXX unfortunately we have only an approximate way to check 
!* whether a timezone name is good: if the value of TZ is 
returned
!* as the canonical tzname, there is no daylight savings time 
!* (flag or canonical second name) _and_ the offset is GMT,  
!* tzset() punted, i.e., returned all default values. 
!* If that happens, try the internal table - if _that_ fails,
!* throw a notice.rjr 2003/02/23
 */
+ 
if (doit)
{
   

Re: [HACKERS] Simplifying timezone support

2003-02-28 Thread Ross J. Reedstrom
On Thu, Feb 20, 2003 at 04:19:21PM -0500, Tom Lane wrote:
 Ross J. Reedstrom [EMAIL PROTECTED] writes:
  On Thu, Feb 20, 2003 at 03:21:09PM -0500, Tom Lane wrote:
  Provide a portable way of getting libc to tell us whether it likes TZ,
  and I'll be glad to fix this.
 
  Dang that lovely word 'portable'. However, given your proposed change,
  perhaps the hurdle for portable time handling is now lower: it seems we've
  not been exposed to as broad a range of broken systems as in the past.
 
 On this particular point my threshold of 'portable' is actually pretty
 low, as long as it's fail-soft.  Failure to detect bad TZ on some
 systems would leave them no worse off than before, right?
 
 But I haven't seen *any* published API that directly tells you whether
 tzset liked TZ or not --- AFAICT it's supposed to just silently
 substitute GMT.  Which would be okay if GMT were the only allowed
 spelling of GMT, but it ain't ...

I've been digging in the date and time code a bit, and now have a proposal
for dealing with SET TIME ZONE 'someunknownstring'.  First, we use the
time token table from the time constant parser in utils/adt/datetime.c
to see if we've got a recognized time zone abbreviation. If it is,
we generate a canonical POSIX timezone name for use in setting TZ,
call tzset(), and we're done.

If the time zone came back UNKOWN, we go ahead and see if tzset() can
interpret it. Criteria for failure: if the timezone offset came back 0,
and the reported tzname[0] is the same as the string that we passed in. If
it does, we fire a NOTICE about an unknown spelling of GMT. Note that we
would have already caught all _known_ spellings of GMT in the first step,
so we won't be spamming the DBA with warnings about 'GMT' and 'UTC', etc.

An extension to this would be to use the tzset() trick above directly
in the datetime constant parser, as a fallback after not matching the
table. In that case, we'd probably want to treat the unknown spelling
of GMT as an error, though (as it currently does).

Thoughts? If this seems acceptable, I can implement it this weekend.

Ross

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html


Re: [HACKERS] Simplifying timezone support

2003-02-28 Thread Ross J. Reedstrom
On Fri, Feb 21, 2003 at 05:45:53PM -0600, Ross J. Reedstrom wrote:
 On Fri, Feb 21, 2003 at 06:15:31PM -0500, Tom Lane wrote:
  Ross J. Reedstrom [EMAIL PROTECTED] writes:
 snip
  
  I'm worried about cases like Africa/Benin for places that just happen
  to be on the prime meridian, but don't call their time GMT or UTC.
  Looking at a globe, it also seems possible that there are places an hour
  west of Greenwich, for which this could fail during daylight-savings
  season.
 
 Well, that'll either get caught by the existing table (we've got six
 different spellings of GMT, currently) or by the 'string in != string out'
 case - the zoneinfo format requires a 3 or more character abbreviation
 for the time zone. For every case I'v looked at in my zoneinfo directory,
 it's either 3 or 4 uppercase characters, and _never_ matches the filename
 path string used to set it. I'll do an exhaustive test after dinner.

O.K., I've run the test: of the 1108 files in my zoneinfo database,
only 11 have matching filenames to the canonical name returned after
setting TZ.  Of those 11, 4 are some version of GMT (GMT, UCT, UTC,
WET), of which, one is in fact missing from our table - UCT. At minimum,
I'll add that.

Every other validly formatted TZ variable that returns GMT should be
caught be the datetktbl check.

I'll play with it this weekend, see how hard it is to make it work.

Ross

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Simplifying timezone support

2003-02-28 Thread Ross J. Reedstrom
On Fri, Feb 21, 2003 at 06:15:31PM -0500, Tom Lane wrote:
 Ross J. Reedstrom [EMAIL PROTECTED] writes:
snip
 
 I'm worried about cases like Africa/Benin for places that just happen
 to be on the prime meridian, but don't call their time GMT or UTC.
 Looking at a globe, it also seems possible that there are places an hour
 west of Greenwich, for which this could fail during daylight-savings
 season.

Well, that'll either get caught by the existing table (we've got six
different spellings of GMT, currently) or by the 'string in != string out'
case - the zoneinfo format requires a 3 or more character abbreviation
for the time zone. For every case I'v looked at in my zoneinfo directory,
it's either 3 or 4 uppercase characters, and _never_ matches the filename
path string used to set it. I'll do an exhaustive test after dinner.

 
  An extension to this would be to use the tzset() trick above directly
  in the datetime constant parser, as a fallback after not matching the
  table. In that case, we'd probably want to treat the unknown spelling
  of GMT as an error, though (as it currently does).
 
 I think tzset() is probably much too slow to consider calling on every
 pass through timestamptz_in ...

It wouldn't happen on every call - only with funky timezone
representations.  We could NOTICE use of tzset(), as well, to alert the
DBA about something fishy, if you'd like.

Ross

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [HACKERS] Simplifying timezone support

2003-02-21 Thread Ross J. Reedstrom
On Thu, Feb 20, 2003 at 03:21:09PM -0500, Tom Lane wrote:
 Ross J. Reedstrom [EMAIL PROTECTED] writes:
  question about pgsql's time zone parsers. It appears there's at least
  two, since SET TIME ZONE accepts strings like 'US/Eastern', while general
  timestamp parsing doesn't:
 
 The TIME ZONE string is fed to libc (via TZ environment variable); the
 other cases are not.
 
  SET TIME ZONE will silently accept any string at all, and fall back to
  providing GMT when a timestamptz is requested.
 
 Provide a portable way of getting libc to tell us whether it likes TZ,
 and I'll be glad to fix this.

Dang that lovely word 'portable'. However, given your proposed change,
perhaps the hurdle for portable time handling is now lower: it seems we've
not been exposed to as broad a range of broken systems as in the past.
I'll look at it. but no promises.

 Ultimately we should probably get rid of our dependence on the libc
 time routines altogether ... but I have no intention of opening that
 can of worms right now.  See past discussions in the archives.

Agreed. I see we're inheriting the actually misleading case from the
OS/libc, as well:

wallace$ unset TZ
wallace$ date
Thu Feb 20 15:00:04 CST 2003
wallace$ export TZ=US/Central
wallace$ date
Thu Feb 20 15:00:16 CST 2003
wallace$ export TZ=US/Zanzibar
wallace$ date
Thu Feb 20 21:00:33 US/Zanzibar 2003
wallace$ export TZ=CST
wallace$ date
Thu Feb 20 21:00:42 CST 2003
wallace$ 

Ross

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Simplifying timezone support

2003-02-21 Thread Tom Lane
Ross J. Reedstrom [EMAIL PROTECTED] writes:
 If the time zone came back UNKOWN, we go ahead and see if tzset() can
 interpret it. Criteria for failure: if the timezone offset came back 0,
 and the reported tzname[0] is the same as the string that we passed in. If
 it does, we fire a NOTICE about an unknown spelling of GMT. Note that we
 would have already caught all _known_ spellings of GMT in the first step,
 so we won't be spamming the DBA with warnings about 'GMT' and 'UTC', etc.

I'm worried about cases like Africa/Benin for places that just happen
to be on the prime meridian, but don't call their time GMT or UTC.
Looking at a globe, it also seems possible that there are places an hour
west of Greenwich, for which this could fail during daylight-savings
season.

 An extension to this would be to use the tzset() trick above directly
 in the datetime constant parser, as a fallback after not matching the
 table. In that case, we'd probably want to treat the unknown spelling
 of GMT as an error, though (as it currently does).

I think tzset() is probably much too slow to consider calling on every
pass through timestamptz_in ...

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


Re: [HACKERS] Simplifying timezone support

2003-02-20 Thread Ross J. Reedstrom
On Wed, Feb 19, 2003 at 10:35:58PM -0500, Tom Lane wrote:
snip Tom discussion backend internal tracking of timezone 
 Any objections?

Not to your suggestion per se, but looking at the bug report raises a
question about pgsql's time zone parsers. It appears there's at least
two, since SET TIME ZONE accepts strings like 'US/Eastern', while general
timestamp parsing doesn't:

test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 CST';
 timestamptz  
--
 2003-02-18 09:36:06.00933-06
(1 row)

test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST';
 timestamptz  
--
 2003-02-18 08:36:06.00933-06
(1 row)

test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 US/Eastern';
ERROR:  Bad timestamp external representation '2003/02/18 09:36:06.00933 US/Eastern'

Further testing says it's even worse that that: 

SET TIME ZONE will silently accept any string at all, and fall back to
providing GMT when a timestamptz is requested. This includes the TLA
TZ abbreviations that the constant parsing code understands, like CST
and EST.

test=# set TIME ZONE 'CST';
SET
test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST';
 timestamptz  
--
 2003-02-18 14:36:06.00933+00
(1 row)

test=# set TIME ZONE 'FOOBAR';
SET
test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST';
 timestamptz  
--
 2003-02-18 14:36:06.00933+00
(1 row)

Here's an especially fun one: with DATESTYLE set to 'Postgresql,US', whatever
string is handed to SET TIME ZONE comes out the other end, if it can't
be parsed:

test=# set TIME ZONE 'FOOBAR';
SET
test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST';
  timestamptz  
---
 Tue Feb 18 14:36:06.00933 2003 FOOBAR
(1 row)


Leading to this erroneous pair:

test=# set TIME ZONE 'US/Central';
SET
test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST';
timestamptz 

 Tue Feb 18 08:36:06.00933 2003 CST
(1 row)

test=# set TIME ZONE 'CST';
SET
test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST';
timestamptz 

 Tue Feb 18 14:36:06.00933 2003 CST
(1 row)

test=# 

Tom, since you're in (or near) that code right now, how painful would
it be to unify the time zone parsing? What's the correct behavior?
Certainly SET TIME ZONE should at leat NOTICE about invalide time zone
names?

Ross

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Simplifying timezone support

2003-02-20 Thread Tom Lane
Ross J. Reedstrom [EMAIL PROTECTED] writes:
 question about pgsql's time zone parsers. It appears there's at least
 two, since SET TIME ZONE accepts strings like 'US/Eastern', while general
 timestamp parsing doesn't:

The TIME ZONE string is fed to libc (via TZ environment variable); the
other cases are not.

 SET TIME ZONE will silently accept any string at all, and fall back to
 providing GMT when a timestamptz is requested.

Provide a portable way of getting libc to tell us whether it likes TZ,
and I'll be glad to fix this.

Ultimately we should probably get rid of our dependence on the libc
time routines altogether ... but I have no intention of opening that
can of worms right now.  See past discussions in the archives.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Simplifying timezone support

2003-02-20 Thread Tom Lane
Ross J. Reedstrom [EMAIL PROTECTED] writes:
 On Thu, Feb 20, 2003 at 03:21:09PM -0500, Tom Lane wrote:
 Provide a portable way of getting libc to tell us whether it likes TZ,
 and I'll be glad to fix this.

 Dang that lovely word 'portable'. However, given your proposed change,
 perhaps the hurdle for portable time handling is now lower: it seems we've
 not been exposed to as broad a range of broken systems as in the past.

On this particular point my threshold of 'portable' is actually pretty
low, as long as it's fail-soft.  Failure to detect bad TZ on some
systems would leave them no worse off than before, right?

But I haven't seen *any* published API that directly tells you whether
tzset liked TZ or not --- AFAICT it's supposed to just silently
substitute GMT.  Which would be okay if GMT were the only allowed
spelling of GMT, but it ain't ...

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]