Re: [HACKERS] Simplifying timezone support
According to my sent folder, this went out Monday afternoon, but I haven't seen it yet, so I'm resending to the list only, without the attached patch. I'll send the patch over to patches. Any comment on the behavior, specifically, the heuristic for deciding tzset() failed, and the proposed order of application of tzset() vs. table lookup? Ross On Mon, Feb 24, 2003 at 03:34:56PM -0600, Ross J. Reedstrom wrote: On Fri, Feb 21, 2003 at 08:39:12PM -0600, Ross J. Reedstrom wrote: Every other validly formatted TZ variable that returns GMT should be caught be the datetktbl check. I'll play with it this weekend, see how hard it is to make it work. O.K., the weekend's over, And I've created two different version of this. Both work, ipass all the regression test, and solve the 'CST is just a funny way to say GMT' problem. I was able to make use of DecodePosixTimezone (DPT) from Thomas's datetime parsing code in assign_timezone. However, the order of application of this vis. tzset is unclear. I had proposed doing the DPT first, then tzset, then a NOTICE if it looked like tzset didn't. Got that working, but discovered a change of behavior: for some of those who have a timezone in the zoneinfo database that is a three letter abbreviation, the current code (tzset only) will provide daylight savings time transitions, so that a timestamp in July returns a different timezone than one in February. This is not true for our internal values of set time zone: there, we convert to a numerical offset, which is constant no matter when the timestamp occurs. This is still a win for those who's timezone abbreviation is _not_ in the zoneinfo DB, (such as CST), which currently is silently interpreted as an odd spelling of GMT. Second solution - try tzset() first, and apply the following heuristic to see if it took: tzname[0]==$TZ and tzname[1]== and timezone=0 and daylight=0 In other words, _all_ the timezone related information remains the default. I tested this against the 1607 zoneinfo files on my system: every one was filtered out, even things that _are_ GMT with no DST (they all had a non-null tzname[1] == tzname[0]) If this succeeds (i.e. tzset didn't recognize the TZ), go ahead and look it up in our big table'o date/time strings. This also works, fixing the bogus GMT spellings, without changing current behavior for any string that is not bogus. Note that the sysadmin can always tell if tzset or the table was used, by looking at the format of the 'show time zone' result. If tzset was called, this is the string that was passed to 'set time zone'. If the table was used, it will be an hours west of GMT offset. The problem with this approach is that it does nothing to reduce our dependency on the OS timezone functionality. Comments? I've attached the second patch for discussion. Ross ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Simplifying timezone support
On Fri, Feb 21, 2003 at 08:39:12PM -0600, Ross J. Reedstrom wrote: Every other validly formatted TZ variable that returns GMT should be caught be the datetktbl check. I'll play with it this weekend, see how hard it is to make it work. O.K., the weekend's over, And I've created two different version of this. Both work, ipass all the regression test, and solve the 'CST is just a funny way to say GMT' problem. I was able to make use of DecodePosixTimezone (DPT) from Thomas's datetime parsing code in assign_timezone. However, the order of application of this vis. tzset is unclear. I had proposed doing the DPT first, then tzset, then a NOTICE if it looked like tzset didn't. Got that working, but discovered a change of behavior: for some of those who have a timezone in the zoneinfo database that is a three letter abbreviation, the current code (tzset only) will provide daylight savings time transitions, so that a timestamp in July returns a different timezone than one in February. This is not true for our internal values of set time zone: there, we convert to a numerical offset, which is constant no matter when the timestamp occurs. This is still a win for those who's timezone abbreviation is _not_ in the zoneinfo DB, (such as CST), which currently is silently interpreted as an odd spelling of GMT. Second solution - try tzset() first, and apply the following heuristic to see if it took: tzname[0]==$TZ and tzname[1]== and timezone=0 and daylight=0 In other words, _all_ the timezone related information remains the default. I tested this against the 1607 zoneinfo files on my system: every one was filtered out, even things that _are_ GMT with no DST (they all had a non-null tzname[1] == tzname[0]) If this succeeds (i.e. tzset didn't recognize the TZ), go ahead and look it up in our big table'o date/time strings. This also works, fixing the bogus GMT spellings, without changing current behavior for any string that is not bogus. Note that the sysadmin can always tell if tzset or the table was used, by looking at the format of the 'show time zone' result. If tzset was called, this is the string that was passed to 'set time zone'. If the table was used, it will be an hours west of GMT offset. The problem with this approach is that it does nothing to reduce our dependency on the OS timezone functionality. Comments? I've attached the second patch for discussion. Ross Index: src//backend/commands/variable.c === RCS file: /projects/cvsroot/pgsql-server/src/backend/commands/variable.c,v retrieving revision 1.73 diff -c -r1.73 variable.c *** src//backend/commands/variable.c2003/02/01 18:31:28 1.73 --- src//backend/commands/variable.c2003/02/24 21:29:44 *** *** 243,252 const char * assign_timezone(const char *value, bool doit, bool interactive) { ! char *result; ! char *endptr; double hours; /* * Check for INTERVAL 'foo' */ --- 243,257 const char * assign_timezone(const char *value, bool doit, bool interactive) { ! char*result; ! char*endptr; ! chartztmp[sizeof(tzbuf)]; ! char*lp; ! const char *cp; double hours; + int tzval,i; + /* * Check for INTERVAL 'foo' */ *** *** 337,348 else { /* !* Otherwise assume it is a timezone name. * !* XXX unfortunately we have no reasonable way to check whether a !* timezone name is good, so we have to just assume that it !* is. */ if (doit) { strcpy(tzbuf, TZ=); --- 342,360 else { /* !* Otherwise assume it is a timezone name. !* Try tzset() first. If that fails, see if our internal !* table of timezone names, can handle it. * !* XXX unfortunately we have only an approximate way to check !* whether a timezone name is good: if the value of TZ is returned !* as the canonical tzname, there is no daylight savings time !* (flag or canonical second name) _and_ the offset is GMT, !* tzset() punted, i.e., returned all default values. !* If that happens, try the internal table - if _that_ fails, !* throw a notice.rjr 2003/02/23 */ + if (doit) {
Re: [HACKERS] Simplifying timezone support
On Thu, Feb 20, 2003 at 04:19:21PM -0500, Tom Lane wrote: Ross J. Reedstrom [EMAIL PROTECTED] writes: On Thu, Feb 20, 2003 at 03:21:09PM -0500, Tom Lane wrote: Provide a portable way of getting libc to tell us whether it likes TZ, and I'll be glad to fix this. Dang that lovely word 'portable'. However, given your proposed change, perhaps the hurdle for portable time handling is now lower: it seems we've not been exposed to as broad a range of broken systems as in the past. On this particular point my threshold of 'portable' is actually pretty low, as long as it's fail-soft. Failure to detect bad TZ on some systems would leave them no worse off than before, right? But I haven't seen *any* published API that directly tells you whether tzset liked TZ or not --- AFAICT it's supposed to just silently substitute GMT. Which would be okay if GMT were the only allowed spelling of GMT, but it ain't ... I've been digging in the date and time code a bit, and now have a proposal for dealing with SET TIME ZONE 'someunknownstring'. First, we use the time token table from the time constant parser in utils/adt/datetime.c to see if we've got a recognized time zone abbreviation. If it is, we generate a canonical POSIX timezone name for use in setting TZ, call tzset(), and we're done. If the time zone came back UNKOWN, we go ahead and see if tzset() can interpret it. Criteria for failure: if the timezone offset came back 0, and the reported tzname[0] is the same as the string that we passed in. If it does, we fire a NOTICE about an unknown spelling of GMT. Note that we would have already caught all _known_ spellings of GMT in the first step, so we won't be spamming the DBA with warnings about 'GMT' and 'UTC', etc. An extension to this would be to use the tzset() trick above directly in the datetime constant parser, as a fallback after not matching the table. In that case, we'd probably want to treat the unknown spelling of GMT as an error, though (as it currently does). Thoughts? If this seems acceptable, I can implement it this weekend. Ross ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Simplifying timezone support
On Fri, Feb 21, 2003 at 05:45:53PM -0600, Ross J. Reedstrom wrote: On Fri, Feb 21, 2003 at 06:15:31PM -0500, Tom Lane wrote: Ross J. Reedstrom [EMAIL PROTECTED] writes: snip I'm worried about cases like Africa/Benin for places that just happen to be on the prime meridian, but don't call their time GMT or UTC. Looking at a globe, it also seems possible that there are places an hour west of Greenwich, for which this could fail during daylight-savings season. Well, that'll either get caught by the existing table (we've got six different spellings of GMT, currently) or by the 'string in != string out' case - the zoneinfo format requires a 3 or more character abbreviation for the time zone. For every case I'v looked at in my zoneinfo directory, it's either 3 or 4 uppercase characters, and _never_ matches the filename path string used to set it. I'll do an exhaustive test after dinner. O.K., I've run the test: of the 1108 files in my zoneinfo database, only 11 have matching filenames to the canonical name returned after setting TZ. Of those 11, 4 are some version of GMT (GMT, UCT, UTC, WET), of which, one is in fact missing from our table - UCT. At minimum, I'll add that. Every other validly formatted TZ variable that returns GMT should be caught be the datetktbl check. I'll play with it this weekend, see how hard it is to make it work. Ross ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Simplifying timezone support
On Fri, Feb 21, 2003 at 06:15:31PM -0500, Tom Lane wrote: Ross J. Reedstrom [EMAIL PROTECTED] writes: snip I'm worried about cases like Africa/Benin for places that just happen to be on the prime meridian, but don't call their time GMT or UTC. Looking at a globe, it also seems possible that there are places an hour west of Greenwich, for which this could fail during daylight-savings season. Well, that'll either get caught by the existing table (we've got six different spellings of GMT, currently) or by the 'string in != string out' case - the zoneinfo format requires a 3 or more character abbreviation for the time zone. For every case I'v looked at in my zoneinfo directory, it's either 3 or 4 uppercase characters, and _never_ matches the filename path string used to set it. I'll do an exhaustive test after dinner. An extension to this would be to use the tzset() trick above directly in the datetime constant parser, as a fallback after not matching the table. In that case, we'd probably want to treat the unknown spelling of GMT as an error, though (as it currently does). I think tzset() is probably much too slow to consider calling on every pass through timestamptz_in ... It wouldn't happen on every call - only with funky timezone representations. We could NOTICE use of tzset(), as well, to alert the DBA about something fishy, if you'd like. Ross ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Simplifying timezone support
On Thu, Feb 20, 2003 at 03:21:09PM -0500, Tom Lane wrote: Ross J. Reedstrom [EMAIL PROTECTED] writes: question about pgsql's time zone parsers. It appears there's at least two, since SET TIME ZONE accepts strings like 'US/Eastern', while general timestamp parsing doesn't: The TIME ZONE string is fed to libc (via TZ environment variable); the other cases are not. SET TIME ZONE will silently accept any string at all, and fall back to providing GMT when a timestamptz is requested. Provide a portable way of getting libc to tell us whether it likes TZ, and I'll be glad to fix this. Dang that lovely word 'portable'. However, given your proposed change, perhaps the hurdle for portable time handling is now lower: it seems we've not been exposed to as broad a range of broken systems as in the past. I'll look at it. but no promises. Ultimately we should probably get rid of our dependence on the libc time routines altogether ... but I have no intention of opening that can of worms right now. See past discussions in the archives. Agreed. I see we're inheriting the actually misleading case from the OS/libc, as well: wallace$ unset TZ wallace$ date Thu Feb 20 15:00:04 CST 2003 wallace$ export TZ=US/Central wallace$ date Thu Feb 20 15:00:16 CST 2003 wallace$ export TZ=US/Zanzibar wallace$ date Thu Feb 20 21:00:33 US/Zanzibar 2003 wallace$ export TZ=CST wallace$ date Thu Feb 20 21:00:42 CST 2003 wallace$ Ross ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Simplifying timezone support
Ross J. Reedstrom [EMAIL PROTECTED] writes: If the time zone came back UNKOWN, we go ahead and see if tzset() can interpret it. Criteria for failure: if the timezone offset came back 0, and the reported tzname[0] is the same as the string that we passed in. If it does, we fire a NOTICE about an unknown spelling of GMT. Note that we would have already caught all _known_ spellings of GMT in the first step, so we won't be spamming the DBA with warnings about 'GMT' and 'UTC', etc. I'm worried about cases like Africa/Benin for places that just happen to be on the prime meridian, but don't call their time GMT or UTC. Looking at a globe, it also seems possible that there are places an hour west of Greenwich, for which this could fail during daylight-savings season. An extension to this would be to use the tzset() trick above directly in the datetime constant parser, as a fallback after not matching the table. In that case, we'd probably want to treat the unknown spelling of GMT as an error, though (as it currently does). I think tzset() is probably much too slow to consider calling on every pass through timestamptz_in ... regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Simplifying timezone support
On Wed, Feb 19, 2003 at 10:35:58PM -0500, Tom Lane wrote: snip Tom discussion backend internal tracking of timezone Any objections? Not to your suggestion per se, but looking at the bug report raises a question about pgsql's time zone parsers. It appears there's at least two, since SET TIME ZONE accepts strings like 'US/Eastern', while general timestamp parsing doesn't: test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 CST'; timestamptz -- 2003-02-18 09:36:06.00933-06 (1 row) test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST'; timestamptz -- 2003-02-18 08:36:06.00933-06 (1 row) test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 US/Eastern'; ERROR: Bad timestamp external representation '2003/02/18 09:36:06.00933 US/Eastern' Further testing says it's even worse that that: SET TIME ZONE will silently accept any string at all, and fall back to providing GMT when a timestamptz is requested. This includes the TLA TZ abbreviations that the constant parsing code understands, like CST and EST. test=# set TIME ZONE 'CST'; SET test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST'; timestamptz -- 2003-02-18 14:36:06.00933+00 (1 row) test=# set TIME ZONE 'FOOBAR'; SET test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST'; timestamptz -- 2003-02-18 14:36:06.00933+00 (1 row) Here's an especially fun one: with DATESTYLE set to 'Postgresql,US', whatever string is handed to SET TIME ZONE comes out the other end, if it can't be parsed: test=# set TIME ZONE 'FOOBAR'; SET test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST'; timestamptz --- Tue Feb 18 14:36:06.00933 2003 FOOBAR (1 row) Leading to this erroneous pair: test=# set TIME ZONE 'US/Central'; SET test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST'; timestamptz Tue Feb 18 08:36:06.00933 2003 CST (1 row) test=# set TIME ZONE 'CST'; SET test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST'; timestamptz Tue Feb 18 14:36:06.00933 2003 CST (1 row) test=# Tom, since you're in (or near) that code right now, how painful would it be to unify the time zone parsing? What's the correct behavior? Certainly SET TIME ZONE should at leat NOTICE about invalide time zone names? Ross ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Simplifying timezone support
Ross J. Reedstrom [EMAIL PROTECTED] writes: question about pgsql's time zone parsers. It appears there's at least two, since SET TIME ZONE accepts strings like 'US/Eastern', while general timestamp parsing doesn't: The TIME ZONE string is fed to libc (via TZ environment variable); the other cases are not. SET TIME ZONE will silently accept any string at all, and fall back to providing GMT when a timestamptz is requested. Provide a portable way of getting libc to tell us whether it likes TZ, and I'll be glad to fix this. Ultimately we should probably get rid of our dependence on the libc time routines altogether ... but I have no intention of opening that can of worms right now. See past discussions in the archives. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Simplifying timezone support
Ross J. Reedstrom [EMAIL PROTECTED] writes: On Thu, Feb 20, 2003 at 03:21:09PM -0500, Tom Lane wrote: Provide a portable way of getting libc to tell us whether it likes TZ, and I'll be glad to fix this. Dang that lovely word 'portable'. However, given your proposed change, perhaps the hurdle for portable time handling is now lower: it seems we've not been exposed to as broad a range of broken systems as in the past. On this particular point my threshold of 'portable' is actually pretty low, as long as it's fail-soft. Failure to detect bad TZ on some systems would leave them no worse off than before, right? But I haven't seen *any* published API that directly tells you whether tzset liked TZ or not --- AFAICT it's supposed to just silently substitute GMT. Which would be okay if GMT were the only allowed spelling of GMT, but it ain't ... regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]