Re: [HACKERS] Redhat 7.3 time manipulation bug

cbbrowne Fri, 24 May 2002 17:25:00 -0700

> > > The last phase could be extending the API to allow multiple simultaneous
> > > time zones, detection of bad time zones, etc etc. This would involve API
> > > changes or extensions, and breaks compatibility with system-supplied
> > > infrastructure.
> > One thing that wasn't clear to me, but could use investigation: if so
> > many systems are using the same underlying timezone database info, maybe
> > there is some commonality at a level below the ISO mktime/tzset/etc API.
> > If we could make use of the system-provided TZ database at a lower level
> > while still using our own APIs not tied to time_t, it'd answer the issue
> > of compatibility with the surrounding system.  (Which is a real issue,
> > I agree --- we should be able to accept the system's standard TZ setting
> > if possible.)


> The fundamental problem (which of course can have a fundamental
> solution ;) is that a time zone database built with a 32-bit time_t
> will have time zone info through 2038 only (it is a binary file with
> 32-bit time fields -- almost certainly anyway). So if we have an
> extended time zone infrastructure using something different for time_t
> we would need to handle the case of reading non-extended time zones
> databases, which puts us back to having limitations.

Ah, but the database in question _doesn't_ consist of 32 bit time_t
values.

It consists of things like:

# @(#)zone.tab  1.26
#
# TZ zone descriptions
#
# From Paul Eggert <[EMAIL PROTECTED]> (1996-08-05):
#
# This file contains a table with the following columns:
# 1.  ISO 3166 2-character country code.  See the file `iso3166.tab'.
# 2.  Latitude and longitude of the zone's principal location
#     in ISO 6709 sign-degrees-minutes-seconds format,
#     either +-DDMM+-DDDMM or +-DDMMSS+-DDDMMSS,
#     first latitude (+ is north), then longitude (+ is east).
# 3.  Zone name used in value of TZ environment variable.
# 4.  Comments; present if and only if the country has multiple rows.
#
# Columns are separated by a single tab.
# The table is sorted first by country, then an order within the country that
# (1) makes some geographical sense, and
# (2) puts the most populous zones first, where that does not contradict (1).
#
# Lines beginning with `#' are comments.
#
#country-
#code   coordinates     TZ                      comments
AD      +4230+00131     Europe/Andorra
AE      +2518+05518     Asia/Dubai
AF      +3431+06912     Asia/Kabul
AG      +1703-06148     America/Antigua
AI      +1812-06304     America/Anguilla
AL      +4120+01950     Europe/Tirane
AM      +4011+04430     Asia/Yerevan
AN      +1211-06900     America/Curacao
AO      -0848+01314     Africa/Luanda

Then a "leapseconds" table, looking like:
# The correction (+ or -) is made at the given time, so lines
# will typically look like:
#       Leap    YEAR    MON     DAY     23:59:60        +       R/S
# or
#       Leap    YEAR    MON     DAY     23:59:59        -       R/S

# If the leapsecond is Rolling (R) the given time is local time
# If the leapsecond is Stationary (S) the given time is UTC

# Leap  YEAR    MONTH   DAY     HH:MM:SS        CORR    R/S
Leap    1972    Jun     30      23:59:60        +       S
Leap    1972    Dec     31      23:59:60        +       S
Leap    1973    Dec     31      23:59:60        +       S
Leap    1974    Dec     31      23:59:60        +       S
Leap    1975    Dec     31      23:59:60        +       S
Leap    1976    Dec     31      23:59:60        +       S

And then a set of rules about timezone adjustments for all sorts of
localities, including the following:

# Rule  NAME    FROM    TO      TYPE    IN      ON      AT      SAVE    LETTER/S
# Summer Time Act, 1916
Rule    GB-Eire 1916    only    -       May     21      2:00s   1:00    BST
Rule    GB-Eire 1916    only    -       Oct      1      2:00s   0       GMT
# S.R.&O. 1917, No. 358
Rule    GB-Eire 1917    only    -       Apr      8      2:00s   1:00    BST
Rule    GB-Eire 1917    only    -       Sep     17      2:00s   0       GMT


# Zone  NAME            GMTOFF  RULES   FORMAT  [UNTIL]
Zone Antarctica/Casey   0       -       zzz     1969
                        8:00    -       WST     # Western (Aus) Standard Time
Zone Antarctica/Davis   0       -       zzz     1957 Jan 13
                        7:00    -       DAVT    1964 Nov # Davis Time
                        0       -       zzz     1969 Feb
                        7:00    -       DAVT
Zone Antarctica/Mawson  0       -       zzz     1954 Feb 13
                        6:00    -       MAWT    # Mawson Time

> I'm guessing that a better approach might be to have our time zone
> stuff inside our own API, which then could choose to call, for
> example, mktime() or pg_mktime(), which could each have different
> signatures.  Then the heuristics for matching one to the other are
> isolated to our thin API implementation, not to the underlying system-
> or pg-provided libraries.

> matching "stringy time zones" to numeric offsets for input date/times.
> The time zone databases themselves don't lend themselves to this,
> since the tables have those stringy zones somewhere on the right hand
> side of each row of information and the fields can change from year to
> year.

The ultimate goal would seem likely to be to store dates internally in
some form like UTC, with some reasonably huge dynamic range, that is,
not limited to 32 bit timestamps, but rather using something like a
proleptic Gregorian calendar (per _Calendrical Calculations_, page 50).

Some reasonable treatments would include:

  - 32 bits is an signed int indicating number of days since GREG_EPOCH,
    where logical epochs would include January 1, 1, January 1, 1900, or
    perhaps even something actually proleptic (proleptic indicates
    "future"), such as January 1, 2038.

  - 8 bits indicating the month; 8 bits indicating the day of month;
    16 bits providing a range of years from -32767 to 32768.

Both have merits...

Timestamps would then forcibly expand things by _at least_ 22 bits, the
minimum needed to express 1/100ths of seconds.  Might as well head on to
32 bits for the time and so have something that can easily represent
values down to well below a millisecond.

The "stringy stuff" indicates how values are to be displayed or parsed.
It does nothing about what is stored internally, or at least shouldn't.
--
(reverse (concatenate 'string "gro.gultn@" "enworbbc"))
http://www.cbbrowne.com/info/emacs.html
In the name of the Lord-High mutant, we sacrifice this suburban girl
-- `Future Schlock'

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Redhat 7.3 time manipulation bug

Reply via email to