I looked into the timezone specifications and basically extracted a list of
existing offsets from the zic database.

My proposed format for the timezone files is something like this:

HADT   -32400 D  # Hawaii-Aleutain Daylight Time
                 #     (America/Adak)
HAST   -36000    # Hawaii-Aleutain Standard Time
                 #     (America/Adak)

That is, the abbreviation, the offset in seconds, optionally a D to mark
daylight saving times (goes into tm->is_dst), the name of the timezone and
the full zic names that use this timezone.

I also made the extracting script find all conflicts and commented them
manually as shown here. Most of the conflicts are between America and Asia.

# CONFLICT! ADT is not unique
# Other timezones:
#  - ADT: Arabic Daylight Time (Asia)
ADT    -10800 D  # Atlantic Daylight Time
                 #     (America/Glace_Bay)
                 #     (America/Goose_Bay)
                 #     (America/Halifax)
                 #     (America/Thule)
                 #     (Atlantic/Bermuda)

However, even within all "America/..." names, there are conflicts. For
example CST is used as US Central Time and as Cuba Central Standard Time.
While US Central time is UTC-6h, Cuba Central Standard Time is UTC-5h.

Another problem is that lots of the timezone names that are hardcoded into
the backend seem to be way outdated or just doubtable, many of them do not
show up in the zic database.

For example NT (Nome Time) seemed to have existed until 1967, America/Nome
is listed in the zic database at AKDT/AKST which is Alaska Daylight/Standard
Time. Other examples:

JAYT, Jayapura Time: Asia/Jayapura is listed as EIT (East Indonesia Time) in
the zic database.

JAVT, Java Time (07:00? see JT): zic database says that it is outdated and
was used until 1932.
JT, Java Time (07:30? see JAVT): I did not find a proof that this is really
+7.5 hours, some sources say it's just 7 hours.

HMT is the strangest of the bunch, I have found the name "Heard and
Mc.Donald Time" but with a different offset. I could not find a reference to
some "Hellas"-Time as indicated in the comment.

So could we remove some of those on the grounds that they do not seem to
be used any more (please correct me here if someone knows more) and that
you can easily add offsets for those if you need them?

With the same argument we could even remove timezones like BDST (British
Double Summer Time), DNT (Dansk Normal Tid), FST (French Summer Time), NOR
(Norway Standard Time), SWT (Swedish Winter Time). Could anybody from those
countries comment on whether or not those are still used or just outdated? I
figure that most of those countries have moved since long to the more common
timezone names...

Ok, after all this has been sorted out I propose to make different files for
the different continents and let the user specify with a guc which ones he
wants to use.

I could think of three possible ways:

1) (See Toms idea in
http://archives.postgresql.org/pgsql-hackers/2006-05/msg01048.php )

Conflicts within one set can just be commented - we would try to include
whatever will probably be used by the majority of users and comment the
other one(s). Conflicts between two sets would show up when postmaster gets
started, it would complain about different definitions for the same
timezone. An American who wants to use some Asian timezones would have to
work through both files and comment conflicting timezones on one side or the
other to make postmaster start up without errors.

2) Find out which timezones do not conflict, put them in a set and load this
by default. Create other sets that are conflicting but that have some
"override" capability with regard to previous timezone definitions. Decide
on the default value for the guc (could point to American timezones for
example). An Australian could either select only the Australian file or
could specify "America, Australia" and the Australian set overrides the
American timezones in case of conflicts. This way, most people do not have
to make changes and those who have to can specify their "override"-file and
keep all the rest, including non-conflicting timezones from a conflicting
timezone set.

3) Combine both, let the user specify the guc variable as "A, B, C" and look
into C first, then in B and then in A.... *thinking*  Right now I actually
think that the "overriding" idea is not that intuitive, most people would
probably expect that this is a list of priorities, so A overrides B which
overrides C.

What do you think?

Having a larger token table in datetime.c does not seem to affect
performance all that much. I did parsing tests with 2 million timestamps
equally distributed over all timezone abbreviations that I had loaded
previously and the difference of 154 timezones in comparsion to other runs
with just 35 was at about ~120ms (on my quite slow laptop computer).

The timezone definition files should be read at server start but should they
also be read at SIGHUP? If so, should they be read only by the postmaster or
by all backends?


Joachim


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to