On Jul 14, 2008, at 07:24, Tom Lane wrote:
"David E. Wheeler" <[EMAIL PROTECTED]> writes:
Could I supply two comparison files, one for Mac OS X with
en_US.UTF-8
and one for everything else, as described in the last three
paragraphs
here?
The fallacy in that proposal is the assumption that there are only two
behaviors out there.
Well, no, that's not the assumption at all. The assumption is that the
type works properly with multibyte characters under multibyte-aware
locales. So I want to have tests to ensure that such is true by having
multibyte characters run under a very specific locale and platform. I
don't really care what platform or locale; the point is to make sure
that the type is actually multibyte-aware.
Let me recalibrate your thoughts a bit: so far
I have tried citext on three different machines (Mac, Fedora 8, HPUX),
and I got three different answers from those tests. That's despite
endeavoring to make the database locales match ... which is less than
trivial in itself because they use three slightly different
spellings of
"en_US.UTF8".
<rant>
This is a truly pitiful state of affairs. Rhetorical question: Why is
there no standardization of locales? I'm sure there are a lot of
opinions out there (should all uppercase chars should precede all
lowercase chars or be mixed in with lowercase chars), but I should
think that, in this day and age, there would be some sort of standard
defining locales and how they work -- and to allow such opinions to be
expressed by different locales, not in the same locale names on
different platforms.
</rant>
Given that you were more or less deliberately testing corner cases,
I think it's quite likely that the number of observable reactions from
N platforms would be more nearly O(N) than O(1).
To me they're not corner cases. To me it is just, "given a specific
platform/locale, does CITEXT respect the locale's rules?" I don't care
to test all platforms and locales (I'm not *that* stupid :-)).
In the real world, to the extent that we are able to control the
locale
of the regression tests, we make it "C" --- and to a large extent we
can't control it at all, which means you have another uncontrolled
variable besides platform. So in the current universe there is
absolutely no value in submitting locale-specific tests for a contrib
module.
Then how do we know that it will continue to be locale-aware over
time? Someone could replace the comparison function with one that just
lowercases ASCII characters, like CITEXT 1 does, and no tests would
fail. How do you prevent that from happening without being hyper-
vigilant (and never leaving the project, I might add)?
I see some discussion in the thread about improving the situation, but
until we are able to decouple database locale from environment locale,
I doubt we'll be able to do a whole lot about automating this kind
of test. There are too many variables at the moment.
Is the decoupling of database locale from environment locale likely to
happen anytime soon? Now that I've written CITEXT, I dare say that
such might become my top-desired feature (aside from replication).
Thanks for the discussion, much appreciated, and I'm learning a ton. I
retain the right to be opinionated, however. ;-)
Best,
David
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers