Re: [HACKERS] PATCH: CITEXT 2.0 v3

David E. Wheeler Mon, 14 Jul 2008 10:49:11 -0700

On Jul 14, 2008, at 07:24, Tom Lane wrote:

"David E. Wheeler" <[EMAIL PROTECTED]> writes:
Could I supply two comparison files, one for Mac OS X withen_US.UTF-8and one for everything else, as described in the last threeparagraphs
here?
The fallacy in that proposal is the assumption that there are only two
behaviors out there.

Well, no, that's not the assumption at all. The assumption is that thetype works properly with multibyte characters under multibyte-awarelocales. So I want to have tests to ensure that such is true by havingmultibyte characters run under a very specific locale and platform. Idon't really care what platform or locale; the point is to make surethat the type is actually multibyte-aware.

Let me recalibrate your thoughts a bit: so far
I have tried citext on three different machines (Mac, Fedora 8, HPUX),
and I got three different answers from those tests.  That's despite
endeavoring to make the database locales match ... which is less than

trivial in itself because they use three slightly differentspellings of

"en_US.UTF8".


<rant>

This is a truly pitiful state of affairs. Rhetorical question: Why isthere no standardization of locales? I'm sure there are a lot ofopinions out there (should all uppercase chars should precede alllowercase chars or be mixed in with lowercase chars), but I shouldthink that, in this day and age, there would be some sort of standarddefining locales and how they work -- and to allow such opinions to beexpressed by different locales, not in the same locale names ondifferent platforms.

</rant>

Given that you were more or less deliberately testing corner cases,
I think it's quite likely that the number of observable reactions from
N platforms would be more nearly O(N) than O(1).

To me they're not corner cases. To me it is just, "given a specificplatform/locale, does CITEXT respect the locale's rules?" I don't careto test all platforms and locales (I'm not *that* stupid :-)).

In the real world, to the extent that we are able to control thelocale

of the regression tests, we make it "C" --- and to a large extent we
can't control it at all, which means you have another uncontrolled
variable besides platform.  So in the current universe there is
absolutely no value in submitting locale-specific tests for a contrib
module.

Then how do we know that it will continue to be locale-aware overtime? Someone could replace the comparison function with one that justlowercases ASCII characters, like CITEXT 1 does, and no tests wouldfail. How do you prevent that from happening without being hyper-vigilant (and never leaving the project, I might add)?

I see some discussion in the thread about improving the situation, but
until we are able to decouple database locale from environment locale,
I doubt we'll be able to do a whole lot about automating this kind
of test.  There are too many variables at the moment.

Is the decoupling of database locale from environment locale likely tohappen anytime soon? Now that I've written CITEXT, I dare say thatsuch might become my top-desired feature (aside from replication).

Thanks for the discussion, much appreciated, and I'm learning a ton. Iretain the right to be opinionated, however. ;-)


Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: CITEXT 2.0 v3

Reply via email to