On Jul 14, 2008, at 07:24, Tom Lane wrote:

"David E. Wheeler" <[EMAIL PROTECTED]> writes:
Could I supply two comparison files, one for Mac OS X with en_US.UTF-8 and one for everything else, as described in the last three paragraphs
here?

The fallacy in that proposal is the assumption that there are only two
behaviors out there.

Well, no, that's not the assumption at all. The assumption is that the type works properly with multibyte characters under multibyte-aware locales. So I want to have tests to ensure that such is true by having multibyte characters run under a very specific locale and platform. I don't really care what platform or locale; the point is to make sure that the type is actually multibyte-aware.

Let me recalibrate your thoughts a bit: so far
I have tried citext on three different machines (Mac, Fedora 8, HPUX),
and I got three different answers from those tests.  That's despite
endeavoring to make the database locales match ... which is less than
trivial in itself because they use three slightly different spellings of
"en_US.UTF8".

<rant>
This is a truly pitiful state of affairs. Rhetorical question: Why is there no standardization of locales? I'm sure there are a lot of opinions out there (should all uppercase chars should precede all lowercase chars or be mixed in with lowercase chars), but I should think that, in this day and age, there would be some sort of standard defining locales and how they work -- and to allow such opinions to be expressed by different locales, not in the same locale names on different platforms.
</rant>

Given that you were more or less deliberately testing corner cases,
I think it's quite likely that the number of observable reactions from
N platforms would be more nearly O(N) than O(1).

To me they're not corner cases. To me it is just, "given a specific platform/locale, does CITEXT respect the locale's rules?" I don't care to test all platforms and locales (I'm not *that* stupid :-)).

In the real world, to the extent that we are able to control the locale
of the regression tests, we make it "C" --- and to a large extent we
can't control it at all, which means you have another uncontrolled
variable besides platform.  So in the current universe there is
absolutely no value in submitting locale-specific tests for a contrib
module.

Then how do we know that it will continue to be locale-aware over time? Someone could replace the comparison function with one that just lowercases ASCII characters, like CITEXT 1 does, and no tests would fail. How do you prevent that from happening without being hyper- vigilant (and never leaving the project, I might add)?

I see some discussion in the thread about improving the situation, but
until we are able to decouple database locale from environment locale,
I doubt we'll be able to do a whole lot about automating this kind
of test.  There are too many variables at the moment.

Is the decoupling of database locale from environment locale likely to happen anytime soon? Now that I've written CITEXT, I dare say that such might become my top-desired feature (aside from replication).

Thanks for the discussion, much appreciated, and I'm learning a ton. I retain the right to be opinionated, however. ;-)

Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to