[HACKERS] Making the regression tests locale-proof

2002-05-10 Thread Peter Eisentraut

Since locale support is now enabled by default, it is desirable that the
regression tests can pass if the clusters locale is not C.

As a first step I have included the following statements in pg_regress
right after the database is created:

alter database $dbname set lc_messages to 'C';
alter database $dbname set lc_monetary to 'C';
alter database $dbname set lc_numeric to 'C';
alter database $dbname set lc_time to 'C';

This gets rid of a boatload of failures related to number formatting.
For that purpose I have changed the permissions on these options to
USERSET.  (I'm still debating making lc_messages SUSET, because otherwise
users can screw with admins by changing the language of the log output all
the time.  Comments?)

The remaining issue is the sort order.  I think this can be solved for
practical purposes by creating two expected files for each affected test,
say char.out and char-locale.out.  The regression test driver would try
the first one, if that fails try the second one.

The assumption here is that all locales will choose the same sort order as
long as they're dealing only with the core 26 letters.  This does not have
to be true in theory, but I think it works for the vast majority of
practical cases.

We could also cut down the number of affected tests by making the
select_implicit and select_having not use mixed-case strings in the test
tables.  Then we have only char, varchar, and select_views left.

Comments?

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Making the regression tests locale-proof

2002-05-10 Thread Trond Eivind Glomsrød

Peter Eisentraut [EMAIL PROTECTED] writes:

 The assumption here is that all locales will choose the same sort order as
 long as they're dealing only with the core 26 letters.  This does not have
 to be true in theory, but I think it works for the vast majority of
 practical cases.


Not for uppercase vs. lowercase versions of them.

With no locale used (straight ASCII), you get A C b, with a locale
you'll get A b C.

-- 
Trond Eivind Glomsrød
Red Hat, Inc.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Making the regression tests locale-proof

2002-05-10 Thread Hannu Krosing

On Sat, 2002-05-11 at 02:25, Peter Eisentraut wrote:
 The remaining issue is the sort order.  I think this can be solved for
 practical purposes by creating two expected files for each affected test,
 say char.out and char-locale.out.  The regression test driver would try
 the first one, if that fails try the second one.
 
 The assumption here is that all locales will choose the same sort order as
 long as they're dealing only with the core 26 letters.  This does not have
 to be true in theory, but I think it works for the vast majority of
 practical cases.

et_EE locale has the following order for core 26 letters _ are other
letters

ABCDEFGHIJKLMNOPQRS_Z_TUVWXY  (notice position of Z)

and I'm not sure if V and W are distinguished when sorting words that
have anything after them.

I've heard that in some other locales there are other veir behaviours
(like sorting on or two of the same letters as equivalent)


Hannu



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Making the regression tests locale-proof

2002-05-10 Thread Tom Lane

Peter Eisentraut [EMAIL PROTECTED] writes:
 For that purpose I have changed the permissions on these options to
 USERSET.  (I'm still debating making lc_messages SUSET, because otherwise
 users can screw with admins by changing the language of the log output all
 the time.  Comments?)

Hm.  Don't the regression tests already assume they are run by the
superuser?  They've got create/drop user commands in them.  So I'd
say SUSET is fine from the point of view of the tests, and I agree
with your concern about making the logs unreadable.

 The assumption here is that all locales will choose the same sort order as
 long as they're dealing only with the core 26 letters.

Nope.  For instance, on HPUX I get this sort order in English:

$ LANG=en_US.iso88591 sort testll
eix
ela
ella
ellm
elm
eln
enx

and this in Spanish:

$ LANG=es_ES.iso88591 sort testll
eix
ela
elm
eln
ella
ellm
enx

because the Spanish treat LL as a single collating element.  (Actually,
my very-rusty recollection is that they sort LL the same as one L, which
would mean that HPUX's behavior is not quite right here: it's treating
LL as one symbol that sorts after L.  Linux seems to have no clue that
LL is special at all though...)

 We could also cut down the number of affected tests by making the
 select_implicit and select_having not use mixed-case strings in the test
 tables.  Then we have only char, varchar, and select_views left.

In practice we could perhaps use test data that doesn't hit any of the
special cases in the popular languages.  But I wonder whether this would
not be shirking our responsibility as testers.  Seems like if you avoid
exercising these kinds of cases, you avoid finding corner-case bugs.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Making the regression tests locale-proof

2002-05-10 Thread Alvaro Herrera

Tom Lane escribió: 

 Peter Eisentraut [EMAIL PROTECTED] writes:

  The assumption here is that all locales will choose the same sort order as
  long as they're dealing only with the core 26 letters.
 
 Nope.  For instance, on HPUX I get this sort order in English:
[...]

 because the Spanish treat LL as a single collating element.  (Actually,
 my very-rusty recollection is that they sort LL the same as one L, which
 would mean that HPUX's behavior is not quite right here: it's treating
 LL as one symbol that sorts after L.  Linux seems to have no clue that
 LL is special at all though...)

HPUX's behaviour is broken, because in spanish LL (as well as CH)
stopped being a special symbol some five years ago (it used to be
treated as one collating element sorted after L, so HPUX behaviour was
right then).


  We could also cut down the number of affected tests by making the
  select_implicit and select_having not use mixed-case strings in the test
  tables.  Then we have only char, varchar, and select_views left.

Maybe it would be better to prepare various results, one for each of a
subset of the locales supported (C, en_EN, some other western and
maybe a couple multibyte?). That way at least you make sure the C
library is working as expected.

-- 
Alvaro Herrera (alvherre[a]atentus.com)
No deja de ser humillante para una persona de ingenio saber
que no hay tonto que no le pueda enseñar algo. (Jean B. Say)


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Making the regression tests locale-proof

2002-05-10 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 HPUX's behaviour is broken, because in spanish LL (as well as CH)
 stopped being a special symbol some five years ago (it used to be
 treated as one collating element sorted after L, so HPUX behaviour was
 right then).

Well, this is an old release ;-) ... the localedef files are dated
around 1996.  (And you don't want to know how long it's been since
I could speak passable Spanish.)

In any case, the fact that the official rules have changed does not
invalidate my point: there are systems on which the assumption Peter
wants to make will fail.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org