[HACKERS] Making the regression tests locale-proof
Since locale support is now enabled by default, it is desirable that the regression tests can pass if the clusters locale is not C. As a first step I have included the following statements in pg_regress right after the database is created: alter database $dbname set lc_messages to 'C'; alter database $dbname set lc_monetary to 'C'; alter database $dbname set lc_numeric to 'C'; alter database $dbname set lc_time to 'C'; This gets rid of a boatload of failures related to number formatting. For that purpose I have changed the permissions on these options to USERSET. (I'm still debating making lc_messages SUSET, because otherwise users can screw with admins by changing the language of the log output all the time. Comments?) The remaining issue is the sort order. I think this can be solved for practical purposes by creating two expected files for each affected test, say char.out and char-locale.out. The regression test driver would try the first one, if that fails try the second one. The assumption here is that all locales will choose the same sort order as long as they're dealing only with the core 26 letters. This does not have to be true in theory, but I think it works for the vast majority of practical cases. We could also cut down the number of affected tests by making the select_implicit and select_having not use mixed-case strings in the test tables. Then we have only char, varchar, and select_views left. Comments? -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Making the regression tests locale-proof
Peter Eisentraut [EMAIL PROTECTED] writes: The assumption here is that all locales will choose the same sort order as long as they're dealing only with the core 26 letters. This does not have to be true in theory, but I think it works for the vast majority of practical cases. Not for uppercase vs. lowercase versions of them. With no locale used (straight ASCII), you get A C b, with a locale you'll get A b C. -- Trond Eivind Glomsrød Red Hat, Inc. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Making the regression tests locale-proof
On Sat, 2002-05-11 at 02:25, Peter Eisentraut wrote: The remaining issue is the sort order. I think this can be solved for practical purposes by creating two expected files for each affected test, say char.out and char-locale.out. The regression test driver would try the first one, if that fails try the second one. The assumption here is that all locales will choose the same sort order as long as they're dealing only with the core 26 letters. This does not have to be true in theory, but I think it works for the vast majority of practical cases. et_EE locale has the following order for core 26 letters _ are other letters ABCDEFGHIJKLMNOPQRS_Z_TUVWXY (notice position of Z) and I'm not sure if V and W are distinguished when sorting words that have anything after them. I've heard that in some other locales there are other veir behaviours (like sorting on or two of the same letters as equivalent) Hannu ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Making the regression tests locale-proof
Peter Eisentraut [EMAIL PROTECTED] writes: For that purpose I have changed the permissions on these options to USERSET. (I'm still debating making lc_messages SUSET, because otherwise users can screw with admins by changing the language of the log output all the time. Comments?) Hm. Don't the regression tests already assume they are run by the superuser? They've got create/drop user commands in them. So I'd say SUSET is fine from the point of view of the tests, and I agree with your concern about making the logs unreadable. The assumption here is that all locales will choose the same sort order as long as they're dealing only with the core 26 letters. Nope. For instance, on HPUX I get this sort order in English: $ LANG=en_US.iso88591 sort testll eix ela ella ellm elm eln enx and this in Spanish: $ LANG=es_ES.iso88591 sort testll eix ela elm eln ella ellm enx because the Spanish treat LL as a single collating element. (Actually, my very-rusty recollection is that they sort LL the same as one L, which would mean that HPUX's behavior is not quite right here: it's treating LL as one symbol that sorts after L. Linux seems to have no clue that LL is special at all though...) We could also cut down the number of affected tests by making the select_implicit and select_having not use mixed-case strings in the test tables. Then we have only char, varchar, and select_views left. In practice we could perhaps use test data that doesn't hit any of the special cases in the popular languages. But I wonder whether this would not be shirking our responsibility as testers. Seems like if you avoid exercising these kinds of cases, you avoid finding corner-case bugs. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Making the regression tests locale-proof
Tom Lane escribió: Peter Eisentraut [EMAIL PROTECTED] writes: The assumption here is that all locales will choose the same sort order as long as they're dealing only with the core 26 letters. Nope. For instance, on HPUX I get this sort order in English: [...] because the Spanish treat LL as a single collating element. (Actually, my very-rusty recollection is that they sort LL the same as one L, which would mean that HPUX's behavior is not quite right here: it's treating LL as one symbol that sorts after L. Linux seems to have no clue that LL is special at all though...) HPUX's behaviour is broken, because in spanish LL (as well as CH) stopped being a special symbol some five years ago (it used to be treated as one collating element sorted after L, so HPUX behaviour was right then). We could also cut down the number of affected tests by making the select_implicit and select_having not use mixed-case strings in the test tables. Then we have only char, varchar, and select_views left. Maybe it would be better to prepare various results, one for each of a subset of the locales supported (C, en_EN, some other western and maybe a couple multibyte?). That way at least you make sure the C library is working as expected. -- Alvaro Herrera (alvherre[a]atentus.com) No deja de ser humillante para una persona de ingenio saber que no hay tonto que no le pueda enseñar algo. (Jean B. Say) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Making the regression tests locale-proof
Alvaro Herrera [EMAIL PROTECTED] writes: HPUX's behaviour is broken, because in spanish LL (as well as CH) stopped being a special symbol some five years ago (it used to be treated as one collating element sorted after L, so HPUX behaviour was right then). Well, this is an old release ;-) ... the localedef files are dated around 1996. (And you don't want to know how long it's been since I could speak passable Spanish.) In any case, the fact that the official rules have changed does not invalidate my point: there are systems on which the assumption Peter wants to make will fail. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org