Travis Vitek wrote:
Martin Sebor wrote:
My only requirement is to get those tests to pass in a reasonable
amount of time (i.e., without timing out), and without compromising
their effectiveness.
> Do
> we want to give up on the locale name matching, or do we want to
include
> zh_CN in the list of locales to test? What about matching the encoding?
> Should we ignore all of this and just find one locale for each value of
> MB_CUR_MAX from 1 to MB_LEN_MAX and run the test on them?
Maybe. I'll let you propose what makes the most sense to you :)
Martin
Well, the AIX I'm testing on has 683 installed locale files. Of those, many
are links to locales with different names. For example, we have
$ locale -a | grep "_CN" | grep -v "\."
ZH_CN
Zh_CN
zh_CN
$ ls -l /usr/lib/nls/loc/ZH_CN
lrwxrwxrwx 1 bin bin bin 28 Feb 8 2008
/usr/lib/nls/loc/ZH_CN -> /usr/lib/nls/loc/ZH_CN.UTF-8
$ ls -l /usr/lib/nls/loc/Zh_CN
lrwxrwxrwx 1 bin bin bin 28 Feb 8 2008
/usr/lib/nls/loc/ZH_CN -> /usr/lib/nls/loc/Zh_CN.GB18030
$ ls -l /usr/lib/nls/loc/zh_CN
lrwxrwxrwx 1 bin bin bin 28 Feb 8 2008
/usr/lib/nls/loc/ZH_CN -> /usr/lib/nls/loc/zh_CN.IBM-eucCN
The locales that are mapped to [ZH_CN.UTF-8, Zh_CN.GB18030, zh_CN.IBM-eucCN]
also appear in the locale list, so we have many duplicated locales. So, for
an immediate reduction in the number of tested locales, we could eliminate
these duplicates. How to tell if a locale is a duplicate? I'm not sure.
Test the result of setlocale(LC_ALL, name) for equality?
Another option would be to ignore all locales that don't match the regular
expression "[a-z][a-z]_[A-Z][A-Z]([EMAIL PROTECTED])?$" or the fnmatch
expressions
"[a-z][a-z]_[A-Z][A-Z]" and "[EMAIL PROTECTED]". The C/POSIX
locales don't match this, but we can explicitly allow them.
I suspect this kind of matching isn't going to be robust enough.
This alone cuts the number of locales down significantly, though it does
affect other platforms. Here is a small table showing the total number of
locales, and the number of locales that match the above regular expression.
Okay Total
AIX 226 603
Compaq 33 40
HP-UX 142 160
Irix 39 60
Linux 479 582
Solaris 223 331
Another option is to build up a list of all installed locales [their names
and other properties], and then provide a mechanism to search through, or
iterate over that list.
This would work.
If you want to run a test on all locales that have a
name matching some expression, you write a function or function object to
return true on match. You pass that to the rw_locales_match() routine, and
it gives you the first match. Call again to get the next match or null.
for (const rw_locale_entry* e = rw_locales_match(0, fun);
e; e = rw_locales_match(e, fun))
{
}
If you want to select only locales with mb_cur_max of 4, you either write a
filter, or you explicitly iterate over the list. If we really decide that it
is necessary to write up a SQL type language for selecting locales, then
that system can be implemented on top of this.
I like it!
Martin