[forwarding back to the list] Travis Vitek wrote: > Martin, > > Not all supported platforms have the GB18030 encoding [HP, Compaq and > IRIX don't], and of those that do, they use different names [gb18030 vs > GB18030]. Same with UTF-8 [utf8, UTF8 or UTF-8].
I realize that. That's the reason why I mentioned the (currently quite inefficient) find_mb_locale() in 22.locale.codecvt.out.cpp: it looks for any multibyte locale with MB_CUR_MAX of some value. > On top of that, Windows > uses code page numbers instead of encodings. Now I can easily convert > names to uppercase and strip out non alphanumeric characters. I might > even be able to do something with windows. That isn't the problem. > > The problem is that I don't seem to have a clear understanding of what > exactly you want from this. The original proposal was to be able to > filter locales by name or encoding, like so... > > char* locales = rw_locales (_RWSTD_LC_ALL, "en_US de", 0, true); > // would retrieve C en_US and all German locales > > char* locales = rw_locales (_RWSTD_LC_ALL, 0, "UTF-8", true); > // would retrieve all UTF-8 locales [those that end in .utf8, .UTF-8 > or .UTF8] > > So I wrote that. Unfortunately, as mentioned above, it has limitations. Right. It wasn't a thought out proposal. I was just brainstorming :) We may not be able to use any of it to fix the hanging tests. > What I really need is some actual requirements so that I can write some > code and get this bug closed. My only requirement is to get those tests to pass in a reasonable amount of time (i.e., without timing out), and without compromising their effectiveness. > > It seems that you want to guarantee that we test multibyte locales. It seems important to exercise ctype::do_narrow() in this case but I haven't looked at the code very carefully. It could be that the code path in the multibyte case isn't any different from the single byte case. > Do > we want to give up on the locale name matching, or do we want to include > zh_CN in the list of locales to test? What about matching the encoding? > Should we ignore all of this and just find one locale for each value of > MB_CUR_MAX from 1 to MB_LEN_MAX and run the test on them? Maybe. I'll let you propose what makes the most sense to you :) Martin > > Travis > > > > >> -----Original Message----- >> From: Martin Sebor [mailto:[EMAIL PROTECTED] On Behalf Of Martin Sebor >> Sent: Wednesday, January 02, 2008 1:41 PM >> To: [email protected] >> Subject: Re: low hanging fruit while cleaning up test failures >> >> Travis Vitek wrote: >>> >>> Martin Sebor wrote: >>>> Travis Vitek wrote: >>>>> Martin Sebor wrote: >>>>>> Travis Vitek wrote: >>>>>>> Martin Sebor wrote: >>>>>>>> I added a new function, rw_fnmatch(), to the test >> driver. It behaves >>>>>>>> just >>>>>>>> like the POSIX fnmatch() (the FNM_XXX constants aren't >> implemented >>>>>>>> yet). While the main purpose behind the new function is >> to support >>>>>>>> STDCXX-683 it should make it easier to also implement a >> scheme like >>>>>>>> the one outlined below. >>>>>>>> >>>>>>>> Travis, feel free to experiment/prototype a solution :) >>>>>>>> >>>>>>>> Martin >>>>>>>> >>>>>>> What expression should be used to get an appropriate set >> of locales for >>>>>>> a >>>>>>> given platform? I can't really expect a filter for all >> UTF-8 locales to >>>>>>> work >>>>>>> on all platforms as some don't have those encodings >> available at all. >>>>>>> If >>>>>>> I >>>>>>> filter by language, then I may be limiting the testing >> to some always >>>>>>> correct subset. Is that acceptable for the MT tests? >>>>>> I think the MT ctype tests just need to exercise a representative >>>>>> sample of multi-byte encodings (i.e., MB_CUR_MAX between 1 and >>>>>> MB_LEN_MAX). There already is some code in the test suite to find >>>>>> locales that use these encodings, although it could be made more >>>>>> efficient. I don't know how useful rw_fnmatch() will turn out to >>>>>> be in finding these codesets since their names don't matter. >>>>>> >>>>>> Martin >>>>>> >>>>>>> Travis >>>>> Actually, I think I meant to say single threaded tests. >> Those are the >>>>> ones >>>>> that currently test every locale. The multi-threadede >> tests already test >>>>> a >>>>> subset of locales, though the method for selecting those >> locales may vary >>>>> between tests. >>>>> >>>>> I don't think it is right to test a fixed set of locales based on >>>>> language, >>>>> country, or encoding. If you agree, then we probably agree that the >>>>> proposed >>>>> enhancement doesn't actually do anything useful [and I've >> wasted a bunch >>>>> of >>>>> time]. If this is the case, then we need to propose >> another solution for >>>>> selecting locales. >>>> I think testing a small subset of installed locales should >> be enough. >>>> In fact, for white box testing of the ctype facets, exercising three >>>> locales, "C" and two named ones, should be sufficient. >>>> >>>>> If I am wrong, and it is useful for testing [and more >> specifically how it >>>>> would be useful for fixing STDCXX-608], then I'd like to hear how. >>>> What do you propose? >>>> >>>> Martin >>>> >>>> >>> Okay. I can live with that. Then the issue now becomes deciding which >>> additional locales to test. How about just testing all >> Spanish and German >>> locales? >> I'd make sure at least one of them uses a multibyte encoding. Maybe >> zh_CN.GB18030? (with MB_CUR_MAX of 4)? >> >> Martin >> >
