Re: low hanging fruit while cleaning up test failures

Martin Sebor Wed, 02 Jan 2008 17:25:13 -0800

[forwarding back to the list]

Travis Vitek wrote:
> Martin,
>
> Not all supported platforms have the GB18030 encoding [HP, Compaq and
> IRIX don't], and of those that do, they use different names [gb18030 vs
> GB18030]. Same with UTF-8 [utf8, UTF8 or UTF-8].


I realize that. That's the reason why I mentioned the (currently quite
inefficient) find_mb_locale() in 22.locale.codecvt.out.cpp: it looks
for any multibyte locale with MB_CUR_MAX of some value.

> On top of that, Windows
> uses code page numbers instead of encodings. Now I can easily convert
> names to uppercase and strip out non alphanumeric characters. I might
> even be able to do something with windows. That isn't the problem.
>
> The problem is that I don't seem to have a clear understanding of what
> exactly you want from this. The original proposal was to be able to
> filter locales by name or encoding, like so...
>
>     char* locales = rw_locales (_RWSTD_LC_ALL, "en_US de", 0, true);
>     // would retrieve C en_US and all German locales
>
>     char* locales = rw_locales (_RWSTD_LC_ALL, 0, "UTF-8", true);
>     // would retrieve all UTF-8 locales [those that end in .utf8, .UTF-8
> or .UTF8]
>
> So I wrote that. Unfortunately, as mentioned above, it has limitations.

Right. It wasn't a thought out proposal. I was just brainstorming :)
We may not be able to use any of it to fix the hanging tests.

> What I really need is some actual requirements so that I can write some
> code and get this bug closed.

My only requirement is to get those tests to pass in a reasonable
amount of time (i.e., without timing out), and without compromising
their effectiveness.

>
> It seems that you want to guarantee that we test multibyte locales.

It seems important to exercise ctype::do_narrow() in this case but
I haven't looked at the code very carefully. It could be that the
code path in the multibyte case isn't any different from the single
byte case.

> Do
> we want to give up on the locale name matching, or do we want to include
> zh_CN in the list of locales to test? What about matching the encoding?
> Should we ignore all of this and just find one locale for each value of
> MB_CUR_MAX from 1 to MB_LEN_MAX and run the test on them?

Maybe. I'll let you propose what makes the most sense to you :)

Martin

>
> Travis
>
>
>
>
>> -----Original Message-----
>> From: Martin Sebor [mailto:[EMAIL PROTECTED] On Behalf Of Martin Sebor
>> Sent: Wednesday, January 02, 2008 1:41 PM
>> To: [email protected]
>> Subject: Re: low hanging fruit while cleaning up test failures
>>
>> Travis Vitek wrote:
>>>
>>> Martin Sebor wrote:
>>>> Travis Vitek wrote:
>>>>> Martin Sebor wrote:
>>>>>> Travis Vitek wrote:
>>>>>>> Martin Sebor wrote:
>>>>>>>> I added a new function, rw_fnmatch(), to the test
>> driver. It behaves
>>>>>>>> just
>>>>>>>> like the POSIX fnmatch() (the FNM_XXX constants aren't
>> implemented
>>>>>>>> yet). While the main purpose behind the new function is
>> to support
>>>>>>>> STDCXX-683 it should make it easier to also implement a
>> scheme like
>>>>>>>> the one outlined below.
>>>>>>>>
>>>>>>>> Travis, feel free to experiment/prototype a solution :)
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>> What expression should be used to get an appropriate set
>> of locales for
>>>>>>> a
>>>>>>> given platform? I can't really expect a filter for all
>> UTF-8 locales to
>>>>>>> work
>>>>>>> on all platforms as some don't have those encodings
>> available at all.
>>>>>>> If
>>>>>>> I
>>>>>>> filter by language, then I may be limiting the testing
>> to some always
>>>>>>> correct subset. Is that acceptable for the MT tests?
>>>>>> I think the MT ctype tests just need to exercise a representative
>>>>>> sample of multi-byte encodings (i.e., MB_CUR_MAX between 1 and
>>>>>> MB_LEN_MAX). There already is some code in the test suite to find
>>>>>> locales that use these encodings, although it could be made more
>>>>>> efficient. I don't know how useful rw_fnmatch() will turn out to
>>>>>> be in finding these codesets since their names don't matter.
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>>> Travis
>>>>> Actually, I think I meant to say single threaded tests.
>> Those are the
>>>>> ones
>>>>> that currently test every locale. The multi-threadede
>> tests already test
>>>>> a
>>>>> subset of locales, though the method for selecting those
>> locales may vary
>>>>> between tests.
>>>>>
>>>>> I don't think it is right to test a fixed set of locales based on
>>>>> language,
>>>>> country, or encoding. If you agree, then we probably agree that the
>>>>> proposed
>>>>> enhancement doesn't actually do anything useful [and I've
>> wasted a bunch
>>>>> of
>>>>> time]. If this is the case, then we need to propose
>> another solution for
>>>>> selecting locales.
>>>> I think testing a small subset of installed locales should
>> be enough.
>>>> In fact, for white box testing of the ctype facets, exercising three
>>>> locales, "C" and two named ones, should be sufficient.
>>>>
>>>>> If I am wrong, and it is useful for testing [and more
>> specifically how it
>>>>> would be useful for fixing STDCXX-608], then I'd like to hear how.
>>>> What do you propose?
>>>>
>>>> Martin
>>>>
>>>>
>>> Okay. I can live with that. Then the issue now becomes deciding which
>>> additional locales to test. How about just testing all
>> Spanish and German
>>> locales?
>> I'd make sure at least one of them uses a multibyte encoding. Maybe
>> zh_CN.GB18030? (with MB_CUR_MAX of 4)?
>>
>> Martin
>>
>

Re: low hanging fruit while cleaning up test failures

Reply via email to