RE: expectation vs requirements for locale facets

Travis Vitek Mon, 20 Aug 2007 03:32:08 -0700

>Martin Sebor wrote:
>
>
>Yes. But notice the text doesn't say anything about time_put_byname or
>time_get_byname ;-)
>


Well, the standard doesn't say much at all about the *_byname<>
facets. All it really says about them is

  [21.1.1.2 p4] For some standard facets a standard "..._byname" class,
  derived from it, implements the virtual function semantics equivalent
  to the facet of the  locale constructed by  locale(const char*)  with
  the same name.  Each such  facet provides  a constructor that takes a
  const char*  argument which  names the  locale,  and a refs argument,
  which is passed to the base class constructor. [...]

So, if I'm reading that right, the *_byname<> facet classes are just
there to prevent the user from having to instantiate a std::locale
directly.

> The C++ standard (or even the C standard for that
> matter) isn't going to of help here.

Wait. Say what now? I'm not sure what you're trying to tell me here.
If the C++ Standard says that these facets read or write years as
roman numerals, then they should probably do so, regardless of what
any other standard document requires. I think this will actually get
cleared up in a few seconds...

>> Of
>> course that isn't what I'm seeing.
>
>Test case?

Yeah. See attachment. Only tested on Win32/VC8 and Linux/GCC.

>
>It's hard to say from just looking at the code (and I haven't looked
>very carefully). In general, we [try to] to implement the POSIX
>semantics, so if it works with strptime()/strftime() it should work
>with our time_put_byname/ time_get_byname.
>

Well, there's the problem right there. The standard requires that the
time_put<> facet format its output according to the POSIX function
strftime(), with the option for supporting extensions. It makes no
indication that the time_get<> facet should read data in such a way as
to be compatible with strptime(). The only thing I see that says
anything about the format expecte by time_get<> is here...

  [22.2.5.1 p1]  Each  get  member parses  a  format  as  produced by a
  corresponding format specifier to  time_put<>::put.  If  the sequence
  being parsed maches  the correct format, the corresponding members of
  the  struct tm  argument are  set  to  the values used to produce the
  sequence; otherwise either an error is reported or unspecified values
  are assigned. note.232)

  232) In other words, user confirmation is required for reliable
  parsing of user-entered dates and times, but *machine-generated
  formats can be parsed reliably.* This allows parsers to be
  aggressive about interpreting user variations on standard formats.
  [emphasis added]

This paragraph says that time_get<>::get_date() is supposed to process
the output of time_put<>::put(..., 'x').

  [22.2.5.1.2 p4] Effects: Reads characters starting at  s until it has
  extracted  those  struct tm members, and remaining format characters,
  used by  time_put<>::put  to produce  the  format specified by 'x' or
  until it encounters an error.

>
>If we test this behavior it's gotta be right ;-) Where does POSIX
>say leading spaces must be skipped? I see this under %e: Equivalent
>to %d. And under %d: The day of the month [01,31]; leading zeros
>are permitted but not required. Nothing about ignoring spaces.
>

Absolutely. The docs for POSIX strftime()...

  %d Replaced by the day of the month as a decimal number [01,31]. [ tm_mday]
  %e Replaced by the day of the month as a decimal number [1,31]; a
single digit is preceded by a space. [ tm_mday]

Here is the problem. The docs for POSIX strptime()...

  %d The day of the month [01,31]; leading zeros are permitted but not
required.
  %e Equivalent to %d.

So strftime() isn't even compatible with strptime() when it comes to '%e'.

>
>Without too much research, my first take on this is that it will
>probably fall under the "not every output format can be parsed"
>category. But we need to do some more reading to confirm this
>hypothesis.
>

Unfortunately, without consistent input/output it is going to be
difficult for this multi-threading test to verify that no data
corruption is occuring with arbitrary locales. Hopefully there is some
system in place that allows us to explicitly specify which locales are
to be used for a test.

>Martin
>

#include <locale>
#include <iterator>
#include <iostream>
#include <exception>
#include <sstream>

#ifdef _WIN32

const char* locales[] = {
    "Afrikaans", "Albanian", "Basque", "Belarusian", "Bulgarian",
    "Catalan", "Croatian", "Czech", "Danish", "Dutch_Belgium",
    "Estonian", "Finnish", "Galician", "Greek", "Hungarian",
    "Icelandic", "Indonesian", "Latvian", "Lithuanian", "Norwegian",
    "Polish", "Romanian", "Russian", "Slovak", "Slovenian", "Swahili",
    "Swedish", "Tatar", "Turkish", "Ukrainian",
    "Dutch_Netherlands", "English_Australia", "English_Belize",
    "English_Canada", "English_Caribbean", "English_Ireland",
    "English_Jamaica", "English_Zimbabwe", "French_Belgium",
    "French_Canada", "French_France", "French_Switzerland",
    "German_Austria", "German_Germany", "German_Liechtenstein",
    "German_Luxembourg", "German_Switzerland", "Italian_Italy",
    "Italian_Switzerland", "Malay_Malaysia", "Portuguese_Brazil",
    "Portuguese_Portugal", "Spanish_Bolivia", "Spanish_Chile",
    "Spanish_Colombia", "Spanish_Ecuador", "Spanish_Guatemala",
    "Spanish_Honduras", "Spanish_Mexico", "Spanish_Nicaragua",
    "Spanish_Panama", "Spanish_Paraguay", "Spanish_Peru",
    "Spanish_Uruguay", "Spanish_Venezuela", "Swedish_Finland" };

#else

const char* locales[] = {
    "bokmal", "catalan", "croatian", "czech", "danish", "dansk",
    "deutsch", "dutch", "estonian", "finnish", "french", "galego",
    "galician", "german", "greek", "hebrew", "hrvatski", "hungarian",
    "icelandic", "italian", "japanese", "korean", "lithuanian",
    "norwegian", "nynorsk", "polish", "portuguese", "romanian",
    "russian", "slovak", "slovene", "slovenian", "spanish", "swedish",
    "thai", "turkish",
    "aa_DJ", "aa_ER", "aa_ET", "af_ZA", "am_ET", "an_ES", "ar_AE", "ar_BH",
    "ar_DZ", "ar_EG", "ar_IN", "ar_IQ", "ar_JO", "ar_KW", "ar_LB", "ar_LY",
    "ar_MA", "ar_OM", "ar_QA", "ar_SA", "ar_SD", "ar_SY", "ar_TN", "ar_YE",
    "be_BY", "bg_BG", "bn_BD", "bn_IN", "br_FR", "bs_BA", "ca_ES", "cs_CZ",
    "cy_GB", "da_DK", "de_AT", "de_BE", "de_CH", "de_DE", "de_LU", "el_GR",
    "en_AU", "en_BW", "en_CA", "en_DK", "en_GB", "en_HK", "en_IE", "en_IN",
    "en_NZ", "en_PH", "en_SG", "en_US", "en_ZA", "en_ZW", "es_AR", "es_BO",
    "es_CL", "es_CO", "es_CR", "es_DO", "es_EC", "es_ES", "es_GT", "es_HN",
    "es_MX", "es_NI", "es_PA", "es_PE", "es_PR", "es_PY", "es_SV", "es_US",
    "es_UY", "es_VE", "et_EE", "eu_ES", "fa_IR", "fi_FI", "fo_FO", "fr_BE",
    "fr_CA", "fr_CH", "fr_FR", "fr_LU", "ga_IE", "gd_GB", "gl_ES", "gu_IN",
    "gv_GB", "he_IL", "hi_IN", "hr_HR", "hu_HU", "id_ID", "is_IS", "it_CH",
    "it_IT", "iw_IL", "ja_JP", "ka_GE", "kk_KZ", "kl_GL", "kn_IN", "ko_KR",
    "kw_GB", "lg_UG", "lo_LA", "lt_LT", "lv_LV", "mi_NZ", "mk_MK", "ml_IN",
    "mn_MN", "mr_IN", "ms_MY", "mt_MT", "nb_NO", "ne_NP", "nl_BE", "nl_NL",
    "nn_NO", "no_NO", "oc_FR", "om_ET", "om_KE", "pa_IN", "pl_PL", "pt_BR",
    "pt_PT", "ro_RO", "ru_RU", "ru_UA", "se_NO", "sk_SK", "sl_SI", "so_DJ",
    "so_ET", "so_KE", "so_SO", "sq_AL", "sr_CS", "st_ZA", "sv_FI", "sv_SE",
    "ta_IN", "te_IN", "tg_TJ", "th_TH", "ti_ER", "ti_ET", "tl_PH", "tr_TR",
    "uk_UA", "ur_PK", "uz_UZ", "vi_VN", "wa_BE", "xh_ZA", "yi_US", "zh_CN",
    "zh_HK", "zh_SG", "zh_TW", "zu_ZA" };

#endif

int main(int argc, char* [])
{
    std::tm src, dst = std::tm ();
    src.tm_isdst = 0;
    src.tm_sec   = 1;
    src.tm_min   = 2;
    src.tm_hour  = 3;
    src.tm_wday  = 4;
    src.tm_mon   = 5;
    src.tm_mday  = 7;
    src.tm_yday  = 7;
    src.tm_year  = 8;

    unsigned n;
    for (n = 0; n < sizeof locales / sizeof *locales; ++n) {
        try {
            const std::locale loc (locales [n]);

            std::stringstream ss;
            ss.imbue (loc);

            const std::time_put<char>& tp =
                std::use_facet<std::time_put<char> >(loc);


            const std::time_get<char>& tg =
                std::use_facet<std::time_get<char> >(loc);

            std::ios_base::iostate state = std::ios_base::goodbit;

            if (argc < 2) {
                tp.put (std::ostreambuf_iterator<char>(ss),
                        ss, ' ', &src, 'x');

                tg.get_date (std::istreambuf_iterator<char>(ss),
                             std::istreambuf_iterator<char>(), ss,
                             state, &dst);
            }
            else {
                tp.put (std::ostreambuf_iterator<char>(ss),
                        ss, ' ', &src, 'X');

                tg.get_time (std::istreambuf_iterator<char>(ss),
                             std::istreambuf_iterator<char>(), ss,
                             state, &dst);
            }

            const bool failed = (state & std::ios_base::failbit) != 0;

            std::cout << "string="   << ss.str ().c_str ()
                      << "\tresult=" << (failed ? "fail" : "good")
                      << "\tlocale=" << locales [n] << std::endl;
                      
        }
        catch(const std::exception& ex) {
            std::cout << ex.what () << std::endl;
        }
    }

    return 0;
}

RE: expectation vs requirements for locale facets

Reply via email to