Hey, Wim

Amazingly complete recomendations! I think that will agree with me all who
read it.
Thank you.

Regards,
Dmitriy Igrishin

2010/2/24 Wim Dumon <[email protected]>

> Markus,
>
> Your interpretation of solution 2 is definitely not how it is intended
> to be used.
> On the other hand, your mail inspired me to investigate some of Wt's
> string conversion methods and a patch is on it's way to the git
> repository.
>
> The one and only important point wrt internationalization and Wt is:
> once your data is correctly stored in (or converted to) a WString, Wt
> knows how it is encoded and what every character means. The string
> will then be rendered correctly in the browser. (internally, a WString
> stores data in UTF-8 format, but that could be anything) But how do
> you get your characters correctly into a WString? There's a number of
> ways.
>
> - WString::WString(std::wstring) or WString::WString(wchar_t *)
> For starters: wchar_t and std::wstring are not portable across
> compilers. They're 2 byte on Windows, 4 byte on Unix. 2 bytes are not
> enough to cover all characters, so on windows you may still have
> multiple bytes to encode a single character on exceptional occasions.
> WString converts a std::wstring or a wchar_t immediately into a UTF-8
> string, using boost's conversion functions.
> You can use this WString constructor if you have a properly
> constructed std::wstring (e.g. a database library, from reading a
> file, ...) and want to display it, but also with the L"blah" notation,
> if you're sure that the compiler uses the right character encoding of
> your file. A correct wstring becomes a correct WString. It is not
> always possible to convert a WString to a std::wstring without losses,
> as wchar_t is compiler-dependend and may be as small as 8 bit. Use
> WString::widen() to convert a WString to a std::wstring.
>
> - WString::WString(std::string, encoding) and WString::WString(char *,
> encoding)
> 'encoding' is either UTF-8 or LocalEncoding.
> (a) UTF-8
> Tells WString that the char * uses the UTF-8 (multibyte) encoding. The
> string is stored without modification. Perfectly safe to use.
> (b) LocalEncoding
> This is the most error prone way to convert internationalized strings
> to a WString. It is best to only use these methods when your char* or
> string does not contain internationalized symbols. WString will
> convert the string parameter to UTF-8, after widening it. Current
> versions of Wt (i.e. 3.1.1) use an unspecified (and buggy) method. We
> should probably use the C++ global locale to do the conversion (I'll
> submit a patch for this soon). The global locale can be set by
> standard c++ methods, e.g. by calling
> std::locale::global(std::locale("")), which reads the locale from the
> environment. I'm not sure if this is a good idea, as you'll understand
> that it is not desirable to have WString("string with weird
> characters") interpreted differently depending on the environment in
> which it's executed... But if you don't set it, which is then the
> default locale for every compiler/OS combination? I leave it as an
> exercise to the reader.
> On Linux, this usually works because UTF-8 encoding is generally used.
>
> - WString::fromUTF8(std::string) and WString::fromUTF8(char *)
> Conventient shorthand for the previous constructor with UTF8 encoding
> parameter. Perfectly safe to use if your string is UTF-8 formatted.
>
> - WString::tr(const char *key) and WString::tr(std::string key)
> tr() is intended to facilitate development of intnerationalized
> applications (= 1 app in many languages, configurable at runtime, per
> session). It looks up the key in a map/database, and replace it with
> the translation of that key, according to the locale that was set by
> WApplication::setLocale(). Note that WApplication::setLocale() has
> nothing to do with string encodings! It specifies to what language the
> tr(key) should be translated.
> The default method to handle tr() is to look it up in a 'message
> bundle'. This is an XML file, which contains mappings of the key to
> their translation. As an XML file specifies its locale, there is no
> discussion about the meaning of a character (Wt support UTF-8 and
> UTF-16 as XML encodings). If the key is not found, Wt will render the
> key with two question marks in front and behind it. For example:
> ??button.ok?? (the .xml file should map button.ok to Jäwhöl or
> whatever).
> There is no reason to use non-ASCII characters in your keys; if you
> do, you're back in the
> what-encoding-for-my-C-file-does-my-compiler-assume game, which you
> want to avoid for portable C source files.
>
> So what are my recommendations wrt string literals with non-ascii
> characters in C++ files...
> 1. Don't do it. Use Wt's tr() mechanism to avoid the encoding mess
> completely. Use a pure ASCII key within the tr() (note that I wrote
> tr("Frankfurt") without umlaut on the u, that was not a typo). Use
> WApplication::messageResourceBundle() and store all encoded strings in
> a properly formatted external XML file. XML files do specify their
> encoding, a C file does not.
> 2. If you can't resist, use L"íntèrnätïñonal Ç++ string". But then you
> have to ensure that the source file encoding assumed by your C
> compiler (could be UTF-8, ISO 8859-1, ...) corresponds to the actual
> encoding of the C file.
>
>
> To come back to your other questions:
> - What if you are reading strings from a file, a db, ...
> You must absolutely know the encoding of the string that you retrieve
> from the file, db, ... I strongly recommend to save your files encoded
> in UTF-8, to configure your database to return UTF-8, ... If not, use
> boost, iconv, or std c++ methods to convert the returned string to
> UTF-8, and WString::fromUTF8() when you use it in Wt. (Alternatively,
> convert it to std::wstring but remember the implications on
> portability).
> - If you're constructing WStrings from literal international strings,
> WString(char *, LocalEncoding) is no good for you (unless you
> correctly configured the C++ global locale). Try to avoid it, but if
> you can't resist, use WString(L"bläh")
>
> Regarding your last example:
> > std::wstring s = L"Frankf?rt";
> > WString x = WString(s);// now the 'utf8_' member of WString seems to be
> > translated
> > std::string s1 = x.narrow(); // now the conversion gets lost...
> .narrow() is implemented like this:
> std::string narrow(const std::wstring& s)
> {
>  return std::string(s.begin(), s.end());
> }
> That is a bug and I fixed it. Widen was buggy in a similar way
> (especially on windows, where char is signed); I fixed that too.
> Locales in C++ always scared me a bit, but I never expected that the
> end result would look so simple...
> for (std::wstring::const_iterator i = s.begin(); i != s.end(); ++i)
>    retval += std::use_facet<std::ctype<wchar_t> >(loc).narrow(*i, '?');
>
> While fixing the bugs, I extended WString's interface:
> - Added WString(char*/string, std::locale)
> - Added locale parameter to Wt::narrow() and Wt::widen()
> So there's a new method to create a string from a char *, where you
> can specify your favorite std::locale which has to be used to
> interpret your chars. If you know what you're doing, this will also
> result in perfectly constructed WStrings.
>
> I hope this clarifies a bit.
>
> Best regards,
> Wim.
>
> 2010/2/23 Markus Quatember <[email protected]>:
> > Hi Wt-Community!
> >
> > I had the same problem like Jiongliang and Wim sent me the two Quick
> > fixes.
> >
> > First of all: The fixes work fine ;-)
> > But I didn't make friends with them so far...
> >
> > I will try to explain my problems:
> >
> >> Quick fix 1: try WString(L"Frankf?rt") (and make sure your compiler
> > speaks your .cpp file's locale)
> > Ok for constructing WString from literals but what if you are reading
> > the Strings from an variable (DB, File,...)
> > If I am constructing WString from literals I anyway must change every
> > call from WString("...") to WString(L"...") :(
> >
> >> Quick fix 2: try WString(tr("Frankf?rt")) and put Frankf?rt in a
> > message bundle
> > For me this is the fix that's better than the first because it is more
> > general.
> > Nevertheless I have to change all calls from WString(x) to
> > WString(tr(x))...
> > I derived from Wt::WLocalizedStrings, did following MS-Win specific and
> > called WApplication::setLocalizedStrings(...):
> >
> > class ConvertStrings : public Wt::WLocalizedStrings
> > {
> > protected:
> >  virtual bool resolveKey( const std::string& key, std::string& result )
> > override
> >  {
> >    if( key.empty() )
> >      return true;
> >
> >    std::vector< wchar_t > w;
> >    w.resize( key.size() * sizeof( wchar_t ) * 2 );
> >    MultiByteToWideChar( CP_ACP, 0, key.c_str(), -1, &w[ 0 ], w.size()
> > );
> >
> >    vector< char > s;
> >    s.resize( w.size() );
> >    WideCharToMultiByte( CP_UTF8, 0, &w[ 0 ], -1, &s[ 0 ], s.size(), 0,
> > 0 );
> >
> >    result = &s[ 0 ];
> >    return true;
> >  }
> > };
> >
> > But now the problems start, because i cannot recognize if the 'key' is
> > already translated or not!
> > So following will get me in troubles:
> >
> > std::string s1 = "Frankf?rt";
> > Wt::WString w1(tr( s )); // OK
> > std::string s2 = w1.narrow();
> > Wt::WString w2(tr( s2 )); // Bad, because s2 is already translated
> >
> > So my question is:
> > Is WString and it's conversion operations in the constructors ok, or am
> > I missing something essential?
> > Think about following:
> >
> > std::wstring s = L"Frankf?rt";
> > WString x = WString(s);// now the 'utf8_' member of WString seems to be
> > translated
> > std::string s1 = x.narrow(); // now the conversion gets lost...
> >
> > best regards
> > Max
> >
> >
> >
> ------------------------------------------------------------------------------
> > Download Intel&#174; Parallel Studio Eval
> > Try the new software tools for yourself. Speed compiling, find bugs
> > proactively, and fine-tune applications for parallel performance.
> > See why Intel Parallel Studio got high marks during beta.
> > http://p.sf.net/sfu/intel-sw-dev
> > _______________________________________________
> > witty-interest mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/witty-interest
> >
>
>
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> witty-interest mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/witty-interest
>
------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
witty-interest mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/witty-interest

Reply via email to