>
> I very strongly believe that the first option [assume that the string is
> utf8] is the right one as:
> 1) It's computationally cheaper
> 2) AMQP defines strings as UTF8 so it's not actually an unreasonable
> assumption to assume a std::string is UTF8 in a well designed interoperable
> application (which is the sort of behaviour we should be encouraging :-))
> 3) If the encoding fails an exception can be thrown - and as it really
> ought to be a UTF8 string quite rightly IMHO. Question though does it
> actually fail during the encoding process or is the encoding just wrong and
> thus risks confusing JMS etc. clients? Even if the latter I suspect that the
> risk is modest - binary values in strings are the Devil's work :-)
> 4) "Unlike a java.lang.String, the c++ std::string does not imply textual
> data ". Actually IMHO std::string really does *imply* textual data. I think
> it's very poor practice to use std::string on binary data, use a char* a
> uint8t* or better yet a proper class to manipulate the actual type that is
> under consideration. I'd take a fairly dim view of my developers if they did
> that sort of thing without really good justification, that's the sort of
> thing that ends up making code unmaintainable in the long run (shall I get
> down off my high horse now :-))
>

In my opinion it is not so obvious, because as far as I know:
- AMQP allows UTF-8 or UTF-16 strings.
- Many C++ applications supporting Unicode store strings in std::wstring
with UCS-2 encoding. Having fixed character size of 2 bytes per code point
allows for simple and efficient string manipulations. If required,
conversions to/from UTF-8 are performed on interfaces to the outside world.
(BTW I think this is also the case for Java.)
- In C++ it is fairly common to use std::string as a container for binary
data. I would not say it is wrong to do that.

I personally would say that in C++ there is no "default" character encoding.
Defaulting to UTF-8 makes some sense because all 7-bit ASCII strings are
UTF-8. But it may be dangerous to assume UTF-8 for all strings and it would
be probably be safer to somehow force the C++ programs to explicitly specify
the encoding when reading and writing strings.

In Java, the default encoding is apparently UTF-8, but the Java client
should still be able to accept strings encoded in UTF-16.

I think that the Qpid client libraries should support implicit conversions
between UTF-8 and UTF-16/UCS-2. I believe it is acceptable to support only
the UCS-2 character set (the Unicode's Basic Multilingual Plane) in C++
client.

Reply via email to