AMQP defines wire encodings for both binary and string types. The
std::string type in c++ can (depending on its contents) be mapped to
either of these categories.
This is a fun thread :-)
So I did get that AMQP had an encoding for binary and a string types, my
point was really that the AMQP string type was defined as UTF8 also if
I were to see a std::string in code I would assume that it was an, erm,
string :-)
Similar to my comments to Rajith there a question of "intuitive"
behaviour so Re your options mentioned previously.
"
(i) assume (from the context) that the string is utf8
(ii) check whether the string contains any chars that are not legal for
utf8, and treat it as utf8 if not
The issue with the first option is that if the string is not valid utf8,
then encoding of the message will fail. The issue with the second option
is that it imposes checking every time such a property is set. Both of
these can of course be avoided by setting the encoding explicitly. "
I very strongly believe that the first option is the right one as:
1) It's computationally cheaper
2) AMQP defines strings as UTF8 so it's not actually an unreasonable
assumption to assume a std::string is UTF8 in a well designed
interoperable application (which is the sort of behaviour we should be
encouraging :-))
3) If the encoding fails an exception can be thrown - and as it really
ought to be a UTF8 string quite rightly IMHO. Question though does it
actually fail during the encoding process or is the encoding just wrong
and thus risks confusing JMS etc. clients? Even if the latter I suspect
that the risk is modest - binary values in strings are the Devil's work :-)
4) "Unlike a java.lang.String, the c++ std::string does not imply
textual data ". Actually IMHO std::string really does *imply* textual
data. I think it's very poor practice to use std::string on binary data,
use a char* a uint8t* or better yet a proper class to manipulate the
actual type that is under consideration. I'd take a fairly dim view of
my developers if they did that sort of thing without really good
justification, that's the sort of thing that ends up making code
unmaintainable in the long run (shall I get down off my high horse now :-))
I'm pretty content with some of your other comments Re:
"
message.setUtf8Property("key", "value");
It would be clear that it is the applications responsibility to ensure
that the data is indeed utf8.
A similar (and I think nicer) solution would be e.g
message.setProperty("key", utf8("value"));
This is perhaps slightly less obvious than an explicitly named method
but that could be addressed with some clear reference documentation for
the method (needed anyway of the generic setProperty() remains unchanged).
"
Thought I wonder if a better direction would be the opposite:
message.setProperty("key", binary("value"));
In other words having an assumption of UTF8 as the default, but giving
an option to explicitly break things - oops I mean use a binary value
:-D (you can see where I stand on this can't you!!).
Re "The fact that this matches what the old API does is a fair sign that
it will not cause great confusion!" I think that's a pretty good selling
point. I'm happy not to break the qpid::messaging API, but I do actually
quite like the idea of extending it with the "JMS like" approach of
having a "generic" accessor/mutator plus additional accessor/mutators
that are more type specific I personally think that it makes the
"intent" a lot clearer. If I saw setString() I'd expect a string and if
someone put something else in it then they're an idiot and deserve what
they get :-D
Cheers - and thanks for an interesting discussion, like I say it's fun
and really interesting to hear different perspectives on this topic.
Frase.
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:[email protected]