AMQP defines wire encodings for both binary and string types. The std::string type in c++ can (depending on its contents) be mapped to either of these categories.

This is a fun thread :-)

So I did get that AMQP had an encoding for binary and a string types, my point was really that the AMQP string type was defined as UTF8 also if I were to see a std::string in code I would assume that it was an, erm, string :-)

Similar to my comments to Rajith there a question of "intuitive" behaviour so Re your options mentioned previously.

"
(i) assume (from the context) that the string is utf8
(ii) check whether the string contains any chars that are not legal for utf8, and treat it as utf8 if not

The issue with the first option is that if the string is not valid utf8, then encoding of the message will fail. The issue with the second option is that it imposes checking every time such a property is set. Both of these can of course be avoided by setting the encoding explicitly. "


I very strongly believe that the first option is the right one as:
1) It's computationally cheaper
2) AMQP defines strings as UTF8 so it's not actually an unreasonable assumption to assume a std::string is UTF8 in a well designed interoperable application (which is the sort of behaviour we should be encouraging :-)) 3) If the encoding fails an exception can be thrown - and as it really ought to be a UTF8 string quite rightly IMHO. Question though does it actually fail during the encoding process or is the encoding just wrong and thus risks confusing JMS etc. clients? Even if the latter I suspect that the risk is modest - binary values in strings are the Devil's work :-) 4) "Unlike a java.lang.String, the c++ std::string does not imply textual data ". Actually IMHO std::string really does *imply* textual data. I think it's very poor practice to use std::string on binary data, use a char* a uint8t* or better yet a proper class to manipulate the actual type that is under consideration. I'd take a fairly dim view of my developers if they did that sort of thing without really good justification, that's the sort of thing that ends up making code unmaintainable in the long run (shall I get down off my high horse now :-))


I'm pretty content with some of your other comments Re:

"
message.setUtf8Property("key", "value");

It would be clear that it is the applications responsibility to ensure that the data is indeed utf8.

A similar (and I think nicer) solution would be e.g

 message.setProperty("key", utf8("value"));

This is perhaps slightly less obvious than an explicitly named method but that could be addressed with some clear reference documentation for the method (needed anyway of the generic setProperty() remains unchanged).
"

Thought I wonder if a better direction would be the opposite:

message.setProperty("key", binary("value"));


In other words having an assumption of UTF8 as the default, but giving an option to explicitly break things - oops I mean use a binary value :-D (you can see where I stand on this can't you!!).


Re "The fact that this matches what the old API does is a fair sign that it will not cause great confusion!" I think that's a pretty good selling point. I'm happy not to break the qpid::messaging API, but I do actually quite like the idea of extending it with the "JMS like" approach of having a "generic" accessor/mutator plus additional accessor/mutators that are more type specific I personally think that it makes the "intent" a lot clearer. If I saw setString() I'd expect a string and if someone put something else in it then they're an idiot and deserve what they get :-D

Cheers - and thanks for an interesting discussion, like I say it's fun and really interesting to hear different perspectives on this topic.

Frase.




---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

Reply via email to