Pieter, I appreciate your overall rationale, but there seems to be a gap in the explanation and that's the reasoning for which 23 of the 33 non-alphanumeric printable ASCII characters to include.
As far as I can see, you've disqualified single and double quotes and probably backslashes. That still leaves free choice of 7 other characters to exclude. Was that choice arbitrary, or did you use additional criteria? If so, what were they? It seems to me that an additional valuable consideration would be XML safety. Bjorn is correct to observe that including < and & in the alphabet makes escaping a requirement for well-formed XML whether the encoded data is included as text nodes or as attribute values: > The ampersand character (&) and the left angle bracket (<) must not appear > in their literal form, except when used as markup delimiters, or within a > comment, a processing instruction, or a CDATA section. If they are needed > elsewhere, they must be escaped using either numeric character references > or the strings " & " and " < " respectively. (Source: http://www.w3.org/TR/REC-xml/#syntax ) For URLs there are 18 reserved characters, but since encoded data is going to be in either the path or the query-string it might be possible to restrict the excluded characters to those which are unsafe in those contexts in the http scheme, which I think would be /+?&%# Putting those together suggests that the excluded 10 characters should be "#%&'+/<?\ Do any of space or !$()*,-.:;=>@[]^_`{|}~ have stronger reasons for exclusion? (The strongest, it seems to me, would be space as potentially trimmed by mistake, and > for XML text nodes). _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
