Hi Marvin, Thanks for the suggestions. I've also wanted to fix some of the character choices in Z85. Why don't we make a V2 spec with your changes?
The process is: fork the RFC, make the necessary changes, send a pull request to the RFC repository, and post the draft to rfc.zeromq.org. We can then fix the reference implementation (also in the git repo) and update libzmq. As we're now into versioning, I'd add another thing, a 5-byte version header in encoded text. We can use characters that aren't legal in version 1.0. -Pieter On Mon, Apr 21, 2014 at 10:46 PM, Marvin Renich <[email protected]> wrote: > [I am not subscribed; please CC me in responses] > > I was looking to ASCII-encode binary values to be saved as HTTP cookies, > and I came across Z85 (as well as Adobe's Ascii85). I disqualified > Ascii85 because the encoding character set includes characters not > allowed in cookies. I like Z85, but it requires the binary frame to be > a multiple of 4 bytes and the encoded frame to be a multiple of 5 bytes, > while Ascii85 allows short frames. > > Allowing the binary data to be any length would greatly increase the > usefulness of Z85, so I would like to propose the following > backward-compatible changes to the specification: > > Remove the paragraph beginning "The binary frame SHALL have a length > that is divisible by 4". > > Add the following to the end of the paragraph beginning "To encode a > frame": > > If fewer than four octets remain after processing all four-octet > groups, the remaining 1, 2, or 3 octets SHALL be treated as an > unsigned 8-, 16-, or 24-bit integer, respectively, in network byte > order. The integer SHALL be output as a base-85 value of 2, 3, or 4 > digits, respectively, using the above representation of the digits, > from most significant to least significant. > > Add the following to the end of the paragraph beginning "To decode a > string": > > If fewer than five characters remain after processing all > five-character groups, the 2, 3, or 4 remaining characters SHALL be > converted to an 8-, 16-, or 24-bit number, respectively, and output > into 1, 2, or 3 octets, respectively, as above. > > Add the following paragraph after that paragraph: > > The string to be decoded SHALL NOT consist of exactly one more than > a multiple of five characters. A group of 5, 4, 3, or 2 characters > aligned to be decoded as such a group according to the above > algorithm SHALL NOT represent a number that exceeds 4,294,967,295, > 16,777,215, 65,535, or 255, respectively. > > Note that this differs slightly from the Ascii85 method of dealing with > frames that are not multiples of 4 or 5 (encoding or decoding). The > Ascii85 method for encoding is to pad with octets with value 0, then > encode, then drop the number of encoded characters equal to the number > of octets of padding added. This requires, on decoding, to pad with > characters representing the digit 84. It also means that there are > typically three four-character sequences that correctly decode to the > same three-octet binary sequence (and similarly for one- and two-octet > sequences). The method I use above has a one-to-one correspondence > between binary frames and encoded strings; no string that is not the > result of encoding a binary frame is a valid input to the decoding > process. > > Ascii85 also allows extraneous white space and line-break characters, > and it seems that many implementations ignore all control characters. > While I do not need this for my current use, I think specifically > allowing this would be beneficial in some applications. > > I would also be in favor of replacing some of the base-85 characters, as > suggested by Peter Taylor on 10 July 2013 in > <http://lists.zeromq.org/pipermail/zeromq-dev/2013-July/022119.html>, > except that I would like to keep space, double quote, comma, semicolon, > and backslash out, since these are not allowed in cookies (RFC 6265). I > think it is easier to get around invalid characters in XML entities and > in URLs than in cookies. Both XML and URLs have well-defined quoting > that can be applied after enocding and before decoding, while the whole > point of my using Z85 in cookies is for its quoting properties. > > Thanks...Marvin > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
