[I am not subscribed; please CC me in responses]

I was looking to ASCII-encode binary values to be saved as HTTP cookies,
and I came across Z85 (as well as Adobe's Ascii85).  I disqualified
Ascii85 because the encoding character set includes characters not
allowed in cookies.  I like Z85, but it requires the binary frame to be
a multiple of 4 bytes and the encoded frame to be a multiple of 5 bytes,
while Ascii85 allows short frames.

Allowing the binary data to be any length would greatly increase the
usefulness of Z85, so I would like to propose the following
backward-compatible changes to the specification:

Remove the paragraph beginning "The binary frame SHALL have a length
that is divisible by 4".

Add the following to the end of the paragraph beginning "To encode a
frame":

    If fewer than four octets remain after processing all four-octet
    groups, the remaining 1, 2, or 3 octets SHALL be treated as an
    unsigned 8-, 16-, or 24-bit integer, respectively, in network byte
    order.  The integer SHALL be output as a base-85 value of 2, 3, or 4
    digits, respectively, using the above representation of the digits,
    from most significant to least significant.

Add the following to the end of the paragraph beginning "To decode a
string":

    If fewer than five characters remain after processing all
    five-character groups, the 2, 3, or 4 remaining characters SHALL be
    converted to an 8-, 16-, or 24-bit number, respectively, and output
    into 1, 2, or 3 octets, respectively, as above.

Add the following paragraph after that paragraph:

    The string to be decoded SHALL NOT consist of exactly one more than
    a multiple of five characters.  A group of 5, 4, 3, or 2 characters
    aligned to be decoded as such a group according to the above
    algorithm SHALL NOT represent a number that exceeds 4,294,967,295,
    16,777,215, 65,535, or 255, respectively.

Note that this differs slightly from the Ascii85 method of dealing with
frames that are not multiples of 4 or 5 (encoding or decoding).  The
Ascii85 method for encoding is to pad with octets with value 0, then
encode, then drop the number of encoded characters equal to the number
of octets of padding added.  This requires, on decoding, to pad with
characters representing the digit 84.  It also means that there are
typically three four-character sequences that correctly decode to the
same three-octet binary sequence (and similarly for one- and two-octet
sequences).  The method I use above has a one-to-one correspondence
between binary frames and encoded strings; no string that is not the
result of encoding a binary frame is a valid input to the decoding
process.

Ascii85 also allows extraneous white space and line-break characters,
and it seems that many implementations ignore all control characters.
While I do not need this for my current use, I think specifically
allowing this would be beneficial in some applications.

I would also be in favor of replacing some of the base-85 characters, as
suggested by Peter Taylor on 10 July 2013 in
<http://lists.zeromq.org/pipermail/zeromq-dev/2013-July/022119.html>,
except that I would like to keep space, double quote, comma, semicolon,
and backslash out, since these are not allowed in cookies (RFC 6265).  I
think it is easier to get around invalid characters in XML entities and
in URLs than in cookies.  Both XML and URLs have well-defined quoting
that can be applied after enocding and before decoding, while the whole
point of my using Z85 in cookies is for its quoting properties.

Thanks...Marvin

_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to