This discussion piqued my interest (what did Scott mean when he said the
whitespace issues are a real mess?), so I did a little research.  For those
who know as little as I, here's what I found.

Base64 appears to have a number of variants.  The basic encoding is
consistent, but there's variation in:

 - what whitespace is allowed, and where, and
 - whether line wrapping is allowed and/or required.

RFC 2045 (MIME Part One) specifies that no more than 76 characters can
appear on a line.  Line breaks are thus required.  Any whitespace (or other
unrecognized characters) are to be ignored.

XML Schema's base64Binary type is specifically disallowed from enforcing
MIME's 76-character limit.  (This seems appropriate, since XML documents are
not subject to the same transmission constraints as MIME messages.)  A
single whitespace character is allowed between characters in the base64
alphabet, but other characters outside the base64 alphabet are not allowed.
On the other hand, the canonical lexical form of a base64Binary data value
lines of 76 base64 characters (except for the last line, which may be less).

RFC 3548 (The Base16, Base32, and Base64 Data Encodings) says,
"Implementations MUST NOT not add line feeds to base encoded data unless the
specification referring to this document explicitly directs base encoders to
add line feeds after a specific number of characters."  Furthermore,
"Implementations MUST reject the encoding if it contains characters outside
the base alphabet when interpreting base encoded data, unless the
specification referring to this document explicitly states otherwise."  (CR
and LF are both outside the base64 alphabet.)

So, it looks like generic base64 encoders and decoders would have to be told
what the whitespace and line wrapping rules are.  Rather than have such
generic functions, Xerces-C includes functions that attempt to implement
what is needed and no more: Schema bas64Binary.  I haven't looked at the
implementation, but it looks to me like an encoder that generates
base64Binary and complies with RFC 3548 would either omit whitespace
(including line breaks) entirely OR generate Schema's canonical form.

I may have missed something or gotten it wrong, but I hope this summary
gives an idea of how much of a mess this stuff is.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to