Le sam. 13 oct. 2018 à 18:58, Steffen Nurpmeso via Unicode < unicode@unicode.org> a écrit :
> Philippe Verdy via Unicode wrote in <CAGa7JC3UomnN+Qzr3JGhqgJY+e-y6AYFk+\ > w9+jearw4ghyk...@mail.gmail.com>: > |You forget that Base64 (as used in MIME) does not follow these rules \ > |as it allows multiple different encodings for the same source binary. \ > |MIME actually > |splits a binary object into multiple fragments at random positions, \ > |and then encodes these fragments separately. Also MIME uses an extension > \ > |of Base64 > |where it allows some variations in the encoding alphabet (so even the \ > |same fragment of the same length may have two disting encodings). > | > |Base64 in MIME is different from standard Base64 (which never splits \ > |the binary object before encoding it, and uses a strict alphabet of \ > |64 ASCII > |characters, allowing no variation). So MIME requires special handling: \ > |the assumpton that a binary message is encoded the same is wrong, but \ > |MIME still > |requires that this non unique Base64 encoding will be decoded back \ > |to the same initial (unsplitted) binary object (independantly of its \ > |size and > |independantly of the splitting boundaries used in the transport, which \ > |may change during the transport). > > Base64 is defined in RFC 2045 (Multipurpose Internet Mail > Extensions (MIME) Part One: Format of Internet Message Bodies). > It is a content-transfer-encoding and encodes any data > transparently into a 7 bit clean ASCII _and_ EBCDIC compatible > (the authors commemorate that) text. > When decoding it reverts this representation into its original form. > Ok, there is the CRLF newline problem, as below. > What do you mean by "splitting"? > > ... > The only variance is described as: > > Care must be taken to use the proper octets for line breaks if base64 > encoding is applied directly to text material that has not been > converted to canonical form. In particular, text line breaks must be > converted into CRLF sequences prior to base64 encoding. The > important thing to note is that this may be done directly by the > encoder rather than in a prior canonicalization step in some > implementations. > > This is MIME, it specifies (in the same RFC): I've not spoken aboutr the encoding of new lines **in the actual encoded text**: - if their existing text-encoding ever gets converted to Base64 as if the whole text was an opaque binary object, their initial text-encoding will be preserved (so yes it will preserve the way these embedded newlines are encoded as CR, LF, CR+LF, NL...) I spoke about newlines used in the transport syntax to split the initial binary object (which may actually contain text but it does not matter). MIME defines this operation and even requires splitting the binary object in fragments with maximum binary size so that these binary fragments can be converted with Base64 into lines with maximum length. In the MIME Base64 representation you can insert newlines anywhere between fragments encoded separately. The maximum size of fragment is not fixed (it is usually about 60 binary octets, that are converted to lines of 80 ASCII characters, followed by a newline (CR+LF is strongly suggested for MIME, but it is admitted to use other newline sequences). Email forwarding agents frequently needed these line lengths to process the mail properly (not just the MIME headers but as well the content body, where they want at least some whitespace or newline in the middle where they can freely rearrange the line lines by compressing whitespaces or splitting lines to shorter length as necessary to their processing; this is much less frequent today because most mail agents are 8-bit clean and allow arbitrary line lengths... except in MIME headers). In MIME headers the situation is different, there's really a maximum line-length there, and if a header is too long, it has to be split on multiple lines (using continuation sequences, i.e. a newline (CR+LF is standard here) followed by at least one space (this insertion/change/removal of whitespaces is permitted everywhere in the MIME header after the header type, but even before the colon that follows the header type). So a MIME header value whose included text gets encoded with Base64 will be split using "=?" sequences starting the indication that the fragment is Base64 encoded (instead of being QuotedPrintable-encoded) and then a separator and the encapsulated Base-64 encoding of a fragment, and a single header may have multiple Base64-encoded fragments in the same header value, and there's large freedom about where to split the value to isolate fragments with convenient size that satisfies the MIME requirements. These multiple fragemetns may then occur on the same line (separated by whitespace) or on multiple line (separated by continuation sequences). In that case, the same initial text can have multiple valid representation in a MIME envelope format using Base64: it is not Base64 itself that splits the message, but the MIME transport syntax (which itself does not alter the initial text-encoding of the initial text... except in parts that are NOT binary-encoded using Base64 or QuotedPrintable). We are in a case where Base64 is not applied uniquely, because it is driven not by the actual transported text, but by the MIME transport syntax, and MIME allows freely changing the Base64 fragment sizes (or even switch to another encoding) as long as it preserves the binary value of the embedded object, and also to change the text-encoding (UTF-8, ISO 8859-*, etc.) if encoded fragments are identified to actually contain text (this does not apply to content bodies, unless they are declared with a "text/*" MIME type in the headers; but this applies for known headers whose value is necessarily a text type (such as in headers with types "From:", "To:", "Cc:", "Subject:", "Date:" ...) MIME defines two distinct syntaxes, one for declaration headers, another for content bodies. Each one can use Base64 encoding and split the content (but differently). HTTP also has a mechanism for splitting a large body into fragments (this allows notably to create streaming protocols where fragments can be easily multiplexed with parallel streams, or to include digital fingerprints or security signatures for individual fragments to secure the stream. This fragmentation is independant of the network transport (generally TCP, but not only) which has its own transparent MTUs at session layer, link layers, and also can be itself be encapsulated through tunnels transported by other means with different MTUs and fragmentation : HTTP does not have to manage that lower layer). Both MIME (for mails) and HTTP define allowed transformations to drive how Base64 will be used. Both have enough flexibility to allow variable fragment sizes, and even allow them to be changed as needed for the transport (this is challending for data signatures of the exchanged contents, but both MIME and HTTP can safely preserve the content without breaking these signatures in the middle): the recipient may not recieve exactly the same Base-64 encoded message, but it will get the same message content (once it is Base64 decoded) Base64 is used exactly to support this flexibility in transport (or storage) without altering any bit of the initial content once it is decoded.