Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?
I also think the reverse is also true ! Decoding a Base64 entity does not warranty it will return valid text in any known encoding. So Unicode normalization of the output cannot apply. Even if it represents text, nothing indicates that the result will be encoded with some Unicode encoding form (unless this is tagged separately, like in MIME). If you use Base64 for decoding MIME contents (e.g. for emails), the Base-64 decoding itself will not transform the encoding, but then the email parser will have to ensure that the text encoding is valid, at which time it will have to transform it (possibly replace some invalid sequences or truncate it), and then only it may apply normalization to help render that text. But these transforms are part of the MIME application and independant of whever you used Base-64 or any another binary encoding or transport syntax. In other words: "If m is not equal to m', then t will not equal t'" is reversible, but nothing indicates that m or m' Base64-decoded are texts, they are just opaque binary objects which are still equal in value like their t or t' Base64-encodings. Note: some Base64 envelope formats (like MIME) allow multiple representations t and t' from the same message m, by adding paddings or transport syntaxes like line-splitting (with varaible length). Base64 alone does not allow that variation (it normally uses a static alphabet), but there are variants that accept decoding extended alphabets as binary equivalent. So you may have two MIME-encoded texts that have different encodings (with Base64 or Quopted-Printable, with variable line lengths) but that represent the same source binary object, and decoding these different encoded messages will yeld the same binary object: this does not depend on Base64 but on the permissivity/flexibility of decoders for these envelope formats (using **extensions** of Base64 specific to the envelope format). Le ven. 12 oct. 2018 à 18:27, Doug Ewell via Unicode a écrit : > J Decker wrote: > > >> How about the opposite direction: If m is base64 encoded to yield t > >> and then t is base64 decoded to yield n, will it always be the case > >> that m equals n? > > > > False. > > Canonical translation may occur which the different base64 may be the > > same sort of string... > > Base64 is a binary-to-text encoding. Neither encoding nor decoding > should presume any special knowledge of the meaning of the binary data, > or do anything extra based on that presumption. > > Converting Unicode text to and from base64 should not perform any sort > of Unicode normalization, convert between UTFs, insert or remove BOMs, > etc. This is like saying that converting a JPEG image to and from base64 > should not resize or rescale the image, change its color depth, convert > it to another graphic format, etc. > > So I'd say "true" to Roger's question. > > I touched on this a little bit in UTN #14, from the standpoint of trying > to improve compression by normalizing the Unicode text first. > > -- > Doug Ewell | Thornton, CO, US | ewellic.org > >
RE: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?
I agree with Doug. Base64 maps each byte of the source string to unique bytes in the destination string. Decoding is also a unique mapping. If the encoded string is “translated” in some way by additional processes, canonical or otherwise, then all bets are off. If you disagree, please offer an example or additional details of how 2 base64 strings might be equivalent. Tex From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of J Decker via Unicode Sent: Friday, October 12, 2018 9:29 AM To: d...@ewellic.org Cc: Unicode Discussion Subject: Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false? On Fri, Oct 12, 2018 at 9:23 AM Doug Ewell via Unicode wrote: J Decker wrote: >> How about the opposite direction: If m is base64 encoded to yield t >> and then t is base64 decoded to yield n, will it always be the case >> that m equals n? > > False. > Canonical translation may occur which the different base64 may be the > same sort of string... Base64 is a binary-to-text encoding. Neither encoding nor decoding should presume any special knowledge of the meaning of the binary data, or do anything extra based on that presumption. Converting Unicode text to and from base64 should not perform any sort of Unicode normalization, convert between UTFs, insert or remove BOMs, etc. This is like saying that converting a JPEG image to and from base64 should not resize or rescale the image, change its color depth, convert it to another graphic format, etc. So I'd say "true" to Roger's question. On the first side (X to base64) definitely true. But there is potential that text resulting from some decoded buffer is translated, resulting in a 'congruent' string that's not exactly the same... and the base64 will be different. Comparing some base64 string with some other base64 string shows a binary difference, but may be still the 'same' string. I touched on this a little bit in UTN #14, from the standpoint of trying to improve compression by normalizing the Unicode text first. -- Doug Ewell | Thornton, CO, US | ewellic.org
Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?
On Fri, Oct 12, 2018 at 9:23 AM Doug Ewell via Unicode wrote: > J Decker wrote: > > >> How about the opposite direction: If m is base64 encoded to yield t > >> and then t is base64 decoded to yield n, will it always be the case > >> that m equals n? > > > > False. > > Canonical translation may occur which the different base64 may be the > > same sort of string... > > Base64 is a binary-to-text encoding. Neither encoding nor decoding > should presume any special knowledge of the meaning of the binary data, > or do anything extra based on that presumption. > > Converting Unicode text to and from base64 should not perform any sort > of Unicode normalization, convert between UTFs, insert or remove BOMs, > etc. This is like saying that converting a JPEG image to and from base64 > should not resize or rescale the image, change its color depth, convert > it to another graphic format, etc. > > So I'd say "true" to Roger's question. > On the first side (X to base64) definitely true. But there is potential that text resulting from some decoded buffer is translated, resulting in a 'congruent' string that's not exactly the same... and the base64 will be different. Comparing some base64 string with some other base64 string shows a binary difference, but may be still the 'same' string. > > I touched on this a little bit in UTN #14, from the standpoint of trying > to improve compression by normalizing the Unicode text first. > > -- > Doug Ewell | Thornton, CO, US | ewellic.org > >
Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?
J Decker wrote: >> How about the opposite direction: If m is base64 encoded to yield t >> and then t is base64 decoded to yield n, will it always be the case >> that m equals n? > > False. > Canonical translation may occur which the different base64 may be the > same sort of string... Base64 is a binary-to-text encoding. Neither encoding nor decoding should presume any special knowledge of the meaning of the binary data, or do anything extra based on that presumption. Converting Unicode text to and from base64 should not perform any sort of Unicode normalization, convert between UTFs, insert or remove BOMs, etc. This is like saying that converting a JPEG image to and from base64 should not resize or rescale the image, change its color depth, convert it to another graphic format, etc. So I'd say "true" to Roger's question. I touched on this a little bit in UTN #14, from the standpoint of trying to improve compression by normalizing the Unicode text first. -- Doug Ewell | Thornton, CO, US | ewellic.org
Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?
On Fri, Oct 12, 2018 at 3:57 AM Costello, Roger L. via Unicode < unicode@unicode.org> wrote: > Hi Unicode Experts, > > Suppose base64 encoding is applied to m to yield base64 text t. > > Next, suppose base64 encoding is applied to m' to yield base64 text t'. > > If m is not equal to m', then t will not equal t'. > > In other words, given different inputs, base64 encoding always yields > different base64 texts. > > True or false? > true. base64 to and from is always the same thing. > > How about the opposite direction: If m is base64 encoded to yield t and > then t is base64 decoded to yield n, will it always be the case that m > equals n? > False. Canonical translation may occur which the different base64 may be the same sort of string... https://en.wikipedia.org/wiki/Unicode_equivalence https://en.wikipedia.org/wiki/Canonical_form > /Roger > >
Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?
Hi Unicode Experts, Suppose base64 encoding is applied to m to yield base64 text t. Next, suppose base64 encoding is applied to m' to yield base64 text t'. If m is not equal to m', then t will not equal t'. In other words, given different inputs, base64 encoding always yields different base64 texts. True or false? How about the opposite direction: If m is base64 encoded to yield t and then t is base64 decoded to yield n, will it always be the case that m equals n? /Roger