On Fri, Oct 12, 2018 at 9:23 AM Doug Ewell via Unicode <unicode@unicode.org> wrote:
> J Decker wrote: > > >> How about the opposite direction: If m is base64 encoded to yield t > >> and then t is base64 decoded to yield n, will it always be the case > >> that m equals n? > > > > False. > > Canonical translation may occur which the different base64 may be the > > same sort of string... > > Base64 is a binary-to-text encoding. Neither encoding nor decoding > should presume any special knowledge of the meaning of the binary data, > or do anything extra based on that presumption. > > Converting Unicode text to and from base64 should not perform any sort > of Unicode normalization, convert between UTFs, insert or remove BOMs, > etc. This is like saying that converting a JPEG image to and from base64 > should not resize or rescale the image, change its color depth, convert > it to another graphic format, etc. > > So I'd say "true" to Roger's question. > On the first side (X to base64) definitely true. But there is potential that text resulting from some decoded buffer is translated, resulting in a 'congruent' string that's not exactly the same... and the base64 will be different. Comparing some base64 string with some other base64 string shows a binary difference, but may be still the 'same' string. > > I touched on this a little bit in UTN #14, from the standpoint of trying > to improve compression by normalizing the Unicode text first. > > -- > Doug Ewell | Thornton, CO, US | ewellic.org > >