Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-12 Thread Philippe Verdy via Unicode
I also think the reverse is also true !

Decoding a Base64 entity does not warranty it will return valid text in any
known encoding. So Unicode normalization of the output cannot apply.

Even if it represents text, nothing indicates that the result will be
encoded with some Unicode encoding form (unless this is tagged separately,
like in MIME).

If you use Base64 for decoding MIME contents (e.g. for emails), the Base-64
decoding itself will not transform the encoding, but then the email parser
will have to ensure that the text encoding is valid, at which time it will
have to transform it (possibly replace some invalid sequences or truncate
it), and then only it may apply normalization to help render that text. But
these transforms are part of the MIME application and independant of whever
you used Base-64 or any another binary encoding or transport syntax.

In other words: "If m is not equal to m', then t will not equal t'" is
reversible, but nothing indicates that m or m' Base64-decoded are texts,
they are just opaque binary objects which are still equal in value like
their t or t' Base64-encodings.

Note: some Base64 envelope formats (like MIME) allow multiple
representations t and t' from the same message m, by adding paddings or
transport syntaxes like line-splitting (with varaible length). Base64 alone
does not allow that variation (it normally uses a static alphabet), but
there are variants that accept decoding extended alphabets as binary
equivalent. So you may have two MIME-encoded texts that have different
encodings (with Base64 or Quopted-Printable, with variable line lengths)
but that represent the same source binary object, and decoding these
different encoded messages will yeld the same binary object: this does not
depend on Base64 but on the permissivity/flexibility of decoders for these
envelope formats (using **extensions** of Base64 specific to the envelope
format).


Le ven. 12 oct. 2018 à 18:27, Doug Ewell via Unicode 
a écrit :

> J Decker wrote:
>
> >> How about the opposite direction: If m is base64 encoded to yield t
> >> and then t is base64 decoded to yield n, will it always be the case
> >> that m equals n?
> >
> > False.
> > Canonical translation may occur which the different base64 may be the
> > same sort of string...
>
> Base64 is a binary-to-text encoding. Neither encoding nor decoding
> should presume any special knowledge of the meaning of the binary data,
> or do anything extra based on that presumption.
>
> Converting Unicode text to and from base64 should not perform any sort
> of Unicode normalization, convert between UTFs, insert or remove BOMs,
> etc. This is like saying that converting a JPEG image to and from base64
> should not resize or rescale the image, change its color depth, convert
> it to another graphic format, etc.
>
> So I'd say "true" to Roger's question.
>
> I touched on this a little bit in UTN #14, from the standpoint of trying
> to improve compression by normalizing the Unicode text first.
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>


RE: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-12 Thread Tex via Unicode
I agree with Doug. Base64 maps each byte of the source string to unique bytes 
in the destination string. Decoding is also a unique mapping.

If the encoded string is “translated” in some way by additional processes, 
canonical or otherwise, then all bets are off.

 

If you disagree, please offer an example or additional details of how 2 base64 
strings might be equivalent.

 

Tex

 

 

 

 

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of J Decker via 
Unicode
Sent: Friday, October 12, 2018 9:29 AM
To: d...@ewellic.org
Cc: Unicode Discussion
Subject: Re: Base64 encoding applied to different unicode texts always yields 
different base64 texts ... true or false?

 

 

On Fri, Oct 12, 2018 at 9:23 AM Doug Ewell via Unicode  
wrote:

J Decker wrote:

>> How about the opposite direction: If m is base64 encoded to yield t
>> and then t is base64 decoded to yield n, will it always be the case
>> that m equals n?
>
> False.
> Canonical translation may occur which the different base64 may be the
> same sort of string...

Base64 is a binary-to-text encoding. Neither encoding nor decoding
should presume any special knowledge of the meaning of the binary data,
or do anything extra based on that presumption.

Converting Unicode text to and from base64 should not perform any sort
of Unicode normalization, convert between UTFs, insert or remove BOMs,
etc. This is like saying that converting a JPEG image to and from base64
should not resize or rescale the image, change its color depth, convert
it to another graphic format, etc.

So I'd say "true" to Roger's question.

On the first side (X to base64) definitely true.

 

But there is potential that text resulting from some decoded buffer is 
translated, resulting in a 'congruent' string that's not exactly the same... 
and the base64 will be different.

 

Comparing some base64 string with some other base64 string shows a binary 
difference, but may be still the 'same' string. 

 


I touched on this a little bit in UTN #14, from the standpoint of trying
to improve compression by normalizing the Unicode text first.

--
Doug Ewell | Thornton, CO, US | ewellic.org



Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-12 Thread J Decker via Unicode
On Fri, Oct 12, 2018 at 9:23 AM Doug Ewell via Unicode 
wrote:

> J Decker wrote:
>
> >> How about the opposite direction: If m is base64 encoded to yield t
> >> and then t is base64 decoded to yield n, will it always be the case
> >> that m equals n?
> >
> > False.
> > Canonical translation may occur which the different base64 may be the
> > same sort of string...
>
> Base64 is a binary-to-text encoding. Neither encoding nor decoding
> should presume any special knowledge of the meaning of the binary data,
> or do anything extra based on that presumption.
>
> Converting Unicode text to and from base64 should not perform any sort
> of Unicode normalization, convert between UTFs, insert or remove BOMs,
> etc. This is like saying that converting a JPEG image to and from base64
> should not resize or rescale the image, change its color depth, convert
> it to another graphic format, etc.
>
> So I'd say "true" to Roger's question.
>
On the first side (X to base64) definitely true.

But there is potential that text resulting from some decoded buffer is
translated, resulting in a 'congruent' string that's not exactly the
same... and the base64 will be different.

Comparing some base64 string with some other base64 string shows a binary
difference, but may be still the 'same' string.


>
> I touched on this a little bit in UTN #14, from the standpoint of trying
> to improve compression by normalizing the Unicode text first.
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>


Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-12 Thread Doug Ewell via Unicode
J Decker wrote:

>> How about the opposite direction: If m is base64 encoded to yield t
>> and then t is base64 decoded to yield n, will it always be the case
>> that m equals n?
>
> False.
> Canonical translation may occur which the different base64 may be the
> same sort of string...

Base64 is a binary-to-text encoding. Neither encoding nor decoding
should presume any special knowledge of the meaning of the binary data,
or do anything extra based on that presumption.

Converting Unicode text to and from base64 should not perform any sort
of Unicode normalization, convert between UTFs, insert or remove BOMs,
etc. This is like saying that converting a JPEG image to and from base64
should not resize or rescale the image, change its color depth, convert
it to another graphic format, etc.

So I'd say "true" to Roger's question.

I touched on this a little bit in UTN #14, from the standpoint of trying
to improve compression by normalizing the Unicode text first.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org



Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-12 Thread J Decker via Unicode
On Fri, Oct 12, 2018 at 3:57 AM Costello, Roger L. via Unicode <
unicode@unicode.org> wrote:

> Hi Unicode Experts,
>
> Suppose base64 encoding is applied to m to yield base64 text t.
>
> Next, suppose base64 encoding is applied to m' to yield base64 text t'.
>
> If m is not equal to m', then t will not equal t'.
>
> In other words, given different inputs, base64 encoding always yields
> different base64 texts.
>
> True or false?
>
true.  base64 to and from is always the same thing.

>
> How about the opposite direction: If m is base64 encoded to yield t and
> then t is base64 decoded to yield n, will it always be the case that m
> equals n?
>
False.
Canonical translation may occur which the different base64 may be the same
sort of string...

https://en.wikipedia.org/wiki/Unicode_equivalence
https://en.wikipedia.org/wiki/Canonical_form


> /Roger
>
>


Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-12 Thread Costello, Roger L. via Unicode
Hi Unicode Experts,

Suppose base64 encoding is applied to m to yield base64 text t. 

Next, suppose base64 encoding is applied to m' to yield base64 text t'.

If m is not equal to m', then t will not equal t'.

In other words, given different inputs, base64 encoding always yields different 
base64 texts.

True or false?

How about the opposite direction: If m is base64 encoded to yield t and then t 
is base64 decoded to yield n, will it always be the case that m equals n?

/Roger